Aksh Garg Profile
Aksh Garg

@AkshGarg03

1,392
Followers
277
Following
23
Media
176
Statuses

CS @stanford | DL @sail | Ex @point72 @tesla , @spacex , @deshaw

Joined January 2022
Don't wanna be here? Send us removal request.
Pinned Tweet
@AkshGarg03
Aksh Garg
1 month
(1/5) @CKT_Conner , @dill_pkl , @emilyzsh , and I are excited to introduce Shard - a proof-of-concept for an infinitely scalable distributed system composed of consumer hardware for training and running ML models! Features: - Data + Pipeline Parallel for handling arbitrarily large
22
31
200
@AkshGarg03
Aksh Garg
2 months
liked by @karpathy i’ve officially made it in life 😇
Tweet media one
@AkshGarg03
Aksh Garg
2 months
stanford midterms be like #cs231n @karpathy
Tweet media one
4
5
208
7
0
422
@AkshGarg03
Aksh Garg
21 days
Re Llama3V: First of all, we want to apologize to the original authors of MiniCPM. We wanted Mustafa to make the original statement but have been unable to contact him since yesterday. @siddrrsh and I posted Llama3V with @mustafaaljadery . Mustafa wrote the code for the project.
@yangzhizheng1
PrimerYang
23 days
Shocked! Llama3-V project from a Stanford team plagiarized a lot from MiniCPM-Llama3-V 2.5! its code is a reformatting of MiniCPM-Llama3-V 2.5, and the model's behavior is highly similar to a noised version of MiniCPM-Llama3-V 2.5 checkpoint. Evidence:
Tweet media one
Tweet media two
Tweet media three
36
171
895
92
58
260
@AkshGarg03
Aksh Garg
2 months
stanford midterms be like #cs231n @karpathy
Tweet media one
4
5
208
@AkshGarg03
Aksh Garg
1 month
1/ As promised, here's my thesis on the future of decentralized training of foundation models. Covers: 1) why decentralized makes sense from scaling, margins, and marketplace lenses 2) challenges 3) exciting enabling research shifts In long form at:
13
21
171
@AkshGarg03
Aksh Garg
1 month
1/ @SohamGovande , @jameszhou02 , @jzhou891 and I spent the weekend building PodPlex: A platform for distributed training & serverless inference at scale I'm very glad to say that we left $10,000 GPU credits richer and 36 hours of sleep poorer more details in 🧵
11
14
136
@AkshGarg03
Aksh Garg
2 months
Releasing Gemma with a 10M context window! We feature: • 1250x context size • Local execution on <32GB of Ram • Infini-attention Check us out on: • 🤗: • GitHub: • Technical blog:
@siddrrsh
Siddharth Sharma
2 months
Introducing Gemma with a 10M context window We feature: • 1250x context length of base Gemma • Requires less than 32GB of memory • Infini-attention + activation compression Check us out on: • 🤗: • GitHub: • Technical
Tweet media one
44
150
1K
12
10
133
@AkshGarg03
Aksh Garg
1 month
Despite 43 million blind people worldwide, current assistive technology such as screen readers and braille displays are extremely expensive and limited. We’re hoping to change that with ApolloVision, a multimodal software layer that can help blind and visually impaired people
2
5
41
@AkshGarg03
Aksh Garg
2 months
first page of 🤗 in <24 hours!! And ahead of whisper, LLama3-70b, and Phi-3 🤯. Really excited to see how the community interacts with the model. If you haven't seen it yet, 10M context window in <32GB RAM. See below 👇
Tweet media one
@AkshGarg03
Aksh Garg
2 months
Releasing Gemma with a 10M context window! We feature: • 1250x context size • Local execution on <32GB of Ram • Infini-attention Check us out on: • 🤗: • GitHub: • Technical blog:
12
10
133
2
3
37
@AkshGarg03
Aksh Garg
2 months
@karpathy you should check out our work 10M context window Gemma-2B with <32 GB RAM (pinned on profile) - maybe a reason to retweet is close ahead :)
2
0
28
@AkshGarg03
Aksh Garg
1 month
the number of start-ups that died today is absurd…
3
0
20
@AkshGarg03
Aksh Garg
1 month
the stars are aligned when your 149th post randomly happened to be a medium article on parallel computing #CS149 (Parallel Computing)
@AkshGarg03
Aksh Garg
1 month
1/ As promised, here's my thesis on the future of decentralized training of foundation models. Covers: 1) why decentralized makes sense from scaling, margins, and marketplace lenses 2) challenges 3) exciting enabling research shifts In long form at:
13
21
171
2
1
18
@AkshGarg03
Aksh Garg
1 month
first andrej and now @JeffDean , can this day get any better
@siddrrsh
Siddharth Sharma
1 month
That feeling when one of your heroes @JeffDean likes your tweet 🥲
Tweet media one
1
0
15
0
0
17
@AkshGarg03
Aksh Garg
2 months
awesome experience building out the first ever AI orchestration framework for distributing devin’s w/ @SohamGovande @brendanm0407 @sahiladhawade huge thanks to @cognition_labs , @mercor_ai , @EtchedInc , and prod for hosting check out our demo 👇
@SohamGovande
soham
2 months
had a LOT of fun building d3n at the @cognition_labs hackathon! d3n is an AI agent orchestration framework that can spawn a fleet of devins to solve distributed problems here's how it works 👇 built with @AkshGarg03 @brendanm0407 @sahiladhawade
8
9
86
6
7
17
@AkshGarg03
Aksh Garg
2 months
It’s a warm day at Stanford! ☀️ Stay hydrated, everyone. #StanfordSun
@itsandrewgao
andrew gao
2 months
It's a rainy day at Stanford! 🌧️ Stay dry, everyone. #StanfordRain
2
0
5
0
1
13
@AkshGarg03
Aksh Garg
1 month
we’re in this paradigm shift where technical ppl who once thought ethics was boring (including me) have started to fear and question unbounded AI development with the amount of influence open-air has, it’s very scary to imagine the implications of safety not being the first
@janleike
Jan Leike
1 month
OpenAI must become a safety-first AGI company.
104
291
4K
3
0
12
@AkshGarg03
Aksh Garg
2 months
we're collecting project ideas for D3N to try out!! have interesting devin projects you want to try out? reply below and we'll select the most upvoted/interesting ideas to send to devin Bonus points if the ideas are naturally distributed or parallelizable
@SohamGovande
soham
2 months
FOR A LIMITED TIME: we'll solve your github issues using our d3n orchestrator! it'll spawn fleets of Devins to solve your issues concurrently. comment below your favorite repo for us to try it out on 👇 (public repos only)
2
0
5
7
1
11
@AkshGarg03
Aksh Garg
1 month
(5/5) We’d also like to recognize @ce_zhang and @togethercompute for their support and @sean_t_strong , @DanielleJing , @arasha , @ShravanGReddy , and @Arpan_Shah_ from @pearvc for awarding us the best startup prize at @hackwithtrees ! We really appreciate the vote of confidence and
0
0
9
@AkshGarg03
Aksh Garg
2 months
oops apparently I forgot to say we won 1st place 🥇
@AkshGarg03
Aksh Garg
2 months
awesome experience building out the first ever AI orchestration framework for distributing devin’s w/ @SohamGovande @brendanm0407 @sahiladhawade huge thanks to @cognition_labs , @mercor_ai , @EtchedInc , and prod for hosting check out our demo 👇
6
7
17
3
0
10
@AkshGarg03
Aksh Garg
1 month
(4/5) We’re optimistic about a decentralized future, unlocking orders of magnitude more compute for AI. We’re very grateful to the efforts by @ce_zhang , @Hades317 , @Yong_jun_He , @jaredq_ , and @awnihannun on the distributed training and optimized edge inference fronts,
2
0
8
@AkshGarg03
Aksh Garg
2 months
@kautilya_p @karpathy existing model with recurrent infini-attention! more details here:
0
1
9
@AkshGarg03
Aksh Garg
2 months
worth it for the socks @SohamGovande
2
0
8
@AkshGarg03
Aksh Garg
1 month
(3/5) Our load balancing algorithm partitions the full network into sub-shards that fit on consumer device memory. Before the first pass, architecture + device profiling across all devices is performed to ensure data parallel across weaker devices, reducing pipeline bottlenecks.
Tweet media one
1
0
8
@AkshGarg03
Aksh Garg
1 month
really excited about a highly parallelized and distributed future - and curious if ppl have seen interesting movements in the space recently long thread with my thesis to come soon
2
0
8
@AkshGarg03
Aksh Garg
1 month
went from gpu poor to gpu rich in a week
2
0
8
@AkshGarg03
Aksh Garg
2 months
insane team (+ cognition) @cognition_labs
Tweet media one
2
2
8
@AkshGarg03
Aksh Garg
1 month
(2/5) Users add their devices to the network with 1 terminal command. On the client side, people see all devices currently hosted on the platform. They can then upload their model architecture and dataset and select to train on specific devices or be automatically assigned.
Tweet media one
1
1
7
@AkshGarg03
Aksh Garg
1 month
time to put this on the vision pro and go swimming through activations
@KyeGomezB
Ky⨋ Gom⨋z (U/ACC) (HIRING)
1 month
Nobody is talking about this right now but Google dropped a CRAZY model interpretation graph tool, to enable you to better understand your models better. Check it out, link 👇
6
174
1K
1
0
6
@AkshGarg03
Aksh Garg
1 month
2/ PodPlex extends Shard to support native HF architectures. This enabled us to FineTune LLama2 on a network of RTX 4090s using fully sharded data parallel.
Tweet media one
1
0
6
@AkshGarg03
Aksh Garg
1 month
4/ 💰Crypto x AI Arbitrage: Decentralization offers cost savings and higher profits. On leading chains, a Nvidia 4090 might earn $5/day in crypto. For AI workloads, the same GPU can earn ~$17/day. Miners could earn more by shifting their GPUs to AI workloads.
1
0
6
@AkshGarg03
Aksh Garg
1 month
@awnihannun @CKT_Conner @dill_pkl @emilyzsh yep! For models on MLX community, we use the MLX variants, and when mlx variants aren't there default to mps to still activate Mac gpus. saw awesome speedups, so appreciate your work a ton!! repos still private atm as we we clean things up and think thru next steps but hope we
1
1
6
@AkshGarg03
Aksh Garg
2 months
holy shit, RT'd by @cognition_labs we gotta win hackathons more often 🚀
@cognition_labs
Cognition
2 months
1. d3n is an AI agent orchestration framework that can spawn a fleet of Devin instances to tackle distributed problems in parallel
3
4
80
1
0
6
@AkshGarg03
Aksh Garg
1 month
9/ Had a really awesome building out a feature update for Shard this past weekend with @SohamGovande @jameszhou02 @JerryZhou50238 Release coming soon.
0
0
6
@AkshGarg03
Aksh Garg
1 month
3/ We adopt a 3-staged approach for rapid iteration. 1) train models in a distributed fashion 2) offload the checkpoints to parallelized @runpod_io serverless inference endpoints 3) evaluate model strengths/weaknesses via Nomic Atlas
Tweet media one
1
0
4
@AkshGarg03
Aksh Garg
2 months
days when ur sick and tired, but still want to keep building can’t tell if this is passion or paranoia
1
0
5
@AkshGarg03
Aksh Garg
1 month
7/ 📡 Communication Optimization: Techniques like DiLoCo, DiPaCo, and Cocktail-SGD are reducing the information shared over slower internet connections. For example, Cocktail-SGD shows only a 1.2x reduction in training speed with a 500Mbps connection, compared to data-center
1
1
4
@AkshGarg03
Aksh Garg
2 months
huge!! really excited about on-device LLMs
@AdrienBrault
Adrien Brault-Lesage
2 months
Pushed to @ollama ! ollama run adrienbrault/nous-hermes2pro-llama3-8b:q4_K_M --format json 'solar system as json'
4
9
55
0
0
5
@AkshGarg03
Aksh Garg
1 month
we said an update to Shard was coming soon...stay tuned for tomorrow's release
@AkshGarg03
Aksh Garg
1 month
(1/5) @CKT_Conner , @dill_pkl , @emilyzsh , and I are excited to introduce Shard - a proof-of-concept for an infinitely scalable distributed system composed of consumer hardware for training and running ML models! Features: - Data + Pipeline Parallel for handling arbitrarily large
22
31
200
0
0
5
@AkshGarg03
Aksh Garg
1 month
2/ Decentralized AI training could revolutionize computing. As communication and compression improve, we may either see a continuation of centralized clusters or a major unlock in available compute for AI training. Here’s why decentralized AI makes sense:
1
0
5
@AkshGarg03
Aksh Garg
1 month
In its current version, users can use their phone/laptop to ask questions about and interact with their environment in seconds. Next, we’re hoping to embed the software onto smart glasses like Meta’s or Frame. If you have app dev/hardware experience, shoot me a dm.
0
0
5
@AkshGarg03
Aksh Garg
2 months
quant firm: 1) Scraper: compiles market data 2) QR Devin: write models to convert market data into alpha 3) Trader Devin: uses models to inform strategies 4) Dev Devin: converts QR models to high-frequency, systematic strategies
0
1
5
@AkshGarg03
Aksh Garg
1 month
9/ Huge thank you to the entire @runpod_io team for organizing the hackathon! We had a ton of fun building PodPlex out!!
1
0
4
@AkshGarg03
Aksh Garg
1 month
8/ Finally, we pass our eval data to nomic for understanding model strengths/weaknesses and hard-mining challenging samples to train more on.
Tweet media one
2
0
4
@AkshGarg03
Aksh Garg
1 month
one the one hand, I appreciate @janleike and @ilyasut 's departure on these grounds; however, at the same i fear whether these ppl leaving means even less pushback on the priorities
1
0
3
@AkshGarg03
Aksh Garg
1 month
3/ 🔓Unlocking Scale: Training larger models demands more data and compute. The problem isn't GPU supply but under-utilization. Ethereum’s blockchain used up to 40 million GPUs, far exceeding the 48K GPUs for Llama3 and 25K for GPT-4. Imagine harnessing this power for AI.
1
0
4
@AkshGarg03
Aksh Garg
1 month
Interesting insight from a friend today… People fear AGI bc it doesn't conform to our current economic models/understanding. However, that doesn't mean it won’t conform to any. Instead of fitting current models to it, we really need to think outside the box
0
0
4
@AkshGarg03
Aksh Garg
1 month
6/ 🚧 Challenges: Decentralized training is complex due to networking bottlenecks and heterogeneous hardware. Solutions like fully-sharded data parallel and pipeline parallel help distribute large models across smaller nodes, optimizing training efficiency.
1
0
4
@AkshGarg03
Aksh Garg
2 months
a full software engineering organization: 1) PM devin (Pam) for high-level product vision 2) Research devin (Steve) for quick python prototyping 3) C++ devin (Cynthia) for high-frequency, optimized execution
1
1
4
@AkshGarg03
Aksh Garg
1 month
5/ From here, automatically orchestrate a fleet of Runpod pods for training this model via a custom Docker image.
Tweet media one
2
0
3
@AkshGarg03
Aksh Garg
1 month
@_Borriss_ @karpathy something to look forward to
0
0
2
@AkshGarg03
Aksh Garg
1 month
5/ 🌐 New Marketplaces: Imagine earning money while your computer is idle. Decentralized AI allows your devices to contribute to AI training, creating passive income. Connecting 5 Macs can offer similar memory to an A100 at a fraction of the cost.
2
0
3
@AkshGarg03
Aksh Garg
1 month
8/ What’s Next? Decentralized computing is likely 2-3 years away but offers early opportunities. I'm optimistic about tools like Shard will build solutions to integrate these advancements. Exciting times are ahead in the future of decentralized AI training!
1
0
3
@AkshGarg03
Aksh Garg
1 month
@SohamGovande doesnt need to worry about buying vanilla lattes anymore @siddrrsh
Tweet media one
1
0
3
@AkshGarg03
Aksh Garg
2 months
why choose between crypto and ai when u can do both
Tweet media one
0
0
3
@AkshGarg03
Aksh Garg
2 months
something big is on the way
0
0
3
@AkshGarg03
Aksh Garg
1 month
@lichuacu @kolelee_ magician extraordinaire @kolelee_
0
0
3
@AkshGarg03
Aksh Garg
1 month
awesome writeup by @AaryanSinghal4 , @bfspector , @simran_s_arora , and @HazyResearch on GPU optimizations: also go check out ThunderKittens 👇
@AaryanSinghal4
Aaryan Singhal
1 month
Super excited to be introducing TK! Our embedded DSL exposes fundamental hardware abstractions that allow you to build the AI kernels that GPUs want to run. Had so much fun working on this with @bfspector , @simran_s_arora , and folks at @HazyResearch 🚀 Repo:
8
11
37
0
0
3
@AkshGarg03
Aksh Garg
1 month
0
0
2
@AkshGarg03
Aksh Garg
2 months
@cognition_labs ' devin just successfully pulled an issue in a huge repo I provided, fixed bugs in it, installed missing packages, and wrote unit tests for it. this is real stuff...
Tweet media one
0
0
2
@AkshGarg03
Aksh Garg
2 months
@SohamGovande thanks bro - infinite context devin next?? @cognition_labs
0
0
1
@AkshGarg03
Aksh Garg
2 months
some ideas we've thought of to kick off brainstorming 👇
1
0
2
@AkshGarg03
Aksh Garg
2 months
@zodia1221 @SohamGovande @brendanm0407 @sahiladhawade @cognition_labs while we would love to take credit, that was unfortunately not our coin lmao
0
0
2
@AkshGarg03
Aksh Garg
2 months
got interesting ideas you want us to explore? comment below and let’s get our devin fleet cooking 🏎️
0
0
2
@AkshGarg03
Aksh Garg
2 months
marketing agency: 1) CEO: picks general market (education/AI) ⬇️ 2) Researchers: collect data about the industry ⬇️ 3) Markets: call individual people with personalized info + market knowledge
1
0
2
@AkshGarg03
Aksh Garg
2 months
@alexandr_wang sounds like the most poetic way of saying don't do quant
1
0
1
@AkshGarg03
Aksh Garg
2 months
@fchollet if we can't expand the manifold but just smoothen it out, how should we think about model improvement when public data runs out?
0
0
2
@AkshGarg03
Aksh Garg
2 months
@ellenjxu_ @cognition_labs @mercor_ai built diff for hacking this together in <24 hrs!! congrats @ellenjxu_
0
0
2
@AkshGarg03
Aksh Garg
1 month
7/ We load the model checkpoints and spin up docker containers for parallelized serverless inference across multiple endpoints, >10x’ign inference speed for rapid iteration.
Tweet media one
1
0
2
@AkshGarg03
Aksh Garg
2 months
0
0
2
@AkshGarg03
Aksh Garg
2 months
heavy finance focus in replies lol 😂 - silicon valley x wall street crossover imminent
@dill_pkl
Dylan Lim
2 months
@AkshGarg03 AI Financial Advisory Service: 1) Advisor Devin personalizes investment strategies. 2) Risk Manager Devin assesses and mitigates financial risks. 3) Market Analyst Devin forecasts market trends using AI.
0
0
1
1
0
2
@AkshGarg03
Aksh Garg
1 month
4/ To get started users simply provide the path to their HF model (hacky and supports a select few models for now) and the number of GPUs they want to distribute across
Tweet media one
1
0
2
@AkshGarg03
Aksh Garg
2 months
@ishanskhare good thing @AllenNaliath has already found it
0
0
1
@AkshGarg03
Aksh Garg
1 month
if gpt4o was free, i'm confident OpenAI has stronger/more revolutionary models coming...but i feel my excitement for the tech revolution dampened by long-term ramifications
1
0
1
@AkshGarg03
Aksh Garg
1 month
0
0
1
@AkshGarg03
Aksh Garg
2 months
0
0
1
@AkshGarg03
Aksh Garg
1 month
6/ We reduce costs 76% (benchmarked on 4090’s) by using fault-tolerant training with checkpointing, which allows us to tap into less reliable spot machines on the community cloud.
Tweet media one
1
0
1
@AkshGarg03
Aksh Garg
1 month
@drsarfati @CKT_Conner @dill_pkl @emilyzsh @SaladTech that would be awesome - we would love to be onboarded to try it out
0
0
1
@AkshGarg03
Aksh Garg
2 months
@ismaelitovega we've been pretty compute bound so haven't gotten to training big models with benchmarks yet, but it's on the to-do. stay tuned for more!!
0
0
1
@AkshGarg03
Aksh Garg
25 days
@annietma very aesthetic!
0
0
1
@AkshGarg03
Aksh Garg
2 months
@maxjacob_me if only we started this cheap, wearable AI wouldn't have a bad rep
0
0
1
@AkshGarg03
Aksh Garg
2 months
@evolvingstuff @karpathy we'll watch and see - i'm optimistic about cool things coming ahead :)
0
0
1
@AkshGarg03
Aksh Garg
1 month
awesome work, looking forward to a world with much lower entry barriers to CUDA
@VictorTaelin
Taelin
1 month
RELEASE DAY After almost 10 years of hard work, tireless research, and a dive deep into the kernels of computer science, I finally realized a dream: running a high-level language on GPUs. And I'm giving it to the world! Bend compiles modern programming features, including: -
455
2K
14K
0
0
1
@AkshGarg03
Aksh Garg
2 months
@brendanm0407 and by agent we mean devin. and by devin we mean d3n
1
0
1
@AkshGarg03
Aksh Garg
2 months
@iam_abdulR happiness
1
0
1
@AkshGarg03
Aksh Garg
2 months
@awnihannun this is awesome, do the perf gains also generalize to llama3? will test out tn
1
0
1
@AkshGarg03
Aksh Garg
2 months
@siddrrsh homies on a 1bit quantization with a black and white future
0
0
1
@AkshGarg03
Aksh Garg
2 months
super hyped!!
@0x_Cryptoyang
Cryptoyoung 🦇🔊
2 months
@AkshGarg03 Great jobs🧑‍💻!& We are planning to add Gemma to the LLMs Arena ⚔️, Where community users will vote, let's see how it performs.
0
0
2
0
0
1
@AkshGarg03
Aksh Garg
2 months
@pledgecoinsol @brendanm0407 pledge coin or devin coin tho?
1
0
1
@AkshGarg03
Aksh Garg
2 months
@wltnwang cool shit
0
0
1
@AkshGarg03
Aksh Garg
2 months
0
0
1
@AkshGarg03
Aksh Garg
2 months
@AllenNaliath @sama most expensive skin ever
0
0
1