Jonathan Frankle Profile Banner
Jonathan Frankle Profile
Jonathan Frankle

@jefrankle

16,152
Followers
685
Following
238
Media
3,152
Statuses

Chief AI Scientist @Databricks via MosaicML. Leading @DbrxMosaicAI . PhD @MIT_CSAIL . BS/MS @PrincetonCS . DC area native. Making AI efficient for everyone.

P{NY=.6, SF=.2, DC=.1, BOS=.1}
Joined December 2013
Don't wanna be here? Send us removal request.
Pinned Tweet
@jefrankle
Jonathan Frankle
2 months
Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.
Tweet media one
34
265
1K
@jefrankle
Jonathan Frankle
4 years
I just open-sourced my codebase for research on neural network pruning, the Lottery Ticket Hypothesis, and other topics in deep learning. It's written in PyTorch and designed to make it easy to add new models, datasets, and experiments. Check it out:
12
261
1K
@jefrankle
Jonathan Frankle
1 year
MPT is here! Check out our shiny new LLMs, open-source w/commercial license. The base MPT-7B model is 7B params trained on 1T tokens and reaches LLaMA-7B quality. We also created Instruct (commercial), Chat, and (my favorite) StoryWriter-65k+ variants. 🧵
28
162
779
@jefrankle
Jonathan Frankle
11 months
MPT-30B is here! Same MPT architecture, 30B parameters, > 1T tokens, 8k context window, trained on H100s, great perf (esp on coding), single-GPU inference, commercially usable, and massively upgraded instruct and chat datasets. Take it for a spin!
24
115
665
@jefrankle
Jonathan Frankle
1 year
I defended today, and @mcarbin was kind enough to pass me. My favorite part of the thesis is a ground-up rewrite of the original Lottery Ticket Hypothesis paper with fresh data and a narrative that benefits from four years of hindsight/maturity. Coming soon to an arxiv near you!
42
15
527
@jefrankle
Jonathan Frankle
1 year
72 hrs ago, @togethercompute released the RedPajama dataset. Like everyone, we at @MosaicML were very excited about the idea of a fully open-source Llama. So excited, in fact, that we've already trained a 1B model on 200B tokens! It's on HF (Apache2) here:
13
82
485
@jefrankle
Jonathan Frankle
11 months
I'm absolutely thrilled that @MosaicML has agreed to join @databricks as we continue on our journey to make the latest advances deep learning efficient and accessible for everyone. The best of MosaicML is yet to come 🎉🎉🎉
@alighodsi
Ali Ghodsi
11 months
Big news: we've agreed to acquire @MosaicML , a leading generative AI platform. I couldn’t be more excited to join forces once the deal closes.
36
212
1K
47
22
474
@jefrankle
Jonathan Frankle
1 year
For those interested, my dissertation is now available. The highlight is that I re-did the original Lottery Ticket Hypothesis paper from scratch (Chapter 3). It follows the same path as the original, but with years of context/maturity + a new experiment 🧵
Tweet media one
5
55
415
@jefrankle
Jonathan Frankle
3 years
I guess the word is out! I'll be joining the @Harvard faculty in the fall of 2023 as part of an amazing cohort of new machine learning professors. Looking forward to sharing more about my lab, how to join, and everything we're building at @hseas when I'm a bit closer to arriving!
@boazbaraktcs
Boaz Barak
3 years
1/21 Banner year for Harvard CS! New hires include Sham Kakade @ShamKakade6 and Fernanda Viegas @viegasf (joining @wattenberg ), as well as David Alvarez-Melis, Anurag Anshu @AnuragAnshu4 , Sitan Chen, and Jonathan Frankle @jefrankle
Tweet media one
5
10
198
37
11
407
@jefrankle
Jonathan Frankle
3 years
Reviewer 3 has very strong opinions on BatchNorm.
Tweet media one
7
13
395
@jefrankle
Jonathan Frankle
2 years
TLDR: Announcing 🌟COMPOSER🌟, a PyTorch trainer for efficient training *algorithmically*. Train 2x-4x faster on standard ML tasks, a taste of what's coming from @MosaicML . Star it, 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚖𝚘𝚜𝚊𝚒𝚌𝚖𝚕, contribute, be efficient! Thread:
8
79
383
@jefrankle
Jonathan Frankle
2 years
Introducing the *Mosaic ResNet*, a new take on a CV workhorse that sets SOTA for efficiency at any ImageNet accuracy. The recipe uses 12 techniques that change the math of training for a 7x speedup over standard baselines + up to 3.8x over the latest work.
7
69
369
@jefrankle
Jonathan Frankle
4 years
Several methods have recently been proposed for pruning neural networks at initialization. In our new paper ( @KDziugaite , @roydanroy , @mcarbin ), we rigorously study these methods to determine why they "miss the mark" and underperform pruning after training
Tweet media one
4
89
349
@jefrankle
Jonathan Frankle
3 years
NEW WORKSHOP: Sparsity in Neural Networks: Advancing Understanding and Practice (July 8-9, 2021). This workshop will bring together members of the many communities working on neural network sparsity to share their perspectives and the latest cutting-edge research (Deadline: 6/15)
Tweet media one
4
85
337
@jefrankle
Jonathan Frankle
10 months
My latest weekend project: tossing another 500B tokens at 8k context window on MPT-7B, hereby creating MPT-7B-8k! 1.5B tokens, 8k context, waaaaay better performance. When we say speed at @MosaicML , we mean it: it took me three days to train.
7
58
296
@jefrankle
Jonathan Frankle
2 years
LLMs are for everyone! Own a GPT-3 trained on your data rather than renting a GPT-3 trained on a web crawl of Reddit. The price is $450K. llm-early-access @mosaicml .com to try it. This is just the start: this doesn't use MosaicML speedups. Our goal is to do this for $100K soon. 🧵
@DbrxMosaicAI
Databricks Mosaic Research
2 years
We have exciting news! In our latest and greatest LLM blog, we show how MosaicML Cloud can help you train LLMs from 1B - 70B parameters, and for the first time, publish transparent times + costs for doing so. It's a lot cheaper than you think! (1/9)
7
48
342
7
36
293
@jefrankle
Jonathan Frankle
1 year
And now it's < $50k. 🖼️Announcing @MosaicML 's diffusion offering 📷We replicated Stable Diffusion 2.0, training from scratch with huge speedup, and we can do it on your data too. Human eval showed the model to be indistinguishable from the original. Blog:
8
29
285
@jefrankle
Jonathan Frankle
3 months
Hello OLMo! Congrats to the amazing @allen_ai team! 7B params, 2T tokens, open training code, open data, intermediate checkpoints, Apache 2.0, the works. A giant leap for open science. Nicely done @mechanicaldirk , @i_beltagy , @soldni , and so many others!
10
48
284
@jefrankle
Jonathan Frankle
4 years
No matter how established I become, I still feel completely inadequate seeing all the NeurIPS tweets. For all the folks out there who feel similarly, you aren't alone.
7
3
278
@jefrankle
Jonathan Frankle
2 years
@Harvard is investing $500M in ML and neuroscience over the next decade thanks to a gift from @ChanZuckerberg . For my part, this makes it possible to study the foundations of deep learning at scales and depth that are otherwise only accessible in industry.
@ChanZuckerberg
Chan Zuckerberg Initiative
2 years
#AI and #MachineLearning are just beginning to make an impact in biology and there is more untapped potential. We’re launching the Kempner Institute for the Natural and Artificial Intelligence at @Harvard to bring together these two fields
1
14
58
10
35
267
@jefrankle
Jonathan Frankle
4 years
At ICML next week, @KDziugaite @roydanroy @mcarbin and I will present Linear Mode Connectivity and the Lottery Ticket Hypothesis. We study the effect of SGD noise (like data order) on neural net optimization. Those results shed new light on lottery tickets
Tweet media one
3
54
266
@jefrankle
Jonathan Frankle
1 year
In the last two weeks, @MosaicML had lots of big news: We trained a 1B/200B token LLM on RedPajama in < 72hrs, Replit used us to train a SOTA code model in < 10 days, we trained SD2 for < $50k, long context BERTs, and perf #'s on H100s. But the biggest news is coming this week 👀
6
18
259
@jefrankle
Jonathan Frankle
8 months
I AM SO ANGRY. I won't submit to ACL venues again after they shafted a student after rebuttals with this idiotic policy. Since anonymity is gone, though, publicity time! Check out awesome work by @ZackAnkner on improving MLM training by scheduling masking:
@nsaphra
Naomi Saphra
8 months
Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for the anonymity deadline. I talk about how the ACL embargo policy hurts junior researchers and makes ACL venues less desirable for NLP work. I don’t talk about the pointless NOISE it adds.
28
47
404
10
35
254
@jefrankle
Jonathan Frankle
3 years
Even though we've been doing this for a year, I will never get used to the fact that the only in-person audience members for my job talk are my stuffed animals.
Tweet media one
6
1
255
@jefrankle
Jonathan Frankle
1 year
Curious how the RedPajama effort by @togethercompute is progressing and where it stacks up? We evaluated the 7B model they just released 2h ago! Here is how it looks 800B tokens in. (Eval took 16 minutes on 32 A100s.)
Tweet media one
@togethercompute
Together AI
1 year
The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions! This project demonstrates the power of the open-source AI community with many contributors ... 🧵
Tweet media one
19
227
887
11
54
249
@jefrankle
Jonathan Frankle
4 years
@davidjschwab @arimorcos and I have a new paper on BatchNorm. It's not exactly a typical BatchNorm paper: we study the accuracy when freezing all weights at random init and "Training BatchNorm and Only BatchNorm." How did this happen? It's a funny story...
Tweet media one
9
61
249
@jefrankle
Jonathan Frankle
3 years
What happens if you freeze all weights at initialization and train *only* BatchNorm? Turns out that BatchNorm's affine parameters are impressively powerful, and they can use random features to reach surprisingly high accuracy. Find out more at the 12pm ET ICLR poster session!
Tweet media one
9
29
250
@jefrankle
Jonathan Frankle
2 years
This is a big deal - I'm so excited it's finally out! This work convinced me that large models like LLMs are really databases. @OfirPress and co-authors created a way to measure the expressive power of querying languages for these new NN DBs and an awesome new querying language.
@OfirPress
Ofir Press
2 years
We've found a new way to prompt language models that improves their ability to answer complex questions Our Self-ask prompt first has the model ask and answer simpler subquestions. This structure makes it easy to integrate Google Search into an LM. Watch our demo with GPT-3 🧵⬇️
52
307
2K
4
28
228
@jefrankle
Jonathan Frankle
4 years
We just posted our ICLR 2020 paper on "The Early Phase of Neural Network Training" on ArXiv. In the paper, we explore the changes neural networks undergo during the crucial first phase of training using winning lottery tickets.
@arimorcos
Ari Morcos
4 years
Recent studies have suggested that the earliest iterations of DNN training are especially critical. In our #ICLR2020 paper with @jefrankle and @davidjschwab , we use the lottery ticket framework to rigorously examine this crucial phase of training.
Tweet media one
1
42
210
0
26
216
@jefrankle
Jonathan Frankle
2 months
This this this. I don't like to call out papers we can't reproduce because I'm not a fan of making life and career harder for PhD students. But I no longer believe anything if we haven't reproduced it ourselves.
@rajammanabrolu
Prithviraj (Raj) Ammanabrolu
2 months
I'm writing this cause I'm a bit salty. We've implemented so many seemingly promising, published & popular papers only for them to utterly flop. At least I like to think that my personal bs Big Model paper classifier is now pretty good given my extensive training data.
4
1
100
11
19
218
@jefrankle
Jonathan Frankle
2 months
Tired reflection at the end of DBRX release day: Last March 24, @databricks released Dolly. Last May 5, Mosaic released MPT-7B. Less than a year later, we've built an LLM that seems to surpass the original ChatGPT. I am so incredibly proud of our team - you all are amazing ♥️
12
13
214
@jefrankle
Jonathan Frankle
1 year
Two weeks later, Stable Diffusion training cost is already down to $125K, a 22% reduction. Our team is blazingly fast at making training blazingly fast.
Tweet media one
@mvpatel2000
Mihir Patel
1 year
Two weeks ago, we released a blog showing training Stable Diffusion from scratch only costs $160K. Proud to report that blog is already out of date. It now costs 💸 $125K 💸. Stay tuned for more speedup from @MosaicML , coming soon to a diffusion model near you!
Tweet media one
3
17
206
2
15
205
@jefrankle
Jonathan Frankle
2 years
What bullshit. Dear OpenAI researchers: My email address is jonathan @mosaicml .com. We are hiring! We have healthy culture and no elitism, egos, or divas.
@sama
Sam Altman
2 years
OpenAI’s chief scientist: expresses curiosity/openness about a mysterious idea, caveats with “may”. Meta’s chief AI scientist: the certainty of "nope". Probably explains a lot of the past 5 years. Dear Meta AI researchers: My email address is sama @openai .com. We are hiring!
76
60
1K
6
2
202
@jefrankle
Jonathan Frankle
1 year
Would anybody be interested in a couple dozen 1B, llama-style (waaaay past Chinchilla) language models trained on different data mixes? I don't know if this question has been well-studied before.
34
12
196
@jefrankle
Jonathan Frankle
4 years
Thank you @LastWeekTonight for featuring @ClareAngelyn , @alvarombedoya , and my work on police use of face recognition. For those in the ML community thinking about "broader impact," there are big opportunities to use your expertise to make a difference in the policy world!
Tweet media one
2
26
187
@jefrankle
Jonathan Frankle
1 year
Another NeurIPS, another moment of deep disappointment about the bro culture and sense of entitlement in pockets of the ML community.
8
3
178
@jefrankle
Jonathan Frankle
3 years
So now I need to ask my adviser for an iPhone if I want to participate in the intellectual life of the ML community?
7
7
169
@jefrankle
Jonathan Frankle
2 years
I'm no hardware expert, but - if you need 2x the power and (potentially) 2x the price to 3x the compute - it seems to me that hardware has little or nothing to offer when it comes to getting us out of the jam we're in with giant models. Our solution has to be better algorithms.
18
10
164
@jefrankle
Jonathan Frankle
5 years
I'm thrilled that LTH received a best paper award at ICLR 2019. Stay tuned - more lottery ticket work is on the way!
@iclr_conf
ICLR 2024
5 years
Best Paper Award 1: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle · Michael Carbin
Tweet media one
3
105
450
4
16
166
@jefrankle
Jonathan Frankle
1 month
Grateful to @atalwalkar for the chance to present my recent work at CMU today! There are very exciting things happening in industry these days.
Tweet media one
13
3
163
@jefrankle
Jonathan Frankle
1 month
Please welcome MegaBlocks to the Databricks family!
Tweet media one
4
15
153
@jefrankle
Jonathan Frankle
5 months
I'm just along for the ride. This is all Nikhil!
@arankomatsuzaki
Aran Komatsuzaki
5 months
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Modifies the scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand
Tweet media one
4
73
407
3
10
148
@jefrankle
Jonathan Frankle
3 years
Even after five years of PhD, I continue to be astounded by the casual, gratuitous cruelty that peers and institutions in academia are capable of inflicting without a second thought.
2
1
151
@jefrankle
Jonathan Frankle
5 years
LATEST NEWS ON THE LOTTERY TICKET HYPOTHESIS: We ( @KDziugaite , @roydanroy , and @mcarbin ) just released an updated paper showing (1) how to scale the LTH to deeper networks on ImageNet and (2) initial insights into why the LTH works. Check it out on Arxiv:
Tweet media one
3
26
146
@jefrankle
Jonathan Frankle
2 months
Usual links to get started with DBRX: * Code is on github: * Instruct model is on HF: * Base model is on HF: * Playground to interact with the model:
5
25
145
@jefrankle
Jonathan Frankle
10 months
I used to believe that @kchonyc was really three postdocs in a trench coat, having never personally seen physical existence that he existed. I was excited to finally have my hypothesis refuted this evening. Empiricism at work!
Tweet media one
2
4
142
@jefrankle
Jonathan Frankle
2 months
Move to NYC! We have bagels and culture and public transportation and @srush_nlp and bagels!
@srush_nlp
Sasha Rush (ICLR)
2 months
@jefrankle Everyone should move to NYC and build open language models.
3
13
111
7
8
138
@jefrankle
Jonathan Frankle
2 years
A Sunday walk down memory lane: I found the original drafts of the Lottery Ticket Hypothesis paper this weekend. Links and commentary in this 🧵. You can chart progress of public versions on arXiv v1-v5, but it's especially cool to see the earliest attempts at stating the idea.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
16
135
@jefrankle
Jonathan Frankle
2 months
For the first time in my life, I have a practical need for the lottery ticket hypothesis. My time has come.
8
3
136
@jefrankle
Jonathan Frankle
2 years
Today is the third time I've personally found plagiarism during ML reviewing in the past year-ish. I'm seeing a more papers now that I'm an AC, but it's still a change. I'm not even trying hard; I'm just checking passages that sound strangely familiar, and I'm right every time.
9
3
135
@jefrankle
Jonathan Frankle
5 years
Just released our new paper about "The Lottery Ticket Hypothesis at Scale," (with Gintare Karolina Dziugaite, @roydanroy , and @mcarbin ) extending our prior work to find small trainable subnetworks within deeper, state-of-the-art neural networks.
Tweet media one
2
36
132
@jefrankle
Jonathan Frankle
2 months
New blog by @mvpatel2000 with big updates to our LLM stack and a new recipe for blazingly fast training. FP8 + Configurable ActCkpt + DTensor + Hybrid Sharding + Comm/Act Compression = 700+ TFLOPs on H100s and linear scaling.
Tweet media one
10
17
132
@jefrankle
Jonathan Frankle
11 months
You asked, we delivered. Hello MPT-30B! (Anybody wanna ask for 65B?)
@code_star
Cody Blakeney
11 months
Boy, was everyone asking for it. 30B wen? 30B now!
Tweet media one
Tweet media two
Tweet media three
2
0
14
20
6
127
@jefrankle
Jonathan Frankle
1 year
Very excited to partner with @allen_ai on this incredible project. It's not every day you get to work with the best of the best on what will soon be the best open-source model in the world ⚔️
3
15
128
@jefrankle
Jonathan Frankle
3 years
Repeating my offer from the @MLRetrospective panel today: the ML community desperately needs a survey track (like IEEE S&P SoK ). I will happily volunteer to do the work to create/run this if any chairs of @NeuripsConf @iclr_conf or @icmlconf are interested
6
8
123
@jefrankle
Jonathan Frankle
3 years
On the job market this year, I was often asked what I considered to be my most impactful piece of research. My answer was always The Perpetual Lineup. The lottery ticket hypothesis affected the lives of grad students. The Perpetual Lineup affected the lives of everyday people.
@GeorgetownCPT
Georgetown Privacy
3 years
1/ 5 years ago today, we released #ThePerpetualLineup , the first of its kind survey of state and local police use of face recognition technology, based on 100 public records requests yielding 16,000+ pages.
3
35
65
1
20
121
@jefrankle
Jonathan Frankle
1 year
Time for my usual refrain: Most papers weren't accepted to ICLR, and don't let Twitter fool you into thinking otherwise. Plenty of smart people and great papers didn't get the outcome they wanted, and you're in very good company if that's you right now.
0
10
120
@jefrankle
Jonathan Frankle
3 years
@tomgoldsteincs We got in trouble with GCP support for naming our GPUs "Bitcoin miner #27 "
1
2
119
@jefrankle
Jonathan Frankle
11 months
I've been reading @matei_zaharia 's papers since I was an undergrad. It isn't every day you get to work for a celebrity. I'm so excited!
@matei_zaharia
Matei Zaharia
11 months
So excited about this -- bringing amazing platforms for data and AI together. @NaveenGRao , @hanlintang and @jefrankle have built an amazing team that has steadily reduced the cost of AI training and released breakthroughs like the first open source LLMs with >64K context.
4
17
153
5
2
115
@jefrankle
Jonathan Frankle
2 months
@arthurmensch You're welcome 😉
Tweet media one
5
8
115
@jefrankle
Jonathan Frankle
4 years
Hoping to get a fifth review on my NeurIPS papers so I can complete these poker hands. Three different papers are one review away from a straight, and it would be nice to turn that two-pair into a full house.
5
2
114
@jefrankle
Jonathan Frankle
1 year
MosaicBERT is here! I've been teasing this for a while. TLDR: You have no excuse NOT to pre-train BERT in your papers. The highlights: * BERT-base quality for $20 and BERT-large quality (using BERT-base) for $100 * 2.4x speedup overall * Pre-trained weights are available on HF
@DbrxMosaicAI
Databricks Mosaic Research
1 year
📢 Introducing MosaicBERT! Now you can pretrain a high-quality BERT model from scratch on the MosaicML platform for $20. So why should you train your own BERT model? 👇 (1/5)
2
9
97
4
13
110
@jefrankle
Jonathan Frankle
6 months
@NaveenGRao @MosaicML @databricks Startups, need a CEO to get those end-of-year goals? DM me! I ♥️ our startup community. I want to see all the great GenAI products accelerated! I'm willing to give you Naveen so I can keep my GPUs.
5
4
110
@jefrankle
Jonathan Frankle
4 years
Ever wondered what happens when you freeze all the weights in a neural network and only train batch normalization? Me too! Turns out you can get 80%+ accuracy on CIFAR-10 by doing so. Check out our poster and oral in the SEDL workshop in West 121. With David Schwab and @arimorcos
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
14
110
@jefrankle
Jonathan Frankle
3 years
We found a scaling law that describes the error of entire families of pruned neural networks. For the night owls among you, check out our work "On the Predictability of Pruning Across Scales" at ICML (tonight, 11pm-2am Eastern). Led by @jonsrosenfeld !
Tweet media one
3
16
107
@jefrankle
Jonathan Frankle
2 years
Louder for the people in the back: LARGE MODELS (GPT, DALLE) = DATABASES PROMPTS = QUERIES OUTPUTS = RESPONSES NNs find new relations w/in data. Anyone, no matter the resources, can study better querying langs and possibly beat a big model they could never afford to train.
@jefrankle
Jonathan Frankle
2 years
This is a big deal - I'm so excited it's finally out! This work convinced me that large models like LLMs are really databases. @OfirPress and co-authors created a way to measure the expressive power of querying languages for these new NN DBs and an awesome new querying language.
4
28
228
4
13
111
@jefrankle
Jonathan Frankle
2 years
Authors are people, and cruelty from the community takes a toll. I've been where the ICML awardees are in a smaller way; I often wish I hadn't gotten an award. Also, students are researchers in training. If they did their best, any shortcomings are on supervision and the process.
2
3
107
@jefrankle
Jonathan Frankle
2 months
We built DBRX end-to-end in 2-3 months on 3K H100s. We train LLMs day-in and day-out with our customers - thousands in the past year. We're constantly finding better ways to build models, and DBRX showcases our latest advances: in data, modeling, performance, and fine-tuning.
Tweet media one
2
12
108
@jefrankle
Jonathan Frankle
3 years
Come to my ICLR poster (12pm ET today) on pruning neural networks at initialization and why we're currently missing the mark. Let's discuss lottery tickets, the nature of optimizing sparse networks, and ways forward for pruning early in training!
Tweet media one
3
4
107
@jefrankle
Jonathan Frankle
2 years
Announcing the BAY AREA EFFICIENT ML POSTER SESSION on Thur 3/31 in Palo Alto. Are you sad that MLSys was postponed? Do you miss getting to see research friends in person? Me too! Submit abstracts for work-in-progress or pandemic-era publications by 3/22.
Tweet media one
2
19
105
@jefrankle
Jonathan Frankle
2 months
It begins...
Tweet media one
2
3
102
@jefrankle
Jonathan Frankle
4 months
"We find that overall, the Intel Gaudi 2 accelerator has the 2nd best training performance-per-chip we've tested (only bested by the NVIDIA H100)." More great AI chips means more FLOPs available for all of us to build great models. Soon, we'll all be GPU (or Gaudi) rich 🤑
@abhi_venigalla
Abhi Venigalla
4 months
New year, new MME 🎉 @dskhudia and I profiled @Intel Gaudi2 accelerators for LLM training and inference, and found great performance and perf/$ !
6
32
137
2
12
99
@jefrankle
Jonathan Frankle
1 year
Big announcement 5 of 6: @MosaicML does inference! As per usual, efficiency is king 👑 We serve LLMs and diffusion models - 15x cheaper than comparable OpenAI offerings. We're happy to serve anything: your model, our model, or anything open-source. Exciting times here at Mosaic!
@DbrxMosaicAI
Databricks Mosaic Research
1 year
📣Announcing MosaicML Inference 📣 Ever wanted a text or image generation API that doesn’t make you send data to a third party? Or a cheaper solution than paying by the token? Or an easy way to get a trained model into production? We can help with that. 🧵
Tweet media one
18
106
666
2
9
102
@jefrankle
Jonathan Frankle
2 months
The endorsement that matters most to me.
Tweet media one
@jefrankle
Jonathan Frankle
2 months
Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.
Tweet media one
34
265
1K
3
5
99
@jefrankle
Jonathan Frankle
11 months
Come to @MosaicML and don't take either pill!
@miniapeur
Mathieu Alain
11 months
Tweet media one
18
790
6K
7
4
97
@jefrankle
Jonathan Frankle
2 years
How much does it *really* cost to train GPT? There's speculation and (mis-)info out there that might make you think it's out of reach. It isn't. @MosaicML is laser focused on making it easy and accessible. This is Part 1 of a series introducing Mosaic GPT.
1
13
101
@jefrankle
Jonathan Frankle
10 months
@zacharylipton @OpenAI @AnthropicAI @MosaicML We don't really have time to publish. We blog, but not slog [through the publication process]. Importantly, though, we're open about what we do, unlike the other two companies you mentioned.
4
2
98
@jefrankle
Jonathan Frankle
1 year
It's been a busy two weeks at @MosaicML : * RedPajama-1B in < 72hrs * @Replit model trained on MosaicML * SD2.0 for < $50k * H100 numbers * Long context BERT * MosaicML Inference release * MPT-7B release Our tools can consistently train great models, and this pace isn't stopping!
@jefrankle
Jonathan Frankle
1 year
MPT is here! Check out our shiny new LLMs, open-source w/commercial license. The base MPT-7B model is 7B params trained on 1T tokens and reaches LLaMA-7B quality. We also created Instruct (commercial), Chat, and (my favorite) StoryWriter-65k+ variants. 🧵
28
162
779
1
13
95
@jefrankle
Jonathan Frankle
3 years
Last chance to register (for free!) for to attend the neural network sparsity workshop taking place tomorrow and Friday at . Join 700 (!) registrants, 62 poster presenters, 7 spotlights, 6 invited talks, 3 panels, and 1 tutorial. See you tomorrow!
1
23
94
@jefrankle
Jonathan Frankle
3 years
Come check out our new paper on how there are sparse, *transferrable* winning ticket subnetworks in BERT pre-trained models at NeurIPS Poster Session 2 today (12pm EST, 9am PST). This project was led by the extraordinary @tianlong_chen @utexasece with teammates at the @MITIBMLab .
Tweet media one
3
7
93
@jefrankle
Jonathan Frankle
8 months
This is how the @MosaicML research team expresses its gratitude to those who go above and beyond in support of our scientific mission ⚔️
@mvpatel2000
Mihir Patel
8 months
Came to @MosaicML for the GPUs. Stayed for the gear
Tweet media one
Tweet media two
10
4
159
7
4
92
@jefrankle
Jonathan Frankle
2 months
Party like it's 2011 #tbt
Tweet media one
5
4
91
@jefrankle
Jonathan Frankle
29 days
Fixed it for you, @code_star
Tweet media one
@rajko_rad
Rajko Radovanović @ ICLR 2024
29 days
Incredible performance and efficiency, all Apache 2.0 open, from the amazing @MistralAI team!!! I’m most excited for the SOTA OSS function calling, code and math reasoning capabilities!! Cc @GuillaumeLample @tlacroix6 @dchaplot @mjmj1oo @sophiamyang
Tweet media one
3
4
71
4
8
91
@jefrankle
Jonathan Frankle
2 months
DBRX-Medium????? 👀
@mvpatel2000
Mihir Patel
2 months
🚨 Announcing DBRX-Medium 🧱, a new SoTA open weights 36b active 132T total parameter MoE trained on 12T tokens (~3e24 flops). Dbrx achieves 150 tok/sec while clearing a wide variety of benchmarks. Deep dive below! 1/N
Tweet media one
15
31
307
1
3
86
@jefrankle
Jonathan Frankle
1 year
How did training go? Zero human intervention needed. None. Nada. Our arch+optimization changes eliminated all loss spikes. The @MosaicML platform (our proprietary training software available to customers) caught and recovered from four hw failures. Please enjoy our empty logbook.
Tweet media one
Tweet media two
6
4
86
@jefrankle
Jonathan Frankle
7 months
I signed on. The world watches Harvard, and Harvard must meet the moment.
@boazbaraktcs
Boaz Barak
7 months
More than 100 Harvard faculty denounce "false equivalency between attacks on noncombatants and self-defense against those atrocities." The conflict is complex but "the events of this week are not complicated. Sometimes there is such a thing as evil"
64
192
1K
1
4
86
@jefrankle
Jonathan Frankle
4 years
Interested in hearing the latest updates on the Lottery Ticket Hypothesis? Come to my talk tomorrow morning at 9:30 at the #AAAI20 Sister Conference Track! New and improved formula with more tickets, more hypotheses, less lottery, same great taste. 🎟️🎟️🎟️
3
10
85
@jefrankle
Jonathan Frankle
11 months
@TiernanRayTech @MosaicML @databricks @OpenAI Our work at MosaicML has nothing to do with the lottery ticket hypothesis, just to be clear.
4
2
83
@jefrankle
Jonathan Frankle
1 year
MPT-7B is now available to run locally. That includes all the variants!
@nomic_ai
Nomic AI
1 year
Huge Release of GPT4All 💥 Powerful LLM's just got faster! - Anyone can try @MosaicML 's new MPT model on their desktop! No GPU required! - Runs on Windows/Mac/Ubuntu Try it at:
14
79
362
1
14
83
@jefrankle
Jonathan Frankle
1 year
It's amazing how much more fun I'm having reviewing for @MLSysConf than for the main ML conferences. I'm a big fan of smaller, more focused venues with shared values.
1
3
84
@jefrankle
Jonathan Frankle
11 months
At @MosaicML , we're loyal to getting the most out of every dollar for our customers, not to any one specific way of doing things. Our stack can run anywhere, and now that means AMD! Check out our numbers on MI250X, and join me in getting excited for MI300.
@abhi_venigalla
Abhi Venigalla
11 months
Ready for GPU independence weekend? PyTorch 2.0 and LLM Foundry now work out of the box on ** AMD GPUs! ** We profiled MPT 1B-13B models on AMD MI250 and saw perf within 80% of A100-40GB, which could go up to 94% with better software. It. Just. Works.
23
214
1K
3
3
83
@jefrankle
Jonathan Frankle
11 days
I'm at ICLR!
6
3
83
@jefrankle
Jonathan Frankle
1 year
Excited about MPT-7B-Storywriter-65k+ 📚 with its 65k training context? It's now available to play with on Hugging Face Spaces. Go have fun with ultra-long contexts! 📝
0
16
81
@jefrankle
Jonathan Frankle
2 years
Way too little, way too late @MIT . At least Princeton tried to head unionization off with a big pay bump. All MIT can muster is an unsubstantiated warning that "promises...have been overstated." If the administration actually cared about our best interests, we wouldn't be here.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
5
81
@jefrankle
Jonathan Frankle
28 days
🦙🦙🦙🔥
1
2
81
@jefrankle
Jonathan Frankle
2 months
Read the blog for the full details. DBRX is better than general-purpose open LLMs at general-purpose tasks and better than CodeLLaMA-70B at code. It even gives the closed models a run for their money. It's great at using its 32k context and at RAG too.
2
6
80
@jefrankle
Jonathan Frankle
2 years
Today I learned that there exist ego-driven ML startups that are really, truly cruel to their researchers (and probably the rest of their employees). This is a subtweet.
6
1
79
@jefrankle
Jonathan Frankle
6 months
❤️
@NaveenGRao
Naveen Rao
6 months
Any AI researchers or engineers feeling uneasy about the future, we are hiring at @databricks / @MosaicML !
9
37
298
1
0
79
@jefrankle
Jonathan Frankle
3 years
It's *almost* great to be back at @MIT_CSAIL ! If someone has their kayak handy, could they please paddle over and drop off a lifejacket and some buckets?
7
2
78
@jefrankle
Jonathan Frankle
1 year
Third - my personal favorite - MPT-7B-StoryWriter-65k+. This model is fine-tuned on English language literature with a context length of 65k. How? We use ALiBi position encodings ( @OfirPress ), so the model can use any length and extrapolate longer (up to 84k in our testing).
Tweet media one
Tweet media two
5
9
77