Jonathan Frankle @jefrankle Twitter profile | Pikagi

Pikagi

Jonathan Frankle

@jefrankle

16,152

Followers

685

Following

238

Media

3,152

Statuses

Chief AI Scientist @Databricks via MosaicML. Leading @DbrxMosaicAI . PhD @MIT_CSAIL . BS/MS @PrincetonCS . DC area native. Making AI efficient for everyone.

P{NY=.6, SF=.2, DC=.1, BOS=.1}

https://t.co/clxHO8FHlb

Joined December 2013

Don't wanna be here? Send us removal request.

Pinned Tweet

@jefrankle

Jonathan Frankle

2 months

Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.

Tweet media one

34

265

1K

Last Seen Profiles

@CoachingManual

@mahhnni

@BrooksAndDunn

@senya8833

@Mentalvideos_

@rvrangers

@T3Bates

@ReyaSissy

@Loxonin_joe

@caireydecopas

@yyylqsmy175995

@heysaiii

@cilvanis

@henry0405123

@AmeliaWilde_AI

@undeadempress

@_at87

@PruCenter

@yourAngel_mhst

@ShrinersOhio

@DanClare1

@pijatsolo_xxxxx

@Jhonkosmik

@iGoatYouTV

@Kacper_Chojnas

@mistherius

@arcsintl

@PinsonValley

@SheedaShe

@rerechaals

@momofuku

@afcbpodcast

@callumbeattieuk

@Paradise__Hell

@VirginOrbit

@CHlRPl

@jefrankle

Jonathan Frankle

4 years

I just open-sourced my codebase for research on neural network pruning, the Lottery Ticket Hypothesis, and other topics in deep learning. It's written in PyTorch and designed to make it easy to add new models, datasets, and experiments. Check it out:

Tweet card media

GitHub - facebookresearch/open_lth: A repository in preparation for open-sourcing lottery ticket...

A repository in preparation for open-sourcing lottery ticket hypothesis code. - facebookresearch/open_lth

12

261

1K

@jefrankle

Jonathan Frankle

1 year

MPT is here! Check out our shiny new LLMs, open-source w/commercial license. The base MPT-7B model is 7B params trained on 1T tokens and reaches LLaMA-7B quality. We also created Instruct (commercial), Chat, and (my favorite) StoryWriter-65k+ variants. 🧵

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs | Databricks Blog

Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and...

www.databricks.com

28

162

779

@jefrankle

Jonathan Frankle

11 months

MPT-30B is here! Same MPT architecture, 30B parameters, > 1T tokens, 8k context window, trained on H100s, great perf (esp on coding), single-GPU inference, commercially usable, and massively upgraded instruct and chat datasets. Take it for a spin!

MPT-30B-Chat - a Hugging Face Space by mosaicml

24

115

665

@jefrankle

Jonathan Frankle

1 year

I defended today, and @mcarbin was kind enough to pass me. My favorite part of the thesis is a ground-up rewrite of the original Lottery Ticket Hypothesis paper with fresh data and a narrative that benefits from four years of hindsight/maturity. Coming soon to an arxiv near you!

42

15

527

@jefrankle

Jonathan Frankle

1 year

72 hrs ago, @togethercompute released the RedPajama dataset. Like everyone, we at @MosaicML were very excited about the idea of a fully open-source Llama. So excited, in fact, that we've already trained a 1B model on 200B tokens! It's on HF (Apache2) here:

mosaicml/mpt-1b-redpajama-200b · Hugging Face

13

82

485

@jefrankle

Jonathan Frankle

11 months

I'm absolutely thrilled that @MosaicML has agreed to join @databricks as we continue on our journey to make the latest advances deep learning efficient and accessible for everyone. The best of MosaicML is yet to come 🎉🎉🎉

@alighodsi

Ali Ghodsi

11 months

Big news: we've agreed to acquire @MosaicML , a leading generative AI platform. I couldn’t be more excited to join forces once the deal closes.

36

212

1K

47

22

474

@jefrankle

Jonathan Frankle

1 year

For those interested, my dissertation is now available. The highlight is that I re-did the original Lottery Ticket Hypothesis paper from scratch (Chapter 3). It follows the same path as the original, but with years of context/maturity + a new experiment 🧵

Tweet media one

5

55

415

@jefrankle

Jonathan Frankle

3 years

I guess the word is out! I'll be joining the @Harvard faculty in the fall of 2023 as part of an amazing cohort of new machine learning professors. Looking forward to sharing more about my lab, how to join, and everything we're building at @hseas when I'm a bit closer to arriving!

@boazbaraktcs

Boaz Barak

3 years

1/21 Banner year for Harvard CS! New hires include Sham Kakade @ShamKakade6 and Fernanda Viegas @viegasf (joining @wattenberg ), as well as David Alvarez-Melis, Anurag Anshu @AnuragAnshu4 , Sitan Chen, and Jonathan Frankle @jefrankle

Tweet media one

5

10

198

37

11

407

@jefrankle

Jonathan Frankle

3 years

Reviewer 3 has very strong opinions on BatchNorm.

Tweet media one

7

13

395

@jefrankle

Jonathan Frankle

2 years

TLDR: Announcing 🌟COMPOSER🌟, a PyTorch trainer for efficient training *algorithmically*. Train 2x-4x faster on standard ML tasks, a taste of what's coming from @MosaicML . Star it, 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚖𝚘𝚜𝚊𝚒𝚌𝚖𝚕, contribute, be efficient! Thread:

Tweet card media

GitHub - mosaicml/composer: Supercharge Your Model Training

Supercharge Your Model Training. Contribute to mosaicml/composer development by creating an account on GitHub.

8

79

383

@jefrankle

Jonathan Frankle

2 years

Introducing the *Mosaic ResNet*, a new take on a CV workhorse that sets SOTA for efficiency at any ImageNet accuracy. The recipe uses 12 techniques that change the math of training for a 7x speedup over standard baselines + up to 3.8x over the latest work.

Mosaic Research | Databricks Blog

Latest blogs from the team at Mosaic Research

www.databricks.com

7

69

369

@jefrankle

Jonathan Frankle

4 years

Several methods have recently been proposed for pruning neural networks at initialization. In our new paper ( @KDziugaite , @roydanroy , @mcarbin ), we rigorously study these methods to determine why they "miss the mark" and underperform pruning after training

Tweet media one

4

89

349

@jefrankle

Jonathan Frankle

3 years

NEW WORKSHOP: Sparsity in Neural Networks: Advancing Understanding and Practice (July 8-9, 2021). This workshop will bring together members of the many communities working on neural network sparsity to share their perspectives and the latest cutting-edge research (Deadline: 6/15)

Tweet media one

4

85

337

@jefrankle

Jonathan Frankle

10 months

My latest weekend project: tossing another 500B tokens at 8k context window on MPT-7B, hereby creating MPT-7B-8k! 1.5B tokens, 8k context, waaaaay better performance. When we say speed at @MosaicML , we mean it: it took me three days to train.

Tweet card media

Announcing MPT-7B-8K: 8K Context Length for Document Understanding | Databricks Blog

Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting from the MPT-7B checkpoint in 3 days on...

www.databricks.com

7

58

296

@jefrankle

Jonathan Frankle

2 years

LLMs are for everyone! Own a GPT-3 trained on your data rather than renting a GPT-3 trained on a web crawl of Reddit. The price is $450K. llm-early-access @mosaicml .com to try it. This is just the start: this doesn't use MosaicML speedups. Our goal is to do this for $100K soon. 🧵

@DbrxMosaicAI

Databricks Mosaic Research

2 years

We have exciting news! In our latest and greatest LLM blog, we show how MosaicML Cloud can help you train LLMs from 1B - 70B parameters, and for the first time, publish transparent times + costs for doing so. It's a lot cheaper than you think! (1/9)

7

48

342

7

36

293

@jefrankle

Jonathan Frankle

1 year

And now it's < $50k. 🖼️Announcing @MosaicML 's diffusion offering 📷We replicated Stable Diffusion 2.0, training from scratch with huge speedup, and we can do it on your data too. Human eval showed the model to be indistinguishable from the original. Blog:

Tweet card media

Training Stable Diffusion from Scratch for $50k with MosaicML (Part 2) | Databricks Blog

We've replicated Stable Diffusion 2 for less than $50k, and we've open-sourced the training code so you can too! This is a 3x cost reduction from our last blog post and an 8x reduction from the...

www.databricks.com

8

29

285

@jefrankle

Jonathan Frankle

3 months

Hello OLMo! Congrats to the amazing @allen_ai team! 7B params, 2T tokens, open training code, open data, intermediate checkpoints, Apache 2.0, the works. A giant leap for open science. Nicely done @mechanicaldirk , @i_beltagy , @soldni , and so many others!

Tweet card media

Hello OLMo: A truly open LLM

As the world races to deploy AI models that are effective and safe, the demand for Open Large Language Models (LLMs) has exploded. The…

blog.allenai.org

10

48

284

@jefrankle

Jonathan Frankle

4 years

No matter how established I become, I still feel completely inadequate seeing all the NeurIPS tweets. For all the folks out there who feel similarly, you aren't alone.

7

3

278

@jefrankle

Jonathan Frankle

2 years

@Harvard is investing $500M in ML and neuroscience over the next decade thanks to a gift from @ChanZuckerberg . For my part, this makes it possible to study the foundations of deep learning at scales and depth that are otherwise only accessible in industry.

New Harvard institute to study natural, artificial intelligence

University-wide initiative made possible by gift from Priscilla Chan and Mark Zuckerberg.

news.harvard.edu

@ChanZuckerberg

Chan Zuckerberg Initiative

@ChanZuckerberg

2 years

#AI and #MachineLearning are just beginning to make an impact in biology and there is more untapped potential. We’re launching the Kempner Institute for the Natural and Artificial Intelligence at @Harvard to bring together these two fields

1

14

58

10

35

267

@jefrankle

Jonathan Frankle

4 years

At ICML next week, @KDziugaite @roydanroy @mcarbin and I will present Linear Mode Connectivity and the Lottery Ticket Hypothesis. We study the effect of SGD noise (like data order) on neural net optimization. Those results shed new light on lottery tickets

Tweet media one

3

54

266

@jefrankle

Jonathan Frankle

1 year

In the last two weeks, @MosaicML had lots of big news: We trained a 1B/200B token LLM on RedPajama in < 72hrs, Replit used us to train a SOTA code model in < 10 days, we trained SD2 for < $50k, long context BERTs, and perf #'s on H100s. But the biggest news is coming this week 👀

6

18

259

@jefrankle

Jonathan Frankle

8 months

I AM SO ANGRY. I won't submit to ACL venues again after they shafted a student after rebuttals with this idiotic policy. Since anonymity is gone, though, publicity time! Check out awesome work by @ZackAnkner on improving MLM training by scheduling masking:

Tweet card media

Dynamic Masking Rate Schedules for MLM Pretraining

Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking...

@nsaphra

Naomi Saphra

8 months

Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for the anonymity deadline. I talk about how the ACL embargo policy hurts junior researchers and makes ACL venues less desirable for NLP work. I don’t talk about the pointless NOISE it adds.

28

47

404

10

35

254

@jefrankle

Jonathan Frankle

3 years

Even though we've been doing this for a year, I will never get used to the fact that the only in-person audience members for my job talk are my stuffed animals.

Tweet media one

6

1

255

@jefrankle

Jonathan Frankle

1 year

Curious how the RedPajama effort by @togethercompute is progressing and where it stacks up? We evaluated the 7B model they just released 2h ago! Here is how it looks 800B tokens in. (Eval took 16 minutes on 32 A100s.)

Tweet media one

@togethercompute

Together AI

@togethercompute

1 year

The first RedPajama models are here! The 3B and 7B models are now available under Apache 2.0 license, including instruction-tuned and chat versions! This project demonstrates the power of the open-source AI community with many contributors ... 🧵

Tweet media one

19

227

887

11

54

249

@jefrankle

Jonathan Frankle

4 years

@davidjschwab @arimorcos and I have a new paper on BatchNorm. It's not exactly a typical BatchNorm paper: we study the accuracy when freezing all weights at random init and "Training BatchNorm and Only BatchNorm." How did this happen? It's a funny story...

Tweet media one

9

61

249

@jefrankle

Jonathan Frankle

3 years

What happens if you freeze all weights at initialization and train *only* BatchNorm? Turns out that BatchNorm's affine parameters are impressively powerful, and they can use random features to reach surprisingly high accuracy. Find out more at the 12pm ET ICLR poster session!

Tweet media one

9

29

250

@jefrankle

Jonathan Frankle

2 years

This is a big deal - I'm so excited it's finally out! This work convinced me that large models like LLMs are really databases. @OfirPress and co-authors created a way to measure the expressive power of querying languages for these new NN DBs and an awesome new querying language.

@OfirPress

Ofir Press

2 years

We've found a new way to prompt language models that improves their ability to answer complex questions Our Self-ask prompt first has the model ask and answer simpler subquestions. This structure makes it easy to integrate Google Search into an LM. Watch our demo with GPT-3 🧵⬇️

52

307

2K

4

28

228

@jefrankle

Jonathan Frankle

4 years

We just posted our ICLR 2020 paper on "The Early Phase of Neural Network Training" on ArXiv. In the paper, we explore the changes neural networks undergo during the crucial first phase of training using winning lottery tickets.

@arimorcos

Ari Morcos

4 years

Recent studies have suggested that the earliest iterations of DNN training are especially critical. In our #ICLR2020 paper with @jefrankle and @davidjschwab , we use the lottery ticket framework to rigorously examine this crucial phase of training.

Tweet media one

1

42

210

0

26

216

@jefrankle

Jonathan Frankle

2 months

This this this. I don't like to call out papers we can't reproduce because I'm not a fan of making life and career harder for PhD students. But I no longer believe anything if we haven't reproduced it ourselves.

@rajammanabrolu

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

I'm writing this cause I'm a bit salty. We've implemented so many seemingly promising, published & popular papers only for them to utterly flop. At least I like to think that my personal bs Big Model paper classifier is now pretty good given my extensive training data.

4

1

100

11

19

218

@jefrankle

Jonathan Frankle

2 months

Tired reflection at the end of DBRX release day: Last March 24, @databricks released Dolly. Last May 5, Mosaic released MPT-7B. Less than a year later, we've built an LLM that seems to surpass the original ChatGPT. I am so incredibly proud of our team - you all are amazing ♥️

12

13

214

@jefrankle

Jonathan Frankle

1 year

Two weeks later, Stable Diffusion training cost is already down to $125K, a 22% reduction. Our team is blazingly fast at making training blazingly fast.

Tweet media one

@mvpatel2000

Mihir Patel

1 year

Two weeks ago, we released a blog showing training Stable Diffusion from scratch only costs $160K. Proud to report that blog is already out of date. It now costs 💸 $125K 💸. Stay tuned for more speedup from @MosaicML , coming soon to a diffusion model near you!

Tweet media one

3

17

206

2

15

205

@jefrankle

Jonathan Frankle

2 years

What bullshit. Dear OpenAI researchers: My email address is jonathan @mosaicml .com. We are hiring! We have healthy culture and no elitism, egos, or divas.

@sama

Sam Altman

2 years

OpenAI’s chief scientist: expresses curiosity/openness about a mysterious idea, caveats with “may”. Meta’s chief AI scientist: the certainty of "nope". Probably explains a lot of the past 5 years. Dear Meta AI researchers: My email address is sama @openai .com. We are hiring!

76

60

1K

6

2

202

@jefrankle

Jonathan Frankle

1 year

Would anybody be interested in a couple dozen 1B, llama-style (waaaay past Chinchilla) language models trained on different data mixes? I don't know if this question has been well-studied before.

34

12

196

@jefrankle

Jonathan Frankle

4 years

Thank you @LastWeekTonight for featuring @ClareAngelyn , @alvarombedoya , and my work on police use of face recognition. For those in the ML community thinking about "broader impact," there are big opportunities to use your expertise to make a difference in the policy world!

Tweet media one

2

26

187

@jefrankle

Jonathan Frankle

1 year

Another NeurIPS, another moment of deep disappointment about the bro culture and sense of entitlement in pockets of the ML community.

8

3

178

@jefrankle

Jonathan Frankle

3 years

So now I need to ask my adviser for an iPhone if I want to participate in the intellectual life of the ML community?

7

7

169

@jefrankle

Jonathan Frankle

2 years

I'm no hardware expert, but - if you need 2x the power and (potentially) 2x the price to 3x the compute - it seems to me that hardware has little or nothing to offer when it comes to getting us out of the jam we're in with giant models. Our solution has to be better algorithms.

18

10

164

@jefrankle

Jonathan Frankle

5 years

I'm thrilled that LTH received a best paper award at ICLR 2019. Stay tuned - more lottery ticket work is on the way!

@iclr_conf

ICLR 2024

5 years

Best Paper Award 1: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle · Michael Carbin

Tweet media one

3

105

450

4

16

166

@jefrankle

Jonathan Frankle

1 month

Grateful to @atalwalkar for the chance to present my recent work at CMU today! There are very exciting things happening in industry these days.

Tweet media one

13

3

163

@jefrankle

Jonathan Frankle

1 month

Please welcome MegaBlocks to the Databricks family!

Tweet media one

4

15

153

@jefrankle

Jonathan Frankle

5 months

I'm just along for the ride. This is all Nikhil!

@arankomatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

5 months

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Modifies the scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand

Tweet media one

4

73

407

3

10

148

@jefrankle

Jonathan Frankle

3 years

Even after five years of PhD, I continue to be astounded by the casual, gratuitous cruelty that peers and institutions in academia are capable of inflicting without a second thought.

2

1

151

@jefrankle

Jonathan Frankle

5 years

LATEST NEWS ON THE LOTTERY TICKET HYPOTHESIS: We ( @KDziugaite , @roydanroy , and @mcarbin ) just released an updated paper showing (1) how to scale the LTH to deeper networks on ImageNet and (2) initial insights into why the LTH works. Check it out on Arxiv:

Tweet media one

3

26

146

@jefrankle

Jonathan Frankle

2 months

Usual links to get started with DBRX: * Code is on github: * Instruct model is on HF: * Base model is on HF: * Playground to interact with the model:

Tweet card media

databricks/dbrx-instruct · Hugging Face

5

25

145

@jefrankle

Jonathan Frankle

10 months

I used to believe that @kchonyc was really three postdocs in a trench coat, having never personally seen physical existence that he existed. I was excited to finally have my hypothesis refuted this evening. Empiricism at work!

Tweet media one

2

4

142

@jefrankle

Jonathan Frankle

2 months

Move to NYC! We have bagels and culture and public transportation and @srush_nlp and bagels!

@srush_nlp

Sasha Rush (ICLR)

2 months

@jefrankle Everyone should move to NYC and build open language models.

3

13

111

7

8

138

@jefrankle

Jonathan Frankle

2 years

A Sunday walk down memory lane: I found the original drafts of the Lottery Ticket Hypothesis paper this weekend. Links and commentary in this 🧵. You can chart progress of public versions on arXiv v1-v5, but it's especially cool to see the earliest attempts at stating the idea.

Tweet media one

Tweet media two

Tweet media three

Tweet media four

2

16

135

@jefrankle

Jonathan Frankle

2 months

For the first time in my life, I have a practical need for the lottery ticket hypothesis. My time has come.

8

3

136

@jefrankle

Jonathan Frankle

2 years

Today is the third time I've personally found plagiarism during ML reviewing in the past year-ish. I'm seeing a more papers now that I'm an AC, but it's still a change. I'm not even trying hard; I'm just checking passages that sound strangely familiar, and I'm right every time.

9

3

135

@jefrankle

Jonathan Frankle

5 years

Just released our new paper about "The Lottery Ticket Hypothesis at Scale," (with Gintare Karolina Dziugaite, @roydanroy , and @mcarbin ) extending our prior work to find small trainable subnetworks within deeper, state-of-the-art neural networks.

Tweet media one

2

36

132

@jefrankle

Jonathan Frankle

2 months

New blog by @mvpatel2000 with big updates to our LLM stack and a new recipe for blazingly fast training. FP8 + Configurable ActCkpt + DTensor + Hybrid Sharding + Comm/Act Compression = 700+ TFLOPs on H100s and linear scaling.

Tweet media one

10

17

132

@jefrankle

Jonathan Frankle

11 months

You asked, we delivered. Hello MPT-30B! (Anybody wanna ask for 65B?)

@code_star

Cody Blakeney

11 months

Boy, was everyone asking for it. 30B wen? 30B now!

Tweet media one

Tweet media two

Tweet media three

2

0

14

20

6

127

@jefrankle

Jonathan Frankle

1 year

Very excited to partner with @allen_ai on this incredible project. It's not every day you get to work with the best of the best on what will soon be the best open-source model in the world ⚔️

AI2 is developing a large language model optimized for science | TechCrunch

AI2, the nonprofit institute devoted to researching AI and its implications, plans to release an open source LLM in 2024.

3

15

128

@jefrankle

Jonathan Frankle

3 years

Repeating my offer from the @MLRetrospective panel today: the ML community desperately needs a survey track (like IEEE S&P SoK ). I will happily volunteer to do the work to create/run this if any chairs of @NeuripsConf @iclr_conf or @icmlconf are interested

6

8

123

@jefrankle

Jonathan Frankle

3 years

On the job market this year, I was often asked what I considered to be my most impactful piece of research. My answer was always The Perpetual Lineup. The lottery ticket hypothesis affected the lives of grad students. The Perpetual Lineup affected the lives of everyday people.

@GeorgetownCPT

Georgetown Privacy

3 years

1/ 5 years ago today, we released #ThePerpetualLineup , the first of its kind survey of state and local police use of face recognition technology, based on 100 public records requests yielding 16,000+ pages.

3

35

65

1

20

121

@jefrankle

Jonathan Frankle

1 year

Time for my usual refrain: Most papers weren't accepted to ICLR, and don't let Twitter fool you into thinking otherwise. Plenty of smart people and great papers didn't get the outcome they wanted, and you're in very good company if that's you right now.

0

10

120

@jefrankle

Jonathan Frankle

3 years

@tomgoldsteincs We got in trouble with GCP support for naming our GPUs "Bitcoin miner #27 "

1

2

119

@jefrankle

Jonathan Frankle

11 months

I've been reading @matei_zaharia 's papers since I was an undergrad. It isn't every day you get to work for a celebrity. I'm so excited!

@matei_zaharia

Matei Zaharia

11 months

So excited about this -- bringing amazing platforms for data and AI together. @NaveenGRao , @hanlintang and @jefrankle have built an amazing team that has steadily reduced the cost of AI training and released breakthroughs like the first open source LLMs with >64K context.

4

17

153

5

2

115

@jefrankle

Jonathan Frankle

2 months

@arthurmensch You're welcome 😉

Tweet media one

5

8

115

@jefrankle

Jonathan Frankle

4 years

Hoping to get a fifth review on my NeurIPS papers so I can complete these poker hands. Three different papers are one review away from a straight, and it would be nice to turn that two-pair into a full house.

5

2

114

@jefrankle

Jonathan Frankle

1 year

MosaicBERT is here! I've been teasing this for a while. TLDR: You have no excuse NOT to pre-train BERT in your papers. The highlights: * BERT-base quality for $20 and BERT-large quality (using BERT-base) for $100 * 2.4x speedup overall * Pre-trained weights are available on HF

@DbrxMosaicAI

Databricks Mosaic Research

1 year

📢 Introducing MosaicBERT! Now you can pretrain a high-quality BERT model from scratch on the MosaicML platform for $20. So why should you train your own BERT model? 👇 (1/5)

2

9

97

4

13

110

@jefrankle

Jonathan Frankle

6 months

@NaveenGRao @MosaicML @databricks Startups, need a CEO to get those end-of-year goals? DM me! I ♥️ our startup community. I want to see all the great GenAI products accelerated! I'm willing to give you Naveen so I can keep my GPUs.

5

4

110

@jefrankle

Jonathan Frankle

4 years

Ever wondered what happens when you freeze all the weights in a neural network and only train batch normalization? Me too! Turns out you can get 80%+ accuracy on CIFAR-10 by doing so. Check out our poster and oral in the SEDL workshop in West 121. With David Schwab and @arimorcos

Tweet media one

Tweet media two

Tweet media three

Tweet media four

6

14

110

@jefrankle

Jonathan Frankle

3 years

We found a scaling law that describes the error of entire families of pruned neural networks. For the night owls among you, check out our work "On the Predictability of Pruning Across Scales" at ICML (tonight, 11pm-2am Eastern). Led by @jonsrosenfeld !

Tweet media one

3

16

107

@jefrankle

Jonathan Frankle

2 years

Louder for the people in the back: LARGE MODELS (GPT, DALLE) = DATABASES PROMPTS = QUERIES OUTPUTS = RESPONSES NNs find new relations w/in data. Anyone, no matter the resources, can study better querying langs and possibly beat a big model they could never afford to train.

@jefrankle

Jonathan Frankle

2 years

This is a big deal - I'm so excited it's finally out! This work convinced me that large models like LLMs are really databases. @OfirPress and co-authors created a way to measure the expressive power of querying languages for these new NN DBs and an awesome new querying language.

4

28

228

4

13

111

@jefrankle

Jonathan Frankle

2 years

Authors are people, and cruelty from the community takes a toll. I've been where the ICML awardees are in a smaller way; I often wish I hadn't gotten an award. Also, students are researchers in training. If they did their best, any shortcomings are on supervision and the process.

2

3

107

@jefrankle

Jonathan Frankle

2 months

We built DBRX end-to-end in 2-3 months on 3K H100s. We train LLMs day-in and day-out with our customers - thousands in the past year. We're constantly finding better ways to build models, and DBRX showcases our latest advances: in data, modeling, performance, and fine-tuning.

Tweet media one

2

12

108

@jefrankle

Jonathan Frankle

3 years

Come to my ICLR poster (12pm ET today) on pruning neural networks at initialization and why we're currently missing the mark. Let's discuss lottery tickets, the nature of optimizing sparse networks, and ways forward for pruning early in training!

Tweet media one

3

4

107

@jefrankle

Jonathan Frankle

2 years

Announcing the BAY AREA EFFICIENT ML POSTER SESSION on Thur 3/31 in Palo Alto. Are you sad that MLSys was postponed? Do you miss getting to see research friends in person? Me too! Submit abstracts for work-in-progress or pandemic-era publications by 3/22.

Tweet media one

2

19

105

@jefrankle

Jonathan Frankle

2 months

It begins...

Tweet media one

2

3

102

@jefrankle

Jonathan Frankle

4 months

"We find that overall, the Intel Gaudi 2 accelerator has the 2nd best training performance-per-chip we've tested (only bested by the NVIDIA H100)." More great AI chips means more FLOPs available for all of us to build great models. Soon, we'll all be GPU (or Gaudi) rich 🤑

@abhi_venigalla

Abhi Venigalla

@abhi_venigalla

4 months

New year, new MME 🎉 @dskhudia and I profiled @Intel Gaudi2 accelerators for LLM training and inference, and found great performance and perf/$ !

6

32

137

2

12

99

@jefrankle

Jonathan Frankle

1 year

Big announcement 5 of 6: @MosaicML does inference! As per usual, efficiency is king 👑 We serve LLMs and diffusion models - 15x cheaper than comparable OpenAI offerings. We're happy to serve anything: your model, our model, or anything open-source. Exciting times here at Mosaic!

@DbrxMosaicAI

Databricks Mosaic Research

1 year

📣Announcing MosaicML Inference 📣 Ever wanted a text or image generation API that doesn’t make you send data to a third party? Or a cheaper solution than paying by the token? Or an easy way to get a trained model into production? We can help with that. 🧵

Tweet media one

18

106

666

2

9

102

@jefrankle

Jonathan Frankle

2 months

The endorsement that matters most to me.

Tweet media one

@jefrankle

Jonathan Frankle

2 months

Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.

Tweet media one

34

265

1K

3

5

99

@jefrankle

Jonathan Frankle

11 months

Come to @MosaicML and don't take either pill!

@miniapeur

Mathieu Alain

11 months

Tweet media one

18

790

6K

7

4

97

@jefrankle

Jonathan Frankle

2 years

How much does it *really* cost to train GPT? There's speculation and (mis-)info out there that might make you think it's out of reach. It isn't. @MosaicML is laser focused on making it easy and accessible. This is Part 1 of a series introducing Mosaic GPT.

Mosaic Research | Databricks Blog

Latest blogs from the team at Mosaic Research

www.databricks.com

1

13

101

@jefrankle

Jonathan Frankle

10 months

@zacharylipton @OpenAI @AnthropicAI @MosaicML We don't really have time to publish. We blog, but not slog [through the publication process]. Importantly, though, we're open about what we do, unlike the other two companies you mentioned.

4

2

98

@jefrankle

Jonathan Frankle

1 year

It's been a busy two weeks at @MosaicML : * RedPajama-1B in < 72hrs * @Replit model trained on MosaicML * SD2.0 for < $50k * H100 numbers * Long context BERT * MosaicML Inference release * MPT-7B release Our tools can consistently train great models, and this pace isn't stopping!

@jefrankle

Jonathan Frankle

1 year

MPT is here! Check out our shiny new LLMs, open-source w/commercial license. The base MPT-7B model is 7B params trained on 1T tokens and reaches LLaMA-7B quality. We also created Instruct (commercial), Chat, and (my favorite) StoryWriter-65k+ variants. 🧵

28

162

779

1

13

95

@jefrankle

Jonathan Frankle

3 years

Last chance to register (for free!) for to attend the neural network sparsity workshop taking place tomorrow and Friday at . Join 700 (!) registrants, 62 poster presenters, 7 spotlights, 6 invited talks, 3 panels, and 1 tutorial. See you tomorrow!

1

23

94

@jefrankle

Jonathan Frankle

3 years

Come check out our new paper on how there are sparse, *transferrable* winning ticket subnetworks in BERT pre-trained models at NeurIPS Poster Session 2 today (12pm EST, 9am PST). This project was led by the extraordinary @tianlong_chen @utexasece with teammates at the @MITIBMLab .

Tweet media one

3

7

93

@jefrankle

Jonathan Frankle

8 months

This is how the @MosaicML research team expresses its gratitude to those who go above and beyond in support of our scientific mission ⚔️

@mvpatel2000

Mihir Patel

8 months

Came to @MosaicML for the GPUs. Stayed for the gear

Tweet media one

Tweet media two

10

4

159

7

4

92

@jefrankle

Jonathan Frankle

2 months

Party like it's 2011 #tbt

Tweet media one

5

4

91

@jefrankle

Jonathan Frankle

29 days

Fixed it for you, @code_star

Tweet media one

@rajko_rad

Rajko Radovanović @ ICLR 2024

29 days

Incredible performance and efficiency, all Apache 2.0 open, from the amazing @MistralAI team!!! I’m most excited for the SOTA OSS function calling, code and math reasoning capabilities!! Cc @GuillaumeLample @tlacroix6 @dchaplot @mjmj1oo @sophiamyang

Tweet media one

3

4

71

4

8

91

@jefrankle

Jonathan Frankle

2 months

DBRX-Medium????? 👀

@mvpatel2000

Mihir Patel

2 months

🚨 Announcing DBRX-Medium 🧱, a new SoTA open weights 36b active 132T total parameter MoE trained on 12T tokens (~3e24 flops). Dbrx achieves 150 tok/sec while clearing a wide variety of benchmarks. Deep dive below! 1/N

Tweet media one

15

31

307

1

3

86

@jefrankle

Jonathan Frankle

1 year

How did training go? Zero human intervention needed. None. Nada. Our arch+optimization changes eliminated all loss spikes. The @MosaicML platform (our proprietary training software available to customers) caught and recovered from four hw failures. Please enjoy our empty logbook.

Tweet media one

Tweet media two

6

4

86

@jefrankle

Jonathan Frankle

7 months

I signed on. The world watches Harvard, and Harvard must meet the moment.

@boazbaraktcs

Boaz Barak

7 months

More than 100 Harvard faculty denounce "false equivalency between attacks on noncombatants and self-defense against those atrocities." The conflict is complex but "the events of this week are not complicated. Sometimes there is such a thing as evil"

64

192

1K

1

4

86

@jefrankle

Jonathan Frankle

4 years

Interested in hearing the latest updates on the Lottery Ticket Hypothesis? Come to my talk tomorrow morning at 9:30 at the #AAAI20 Sister Conference Track! New and improved formula with more tickets, more hypotheses, less lottery, same great taste. 🎟️🎟️🎟️

3

10

85

@jefrankle

Jonathan Frankle

11 months

@TiernanRayTech @MosaicML @databricks @OpenAI Our work at MosaicML has nothing to do with the lottery ticket hypothesis, just to be clear.

4

2

83

@jefrankle

Jonathan Frankle

1 year

MPT-7B is now available to run locally. That includes all the variants!

@nomic_ai

Nomic AI

1 year

Huge Release of GPT4All 💥 Powerful LLM's just got faster! - Anyone can try @MosaicML 's new MPT model on their desktop! No GPU required! - Runs on Windows/Mac/Ubuntu Try it at:

14

79

362

1

14

83

@jefrankle

Jonathan Frankle

1 year

It's amazing how much more fun I'm having reviewing for @MLSysConf than for the main ML conferences. I'm a big fan of smaller, more focused venues with shared values.

1

3

84

@jefrankle

Jonathan Frankle

11 months

At @MosaicML , we're loyal to getting the most out of every dollar for our customers, not to any one specific way of doing things. Our stack can run anywhere, and now that means AMD! Check out our numbers on MI250X, and join me in getting excited for MI300.

@abhi_venigalla

Abhi Venigalla

@abhi_venigalla

11 months

Ready for GPU independence weekend? PyTorch 2.0 and LLM Foundry now work out of the box on ** AMD GPUs! ** We profiled MPT 1B-13B models on AMD MI250 and saw perf within 80% of A100-40GB, which could go up to 94% with better software. It. Just. Works.

23

214

1K

3

3

83

@jefrankle

Jonathan Frankle

11 days

I'm at ICLR!

6

3

83

@jefrankle

Jonathan Frankle

1 year

Excited about MPT-7B-Storywriter-65k+ 📚 with its 65k training context? It's now available to play with on Hugging Face Spaces. Go have fun with ultra-long contexts! 📝

0

16

81

@jefrankle

Jonathan Frankle

2 years

Way too little, way too late @MIT . At least Princeton tried to head unionization off with a big pay bump. All MIT can muster is an unsubstantiated warning that "promises...have been overstated." If the administration actually cared about our best interests, we wouldn't be here.

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

5

81

@jefrankle

Jonathan Frankle

28 days

🦙🦙🦙🔥

1

2

81

@jefrankle

Jonathan Frankle

2 months

Read the blog for the full details. DBRX is better than general-purpose open LLMs at general-purpose tasks and better than CodeLLaMA-70B at code. It even gives the closed models a run for their money. It's great at using its 32k context and at RAG too.

Introducing DBRX: A New State-of-the-Art Open LLM | Databricks Blog

www.databricks.com

2

6

80

@jefrankle

Jonathan Frankle

2 years

Today I learned that there exist ego-driven ML startups that are really, truly cruel to their researchers (and probably the rest of their employees). This is a subtweet.

6

1

79

@jefrankle

Jonathan Frankle

6 months

❤️

@NaveenGRao

Naveen Rao

6 months

Any AI researchers or engineers feeling uneasy about the future, we are hiring at @databricks / @MosaicML !

9

37

298

1

0

79

@jefrankle

Jonathan Frankle

3 years

It's *almost* great to be back at @MIT_CSAIL ! If someone has their kayak handy, could they please paddle over and drop off a lifejacket and some buckets?

7

2

78

@jefrankle

Jonathan Frankle

1 year

Third - my personal favorite - MPT-7B-StoryWriter-65k+. This model is fine-tuned on English language literature with a context length of 65k. How? We use ALiBi position encodings ( @OfirPress ), so the model can use any length and extrapolate longer (up to 84k in our testing).

Tweet media one

Tweet media two

5

9

77