Databricks Mosaic Research @DbrxMosaicAI Twitter profile

8

32

171

Last Seen Profiles

@vivipv1

@Timberframe2

@EarthwormAfrica

@no_pretext161

@AnnahdyRiz12629

@imjustbrowsing5

@UNCKirschner

@GreekMythComix

@sporeperson

@arianatweets

@Kama1G

@TKomorner

@JakeMathis_18

@PecintaIbuStw2

@Carson_Crider12

@clint_dempsey

@boataka590

@710ism

@MICLimerick

@Grwill2

@sooO_so8

@dattebanyyan

@1ov1bri

@karmelilody

@kura_simba33570

@maryhoughton227

@glaupie

@BoersenAktuell

@lexibernotas33

@papihush

@Danielle_Faith

@awhizee

@Dr_Amy

@tonesofcolor

@starvingivyy

@POVaults

Databricks Mosaic Research

@DbrxMosaicAI

1 year

📢 Introducing MPT: a new family of open-source commercially usable LLMs from @MosaicML . Trained on 1T tokens of text+code, MPT models match and - in many ways - surpass LLaMa-7B. This release includes 4 models: MPT-Base, Instruct, Chat, & StoryWriter (🧵)

22

216

1K

Databricks Mosaic Research

@DbrxMosaicAI

1 year

📣Announcing MosaicML Inference 📣 Ever wanted a text or image generation API that doesn’t make you send data to a third party? Or a cheaper solution than paying by the token? Or an easy way to get a trained model into production? We can help with that. 🧵

18

106

665

Databricks Mosaic Research

@DbrxMosaicAI

11 months

Introducing training LLMs with AMD hardware! MosaicML + PyTorch 2.0 + ROCm 5.4+ = LLM training out of the box with zero code changes. With MosaicML, the ML community has additional hardware + software options to choose from. Read more:

9

148

680

Databricks Mosaic Research

@DbrxMosaicAI

11 months

Today, we’re excited to share that MosaicML has agreed to join @Databricks !

16

90

626

Databricks Mosaic Research

@DbrxMosaicAI

11 months

Meet MPT-30B, the latest member of @MosaicML 's family of open-source, commercially usable models. It's trained on 1T tokens with up to 8k context (even more w/ALiBi) on A100s and *H100s* with big improvements to Instruct and Chat. Take it for a spin on HF!

17

129

550

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Meet PubMed GPT 🩺 a new SOTA on the US Medical Licensing Exam developed by MosaicML and @StanfordHAI . It's a normal GPT-3B model trained on medical data that bests hand-designed med models and generic models 40x bigger, a sweet spot for foundation models🧵

Latest blogs from the team at Mosaic Research

Mosaic LLMs: GPT-3 quality for

12

132

525

Databricks Mosaic Research

@DbrxMosaicAI

1 year

[1/8] Full technical details on our Stable Diffusion 2.0 speedrun are here! On Wednesday, we announced that we had replicated SD2 for < $50k, 2.7x over our baseline and 6x over Stability's number. Today, we share the technical nitty-gritty on how we did it:

5

44

411

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Woo hoo! 🙌What an honor to make the @Forbes AI 50 List. MosaicML empowers you build your own #GenerativeAI . Train, finetune, and deploy your custom #LLM today:

13

45

356

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Got an extra $20 burning a hole in your wallet? With the MosaicBERT architecture + training recipe, you can now pretrain a competitive #BERT -Base model from scratch on the MosaicML platform for the cost of a large pizza! 🍕⚡️👏 Learn more:

2

14

340

Databricks Mosaic Research

@DbrxMosaicAI

10 months

Announcing MPT-7B-8K: a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. With its 8k context length, MPT-7B-8K specializes in document summarization and question-answering, and may be used commercially. Read more:

7

77

356

Databricks Mosaic Research

@DbrxMosaicAI

2 years

We have exciting news! In our latest and greatest LLM blog, we show how MosaicML Cloud can help you train LLMs from 1B - 70B parameters, and for the first time, publish transparent times + costs for doing so. It's a lot cheaper than you think! (1/9)

Training large language models (LLMs) costs less than you think. Using the MosaicML platform, we show how fast, cheap, and easy it is to train these models at scale (1B -> 70B parameters). With new...

7

48

342

Databricks Mosaic Research

@DbrxMosaicAI

2 years

📢 MosaicML Cloud is now available for early access! Create advanced AI models faster and cheaper than you thought possible.

Latest blogs from the team at Mosaic Research

Train and Publish LLMs with MosaicML and W&B

3

54

306

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Want to train your own custom #LLMs while keeping your data private? 🔒 We got you. The MosaicML platform keeps your data, models, and source code secure in your private network while abstracting away the complexity of #ML training infrastructure. Learn more:

4

28

254

Databricks Mosaic Research

@DbrxMosaicAI

1 year

The MosaicML team is excited to present at the @weights_biases webinar this Thursday, 23-Feb-2023, 7PM CET/10AM PST! Our very own @leavittron will be joining W&B's @carey_phelps to showcase MosaicML #LLM training and W&B's Model Registry. Register at

MosaicML is setting new benchmarks for making large language model (LLM) training approachable and affordable for organizations of all sizes. Most of us understand that ML is an iterative process...

webinar.mosaicml.wandb.events

1

19

235

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Large Language Models (LLM) are gaining in popularity, but training these models from scratch can be a huge pain... until now! Our latest LLM blog series uncovers how to reduce the time, cost, and complexity of training these billion-parameter models:

Latest blogs from the team at Mosaic Research

1

40

252

Databricks Mosaic Research

@DbrxMosaicAI

1 year

How good are @nvidia H100s actually? In collaboration with @CoreWeave , we benchmarked A100 vs H100 performance for large language model training. Here's what we found: [1/6]

Latest blogs from the team at Mosaic Research

A Universal Law of Robustness via Isoperimetry

2

50

226

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, an exciting paper from @MSFTResearch : Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer While it's too early to say, this may be remembered as the single biggest efficiency advancement in hyperparameter tuning.

3

41

214

Databricks Mosaic Research

@DbrxMosaicAI

10 months

How can the ML community measure LLM quality in a holistic and standardized manner? The Mosaic Model Gauntlet encompasses 34 benchmarks, organized into 6 broad categories of competency, evaluated with our blazingly fast open-source ICL eval harness. 🧵👇

5

41

170

Databricks Mosaic Research

@DbrxMosaicAI

2 years

"(Certified!!) Adversarial Robustness for Free!" Paper: This is a rare paper where having multiple exclamation marks in the title is justified. [1/12]

3

26

164

Databricks Mosaic Research

@DbrxMosaicAI

3 years

Hello World! Today we come out of stealth to make ML training more efficient with a mosaic of methods that modify training to improve speed, reduce cost, and boost quality. Read our founders' blog by @NaveenGRao @hanlintang @mcarbin @jefrankle (1/4)

7

41

164

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Is overparametrization the key to solving ImageNet? In this NeurIPS Outstanding Paper Award winner, , @SebastienBubeck and @geoishard are looking at the significance of neural network overparametrization and the universal law of robustness.(1/6)

Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in...

Fortuitous Forgetting in Connectionist Networks

5

25

160

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Is forgetting actually beneficial for training? This top-reviewed ICLR paper () introduces a powerful paradigm that consistently outperforms all other methods on ResNet18 and DenseNet169 on a variety of image datasets. (1/8)

Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce...

Training LLMs at Scale with AMD MI250 GPUs | Databricks Blog

2

30

153

Databricks Mosaic Research

@DbrxMosaicAI

2 years

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism They replace the attention mechanism in a Swin transformer with a simple spatial shift of some of the channels. Turns out this actually works.

3

23

136

Databricks Mosaic Research

@DbrxMosaicAI

1 month

Ready to use a programmatic approach to prompting #LLMs and building #RAG applications? The @stanfordnlp #dspy repo includes support for @databricks Model Serving and Vector Search! Details:

1

32

133

Databricks Mosaic Research

@DbrxMosaicAI

10 months

📢 Today, we're thrilled to announce that @Databricks has completed its acquisition of MosaicML. Our teams share a common goal to make #GenerativeAI accessible for all. We're excited to change the world together! Read the press release and stay tuned for more updates:

5

24

131

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Data2Vec: A new paper by Meta AI claims to be “The first high-performance self-supervised algorithm that works for speech, vision, and text.” And the results look very promising.

2

16

125

Databricks Mosaic Research

@DbrxMosaicAI

2 years

"Scaling Laws for Neural Language Models" was one of the papers that fueled the recent push towards larger models. @DeepMind revisits the question, "How should one scale model size relative to dataset size?" and finds some surprising answers!

3

27

124

Databricks Mosaic Research

@DbrxMosaicAI

7 months

Our NLP architect @abhi_venigalla continues his work on the use of AMD accelerators at scale for #LLM training. In our latest @databricks blog post, he shares multi-node training performance results on MI250 GPUs:

We benchmarked LLM training on a multi-node AMD MI250 cluster and found near-linear scaling on up to 128 GPUs, demonstrating a compelling option for multi-node LLM training.

4

31

120

Databricks Mosaic Research

@DbrxMosaicAI

1 year

How much does it take to train a Stable Diffusion model from scratch? The answer: 79,000 A100-hours in 13 days, for a total training cost of <$160k. Our tooling reduces the time and cost to train by 2.5x, and is also extensible and simple to use.

Latest blogs from the team at Mosaic Research

Improving language models by retrieving from trillions of tokens

1

25

118

Databricks Mosaic Research

@DbrxMosaicAI

2 years

We've shared great research before, but reproducing methods from papers is hard. Announcing Composer, our library of ML speedups: . Train CV models ~4x faster and NLP models ~2x faster at the same accuracy -- with minimal tuning. (1/5)

3

34

105

Databricks Mosaic Research

@DbrxMosaicAI

2 years

We're excited to share details about how we use cyclic LR schedules to estimate time/cost/accuracy tradeoff curves for model training. This is a key element in our approach to #EfficientML -- how we benchmark the speedup methods we implement. Article here:

0

17

98

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today we take a look at a new architecture from DeepMind called RETRO: Retrievel-Enhanced Transformer (). RETRO uses a large database of documents along with an embedding-based retrieval system to improve the “knowledge” of transformers at runtime. (1/15)

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database,...

AI2 WildBench Leaderboard - a Hugging Face Space by allenai

2

20

95

Databricks Mosaic Research

@DbrxMosaicAI

2 months

DBRX is the top open-source model on the latest WildBench Leaderboard on HuggingFace! Thanks to our friends @allen_ai for this benchmark of LLMs with challenging tasks from real users in the wild. #DBRX

huggingface.co

4

31

96

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, we're excited to share our MLPerf Training 2.0 results: 23.8 minutes to train ResNet-50 to 75.9% validation accuracy on ImageNet, using 8x NVIDIA A100 GPUs.

MosaicML Satisfies the Need for Speed with MLPerf Results | Databricks Blog

MosaicML’s Open Division submission to the MLPerf Image Classification benchmark delivers a score of 23.8 minutes (4.5x speed-up relative to our baseline) on 8x NVIDIA A100 GPUs. Our results show how...

MosaicBERT: Pretraining BERT from Scratch for $20 | Databricks Blog

3

19

98

Databricks Mosaic Research

@DbrxMosaicAI

2 years

"RankGen: Improving Text Generation with Large Ranking Models" It's tempting to look at this paper as yet another method to make the numbers go up. But there's another story here that's much more interesting. [1/14]

2

23

96

Databricks Mosaic Research

@DbrxMosaicAI

1 year

📢 Introducing MosaicBERT! Now you can pretrain a high-quality BERT model from scratch on the MosaicML platform for $20. So why should you train your own BERT model? 👇 (1/5)

With the MosaicBERT architecture + training recipe, you can now pretrain a competitive BERT-Base model from scratch on the MosaicML platform for $20. We’ve released the pretraining and finetuning...

GitHub - mosaicml/composer: Supercharge Your Model Training

2

10

97

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Happy December! Today, we're looking back at Stochastic Weight Averaging (SWA), now a classic ML efficiency win! SWA is a simple method for improving accuracy with no increase in training time. It is built into fastai, pytorch, PTL and our Composer . (1/12)

Supercharge Your Model Training. Contribute to mosaicml/composer development by creating an account on GitHub.

GitHub - mosaicml/llm-foundry: LLM training code for Databricks foundation models

1

22

94

Databricks Mosaic Research

@DbrxMosaicAI

2 years

New blog post! Take a look at some best practices for efficient CNN training, and find out how you can apply them easily with our Composer library: #EfficientML

1

18

80

Databricks Mosaic Research

@DbrxMosaicAI

9 months

New Blog Post 📢 Llama2-70B-Chat is now available on MosaicML Inference and @databricks MLflow AI Gateway. Learn more:

1

20

81

Databricks Mosaic Research

@DbrxMosaicAI

11 months

We launched the MPT-7B foundation models just over a month ago, and since then, they’ve been downloaded over 2 million times! We are humbled by this warm reception, and thrilled to see the vibrant #LLM community rise up to share how they're using them!

0

12

79

Databricks Mosaic Research

@DbrxMosaicAI

1 year

What makes these models special? * Licensed for commercial use (unlike LLaMA) * Trained on more data than any comparable open-source LLM. * Handle extremely long inputs (trained up to 65k, goes up to 84k w/ALiBi) * Really fast inference and training code.

LLM training code for Databricks foundation models - mosaicml/llm-foundry

Fine-Tuning can Distort Pretrained Features and Underperform...

1

8

78

Databricks Mosaic Research

@DbrxMosaicAI

2 months

Our #DBRX open source #LLM is now available on ! Sign in (or sign up), select custom models, and start chatting with our DBRX-Instruct model. Get started:

1

17

73

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, we're looking at fine-tuning large models, and this paper submitted to ICLR: It shows fine-tuning can hurt performance on out-of-distribution examples, and explains how using some nice theory. We'll be keeping an eye on this! (1/8)

When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer---the...

openreview.net

2

15

76

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, we're looking at a recent paper out of DeepMind outlining ReLICv2 (), which shows an exciting step forward in self-supervised learning! S/O to @weballergy

Pushing the limits of self-supervised ResNets: Can we outperform...

Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark,...

Fine-Grained Human Feedback | Databricks Blog

2

18

73

Databricks Mosaic Research

@DbrxMosaicAI

2 years

"ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers" Paper: Post-training quantization from the DeepSpeed team that can reduce inferences cost up to 5x. [1/8]

4

8

71

Databricks Mosaic Research

@DbrxMosaicAI

1 year

🚨 A few months ago we announced that you can train Stable Diffusion from scratch for less than $125k using the MosaicML platform. A major price drop is coming...and we have the training run to back it up. Stay tuned for a major announcement this week!

0

9

71

Databricks Mosaic Research

@DbrxMosaicAI

1 year

🎉 🎉🎉 We have a new price on training Stable Diffusion 2 from scratch: $50k trained on the MosaicML Platform. We replicated Stable Diffusion 2.0 with massive training speedups, and now you can too. Learn more in our latest blog post:

4

13

68

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Should you train your ViT? While Vision Transformers (ViT) have delivered groundbreaking performance, that performance often depends on tons of pre-training and data augmentations. This paper () shows that ViTs can be performant without pre-training! (1/11)

3

7

67

Databricks Mosaic Research

@DbrxMosaicAI

2 years

"A Neural Corpus Indexer for Document Retrieval" Paper: They train a seq-to-seq model to directly spit out document IDs given queries. And it works really well. Like, these are some of the largest accuracy lifts I've ever seen in a paper. [1/15]

1

13

67

Databricks Mosaic Research

@DbrxMosaicAI

7 months

Our #GenAI engineering team shares tips and tricks on how to deploy open source #LLMs for production usage in their latest @databricks blog post.

0

10

65

Databricks Mosaic Research

@DbrxMosaicAI

2 months

New blog post! @zeqiuwu1 , @huyushi98 , and @rajammanabrolu share a recent highlight from their work in #LLM finetuning research: Fine-Grained Reinforcement Learning from Human Feedback (RLHF)

(This post written i

Masked Autoencoders Are Scalable Vision Learners

1

11

63

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Is this the end of contrastive self-supervised pretraining? In this :thread:, we’ll discuss Masked Autoencoders Are Scalable Vision Learners, the exciting new work from Kaiming He, @endernewton , @sainingxie , Yanghao Li, and @inkynumbers at @MetaAI 1/15

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the...

Blazingly Fast LLM Evaluation for In-Context Learning | Databricks Blog

3

13

66

Databricks Mosaic Research

@DbrxMosaicAI

1 year

There's fast, and then there's blazingly fast. 🔥🔥🔥 With Composer and MosaicML Cloud, you can now evaluate #llms on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. Read more:

With MosaicML you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. For 70B parameter models, LAMBADA...

OLMo Is Here, Powered by Databricks | Databricks Blog

2

7

62

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Did somebody say "best-in-class open-source generative models?" Thanks for the recognition, @databricks ! Check out our MPT-7B series here:

1

10

62

Databricks Mosaic Research

@DbrxMosaicAI

4 months

We're excited to have contributed to the development of the OLMo open source model from @allen_ai which was developed using the @databricks Mosaic AI training platform. Read the blog post from @jefrankle to learn more:

AI2 is releasing OLMo 7B, an open source, state-of-the-art large language model. We’re proud to have supported their work using the Mosaic training platform.

Release v0.8.0 · mosaicml/composer

Jonathan Frankle

@jefrankle

4 months

Hello OLMo! Congrats to the amazing @allen_ai team! 7B params, 2T tokens, open training code, open data, intermediate checkpoints, Apache 2.0, the works. A giant leap for open science. Nicely done @mechanicaldirk , @i_beltagy , @soldni , and so many others!

10

48

284

1

10

63

Databricks Mosaic Research

@DbrxMosaicAI

9 months

📦 To evaluate the coding capabilities of LLMs, you need to execute the code. But what if the LLM spits out malicious code?😱 With MosaicML, you can now evaluate #LLMs on code gen benchmarks (eg. HumanEval) in an effortless, end-to-end secure framework.

1

11

57

Databricks Mosaic Research

@DbrxMosaicAI

2 years

We’ve released Composer 0.8.0, which introduces a HuggingFaceModel object for reading in your existing 🤗 Transformers models. Training or fine-tuning BERT models with Composer just got much easier. See the release notes for the full set of enhancements:

🚀 Composer v0.8.0 Composer v0.8.0 is released! Install via pip: pip install --upgrade mosaicml==0.8.0 Alternatively, install Composer with Conda: conda install -c mosaicml mosaicml=0.8.0 New...

mosaicml/mpt-30b · Hugging Face

1

14

57

Databricks Mosaic Research

@DbrxMosaicAI

11 months

Thanks @AMD for sharing our blog post about we trained #LLMs on your Instinct accelerators with no code changes. It just works!

AMD

@AMD

11 months

The release of @PyTorch 2.0 and AMD ROCm 5.4 has @MosaicML “excited to announce that LLM training works out of the box on AMD Instinct data center GPUs, with zero code changes…” Read more about how the AMD Instinct MI250 helps developers train #AI models.

5

21

151

1

10

57

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Do language models really need tokenizers? Work by @colinraffel 's group suggests the answer is often no. Their Byt5 model () modifies mT5 to take in raw UTF-8 bytes instead of output from a tokenizer. (1/9)

2

7

54

Databricks Mosaic Research

@DbrxMosaicAI

11 months

And of course, the base model is available for you to build on as you like, on your own or on the MosaicML Platform.

huggingface.co

1

2

53

Databricks Mosaic Research

@DbrxMosaicAI

3 months

What should you do if you want to effectively and cheaply “instruction finetune” an LLM? @aditi_jh and @JacobianNeuro share some important insights. (1/5)

1

7

52

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Shoutout to @jeremyphoward , @math_rachel and the whole @fastdotai team. Two-way callbacks and other ideas helped a lot in designing composer (). We're standing on the shoulders of giants in our shared mission to make AI accessible to everyone.

0

7

50

Databricks Mosaic Research

@DbrxMosaicAI

2 months

Since becoming part of @databricks last July, the MosaicML team has continued its mission to optimize and improve #GenAI model training. Our rigorous science leads to real-world results. Visit our new research hub to discover what we've working on:

Mosaic Research | Databricks

The latest research, blogs and breakthroughs from Mosaic Research — plus job openings and more

2

7

50

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, we look at optimizing data movement for transformer deep learning networks. In this paper (), Ivanov et al. show that 40% of the runtime is spent in data movement, and that training has become memory bound for transformer networks. (1/6)

1

12

51

Databricks Mosaic Research

@DbrxMosaicAI

11 months

It's Christmas in July for the ML Community! 🎄 We found that AMD systems appear stable and consistent with training on NVIDIA systems when using MosaicML's training stack. With StreamingDataset's elastic determinism, we can get the same loss curves.

2

5

49

Databricks Mosaic Research

@DbrxMosaicAI

1 year

In short: with just a few lines of code, H100s were 30% more cost-effective and 3x faster than A100s for a 7 billion parameter MosaicGPT model. [6/6]

2

6

50

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Think it’s too hard—or too expensive—to train your own GPT or diffusion models from scratch? Think again. We built the MosaicML platform to tackle the challenges of training large models and unleash the power of #generativeAI . Learn more:

Latest blogs from the team at Mosaic Research

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

3

10

48

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, we will take a look at the [Dataset Cartography]() paper that analyzes how sample classification changes during training.

Large datasets have become commonplace in NLP research. However, the increased emphasis on data quantity has made it challenging to assess the quality of data. We introduce Data Maps---a...

GitHub - mosaicml/composer: Supercharge Your Model Training

1

10

48

Databricks Mosaic Research

@DbrxMosaicAI

2 years

"Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam" One of the biggest bottlenecks in distributed training is communication between nodes. The 0/1 Adam optimizer can train a BERT-Large while syncing only 1.03 bits/parameter on average. [1/7]

1

8

47

Databricks Mosaic Research

@DbrxMosaicAI

1 year

The model is really good. Across 12 different in-context learning tasks, it nearly always surpasses every other open-source model < 30B params and trades off with LLaMa-7B for the best open-source model. Plus it's commercially usable, and finetuned versions are available now.

2

5

46

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Composer is trending on GitHub (python)! Composer helps train ML models faster and cheaper through algorithmic efficiency, and the world is taking notice, thanks to this wonderful community! See what all the buzz is about in our repo -- and give us a ⭐!

Supercharge Your Model Training. Contribute to mosaicml/composer development by creating an account on GitHub.

Knowledge distillation: A good teacher is patient and consistent

0

12

42

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Large Language Models are notorious for being expensive to train, but provide a model that can be evaluated on generalized language understanding benchmarks. What if the goal is to perform well on a task-specific benchmark instead? Can we cut down the costs of pre-training? (1/9)

1

6

43

Databricks Mosaic Research

@DbrxMosaicAI

1 year

We used the same tools as our customers: the MosaicML platform, Composer, StreamingDatasets, etc. They made training a piece of cake. Here's our training logbook. 🥱 No loss spikes (we fixed them architecturally). Four hw failures handled automatically. ZERO human intervention.

3

1

42

Databricks Mosaic Research

@DbrxMosaicAI

1 year

All of these models are available now. You can download the weights on HuggingFace and experiment with them using Spaces.

1

7

40

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Does your application require high accuracy, but has tight inference constraints? Are you willing to pay any training costs to achieve this? If so, today’s paper may be for you! They reach 82.8% accuracy on ImageNet using only a ResNet-50 model! (1/9)

There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we...

2

5

42

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Composer gives you world-class model accuracy at a fraction of the cost. FFCV is a super effective tool that helps us do it! Our latest blog post breaks it down: S/O to @gpoleclerc @andrew_ilyas @logan_engstrom @smsampark @hadisalmanx @aleks_madry

Latest blogs from the team at Mosaic Research

Augmenting Convolutional networks with attention-based aggregation

2

11

41

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Today, we’re looking at “Augmenting Convolutional networks with attention-based aggregation” which proposes a modification to the average pooling used in many conv nets and a particularly simple new architecture. (1/10)

We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We replace the final average pooling by an attention-based aggregation layer...

MosaicML StreamingDataset: Fast, Accurate Streaming of Training Data from Cloud Storage | Databri...

2

5

41

Databricks Mosaic Research

@DbrxMosaicAI

11 months

We are thrilled to see promising alternative options for AI hardware to provide the best cost, performance, and developer experience for our customers.

2

1

41

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Exciting new speedups from @aleks_madry 's lab with results on #Imagenet ! Optimized data-loading with FFCV removes the CPU bottleneck which normally limits the throughput of ResNet+ImageNet+A100 training. (1/4)

Aleksander Madry

@aleks_madry

2 years

ImageNet is the new CIFAR! My students made FFCV (), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute). FFCV speeds up ~any existing training code (no training tricks needed) (1/3)

29

390

2K

1

9

40

Databricks Mosaic Research

@DbrxMosaicAI

2 years

“Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models” GLUE is worn out. Even SuperGLUE doesn’t stick anymore. Enter BIG-bench, a collection of 204 tasks contributed by 444 authors, designed for evaluating large language models.

1

4

37

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Join us next week at @weights_biases ' Fully Connected conference on Wednesday, June 7th in San Francisco. Our CTO/Co-Founder @hanlintang will be speaking alongside a roster of #generativeAI luminaries. Full agenda here:

2

9

37

Databricks Mosaic Research

@DbrxMosaicAI

1 year

To highlight StoryWriter: Its final training stage has a 65k token context, 32x LLaMa and 2x GPT-4. This crazy length works out of the box with our LLM Foundry on standard GPUs. We used ALiBi pos encodings: they handle any input length and extrapolate longer (84k in our testing).

2

3

36

Databricks Mosaic Research

@DbrxMosaicAI

2 years

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale The DeepSpeed team made an awesome family of MoE models + systems optimizations:

1

4

36

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Great to see a MosaicML citation in the wild! Spotted in 'DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing' () by @conglongli @yuxionghe et al.

1

3

36

Databricks Mosaic Research

@DbrxMosaicAI

30 days

Our team is incredibly proud to partner with @allen_ai and thrilled to see them cook! Achieving such a massive improvement in MMLU, while reducing the compute budget, is a fantastic win. And doing it fully open? Everyone wins. Congrats! Can't wait to see what's next 👀

Allen Institute for AI

@allen_ai

1 month

Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog:

13

47

169

0

6

36

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Wrangling large #datasets doesn't have to be so hard. We're here to spare you the headaches. 🎉 Announcing StreamingDataset - designed to make distributed training on large datasets from #cloud storage as fast, accurate, and scalable as possible.

Loading your training data becomes an escalating challenge as datasets grow bigger in size and the number of nodes scales. We built StreamingDataset to make training on large datasets from cloud...

2

5

36

Databricks Mosaic Research

@DbrxMosaicAI

11 months

MPT-30B is a bigger sibling of MPT-7B, which we released a few weeks ago. The model arch is the same, the data mix is a similar, and the context grew to 8k. We massively upgraded the Instruct and Chat variants over MPT-7B. See the full details in our blog!

Latest blogs from the team at Mosaic Research

Mind Your Outliers! Investigating the Negative Impact of Outliers...

4

3

33

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Cheers from the MosaicML holiday party! With 2021 winding down, it’s a natural time to take a look back. MosaicML exists with one core mission: to make ML training efficient for everyone. After our first year as a company, we couldn’t be happier with what we’ve accomplished:(1/6)

2

3

35

Databricks Mosaic Research

@DbrxMosaicAI

2 years

New year, new summaries! Let's look at dataset quality and its impact on sample efficiency. This paper () studies the ineffectiveness of active learning on visual question answering (VQA) datasets and points to *collective outliers* as the culprit. (1/8)

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic...

Mosaic ResNet Deep Dive | Databricks Blog

2

7

34

Databricks Mosaic Research

@DbrxMosaicAI

1 year

Technical details time! How did we do this? We started with our own custom variant of the transformer architecture, modified for speed and efficiency (no surprise from us). And then we trained on a ton of data on 440 A100s for 9.5 days.

1

3

34

Databricks Mosaic Research

@DbrxMosaicAI

1 year

MPT-7B comes in four different flavors. MPT-7B-Instruct is a commercially-usable instruction-following model finetuned on Dolly+HHRLHF. MPT-7B-Chat is a chatbot finetuned on Alpaca & friends. MPT-7B-StoryTeller-65k+ is finetuned on books w/context 65k; it writes awesome fiction.

1

2

34

Databricks Mosaic Research

@DbrxMosaicAI

1 year

👀 A LOT more possibilities are about to open up. 64K context length means that your LLM can consume and process much longer documents AND write longer responses for text generation! Stay tuned for a major LLM announcement later this week.

Naveen Rao

@NaveenGRao

1 year

🤯🤯 LLM trained with 64K+ context length! What could you do with that? Prompted our model with the ENTIRE contents of "The Great Gatsby" and asked it to write the epilogue. Snippet 👇 Model dropping soon to an open-source repo near you. Epilogue: It seemed to me that Gatsby

41

90

678

0

3

33

Databricks Mosaic Research

@DbrxMosaicAI

2 years

Check out our deep-dive blog post on the Mosaic ResNet training recipe. See the details of our observations, and how to reproduce these results for your needs. We're able to achieve 7x faster training for ResNet-50, and so can you! #EfficientML

TL;DR: We recently released a set of recipes which can accelerate training of a ResNet-50 on ImageNet by up to 7x over standard baselines. In this report we take a deep dive into the technical...

Replit — How to train your own Large Language Models

0

4

32

Databricks Mosaic Research

@DbrxMosaicAI

1 year

The team at @Replit is doing amazing work, and we’re thrilled to provide them with the MosaicML platform to fuel their AI model training needs. Check out their post that shows a holistic view of the LLM lifecycle, and the ecosystem they’ve built:

How Replit trains Large Language Models (LLMs) using Databricks, Hugging Face, and MosaicML Introduction Large Language Models, like OpenAI's GPT-4 or Google's PaLM, have taken the world of artific...

blog.replit.com

0

9

32

Databricks Mosaic Research

@DbrxMosaicAI

11 months

How much did it cost to train? At list price on @MosaicML , it was between $714k and $871k depending on your GPU choice. It's also incredibly cheap to fine-tune, at between $714 and $871 per 1B tokens.

2

0

29

Databricks Mosaic Research

@DbrxMosaicAI

3 years

One of our researchers just went through @TimDettmers ' paper on 8-bit optimizers: . It is a pretty cool and very practical way to reduce memory consumption for large models. #EfficientML (1/4)

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can be used to...