Databricks Mosaic Research Profile Banner
Databricks Mosaic Research Profile
Databricks Mosaic Research

@DbrxMosaicAI

30,409
Followers
116
Following
316
Media
977
Statuses

We remove the barriers to state-of-the-art generative AI model development and make data + AI available to all.

San Francisco, CA
Joined December 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@DbrxMosaicAI
Databricks Mosaic Research
2 months
DBRX is a new open source general-purpose #LLM that advances the state of the art in efficiency, using a 132B-parameter MoE architecture. Check out our deep-dive on how we trained and benchmarked #DBRX :
8
32
171
@DbrxMosaicAI
Databricks Mosaic Research
1 year
📢 Introducing MPT: a new family of open-source commercially usable LLMs from @MosaicML . Trained on 1T tokens of text+code, MPT models match and - in many ways - surpass LLaMa-7B. This release includes 4 models: MPT-Base, Instruct, Chat, & StoryWriter (🧵)
Tweet media one
22
216
1K
@DbrxMosaicAI
Databricks Mosaic Research
1 year
📣Announcing MosaicML Inference 📣 Ever wanted a text or image generation API that doesn’t make you send data to a third party? Or a cheaper solution than paying by the token? Or an easy way to get a trained model into production? We can help with that. 🧵
Tweet media one
18
106
665
@DbrxMosaicAI
Databricks Mosaic Research
11 months
Introducing training LLMs with AMD hardware! MosaicML + PyTorch 2.0 + ROCm 5.4+ = LLM training out of the box with zero code changes. With MosaicML, the ML community has additional hardware + software options to choose from. Read more:
Tweet media one
9
148
680
@DbrxMosaicAI
Databricks Mosaic Research
11 months
Today, we’re excited to share that MosaicML has agreed to join @Databricks !
Tweet media one
16
90
626
@DbrxMosaicAI
Databricks Mosaic Research
11 months
Meet MPT-30B, the latest member of @MosaicML 's family of open-source, commercially usable models. It's trained on 1T tokens with up to 8k context (even more w/ALiBi) on A100s and *H100s* with big improvements to Instruct and Chat. Take it for a spin on HF!
Tweet media one
17
129
550
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Meet PubMed GPT 🩺 a new SOTA on the US Medical Licensing Exam developed by MosaicML and @StanfordHAI . It's a normal GPT-3B model trained on medical data that bests hand-designed med models and generic models 40x bigger, a sweet spot for foundation models🧵
12
132
525
@DbrxMosaicAI
Databricks Mosaic Research
1 year
[1/8] Full technical details on our Stable Diffusion 2.0 speedrun are here! On Wednesday, we announced that we had replicated SD2 for < $50k, 2.7x over our baseline and 6x over Stability's number. Today, we share the technical nitty-gritty on how we did it:
Tweet media one
5
44
411
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Woo hoo! 🙌What an honor to make the @Forbes AI 50 List. MosaicML empowers you build your own #GenerativeAI . Train, finetune, and deploy your custom #LLM today:
Tweet media one
13
45
356
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Got an extra $20 burning a hole in your wallet? With the MosaicBERT architecture + training recipe, you can now pretrain a competitive #BERT -Base model from scratch on the MosaicML platform for the cost of a large pizza! 🍕⚡️👏 Learn more:
Tweet media one
2
14
340
@DbrxMosaicAI
Databricks Mosaic Research
10 months
Announcing MPT-7B-8K: a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. With its 8k context length, MPT-7B-8K specializes in document summarization and question-answering, and may be used commercially. Read more:
Tweet media one
7
77
356
@DbrxMosaicAI
Databricks Mosaic Research
2 years
We have exciting news! In our latest and greatest LLM blog, we show how MosaicML Cloud can help you train LLMs from 1B - 70B parameters, and for the first time, publish transparent times + costs for doing so. It's a lot cheaper than you think! (1/9)
7
48
342
@DbrxMosaicAI
Databricks Mosaic Research
2 years
📢 MosaicML Cloud is now available for early access! Create advanced AI models faster and cheaper than you thought possible.
3
54
306
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Want to train your own custom #LLMs while keeping your data private? 🔒 We got you. The MosaicML platform keeps your data, models, and source code secure in your private network while abstracting away the complexity of #ML training infrastructure. Learn more:
Tweet media one
4
28
254
@DbrxMosaicAI
Databricks Mosaic Research
1 year
The MosaicML team is excited to present at the @weights_biases webinar this Thursday, 23-Feb-2023, 7PM CET/10AM PST! Our very own @leavittron will be joining W&B's @carey_phelps to showcase MosaicML #LLM training and W&B's Model Registry. Register at
1
19
235
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Large Language Models (LLM) are gaining in popularity, but training these models from scratch can be a huge pain... until now! Our latest LLM blog series uncovers how to reduce the time, cost, and complexity of training these billion-parameter models:
1
40
252
@DbrxMosaicAI
Databricks Mosaic Research
1 year
How good are @nvidia H100s actually? In collaboration with @CoreWeave , we benchmarked A100 vs H100 performance for large language model training. Here's what we found: [1/6]
2
50
226
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Today, an exciting paper from @MSFTResearch : Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer While it's too early to say, this may be remembered as the single biggest efficiency advancement in hyperparameter tuning.
Tweet media one
3
41
214
@DbrxMosaicAI
Databricks Mosaic Research
10 months
How can the ML community measure LLM quality in a holistic and standardized manner? The Mosaic Model Gauntlet encompasses 34 benchmarks, organized into 6 broad categories of competency, evaluated with our blazingly fast open-source ICL eval harness. 🧵👇
5
41
170
@DbrxMosaicAI
Databricks Mosaic Research
2 years
"(Certified!!) Adversarial Robustness for Free!" Paper: This is a rare paper where having multiple exclamation marks in the title is justified. [1/12]
Tweet media one
3
26
164
@DbrxMosaicAI
Databricks Mosaic Research
3 years
Hello World! Today we come out of stealth to make ML training more efficient with a mosaic of methods that modify training to improve speed, reduce cost, and boost quality. Read our founders' blog by @NaveenGRao @hanlintang @mcarbin @jefrankle (1/4)
Tweet media one
7
41
164
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Is overparametrization the key to solving ImageNet? In this NeurIPS Outstanding Paper Award winner, , @SebastienBubeck and @geoishard are looking at the significance of neural network overparametrization and the universal law of robustness.(1/6)
5
25
160
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Is forgetting actually beneficial for training? This top-reviewed ICLR paper () introduces a powerful paradigm that consistently outperforms all other methods on ResNet18 and DenseNet169 on a variety of image datasets. (1/8)
2
30
153
@DbrxMosaicAI
Databricks Mosaic Research
2 years
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism They replace the attention mechanism in a Swin transformer with a simple spatial shift of some of the channels. Turns out this actually works.
Tweet media one
Tweet media two
3
23
136
@DbrxMosaicAI
Databricks Mosaic Research
1 month
Ready to use a programmatic approach to prompting #LLMs and building #RAG applications? The @stanfordnlp #dspy repo includes support for @databricks Model Serving and Vector Search! Details:
Tweet media one
1
32
133
@DbrxMosaicAI
Databricks Mosaic Research
10 months
📢 Today, we're thrilled to announce that @Databricks has completed its acquisition of MosaicML. Our teams share a common goal to make #GenerativeAI accessible for all.  We're excited to change the world together! Read the press release and stay tuned for more updates:
5
24
131
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Data2Vec: A new paper by Meta AI claims to be “The first high-performance self-supervised algorithm that works for speech, vision, and text.” And the results look very promising.
2
16
125
@DbrxMosaicAI
Databricks Mosaic Research
2 years
"Scaling Laws for Neural Language Models" was one of the papers that fueled the recent push towards larger models. @DeepMind revisits the question, "How should one scale model size relative to dataset size?" and finds some surprising answers!
Tweet media one
3
27
124
@DbrxMosaicAI
Databricks Mosaic Research
7 months
Our NLP architect @abhi_venigalla continues his work on the use of AMD accelerators at scale for #LLM training. In our latest @databricks blog post, he shares multi-node training performance results on MI250 GPUs:
4
31
120
@DbrxMosaicAI
Databricks Mosaic Research
1 year
How much does it take to train a Stable Diffusion model from scratch? The answer: 79,000 A100-hours in 13 days, for a total training cost of <$160k. Our tooling reduces the time and cost to train by 2.5x, and is also extensible and simple to use.
1
25
118
@DbrxMosaicAI
Databricks Mosaic Research
2 years
We've shared great research before, but reproducing methods from papers is hard. Announcing Composer, our library of ML speedups: . Train CV models ~4x faster and NLP models ~2x faster at the same accuracy -- with minimal tuning. (1/5)
Tweet media one
3
34
105
@DbrxMosaicAI
Databricks Mosaic Research
2 years
We're excited to share details about how we use cyclic LR schedules to estimate time/cost/accuracy tradeoff curves for model training. This is a key element in our approach to #EfficientML -- how we benchmark the speedup methods we implement. Article here:
0
17
98
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Today we take a look at a new architecture from DeepMind called RETRO: Retrievel-Enhanced Transformer (). RETRO uses a large database of documents along with an embedding-based retrieval system to improve the “knowledge” of transformers at runtime. (1/15)
2
20
95
@DbrxMosaicAI
Databricks Mosaic Research
2 months
DBRX is the top open-source model on the latest WildBench Leaderboard on HuggingFace! Thanks to our friends @allen_ai for this benchmark of LLMs with challenging tasks from real users in the wild. #DBRX
4
31
96
@DbrxMosaicAI
Databricks Mosaic Research
2 years
"RankGen: Improving Text Generation with Large Ranking Models" It's tempting to look at this paper as yet another method to make the numbers go up. But there's another story here that's much more interesting. [1/14]
Tweet media one
2
23
96
@DbrxMosaicAI
Databricks Mosaic Research
1 year
📢 Introducing MosaicBERT! Now you can pretrain a high-quality BERT model from scratch on the MosaicML platform for $20. So why should you train your own BERT model? 👇 (1/5)
2
10
97
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Happy December! Today, we're looking back at Stochastic Weight Averaging (SWA), now a classic ML efficiency win! SWA is a simple method for improving accuracy with no increase in training time. It is built into fastai, pytorch, PTL and our Composer . (1/12)
1
22
94
@DbrxMosaicAI
Databricks Mosaic Research
2 years
New blog post! Take a look at some best practices for efficient CNN training, and find out how you can apply them easily with our Composer library: #EfficientML
1
18
80
@DbrxMosaicAI
Databricks Mosaic Research
9 months
New Blog Post 📢 Llama2-70B-Chat is now available on MosaicML Inference and @databricks MLflow AI Gateway. Learn more:
Tweet media one
1
20
81
@DbrxMosaicAI
Databricks Mosaic Research
11 months
We launched the MPT-7B foundation models just over a month ago, and since then, they’ve been downloaded over 2 million times! We are humbled by this warm reception, and thrilled to see the vibrant #LLM community rise up to share how they're using them!
Tweet media one
0
12
79
@DbrxMosaicAI
Databricks Mosaic Research
1 year
What makes these models special? * Licensed for commercial use (unlike LLaMA) * Trained on more data than any comparable open-source LLM. * Handle extremely long inputs (trained up to 65k, goes up to 84k w/ALiBi) * Really fast inference and training code.
1
8
78
@DbrxMosaicAI
Databricks Mosaic Research
2 months
Our #DBRX open source #LLM is now available on ! Sign in (or sign up), select custom models, and start chatting with our DBRX-Instruct model. Get started:
Tweet media one
1
17
73
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Today, we're looking at fine-tuning large models, and this paper submitted to ICLR: It shows fine-tuning can hurt performance on out-of-distribution examples, and explains how using some nice theory. We'll be keeping an eye on this! (1/8)
2
15
76
@DbrxMosaicAI
Databricks Mosaic Research
2 years
"ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers" Paper: Post-training quantization from the DeepSpeed team that can reduce inferences cost up to 5x. [1/8]
Tweet media one
4
8
71
@DbrxMosaicAI
Databricks Mosaic Research
1 year
🚨 A few months ago we announced that you can train Stable Diffusion from scratch for less than $125k using the MosaicML platform. A major price drop is coming...and we have the training run to back it up. Stay tuned for a major announcement this week!
Tweet media one
0
9
71
@DbrxMosaicAI
Databricks Mosaic Research
1 year
🎉 🎉🎉 We have a new price on training Stable Diffusion 2 from scratch: $50k trained on the MosaicML Platform. We replicated Stable Diffusion 2.0 with massive training speedups, and now you can too. Learn more in our latest blog post:
4
13
68
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Should you train your ViT? While Vision Transformers (ViT) have delivered groundbreaking performance, that performance often depends on tons of pre-training and data augmentations. This paper () shows that ViTs can be performant without pre-training! (1/11)
3
7
67
@DbrxMosaicAI
Databricks Mosaic Research
2 years
"A Neural Corpus Indexer for Document Retrieval" Paper: They train a seq-to-seq model to directly spit out document IDs given queries. And it works really well. Like, these are some of the largest accuracy lifts I've ever seen in a paper. [1/15]
Tweet media one
1
13
67
@DbrxMosaicAI
Databricks Mosaic Research
7 months
Our #GenAI engineering team shares tips and tricks on how to deploy open source #LLMs for production usage in their latest @databricks blog post.
Tweet media one
0
10
65
@DbrxMosaicAI
Databricks Mosaic Research
2 months
New blog post! @zeqiuwu1 , @huyushi98 , and @rajammanabrolu share a recent highlight from their work in #LLM finetuning research: Fine-Grained Reinforcement Learning from Human Feedback (RLHF)
1
11
63
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Is this the end of contrastive self-supervised pretraining? In this :thread:, we’ll discuss Masked Autoencoders Are Scalable Vision Learners, the exciting new work from Kaiming He, @endernewton , @sainingxie , Yanghao Li, and @inkynumbers at @MetaAI 1/15
3
13
66
@DbrxMosaicAI
Databricks Mosaic Research
1 year
There's fast, and then there's blazingly fast. 🔥🔥🔥 With Composer and MosaicML Cloud, you can now evaluate #llms on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. Read more:
2
7
62
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Did somebody say "best-in-class open-source generative models?" Thanks for the recognition, @databricks ! Check out our MPT-7B series here:
Tweet media one
1
10
62
@DbrxMosaicAI
Databricks Mosaic Research
4 months
We're excited to have contributed to the development of the OLMo open source model from @allen_ai which was developed using the @databricks Mosaic AI training platform. Read the blog post from @jefrankle to learn more:
@jefrankle
Jonathan Frankle
4 months
Hello OLMo! Congrats to the amazing @allen_ai team! 7B params, 2T tokens, open training code, open data, intermediate checkpoints, Apache 2.0, the works. A giant leap for open science. Nicely done @mechanicaldirk , @i_beltagy , @soldni , and so many others!
10
48
284
1
10
63
@DbrxMosaicAI
Databricks Mosaic Research
9 months
📦 To evaluate the coding capabilities of LLMs, you need to execute the code. But what if the LLM spits out malicious code?😱 With MosaicML, you can now evaluate #LLMs on code gen benchmarks (eg. HumanEval) in an effortless, end-to-end secure framework.
Tweet media one
1
11
57
@DbrxMosaicAI
Databricks Mosaic Research
2 years
We’ve released Composer 0.8.0, which introduces a HuggingFaceModel object for reading in your existing 🤗 Transformers models. Training or fine-tuning BERT models with Composer just got much easier. See the release notes for the full set of enhancements:
1
14
57
@DbrxMosaicAI
Databricks Mosaic Research
11 months
Thanks @AMD for sharing our blog post about we trained #LLMs on your Instinct accelerators with no code changes. It just works!
@AMD
AMD
11 months
The release of @PyTorch 2.0 and AMD ROCm 5.4 has @MosaicML “excited to announce that LLM training works out of the box on AMD Instinct data center GPUs, with zero code changes…” Read more about how the AMD Instinct MI250 helps developers train #AI models.
5
21
151
1
10
57
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Do language models really need tokenizers? Work by @colinraffel 's group suggests the answer is often no. Their Byt5 model () modifies mT5 to take in raw UTF-8 bytes instead of output from a tokenizer. (1/9)
Tweet media one
2
7
54
@DbrxMosaicAI
Databricks Mosaic Research
11 months
And of course, the base model is available for you to build on as you like, on your own or on the MosaicML Platform.
1
2
53
@DbrxMosaicAI
Databricks Mosaic Research
3 months
What should you do if you want to effectively and cheaply “instruction finetune” an LLM? @aditi_jh and @JacobianNeuro share some important insights. (1/5)
Tweet media one
1
7
52
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Shoutout to @jeremyphoward , @math_rachel and the whole @fastdotai team. Two-way callbacks and other ideas helped a lot in designing composer (). We're standing on the shoulders of giants in our shared mission to make AI accessible to everyone.
Tweet media one
0
7
50
@DbrxMosaicAI
Databricks Mosaic Research
2 months
Since becoming part of @databricks last July, the MosaicML team has continued its mission to optimize and improve #GenAI model training. Our rigorous science leads to real-world results. Visit our new research hub to discover what we've working on:
2
7
50
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Today, we look at optimizing data movement for transformer deep learning networks. In this paper (), Ivanov et al. show that 40% of the runtime is spent in data movement, and that training has become memory bound for transformer networks. (1/6)
1
12
51
@DbrxMosaicAI
Databricks Mosaic Research
11 months
It's Christmas in July for the ML Community! 🎄 We found that AMD systems appear stable and consistent with training on NVIDIA systems when using MosaicML's training stack. With StreamingDataset's elastic determinism, we can get the same loss curves.
Tweet media one
Tweet media two
2
5
49
@DbrxMosaicAI
Databricks Mosaic Research
1 year
In short: with just a few lines of code, H100s were 30% more cost-effective and 3x faster than A100s for a 7 billion parameter MosaicGPT model. [6/6]
Tweet media one
2
6
50
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Think it’s too hard—or too expensive—to train your own GPT or diffusion models from scratch? Think again. We built the MosaicML platform to tackle the challenges of training large models and unleash the power of #generativeAI . Learn more:
3
10
48
@DbrxMosaicAI
Databricks Mosaic Research
2 years
"Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam" One of the biggest bottlenecks in distributed training is communication between nodes. The 0/1 Adam optimizer can train a BERT-Large while syncing only 1.03 bits/parameter on average. [1/7]
Tweet media one
1
8
47
@DbrxMosaicAI
Databricks Mosaic Research
1 year
The model is really good. Across 12 different in-context learning tasks, it nearly always surpasses every other open-source model < 30B params and trades off with LLaMa-7B for the best open-source model. Plus it's commercially usable, and finetuned versions are available now.
Tweet media one
2
5
46
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Composer is trending on GitHub (python)! Composer helps train ML models faster and cheaper through algorithmic efficiency, and the world is taking notice, thanks to this wonderful community! See what all the buzz is about in our repo -- and give us a ⭐!
0
12
42
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Large Language Models are notorious for being expensive to train, but provide a model that can be evaluated on generalized language understanding benchmarks. What if the goal is to perform well on a task-specific benchmark instead? Can we cut down the costs of pre-training? (1/9)
Tweet media one
1
6
43
@DbrxMosaicAI
Databricks Mosaic Research
1 year
We used the same tools as our customers: the MosaicML platform, Composer, StreamingDatasets, etc. They made training a piece of cake. Here's our training logbook. 🥱 No loss spikes (we fixed them architecturally). Four hw failures handled automatically. ZERO human intervention.
Tweet media one
Tweet media two
3
1
42
@DbrxMosaicAI
Databricks Mosaic Research
1 year
All of these models are available now. You can download the weights on HuggingFace and experiment with them using Spaces.
1
7
40
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Does your application require high accuracy, but has tight inference constraints? Are you willing to pay any training costs to achieve this? If so, today’s paper may be for you! They reach 82.8% accuracy on ImageNet using only a ResNet-50 model! (1/9)
2
5
42
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Composer gives you world-class model accuracy at a fraction of the cost. FFCV is a super effective tool that helps us do it! Our latest blog post breaks it down: S/O to @gpoleclerc @andrew_ilyas @logan_engstrom @smsampark @hadisalmanx @aleks_madry
2
11
41
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Today, we’re looking at “Augmenting Convolutional networks with attention-based aggregation” which proposes a modification to the average pooling used in many conv nets and a particularly simple new architecture. (1/10)
2
5
41
@DbrxMosaicAI
Databricks Mosaic Research
11 months
We are thrilled to see promising alternative options for AI hardware to provide the best cost, performance, and developer experience for our customers.
Tweet media one
2
1
41
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Exciting new speedups from @aleks_madry 's lab with results on #Imagenet ! Optimized data-loading with FFCV removes the CPU bottleneck which normally limits the throughput of ResNet+ImageNet+A100 training. (1/4)
@aleks_madry
Aleksander Madry
2 years
ImageNet is the new CIFAR! My students made FFCV (), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute). FFCV speeds up ~any existing training code (no training tricks needed) (1/3)
Tweet media one
29
390
2K
1
9
40
@DbrxMosaicAI
Databricks Mosaic Research
2 years
“Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models” GLUE is worn out. Even SuperGLUE doesn’t stick anymore. Enter BIG-bench, a collection of 204 tasks contributed by 444 authors, designed for evaluating large language models.
1
4
37
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Join us next week at  @weights_biases ' Fully Connected conference on Wednesday, June 7th in San Francisco. Our CTO/Co-Founder  @hanlintang  will be speaking alongside a roster of  #generativeAI luminaries. Full agenda here:
Tweet media one
2
9
37
@DbrxMosaicAI
Databricks Mosaic Research
1 year
To highlight StoryWriter: Its final training stage has a 65k token context, 32x LLaMa and 2x GPT-4. This crazy length works out of the box with our LLM Foundry on standard GPUs. We used ALiBi pos encodings: they handle any input length and extrapolate longer (84k in our testing).
Tweet media one
2
3
36
@DbrxMosaicAI
Databricks Mosaic Research
2 years
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale The DeepSpeed team made an awesome family of MoE models + systems optimizations:
Tweet media one
1
4
36
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Great to see a MosaicML citation in the wild! Spotted in 'DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing' () by @conglongli @yuxionghe et al.
Tweet media one
1
3
36
@DbrxMosaicAI
Databricks Mosaic Research
30 days
Our team is incredibly proud to partner with @allen_ai and thrilled to see them cook! Achieving such a massive improvement in MMLU, while reducing the compute budget, is a fantastic win. And doing it fully open? Everyone wins. Congrats! Can't wait to see what's next 👀
@allen_ai
Allen Institute for AI
1 month
Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog:
Tweet media one
13
47
169
0
6
36
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Wrangling large #datasets doesn't have to be so hard. We're here to spare you the headaches. 🎉 Announcing StreamingDataset - designed to make distributed training on large datasets from #cloud storage as fast, accurate, and scalable as possible.
2
5
36
@DbrxMosaicAI
Databricks Mosaic Research
11 months
MPT-30B is a bigger sibling of MPT-7B, which we released a few weeks ago. The model arch is the same, the data mix is a similar, and the context grew to 8k. We massively upgraded the Instruct and Chat variants over MPT-7B. See the full details in our blog!
4
3
33
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Cheers from the MosaicML holiday party! With 2021 winding down, it’s a natural time to take a look back. MosaicML exists with one core mission: to make ML training efficient for everyone. After our first year as a company, we couldn’t be happier with what we’ve accomplished:(1/6)
Tweet media one
2
3
35
@DbrxMosaicAI
Databricks Mosaic Research
2 years
New year, new summaries! Let's look at dataset quality and its impact on sample efficiency. This paper () studies the ineffectiveness of active learning on visual question answering (VQA) datasets and points to *collective outliers* as the culprit. (1/8)
2
7
34
@DbrxMosaicAI
Databricks Mosaic Research
1 year
Technical details time! How did we do this? We started with our own custom variant of the transformer architecture, modified for speed and efficiency (no surprise from us). And then we trained on a ton of data on 440 A100s for 9.5 days.
Tweet media one
1
3
34
@DbrxMosaicAI
Databricks Mosaic Research
1 year
MPT-7B comes in four different flavors. MPT-7B-Instruct is a commercially-usable instruction-following model finetuned on Dolly+HHRLHF. MPT-7B-Chat is a chatbot finetuned on Alpaca & friends. MPT-7B-StoryTeller-65k+ is finetuned on books w/context 65k; it writes awesome fiction.
Tweet media one
1
2
34
@DbrxMosaicAI
Databricks Mosaic Research
1 year
👀 A LOT more possibilities are about to open up. 64K context length means that your LLM can consume and process much longer documents AND write longer responses for text generation! Stay tuned for a major LLM announcement later this week.
@NaveenGRao
Naveen Rao
1 year
🤯🤯 LLM trained with 64K+ context length! What could you do with that? Prompted our model with the ENTIRE contents of "The Great Gatsby" and asked it to write the epilogue. Snippet 👇 Model dropping soon to an open-source repo near you. Epilogue: It seemed to me that Gatsby
41
90
678
0
3
33
@DbrxMosaicAI
Databricks Mosaic Research
2 years
Check out our deep-dive blog post on the Mosaic ResNet training recipe. See the details of our observations, and how to reproduce these results for your needs. We're able to achieve 7x faster training for ResNet-50, and so can you! #EfficientML
0
4
32
@DbrxMosaicAI
Databricks Mosaic Research
1 year
The team at @Replit is doing amazing work, and we’re thrilled to provide them with the MosaicML platform to fuel their AI model training needs. Check out their post that shows a holistic view of the LLM lifecycle, and the ecosystem they’ve built:
0
9
32
@DbrxMosaicAI
Databricks Mosaic Research
11 months
How much did it cost to train? At list price on @MosaicML , it was between $714k and $871k depending on your GPU choice. It's also incredibly cheap to fine-tune, at between $714 and $871 per 1B tokens.
Tweet media one
2
0
29
@DbrxMosaicAI
Databricks Mosaic Research
3 years
One of our researchers just went through @TimDettmers ' paper on 8-bit optimizers: . It is a pretty cool and very practical way to reduce memory consumption for large models. #EfficientML (1/4)
2
4
32
@DbrxMosaicAI
Databricks Mosaic Research
2 years
A new standard for performance made easy. Our MLPerf results today show leading NLP performance, speeding up training of @huggingface BERT models by 2.7x. Easily enable our optimizations on the MosaicML Cloud with a single flag.
Tweet media one
3
7
31
@DbrxMosaicAI
Databricks Mosaic Research
6 months
A very enjoyable #podcast with @jefrankle , Chief Scientist ( #neuralnetworks ) with @databricks / @MosaicML with @jthandy ( @getdbt ) and @j_schottenstein ( @LangChainAI ) Listen to the Analytics Engineering podcast with Jonathan now!
Tweet media one
0
11
31
@DbrxMosaicAI
Databricks Mosaic Research
1 year
What about with FP8? To test this, we integrated NVIDIA’s TransformerEngine into our LLM training stack. As advertised, this took just a few lines of code. On a billion-parameter LLM, convergence in FP8 equaled that of BF16 with no hyperparameter changes! [4/6]
Tweet media one
1
1
31