Very excited to share some personal news!
@johnowhitaker
@pcuenq
@multimodalart
and I are writing a book with
@oreilly
about generative ML🤗
We'll cover many topics from theory and practical aspects, discuss creative applications, and more!
What topics would you like to see?
Graphs are **everywhere**, from social media and knowledge systems to molecules and meshes! 🧑🏫
Want to learn about Machine Learning for Graphs? Check out this thread! 🧵
🧵Stable Diffusion weights are officially public, and we got some surprises! 🤗
🤗 Public weights
🧨Support with the diffusers library
🔥Load and use the model with a few lines of code
📖Blog post explaining how it works
The ML ecosystem in France is on fire🔥 It has amazing talent and resources. Here are 10 facts you might not know:
1. There are great research labs - from
@MistralAI
and
@kyutai_labs
to large ones from
@AIatMeta
and
@GoogleDeepMind
. The Llama 2 and CodeLlama authors are based in…
Have you used transformers but not fully grasped how they work internally?👀
Welcome the Random Transformer, a step-by-step walkthrough doing the math of the transformer model. Kick off your year understanding what's going on under the hood.
💫StarCoder: May the Source be with You!
🔥15B LLM with 8k context
🥳Trained on permissively-licensed code
💻Acts as tech assistant
🤯80+ programming languages
🚀Open source and data
💫Online demos
🧑💻VSCode plugin
🪅1 trillion tokens
Follow this amazing experience! 🧵
Yesterday
@GoogleAI
released Flan T5, a model that can solve 1800 different tasks 🤯
Since then
- Open-source models:
- Support in
@huggingface
transformers
- An interactive demo I created to play with the model
I'm super excited to share that today I'm joining
@huggingface
🤗 as a Machine Learning engineer in the Open Source team. Looking forward to contributing to the community and democratizing ML 🚀🚀🚀
🧵Thread on images generated with free Open Source tools. No coding is needed to try them out!
"a monkey head that is only made out of avocado, 3D" by dalle mini
WebGPU will change ML 🤯
With the recent release of ONNX Runtime with WebGPU, in-browser ML is about to change. We can now fully leverage GPUs to run ML models (think of Phi, SD, etc) entirely in the browser
Benchmark in my computer: 40x faster ⚡️
Introducing: Zephyr 141B-A35B 🥁
🔥Mixtral-8x22B fine-tune
🤯 Using DORPO: new alignment algorithm (no SFT, open )
🚀 With 7k instances of (open) data
Very strong IFEval, BBH, AGIEval... Enjoy! 🤗
Combine the power of a simple PyTorch model, CLIP, and
@Gradio
and you get: Draw To Search! 🎨🧑🎨
Try it out here!
Find images from movies in a different way 🔥
#model_of_the_day
OpenAI released Point-E, a text-to-3D (point clouds) demo 🔥
You can check out an open-source demo for it at 🤯Enjoy! The demo uses the lower-quality but much faster version of the model.
Prediction: We're about to face the longest AI Winter in a long, long time❄️ 😱
Starting this weekend, many people will be taking time off to celebrate the new year. We might not have SOTA releases for two or three days📉
You keep reading about sentence embeddings, but you might still not know exactly what they are. You are not alone! 🤗
I wrote a step-by-step walkthrough with code, math, applications, and memes. Kick off your year understanding what embeddings are!
Image Editing with Instructions🔥
Input an image, give an instruction ("remove the boy with blue backbag"), and get your image edited. Amazing what you can do with open-source tools 🤯
Are you overwhelmed by everything happening in the ML ecosystem?
We're doing a small crowdsourced initiative with a high-level distillation+timeline of cool big things happening in the ML landscape. Feel free to contribute! 🤗
NVIDIA, Hugging Face, and ServiceNow release TheStack v2, a massive code dataset🌸
- 37 TB of de-duplicated code
- 913B tokens
- 619 programming languages
Blog post
Data
Welcome Candle, minimalistic ML framework in Rust 🕯️
🦙Whisper, Llama, Falcon, Bert, StarCoder
🖥️Run models in the browser with WASM
✨Flash Attention
💾Dataset loaders
All in Rust.
What can you find at
@huggingface
?
- 26000 models for NLP, Audio, Computer Vision and Tabular Data
- 2500 datasets for different domains
- Over 1500 ML demos
All shared by the community, open to everyone, open source. 🤗
The time of closed ML should be over.
You can now transcribe audio with Whisper 70 times faster than the original implementation! 🔥
Transcribe 2-hour movies in 2 minutes!🤯For free, with Open Source tools!
Kaggle notebook (free TPUs)
GitHub repo:
Apropos of nothing, here is a mini-tutorial on three types of Mixture of Experts (MoE): Pre-trained MoE, upcycled MoEs, and FrankenMoEs.
MoE refresher
MoEs replace the feed-forward layers with sparse MoE layers. These layers contain a certain number of experts (e.g. 8), each one…
🚨Free
@huggingface
courses🚨
Check out our new page with high-quality material 🤗
- NLP and RL courses
- Notebooks, interactive demos, videos
- Lots of exciting things are coming soon!
Which topics would you like to see here?
Open ML is going brrr. In just 5 days
🧱 Databricks releases DBRX
🦾 Mistral releases 7B v2
🚀Qwen1.5 MoE-A2.7B
🐍Jamba, a MoE SSM LLM
🤏Wild 1-bit and 2-bit quantization with HQQ+
3 big pre-trained MoEs, a new Mistral base, and crazy updates for on-device. Let's goo 🚀
Falcon 180B is out🤯
- 180B params
- Trained on 3.5 trillion tokens+7 million GPU hours
- Quality on par with PaLM 2 outperforms Llama 2 and GPT-3.5 across 13 benchmarks
- 4bit and 8bit precision with similar quality
Demo:
Blog:
LLaVA 1.6 is out! 🥳
- Outperforms Gemini PRO on some benchmarks
- Higher resolution than LLaVA 1.5 (up to 4x more pixels!)
- Better OCR capability and instruction-following
- More conversational
Models:
Blog:
Some extremely exciting news!🤗
We are raising $100 million looking forward to building the future of open Machine Learning, from Computer Vision to Reinforcement Learning. The future of ML is collaborative.🤗
Jobs:
Announcement:
Looking forward to the next year:
🎻 Fast-to-generate, synthetic music
☁️ Diffusion models applied for 3D objects (meshes, point clouds)
📽️ High-quality artificial video generation (with audio)
All open source and accessible to the community 🤗
🧠 Would you like to learn about Graph Neural Networks?
Stanford is organizing a workshop with leaders of academia and industry. 🤖 The schedule looks amazing and will be streamed for free!
You can register at
OpenBMB, the creators of UltraFeedback, silently released a series of very strong edge models!
- 2.4B base model close to Mistral 7B
- 2.4 DPO outperforming Llama 70B on MT Bench
- A 3B bilingual VLLM (+12B version RLHF VLLM)
Check the models at 🚀
Each week the
@huggingface
Spaces of the week look more 🔥
- AudioLM: Text to Audio
- Text to Motion
- BioGPT
- BLIP2: Image to text
- Instruct Pix2Pix
- CoCa: Image captioning
- Lip Movement Recognition
- GLIGen for text-to-image
Check them out in
Introducing: Zephyr Gemma!
The community has struggled to do a good preference-tune of Gemma, so we built an open-source recipe and trained a model to help people get started.
Model:
Demo:
Handbook:
Open Source ML is going brrr 🚀 Here are 5 amazing free OS demos
1⃣Stable Diffusion + ControlNet for hand control
Demo:
Look at those fingers! 😍
⬅️+promp = Input
➡️ = Output
Dear Open Source Community,
I'm about to have a trip of 19 hours with no internet access (going from Europe to South America).
Please hold off of releasing new SOTA models for a bit. I'll be back soon.
Appreciated,
Hacker Llama🤗
MoEs paper list: a chronological, annotated curation of MoE-related papers.
From Outrageously Large Neural Networks all the way to Mixtral, check it out!
Did you know that
@huggingface
transformers has a new document-question-answering pipeline that lets you get insights on your documents and invoices with 3 lines of code? 🤯
Try it out in this colab ⤵️
Introducing...🥁🥁Mamba models are now compatible with 🤗transformers! Mamba models are super fast (scale well!) and have solid quality⚡️
- Generation utilities
- PEFT fine-tuning
- TRL support
Check the repos here!
Did you know
@huggingface
tokenizers lib is written in Rust! You can get huge speedups (e.g.
@chainyo_ai
recently tokenized his 40GB dataset in 5 minutes rather than 4 hours) 🔥
Is working at Hugging Face worth it?
- Open source
- Lots of flexibility (in schedule/work topics)
- No meetings+async
- Team members from all around the world
- Collaborative environment (internally and externally)
- Competitive compensation
- Keep at forefront of ML
- Growth
If you finetune Microsoft's Phi...is it called phinetuning?
It seems the answer is yes! Check out this community tutorial about end-to-end phinetuning, from the dataset to the benchmarking!
So a researcher at <BigTech> kinda DoS-ed one of our free APIs by abusing them to do benchmarking of Mistral, Llama, and a few other models 🙃
You would have expected them to have the hardware resources or at least pay for the API...Weren't they the GPU-rich?
Want to learn about Q-Learning?🧠
Check out
@ThomasSimonini
free open-source Deep Reinforcement Learning course! where you learn in a practical way about:
- Deep Q-Learning
- Policy Gradient
- Actor-Critic Methods
- PPO
Check it out!
🧵ML interview questions of the week! Back to the basics.
1⃣ What is cross validation? Why is it needed?
2⃣🤯What is the curse of dimensionality?
3⃣♨️What is one-hot encoding?
If anyone was affected by the BigTech layoffs and is ready to look for a new job: at
@huggingface
we're still hiring for all kinds of roles (from developer advocacy to ML engineering). We're looking for growth and generalist mindsets and people excited to work in OS ML. Ping me🤗
"Can you figure out what the experts in a Mixture of Experts model are each specialized in?"
Yes, this is touched on in the Mixtral paper (2024) and discussed quite extensively in the ST-MoE paper (2022), section 7. Also summarized in
People's intuition…
.
@_sholtodouglas
poses a challenge.
In the spirit of
@natfriedman
(whose Vesuvius Challenge was solved by a listener of my podcast -
@LukeFarritor
).
Can you figure out what the experts in a Mixture of Experts model are each specialized in?
"A wonderful research project to do:…
So excited about the launch of Kyutai. What is it, and why does it matter:
What is it?
A strongly funded open science and open source research lab just announced in France. It will focus on high-quality open research, training, mentoring, and contributing to the AI ecosystem.…
CodeGemma is here 🔥Official model from Google with impressive code results for its size
Three models
- 2B for code generation and infilling
- 7B for code infilling and natural language
- 7B for instruct following
Enjoy! 🤗
At
@huggingface
, we're still hiring and have a strong remote-friendly, decentralized culture 🤗
Check some jobs at , but you can also apply in Wild Card if there's no good fit for your profile
The
@huggingface
RL Course just launched its first unit!
🔥
This is not the typical RL course. You get to train an agent, share it with the community, and compare your results on a leaderboard. Too bad I'm at the bottom 😅
What happened in the open-source AI world in August? August is traditionally a slow month...but not for AI it seems! 👇Here is a recap!
Code goes wild💻🦙
- Just 6 months after LLaMA,
@MetaAI
releases Code Llama, a family of LLMs for code . You can now…
E5 mistral-7b: New technique and SOTA model for text embeddings by Microsoft
Paper:
Model:
Leaderboard:
- Only trained on synthetic data (for 93 langs)
- Decoder-only LLM 🤯(Mistral 7B fine-tune)
- Tops…
How fast has the
@huggingface
Hub grown since exactly a year ago?
Models: 20k->150k (x7)
Datasets: 5k->31k (x6)
Spaces: 300->14k (x46 😅 but we were just launching)
Bets for next year?
This weekend we achieved a new milestone at
@huggingface
. We reached 30000 public models! 🥳🚀
Let that number sink in. Thousands of individuals and organizations have shared their ML models public and available for the whole ecosystem.
🎉🎉🎉
One step closer to 1-bit quantization🤯Hidden gem from last week with released code:
Extreme low-bit quantization via partially binarized LLMs while maintaining performance (which previous binarization methods failed to do)
Proposal: with new MoEs, let's discuss less about the total number of experts, and instead focus on the two main things that we care about:
- # of total params
- # of default activated params
Mixtral-8x7B -> Mixtral-47B-A12B
Mixtral-8x22B -> Mixtral-141B-A35B
5 open-access video generation models 📹
1. Stable Video Diffusion (image-to-video)
2. LaVie (text-to-video)
3. SEINE (image-to-video)
4. Hotshot-XL (gifs)
5. ModelScope ttv
The space is on 🔥Here is a blog post from May introducing the space
Massive release: Qwen 1.5 is out!
- Models from 0.5B to 72B
- Chat models released
- Very strong metrics (best base model, strong chat one!)
- Support long contexts
30 new models are out!
Enjoy 🚀
Another week in Open ML 🥳
- Cohere Command R+
- Google Gemma Instruct 1.1
- Qwen 1.5 32B model family is out
- JetMoE
- Sailor: LMs for South-East Asia
- Mixture of Depths replication
- Two different bitnet 1.5 open-source replications
Open ML going brrr 🚀