Philipp Schmid Profile Banner
Philipp Schmid Profile
Philipp Schmid

@_philschmid

16,404
Followers
658
Following
462
Media
1,856
Statuses

Tech Lead and LLMs at @huggingface 👨🏻‍💻 🤗 AWS ML Hero 🦸🏻 | Cloud & ML enthusiast | 📍Nuremberg | 🇩🇪

Nürnberg
Joined June 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@_philschmid
Philipp Schmid
1 year
Exciting news! 🚀 I am super happy to share our new partnership with @awscloud 🤝 We will work together on making AI open, accessible, and affordable for every company and individual! 👀💸 👉
16
25
192
@_philschmid
Philipp Schmid
1 year
The first open-source ChatGPT alternative got released! 🚀 @togethercompute released a 20B chat-GPT model on Apache-2.0 🗣🆕 You can try it for free on Hugging Face. 😍 Demo: Model: Announcement:
Tweet media one
45
399
2K
@_philschmid
Philipp Schmid
1 month
Casual Easter Monday with a huge gift from @OpenAI !🤯 They just released an old GPT-3.5 version. 😍 👉
Tweet media one
119
203
1K
@_philschmid
Philipp Schmid
3 months
Gemma an open Gemini LLM released by Google! 🤯  @Google just released Gemma, their first open LLM based on Gemini, which outperforms Mistral AI 7B! 🤯 Gemma comes in 2 different sizes, 2B & 7B, and can be commercially used! 🔥 TL;DR; 🧮 2B & 7B parameter Instruction and base…
Tweet media one
20
255
1K
@_philschmid
Philipp Schmid
3 months
Introducing Hugging Chat Assistant! 🤵 Build your own personal Assistant in Hugging Face Chat in 2 clicks! Similar to @OpenAI GPTs, you can now create custom versions of @huggingface Chat! 🤯 An Assistant is defined by 🏷️ Name, Avatar, and Description 🧠 Any available open…
Tweet media one
36
243
869
@_philschmid
Philipp Schmid
4 months
What's the best way to fine-tune open LLMs in 2024? Look no further! 👀 I am excited to share “How to Fine-Tune LLMs in 2024 with Hugging Face” using the latest research techniques, including Flash Attention, Q-LoRA, @OpenAI dataset formats (messages), ChatML, Packing, all built…
20
174
856
@_philschmid
Philipp Schmid
1 year
New Open-source LLMs! 🤯 The Falcon has landed! 🦅 TII just released two new open-source LLMs called Falcon, which comes into size 7B trained on 1.5T tokens and 40B trained on 1T Tokens. 🚀🔥 7B: 40B:
Tweet media one
23
156
779
@_philschmid
Philipp Schmid
11 months
The Falcon models are taking the open-source LLM space by storm! Falcon 🦅 offers commercial use through the Apache 2.0 license🔓 At @huggingface , we wrote a comprehensive blog post covering everything you need to know about the Falcon models. 👉 🧵 1/2
11
153
690
@_philschmid
Philipp Schmid
1 year
Meet your new coding buddy!😱 We are excited to announce StarChat 💬 - an open-source ChatGPT-like model to answer all your coding questions🚀🎉 Trained on 40k+ conversations to help you code, debug & more in over 80 languages🌎
Tweet media one
11
159
688
@_philschmid
Philipp Schmid
19 days
Easily Fine-tune @AIatMeta Llama 3 70B! 🦙 I am excited to share a new guide on how to fine-tune Llama 3 70B with @PyTorch FSDP, Q-Lora, and Flash Attention 2 (SDPA) using @huggingface build for consumer-size GPUs (4x 24GB). 🚀 Blog: The blog covers: 👨‍💻…
18
153
681
@_philschmid
Philipp Schmid
4 months
We're excited to announce our partnership between @huggingface and @Google Cloud! 🤗 We will collaborate with Google to foster open AI innovation across open science, open-source, cloud, and hardware. 🧠 Read more: Why This Matters: Keeping AI Open,…
54
124
564
@_philschmid
Philipp Schmid
9 months
It didn't take 48 hours for the open source community to fine-tune Code Llama to beat GPT-4 (March) version on Humal Eval! 🤯 👉 Model comes from the awesome @WizardLM_AI group! 🔥
Tweet media one
13
103
577
@_philschmid
Philipp Schmid
10 months
LLaMA 2 released! 🚨🔔 @Meta just released LLaMa 2 the next iteration of LLaMA with a commercial-friendly license.🤯😍 LLaMA 2 comes in 3 different sizes, 7B, 13B, and 70B. The 7B & 13B use the same architecture as LLaMA 1 and are a 1-to-1 replacement for commercial use🔥 🧵1/4
Tweet media one
4
162
573
@_philschmid
Philipp Schmid
1 month
Gemma can now code!🤯 🔔  @GoogleDeepMind just released Code Gemma, a collection of specialized open code models. Code Gemma comes in two different sizes 2B & 7B, excellent for on-device code completion. 🧑🏻‍💻 TL;DR; 🧮 2B & 7B with 8192k context 🛫 initialized from Gemma Base 🔠…
Tweet media one
9
126
568
@_philschmid
Philipp Schmid
4 months
RAG or Fine-tuning 🤔 What's better? RAG? Fine-tuning? or a combination? @Microsoft created a detailed case study on RAG and fine-tuning for domain-specific applications here in the agricultural sector. 👩‍🌾 A must-read for everyone who wants to learn or refresh his knowledge on…
Tweet media one
6
121
555
@_philschmid
Philipp Schmid
4 months
Transform Screenshots into HTML Code! The @huggingface multimodal team released Websight, a dataset of 823,000 pairs of website screenshots and HTML/CSS code. 🤯  Websight is designed to train Vision Language Models (VLMs) to convert images into code. The dataset was generated…
Tweet media one
8
120
522
@_philschmid
Philipp Schmid
10 months
Open-source LLMs like Falcon, (Open-)LLaMA, X-Gen, or StarCoder, come a long way and can compete with models like GPT4 on certain use cases. 🥊 I'm excited to share a new blog on how to deply LLMs using @huggingface Inference Endpoints. 👉 🧵 1/3
12
144
520
@_philschmid
Philipp Schmid
11 months
Introducing StarChat Beta β 🤖 Your new coding buddy 🙌Attention all coders and developers 💻  You can write in plain English, and it will understand your queries, offer explanations, and provide step-by-step guidance to solve coding problems 🤯 👉  🧵1/4
Tweet media one
15
122
509
@_philschmid
Philipp Schmid
11 months
Open-source LLMs are behind commercial models when it comes to context length. 🔠 @OpenAI GPT-3.5 now has 16k, GPT-4 of 32k and @AnthropicAI Claude up 100k💪🏻  For example, Meta LLaMa or Falcon have only 2k😔 Here are two amazing blog posts I found in the last week .🚀😍 🧵 1/3
17
98
498
@_philschmid
Philipp Schmid
3 months
Code Llama 70B is here!🤯🧑🏻‍💻  @AIatMeta just released CodeLlama 70B, the latest and biggest iteration of Code Llama! 🚀 Code Llama 70B achieves 67.8% on HumanEval, reaching the initial GPT-4 performance! 👨‍🏫 Key facts✨: 🧮 70B parameter initialized from Llama 2 🔠 Trained on 500B…
Tweet media one
9
99
490
@_philschmid
Philipp Schmid
1 year
📣 Exciting News! Falcon Models from TII are now under the Apache 2.0 License! 🚀 🔓 You can now leverage the best-performing open source models in commercial projects. 🙌 🦅 👉
Tweet media one
8
93
478
@_philschmid
Philipp Schmid
1 year
New open-source chat-GPT model alert! 🚨 @togethercompute released a new version of their chatGPT-NeoX 20B model with higher quality by fine-tuning on user feedback. 🚀🔥 Demo: Model:
Tweet media one
4
94
449
@_philschmid
Philipp Schmid
2 years
6 months ago, EleutherAI released GPT-J 6B, an open-source alternative to GPT-3🧠 But deploying it was very challenging until today📈 Check out my new blog on how to deploy GPT-J using Transformers and Amazon SageMaker🚀🤯 👉🏻  👉🏻
Tweet media one
4
71
447
@_philschmid
Philipp Schmid
1 year
🚨Attention #NLP enthusiasts! We just published a new blog post on how to fine-tune FLAN-T5-XXL using DeepSpeed & Hugging Face Transformers! 🚀 👉 We ran a series of experiments to help you choose the right hardware setup.🤖💻
7
84
448
@_philschmid
Philipp Schmid
4 months
Teaching LLMs new knowledge or a language requires a lot of training and can lead to “forgetting” its previous skills! Maybe not anymore. 🤯 New research from Tencent shows that you can expand the knowledge of LLMs by making it “bigger” without making it forget its previous…
13
84
441
@_philschmid
Philipp Schmid
4 months
We got a late Christmas gift from @Microsoft ! 🎁🤗 Microsoft just changed the license for their small LLM phi-2 to MIT! 🚀 Phi-2 is a 2.7 billion parameter LLM trained on 1.4T tokens, including synthetic data, achieving 56.7 on MMLU, outperforming Google Gemini Nano. TL;DR: 🧮…
Tweet media one
8
65
432
@_philschmid
Philipp Schmid
4 months
Introducing Clipper: The easiest way to convert HTML to Markdown for RAG applications! 🚀 Clipper is a CLI tool that simplifies the process of clipping content from web pages and converting it to Markdown format. With Clipper, you can build markdown datasets for training your…
Tweet media one
7
71
430
@_philschmid
Philipp Schmid
2 months
Introducing embedding quantization!💥 A new technique to quantize embeddings to achieve up to 45x faster retrieval while keeping 96% accuracy on open Embedding Models. This will help scale RAG Application! 🚀 TL;DR: 📝 🔥 Binary quantization: 32x less storage & up to 45x faster…
Tweet media one
Tweet media two
12
67
418
@_philschmid
Philipp Schmid
11 months
Hugging Face LLM Inference Container now supports Falcon 7B and Falcon 40B deployments on Amazon SageMaker🦅🚀 Falcon is the best performing open source LLM available today for commercial use under the Apache 2.0 license! 🤑 👉
11
79
408
@_philschmid
Philipp Schmid
10 months
Thrilled to share a new blog post on how to fine-tune LLaMa 2 with QLoRA and Hugging Face on Amazon SageMaker🧑🏻‍💻 The blog post includes instructions for 7B, 13B, and 70B versions of the Model alongside Hardware requirements. 🖲 👉
11
104
408
@_philschmid
Philipp Schmid
1 month
On-Device 2B LLMs for actions, outperform GPT-4 🤯 The “Octopus v2: On-device language model for super agent” proposes a new method to create on-device agents. 📱🔄 Implementation 1️⃣ Define supported functions as special tokens, e.g. <func_1> and add them to the tokenizer 2️⃣…
Tweet media one
10
110
407
@_philschmid
Philipp Schmid
6 months
Using LLMs (GPT-4) as an evaluator for smaller models is becoming the de facto standard. However, relying on closed-source models is suboptimal due to missing control, transparency, and versioning 🤔  The recent paper shows that open LLMs match GPT-4 evaluation skills 🚀 🧶
Tweet media one
18
69
400
@_philschmid
Philipp Schmid
2 months
How are open LLMs trained and created in 2024? 🤔 @01AI_Yi just released their paper on how they created the YI, a family of LLMs and V-LLMs. The paper includes details on the data processing, training and multimodality part. Let's take a look👀 First Data: How does their data…
Tweet media one
6
88
401
@_philschmid
Philipp Schmid
20 days
Data is all we need! 👑 Not only since Llama 3 have we known that data is all we need. Excited to share 🍷 FineWeb, a 15T token open-source dataset! Fineweb is a deduplicated English web dataset derived from CommonCrawl created at @huggingface ! 🌐 TL;DR: 🌐 15T tokens of cleaned…
Tweet media one
14
87
393
@_philschmid
Philipp Schmid
2 years
Serverless Deep Learning enters the next chapter📘 Amazon SageMaker Serverless Inference is a new fully managed serverless inference option🚀 Learn how to deploy @hugigingface transformers serverless in 1 line of code🤯 🧑🏻‍💻 📘
Tweet media one
7
61
386
@_philschmid
Philipp Schmid
26 days
We can do it! 🙌 First open LLM outperforms @OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯 TL;DR; 🧮 Mixtral 8x22B based (141B-A40 MoE) 🔓 Apache 2.0 license 🤖 First > 9.00 on MT-Bench with an open LLM 🧬 Used multi-step…
Tweet media one
13
79
391
@_philschmid
Philipp Schmid
1 year
OpenAssistant released their Conversational dataset (OASST1) under Apache 2.0. 🤯😍 The dataset includes: 💭 161,443 messages 🌳 66,497 conversation trees 🇺🇸 35 different languages and was created by 13,500 volunteers. 🤗 👉
1
116
386
@_philschmid
Philipp Schmid
1 year
Training FLAN-T5-XXL (11B) on a single consumer-size GPU impossible? 🤔 No, not anymore!! 🤯 We created a blog post on how to train FLAN-T5-XXL on a single GPU using LoRA. 🥳 👉
6
79
374
@_philschmid
Philipp Schmid
3 months
Am I the only one who thinks @OpenAI used @UnrealEngine or something similar to create synthetic data? The Sora videos look like videos games to me.
33
14
368
@_philschmid
Philipp Schmid
1 month
We are lowering the prices for Compute on Hugging Face by up to 50%!🤯 Yes, you heard it right @huggingface Spaces & Inference Endpoints are now, on average, 20% cheaper than AWS EC2 on-demand! 🤑 We were working hard to unify and improve our infrastructure to reduce our and…
Tweet media one
10
73
368
@_philschmid
Philipp Schmid
5 months
OpenChat is currently one of the best open @ChatGPTapp alternatives. 🚀 The team ( @alignment_lab ) behind OpenChat released the paper on how they achieved ChatGPT (GPT-3.5) performance. The performance of OpenChat comes from its strong base model ( @MistralAI 7B LLM) and in…
Tweet media one
6
69
363
@_philschmid
Philipp Schmid
18 days
Phi-3 mini model released under MIT! 🚀 Last Week Llama 3, this week Phi-3 🤯  @Microsoft Phi-3 comes in 3 different sizes: mini (3.8B), small (7B) & medium (14B). Phi-3-mini was released today, claiming to match Llama 3 8B performance! 🚀 3.8B TL;DR: 2️⃣ Instruct Versions with 4k…
Tweet media one
13
97
364
@_philschmid
Philipp Schmid
2 months
Introducing StarCoder 2 ⭐️ The most complete open Code-LLM 🤖 StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance…
Tweet media one
11
80
364
@_philschmid
Philipp Schmid
10 months
Are Vector Databases here to stay? 🔍 Yes, it seems LLMs are lost in the Middle and lost focus on long inputs.🗺👁‍🗨 In “Lost in the Middle: How Language Models Use Long Contexts,” researchers from Stanford tried to understand better how LLMs make use of the context📚✨ 🧵1/5
Tweet media one
9
84
351
@_philschmid
Philipp Schmid
3 years
Today is the day. After almost 8 months first time at the @huggingface office ever. 🧑🏻‍💻🏢 All of them exists I can confirm Hugging Face isn’t an AGI 🤯🤗
Tweet media one
6
12
355
@_philschmid
Philipp Schmid
3 months
2.4x faster generation with Llama 70B and new NVIDIA AI XQA-kernel! 🤯  @NVIDIAAI just open-sourced a new GPU kernel (XQA) optimization for MQA and GQA during the generation phase. 🚀 With XQA enabled, NVIDIA could boost the throughput of Meta Llama 70B on H200 (🆕) from 1227…
Tweet media one
13
62
349
@_philschmid
Philipp Schmid
6 months
Whisper just got smaller, faster!🔔 The audio team at @huggingface released DistilWhisper, a distilled version of @OpenAI Whisper🧪   DistilWhisper is a drop-in replacement for Whisper on English speech recognition being 5.8 times faster and accuracy within 1% WER🤯 🧶
Tweet media one
4
70
340
@_philschmid
Philipp Schmid
3 months
Can LLMs solve complex problems like humans?🧠 SELF-DISCOVER from Google Deepmind proposes a new framework that teaches itself to think critically and step-by-step, mimicking human reasoning! It's boosting LLMs problem-solving skills by up to 32% 🧮 Implementation: Stage 1: 1️⃣…
Tweet media one
7
89
337
@_philschmid
Philipp Schmid
1 year
Have you heard the news? 🗞️ Not one but two new open-source LLMs have been released! 🌍 @MosaicML and @togethercompute released new 7B LLM models under the Apache 2.0 license. 🤯 Available on Hugging Face: Together: MosaicML:
Tweet media one
6
70
332
@_philschmid
Philipp Schmid
1 year
Introducing StarCoder ⭐️ a 15B open-source Code-LLM created by @huggingface and @ServiceNow through @BigCodeProject 🔡 8192 token context window 📊 trained on 1 trillion token 💭 80+ Programming languages 🔐 only permissive licensed data ✅ commercial use
4
74
328
@_philschmid
Philipp Schmid
2 months
GPU Poor, no more you are.✨ We are excited to announce “Train on DGX” Cloud on @huggingface to train open LLMs on one or more @nvidia H100 and L40S. 🤯  Train on DGX Cloud is now available to every Enterprise Hub organization! 🏙️ Train on DGX: 🚂 Is powered by Hugging Face…
Tweet media one
6
61
328
@_philschmid
Philipp Schmid
7 months
Llama 2 outperforms GPT-3.5 on long contexts! 🤯  @AIatMeta silently released Long Llama, a series of new Llama 2 models with up to 32k context. 🏆 Long Llama 2 70B is on par with @OpenAI GPT-4 for summarization and outperforms GPT-3.5 16k on 7/10 long context tasks! 🧶
Tweet media one
11
65
315
@_philschmid
Philipp Schmid
5 days
If you are using Whisper for transcription, listen⁉️👂We created an optimized Whisper with Speaker Diarization for @huggingface Inference Endpoints 🤗 We created a reference implementation that optimizes Whisper with Flash Attention and Speculative Decoding and combines it with…
Tweet media one
9
57
320
@_philschmid
Philipp Schmid
2 months
Zehpyr 7B Gemma released!🔷🔶 We are excited to announce a Zephyr Gemma, the best-fine-tuned version of @Google Gemma 7B. Outperforming Google Gemma Instruct on 5 out 6 benchmarks, including MT Bench, AGIEval & GPT4All. 🤯🚀 Zephyr Gemma TL;DR; 🧠 Fine-tuned Gemma 7B on DEITA…
Tweet media one
5
79
311
@_philschmid
Philipp Schmid
6 months
How can we teach LLMs to be factual, correct, and more reliable? 🤔 RAG is one approach to adding information to the prompt. But, always retrieving can lead to bad responses😔 Self-RAG proposes a new method to teach LLMs when to retrieve information and how to use it.🤯 🧶
Tweet media one
6
53
313
@_philschmid
Philipp Schmid
15 days
Llama 3 extended to almost 100,000-token context! ✅ By Combining PoSE and continuing pre-training on Llama 3 8B base for 300M tokens, the community ( @winglian ) managed to extend the context from 8k to 64k. 🚀 Applying rope scaling afterward led to a supported context window of…
Tweet media one
7
56
311
@_philschmid
Philipp Schmid
9 months
Code Llama with @huggingface 🤗 Yesterday, @MetaAI released Code Llama, a family of open-access code LLMs! Today, we release the integration in the Hugging Face ecosystem🔥 Models: 👉 blog post: 👉  Blog post covers how to use it!
7
80
298
@_philschmid
Philipp Schmid
8 months
Falcon 180B released! 🚨🦅 @TIIuae just their new Falcon model which beats OpenAI GPT-3.5 and is on par with Google's PaLM-2 Large👑 With 180 billion parameters, Falcon 180B is 2.5x larger than Llama 2 and was trained on 4x more compute🤯 👉 🧶
12
104
304
@_philschmid
Philipp Schmid
6 months
Zephyr code released! The @huggingface H4 team has just released the code for Zephyr-7b, which was trained with Direct Preference Optimization.🚀 👉 The handbook contains robust recipes for: 📚 Supervised fine-tuning 🎯 Direct preference optimization
6
66
300
@_philschmid
Philipp Schmid
4 months
In RAG, retrieving the right information is and will become crucial for success with powerful LLMs. This leads to how we can improve embedding models for my queries and data? 🤔 A new paper by Microsoft, “Improving Text Embeddings with Large Language Models,” proposes using LLMs…
5
56
300
@_philschmid
Philipp Schmid
11 months
New open-source LLMs! 🔔 @salesforce released XGen 7B, a new LLM with an 8k context under Apache 2.0🔓  XGen has the same architecture as @MetaAI LLaMa ca be used 1-to-1 replacement for commercial use! 🔥  👉 🧵 1/2
Tweet media one
5
73
293
@_philschmid
Philipp Schmid
1 month
Serverless LLMs for Developers!🌪️ We are excited to announce "Deploy on @Cloudflare Workers AI" on @huggingface enabling developers to easily use open LLMs as serverless APIs powered by Cloudflare's edge GPU data centers. 🚀😍 It’s the first integration of our partnership with…
12
67
292
@_philschmid
Philipp Schmid
7 months
How can we improve reasoning and reduce hallucinations of LLMs? 🤔 @GoogleDeepMind and a group of researchers propose a prompting technique, “Hypotheses-to-Theories (HtT)” to teach LLMs rules to improve reasoning and reduce hallucination. 🧠 🧶
Tweet media one
6
71
288
@_philschmid
Philipp Schmid
1 month
New open model from @MistralAI ! 🧠 Yesterday night, Mistral released Mixtral 8x22B a 176B MoE via magnet link. 🔗🤯 What we know so far: 🧮 176B MoE with ~40B active 📜 context length of 65k tokens. 🪨 Base model can be fine-tuned 👀 ~260GB VRAM in fp16, 73GB in int4 📜 Apache…
Tweet media one
11
67
291
@_philschmid
Philipp Schmid
3 months
Code Llama 🦙 + Web Search 🌐 = 10x dev 🧑🏻‍💻! Try on @huggingface for free!! 🤗
Tweet media one
8
51
288
@_philschmid
Philipp Schmid
6 months
How can you evaluate LLMs?🤔 Two approaches are human evaluations and using LLMs as a judge👩‍⚖️ As human evaluation is expensive, I wrote a hands-on blog post using the LLMs to evaluate RAG and other applications using @huggingface and @LangChainAI 👉 🧶
3
55
283
@_philschmid
Philipp Schmid
1 year
End of last year, Google open-sourced FLAN-T5, a better T5 model in any aspect We created a new repository on Hugging Face, which implements a custom handler for inference Endpoints that allows you to deploy FLAN-T5-XXL on a single A10G GPU. 🤯😍 👉  🧵:
Tweet media one
6
51
284
@_philschmid
Philipp Schmid
2 months
Can we make RAG applications more robust with fine-tuning? A paper by @Microsoft and UC Berkley put this to the test to see if small open LLMs, like @AIatMeta Llama 7B, can match @OpenAI GPT-3.5. They called it “Retrieval Augmented Fine Tuning (RAFT)”, where you train an LLM…
Tweet media one
5
62
275
@_philschmid
Philipp Schmid
2 months
Elon Musk kept his word and released Grok-1🤯 Grok-1 is a 314B big Mixture-of-Experts (MoE) transformer. 🧐 What we know so far: 🧠 Base model, not fine-tuned ⚖️ Apache 2.0 license 🧮 314B MoE with 25% active on a token 📊 According to the initial announcement; 73% on MMLU,…
Tweet media one
8
68
277
@_philschmid
Philipp Schmid
1 year
Multi-query attention (MQA) is gaining popularity in LLMs. With the release of StarCoder 14B and Falcon 7B/40B, we have two accessible LLMs using it 🔥  MQA shares key and value matrices across attention heads, which enables to generate longer text using less memory. 🧠 🧵
Tweet media one
5
51
274
@_philschmid
Philipp Schmid
1 year
Generative AI to power the Next-Gen Document Understanding 📄🤖 Say goodbye to traditional OCR and welcome Donut, an MIT-licensed Generative AI model that processes your documents directly. 🤖🥇 👉
7
52
272
@_philschmid
Philipp Schmid
1 year
🎯 Goal: Summarize chats & dialogues ✏️ Content: Fine-tune @GoogleAI FLAN-T5 for summarization ✅ Result: Summarize ChatGPT dialogues with @huggingface Transformers 📔Blog:
10
66
269
@_philschmid
Philipp Schmid
5 days
New MoE alert! 🔔 DeepSeek V2 Chat just got released, a 236B parameter Mixture of Experts Model with a 128k context window and 21B active parameter. 🏎️⚡️ TL;DR 🧮 236B parameters with 21B activated during generation 👨‍🏫  160 experts with 6 active in generation 🚀 Matches Mixtral…
Tweet media one
6
56
276
@_philschmid
Philipp Schmid
2 months
GaLore is a new Memory Efficient Fine-tuning Technique, for “full-tuning” billion-parameter models, like Llama 2 7B on consumer-size GPUs. In contrast to LoRA, GaLore reduces the memory by projecting the optimizer states and gradients into a lower-dimensional. 🤯 TL;DR 📝 🚀…
Tweet media one
3
61
269
@_philschmid
Philipp Schmid
3 months
Can a 2B LLM outperform Mistral 7B or Llama 13B? Creators of the popular Ultrafeedback dataset released MiniCPM, a 2.4B parameter model claiming performance close to Mistral 7B, Llama 2 13B, or Falcon 40B. 🤯🤔 As part of the release, the researchers released a detailed…
8
54
265
@_philschmid
Philipp Schmid
23 days
Llama 3 released! 🚨🔔 @AIatMeta just released their best open LLM! 👑🚀 Llama 3 is the next iteration of Llama with a ~10% relative improvement to its predecessor! 🤯 Llama 3 comes in 2 different sizes 8B and 70B with a new extended tokenizer and commercially permissive license!…
Tweet media one
7
63
266
@_philschmid
Philipp Schmid
30 days
Mixture of Adapters? 🤔 PHATGOOSE proposes a new method to combine Adapters (LoRA) into a single MoE-like architecture with gating and routing mechanism to experts (fine-tunes). Implementation 1️⃣ Select or train a collection of fine-tuned adapters (LoRA) with the same base model…
Tweet media one
8
54
265
@_philschmid
Philipp Schmid
2 months
Gemma fine-tuning with ChatML works! ☑️  I created a minimal example script to fine-tune Gemma 7B on the Dolly dataset using TRL, PEFT, Flash Attention with LoRA, and the @OpenAI chatML format. 🫡 It should help others to be unblocked. Let me know if it's too minimalistic I…
Tweet media one
9
45
262
@_philschmid
Philipp Schmid
2 months
Introducing OpenHermesPreferences 1M! 🦋 We just released the largest open AI preference dataset on Hugging Face! 🤯 Together with @argilla_io , we extended the OpenHermes ( @Teknium1 ) dataset into a pair-wise comparison dataset for RLHF and DPO. 🧠 Dataset: 📦 Size: ~1 million AI…
Tweet media one
6
56
257
@_philschmid
Philipp Schmid
2 years
Transformers are changing machine learning, starting with NLP, and now, with audio and computer vision💬👄 👀 You can now use the Hugging Face Inference DLC to do automatic speech recognition using wav2vec2 model or WavLM🤯 🖼  📈
Tweet media one
3
59
254
@_philschmid
Philipp Schmid
5 months
What if you can improve LLMs using direct customer feedback? 🤔 Most alignment methods like RLHF or DPO require multiple outputs from the same prompt to improve the model by learning the preferred one. 👀 In real-world use cases, you oftentimes only have the option to collect…
Tweet media one
3
59
252
@_philschmid
Philipp Schmid
11 months
"grouped-query attention" (GQA) from Google proposes a method to convert models from multi-head attention (MHA) to GQA 🧠 GQA claims to offer similar benefits to multi-query attention (MQA) with faster inference via reduced # key-value heads.🤯🛫 👉  🧵1/3
Tweet media one
3
59
248
@_philschmid
Philipp Schmid
17 days
First open LLM from @SnowflakeDB !  Arctic is 480B Dense-MoE with a 10B dense transformer model and a 128x3.66B MoE MLP designed specifically for enterprise AI. 🤔 TL;DR: 🧠 480B parameters with 17B active during generation 👨‍🏫  128 experts with 2 active in generation 2️⃣ Instruct…
Tweet media one
12
52
252
@_philschmid
Philipp Schmid
5 months
🚨Never trust marketing content🚨 Fixed the results of @GoogleAI Gemini Ultra on MMLU. Details: But yes Gemini Ultra > GPT-4 on CoT @32 according to the report.
Tweet media one
15
34
248
@_philschmid
Philipp Schmid
4 months
Can LLMs improve themselves? Self-play fine-tuning (SPIN) is a new method that promises to enhance the performance of LLMs without needing additional human-annotated data beyond the original fine-tuning dataset. 🎮  The paper aims to improve an LLM by having it iteratively…
Tweet media one
5
55
243
@_philschmid
Philipp Schmid
8 months
Not, Yet another RoPE extensioN method! 🙄 But listen, “YaRN” allows you to scale LLMs like llama 2 to over 100k context! 🤯 Llama 2 13B 128k on🤗 👉  The code and the Paper: 👉
Tweet media one
4
52
244
@_philschmid
Philipp Schmid
5 months
Have you heard of Mixture of Experts (MoE) models? 🤔 With the release of @MistralAI Mixtral 8x7B, MoEs are gaining attention, it is also rumored @OpenAI GPT-4 is an MoE👀  But what exactly are MoEs, and how do they work? We created an in-depth blog.
0
63
242
@_philschmid
Philipp Schmid
11 months
Do we need RL to align LLMs with Human feedback? 🔍👀  Last week, @Stanford researchers unveiled a paper introducing Direct Preference Optimization (DPO) - a new algorithm that could change the way we align LLMs with Human Feedback 🧵 1/3
Tweet media one
3
63
243
@_philschmid
Philipp Schmid
2 years
🌸 BLOOM is here, available and accessible for everyone. 🤗🤝 Get yourself a account and try it yourself: 👉  At a glance : 🔢 176 billion parameter 🌍 59 languages 🔓 Open-Access or read more about 👉
Tweet media one
5
51
236
@_philschmid
Philipp Schmid
10 months
Is Llama 2 special or just a better iteration of Llama 1? 🤔 Over the weekend, I had time to read the paper in which Meta released. 📖 Below are some of my findings, which you might have missed📝 🧵 1/6
Tweet media one
4
52
235
@_philschmid
Philipp Schmid
26 days
Introducing Idefics2, the strongest Vision-Language-Model (VLM) < 10B! 🚀 Idefics2 comes with significantly enhanced capabilities in OCR, document understanding, and visual reasoning. 💬📄🖼️ TL;DR; 📚 8B base and instruction variant 🖼️ Image + text inputs ⇒ Text output 📷…
Tweet media one
2
58
240
@_philschmid
Philipp Schmid
5 months
We just got more details on Mixtral 8x7B from @MistralAI 🧠 Mixtral is sparse mixture of expert models (SMoE) with open weights outperforming existing open LLMs like Meta Llama 70B.🤯 💪🏻 TL;DR: ⬇️
Tweet media one
2
55
237
@_philschmid
Philipp Schmid
1 year
FLANv2 dataset is available on Hugging Face! 🚨 Hugging Face user, SirNeural uploaded a replica of the FLAN dataset, which was used to fine-tune the FLAN-T5 models to the Hugging Face Hub🤗😍 👉 let's train more instruction fine-tuned models! 📈🔥
3
55
236
@_philschmid
Philipp Schmid
5 months
Access to GPUs is becoming increasingly difficult, especially for fine-tuning your own LLMs, like llama or mistral.😔 Happy to share support for open LLMs on @awscloud Trainium.⚡️ 👉 You can now easily fine-tune LLMs using Hugging Face Optimum Neuron…
2
52
234
@_philschmid
Philipp Schmid
2 months
Can ORPO redefine how we train and align LLMs for RLHF? 🤔 State-of-the-art LLMs followed the process of Base Model → Supervised Fine-tuning → RLHF (PPO/DPO). This is very resource-intensive and complex. 😒 Odds Ratio Preference Optimization (ORPO) proposes a new method to…
Tweet media one
2
66
235
@_philschmid
Philipp Schmid
4 years
I am very honored that my notebook is now an official part of transformers by @huggingface . 🥳🥰🤗 Learn "How to fine-tune a non-English GPT-2 Model with Trainer class".
Tweet media one
7
42
235
@_philschmid
Philipp Schmid
6 months
Did @MSFTResearch leak the parameter count of @OpenAI GPT-3.5 turbo?🤯 According to „CodeFusion: A Pre-trained Diffusion Model for Code Generation“ paper gpt-3.5-turbo has only 20B parameter. Paper:
Tweet media one
14
37
232
@_philschmid
Philipp Schmid
1 month
Jamba released! @AI21Labs just released the first production-scale Mamba implementation! Jamba is a hybrid SSM-Transformer MoE rivaling open transformer-based LLMs 🚀 TL;DR: 🧠 52B parameters with 12B active during generation 👨‍🏫 16 experts with 2 active in generation 🆕 New…
Tweet media one
5
43
235
@_philschmid
Philipp Schmid
3 months
New Embedding Models for Code released by @awscloud ! Embedding Models are at the heart of every RAG application. Without good embeddings, retrieving relevant context to answer your user prompts is impossible. 🔍 Super exciting to see Amazon release CodeSage, a family of open…
Tweet media one
5
38
232
@_philschmid
Philipp Schmid
2 months
Chunking or Splitting your documents correctly is crucial for good RAG applications. 🥇 Without the right concept, you might not be able to retrieve the correct information based on your user input. 🧭 To help you better understand how your documents are split using, e.g.,…
Tweet media one
4
44
233
@_philschmid
Philipp Schmid
3 months
Introducing Messages API with @OpenAI compatibility for @huggingface Inference Endpoints and Text Generation Inference! 🚀  The new API can be directly used with OpenAI's client libraries or third-party tools, like @LangChainAI or @llama_index . 🖼️🦜🦙 Migrating from closed…
Tweet media one
8
47
228