Philipp Schmid @_philschmid Twitter profile

Pinned Tweet

Philipp Schmid

1 year

Exciting news! 🚀 I am super happy to share our new partnership with @awscloud 🤝 We will work together on making AI open, accessible, and affordable for every company and individual! 👀💸 👉

Hugging Face and AWS partner to make AI more accessible

huggingface.co

16

25

192

Last Seen Profiles

@channiecaa

@WoolfJoshu35919

@Sillyshib

@Vox_Diaboli

@utkarshh_UT

@gigaswapfinance

@ShakeNBake33

@inpartiet_no

@S1_1Yk2ya

@DB_ORGdd

@HukumDan

@AlmightyUzumaki

@NoahWeisbord

@LessiMarcano

@p0nyplanet

@CutthroatInc29

@ErnestScheyder

@nanase_trmr

@HaslerDani61416

@lysstory

@yuujievents

@Jack_Sol_

@SDMIS69

@kuitaisat

@aoikokinchanda

@oolafffff

@IPSAthletics

@LatinFire_NSFW

@GervinTwittiot

@MahiroYukishiro

@SCAI_FI

@Tolkoto

@CinderellaKone

@OurGameMagazine

@LSDLOOKS

@_kimddeonuu

Philipp Schmid

@_philschmid

1 year

The first open-source ChatGPT alternative got released! 🚀 @togethercompute released a 20B chat-GPT model on Apache-2.0 🗣🆕 You can try it for free on Hugging Face. 😍 Demo: Model: Announcement:

45

399

2K

Philipp Schmid

@_philschmid

1 month

Casual Easter Monday with a huge gift from @OpenAI !🤯 They just released an old GPT-3.5 version. 😍 👉

119

203

1K

Philipp Schmid

@_philschmid

3 months

Gemma an open Gemini LLM released by Google! 🤯 @Google just released Gemma, their first open LLM based on Gemini, which outperforms Mistral AI 7B! 🤯 Gemma comes in 2 different sizes, 2B & 7B, and can be commercially used! 🔥 TL;DR; 🧮 2B & 7B parameter Instruction and base…

20

255

1K

Philipp Schmid

@_philschmid

3 months

Introducing Hugging Chat Assistant! 🤵 Build your own personal Assistant in Hugging Face Chat in 2 clicks! Similar to @OpenAI GPTs, you can now create custom versions of @huggingface Chat! 🤯 An Assistant is defined by 🏷️ Name, Avatar, and Description 🧠 Any available open…

36

243

869

Philipp Schmid

@_philschmid

4 months

What's the best way to fine-tune open LLMs in 2024? Look no further! 👀 I am excited to share “How to Fine-Tune LLMs in 2024 with Hugging Face” using the latest research techniques, including Flash Attention, Q-LoRA, @OpenAI dataset formats (messages), ChatML, Packing, all built…

20

174

856

Philipp Schmid

@_philschmid

1 year

New Open-source LLMs! 🤯 The Falcon has landed! 🦅 TII just released two new open-source LLMs called Falcon, which comes into size 7B trained on 1.5T tokens and 40B trained on 1T Tokens. 🚀🔥 7B: 40B:

23

156

779

Philipp Schmid

@_philschmid

11 months

The Falcon models are taking the open-source LLM space by storm! Falcon 🦅 offers commercial use through the Apache 2.0 license🔓 At @huggingface , we wrote a comprehensive blog post covering everything you need to know about the Falcon models. 👉 🧵 1/2

The Falcon has landed in the Hugging Face ecosystem

huggingface.co

11

153

690

Philipp Schmid

@_philschmid

1 year

Meet your new coding buddy!😱 We are excited to announce StarChat 💬 - an open-source ChatGPT-like model to answer all your coding questions🚀🎉 Trained on 40k+ conversations to help you code, debug & more in over 80 languages🌎

11

159

688

Philipp Schmid

@_philschmid

19 days

Easily Fine-tune @AIatMeta Llama 3 70B! 🦙 I am excited to share a new guide on how to fine-tune Llama 3 70B with @PyTorch FSDP, Q-Lora, and Flash Attention 2 (SDPA) using @huggingface build for consumer-size GPUs (4x 24GB). 🚀 Blog: The blog covers: 👨‍💻…

Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora

Learn how to fine-tune Llama 3 70b with PyTorch FSDP and Q-Lora using Hugging Face TRL, Transformers, PEFT and Datasets.

www.philschmid.de

18

153

681

Philipp Schmid

@_philschmid

4 months

We're excited to announce our partnership between @huggingface and @Google Cloud! 🤗 We will collaborate with Google to foster open AI innovation across open science, open-source, cloud, and hardware. 🧠 Read more: Why This Matters: Keeping AI Open,…

Hugging Face and Google partner for open AI collaboration

huggingface.co

54

124

564

Philipp Schmid

@_philschmid

9 months

It didn't take 48 hours for the open source community to fine-tune Code Llama to beat GPT-4 (March) version on Humal Eval! 🤯 👉 Model comes from the awesome @WizardLM_AI group! 🔥

13

103

577

Philipp Schmid

@_philschmid

10 months

LLaMA 2 released! 🚨🔔 @Meta just released LLaMa 2 the next iteration of LLaMA with a commercial-friendly license.🤯😍 LLaMA 2 comes in 3 different sizes, 7B, 13B, and 70B. The 7B & 13B use the same architecture as LLaMA 1 and are a 1-to-1 replacement for commercial use🔥 🧵1/4

4

162

573

Philipp Schmid

@_philschmid

1 month

Gemma can now code!🤯 🔔 @GoogleDeepMind just released Code Gemma, a collection of specialized open code models. Code Gemma comes in two different sizes 2B & 7B, excellent for on-device code completion. 🧑🏻‍💻 TL;DR; 🧮 2B & 7B with 8192k context 🛫 initialized from Gemma Base 🔠…

9

126

568

Philipp Schmid

@_philschmid

4 months

RAG or Fine-tuning 🤔 What's better? RAG? Fine-tuning? or a combination? @Microsoft created a detailed case study on RAG and fine-tuning for domain-specific applications here in the agricultural sector. 👩‍🌾 A must-read for everyone who wants to learn or refresh his knowledge on…

6

121

555

Philipp Schmid

@_philschmid

4 months

Transform Screenshots into HTML Code! The @huggingface multimodal team released Websight, a dataset of 823,000 pairs of website screenshots and HTML/CSS code. 🤯 Websight is designed to train Vision Language Models (VLMs) to convert images into code. The dataset was generated…

8

120

522

Philipp Schmid

@_philschmid

10 months

Open-source LLMs like Falcon, (Open-)LLaMA, X-Gen, or StarCoder, come a long way and can compete with models like GPT4 on certain use cases. 🥊 I'm excited to share a new blog on how to deply LLMs using @huggingface Inference Endpoints. 👉 🧵 1/3

Deploy LLMs with Hugging Face Inference Endpoints

huggingface.co

12

144

520

Philipp Schmid

@_philschmid

11 months

Introducing StarChat Beta β 🤖 Your new coding buddy 🙌Attention all coders and developers 💻 You can write in plain English, and it will understand your queries, offer explanations, and provide step-by-step guidance to solve coding problems 🤯 👉 🧵1/4

15

122

509

Philipp Schmid

@_philschmid

11 months

Open-source LLMs are behind commercial models when it comes to context length. 🔠 @OpenAI GPT-3.5 now has 16k, GPT-4 of 32k and @AnthropicAI Claude up 100k💪🏻 For example, Meta LLaMa or Falcon have only 2k😔 Here are two amazing blog posts I found in the last week .🚀😍 🧵 1/3

17

98

498

Philipp Schmid

@_philschmid

3 months

Code Llama 70B is here!🤯🧑🏻‍💻 @AIatMeta just released CodeLlama 70B, the latest and biggest iteration of Code Llama! 🚀 Code Llama 70B achieves 67.8% on HumanEval, reaching the initial GPT-4 performance! 👨‍🏫 Key facts✨: 🧮 70B parameter initialized from Llama 2 🔠 Trained on 500B…

9

99

490

Philipp Schmid

@_philschmid

1 year

📣 Exciting News! Falcon Models from TII are now under the Apache 2.0 License! 🚀 🔓 You can now leverage the best-performing open source models in commercial projects. 🙌 🦅 👉

8

93

478

Philipp Schmid

@_philschmid

1 year

New open-source chat-GPT model alert! 🚨 @togethercompute released a new version of their chatGPT-NeoX 20B model with higher quality by fine-tuning on user feedback. 🚀🔥 Demo: Model:

4

94

449

Philipp Schmid

@_philschmid

2 years

6 months ago, EleutherAI released GPT-J 6B, an open-source alternative to GPT-3🧠 But deploying it was very challenging until today📈 Check out my new blog on how to deploy GPT-J using Transformers and Amazon SageMaker🚀🤯 👉🏻 👉🏻

4

71

447

Philipp Schmid

@_philschmid

1 year

🚨Attention #NLP enthusiasts! We just published a new blog post on how to fine-tune FLAN-T5-XXL using DeepSpeed & Hugging Face Transformers! 🚀 👉 We ran a series of experiments to help you choose the right hardware setup.🤖💻

Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers

Learn how to fine-tune Google's FLAN-T5 XXL using DeepSpeed & Hugging Face Transformers.

www.philschmid.de

7

84

448

Philipp Schmid

@_philschmid

4 months

Teaching LLMs new knowledge or a language requires a lot of training and can lead to “forgetting” its previous skills! Maybe not anymore. 🤯 New research from Tencent shows that you can expand the knowledge of LLMs by making it “bigger” without making it forget its previous…

13

84

441

Philipp Schmid

@_philschmid

4 months

We got a late Christmas gift from @Microsoft ! 🎁🤗 Microsoft just changed the license for their small LLM phi-2 to MIT! 🚀 Phi-2 is a 2.7 billion parameter LLM trained on 1.4T tokens, including synthetic data, achieving 56.7 on MMLU, outperforming Google Gemini Nano. TL;DR: 🧮…

8

65

432

Philipp Schmid

@_philschmid

4 months

Introducing Clipper: The easiest way to convert HTML to Markdown for RAG applications! 🚀 Clipper is a CLI tool that simplifies the process of clipping content from web pages and converting it to Markdown format. With Clipper, you can build markdown datasets for training your…

7

71

430

Philipp Schmid

@_philschmid

2 months

Introducing embedding quantization!💥 A new technique to quantize embeddings to achieve up to 45x faster retrieval while keeping 96% accuracy on open Embedding Models. This will help scale RAG Application! 🚀 TL;DR: 📝 🔥 Binary quantization: 32x less storage & up to 45x faster…

12

67

418

Philipp Schmid

@_philschmid

11 months

Hugging Face LLM Inference Container now supports Falcon 7B and Falcon 40B deployments on Amazon SageMaker🦅🚀 Falcon is the best performing open source LLM available today for commercial use under the Apache 2.0 license! 🤑 👉

Deploy Falcon 7B & 40B on Amazon SageMaker

Learn how to deploy Falcon 40B to Amazon SageMaker using the new Hugging Face LLM Inference DLC.

www.philschmid.de

11

79

408

Philipp Schmid

@_philschmid

10 months

Thrilled to share a new blog post on how to fine-tune LLaMa 2 with QLoRA and Hugging Face on Amazon SageMaker🧑🏻‍💻 The blog post includes instructions for 7B, 13B, and 70B versions of the Model alongside Hardware requirements. 🖲 👉

Fine-tune LLaMA 2 (7-70B) on Amazon SageMaker

Learn how to train LLaMa 2 using QLoRA Hugging Face Transformers on Amazon SageMaker

www.philschmid.de

11

104

408

Philipp Schmid

@_philschmid

1 month

On-Device 2B LLMs for actions, outperform GPT-4 🤯 The “Octopus v2: On-device language model for super agent” proposes a new method to create on-device agents. 📱🔄 Implementation 1️⃣ Define supported functions as special tokens, e.g. <func_1> and add them to the tokenizer 2️⃣…

10

110

407

Philipp Schmid

@_philschmid

6 months

Using LLMs (GPT-4) as an evaluator for smaller models is becoming the de facto standard. However, relying on closed-source models is suboptimal due to missing control, transparency, and versioning 🤔 The recent paper shows that open LLMs match GPT-4 evaluation skills 🚀 🧶

18

69

400

Philipp Schmid

@_philschmid

2 months

How are open LLMs trained and created in 2024? 🤔 @01AI_Yi just released their paper on how they created the YI, a family of LLMs and V-LLMs. The paper includes details on the data processing, training and multimodality part. Let's take a look👀 First Data: How does their data…

6

88

401

Philipp Schmid

@_philschmid

20 days

Data is all we need! 👑 Not only since Llama 3 have we known that data is all we need. Excited to share 🍷 FineWeb, a 15T token open-source dataset! Fineweb is a deduplicated English web dataset derived from CommonCrawl created at @huggingface ! 🌐 TL;DR: 🌐 15T tokens of cleaned…

14

87

393

Philipp Schmid

@_philschmid

2 years

Serverless Deep Learning enters the next chapter📘 Amazon SageMaker Serverless Inference is a new fully managed serverless inference option🚀 Learn how to deploy @hugigingface transformers serverless in 1 line of code🤯 🧑🏻‍💻 📘

7

61

386

Philipp Schmid

@_philschmid

26 days

We can do it! 🙌 First open LLM outperforms @OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯 TL;DR; 🧮 Mixtral 8x22B based (141B-A40 MoE) 🔓 Apache 2.0 license 🤖 First > 9.00 on MT-Bench with an open LLM 🧬 Used multi-step…

13

79

391

Philipp Schmid

@_philschmid

1 year

OpenAssistant released their Conversational dataset (OASST1) under Apache 2.0. 🤯😍 The dataset includes: 💭 161,443 messages 🌳 66,497 conversation trees 🇺🇸 35 different languages and was created by 13,500 volunteers. 🤗 👉

OpenAssistant/oasst1 · Datasets at Hugging Face

huggingface.co

1

116

386

Philipp Schmid

@_philschmid

1 year

Training FLAN-T5-XXL (11B) on a single consumer-size GPU impossible? 🤔 No, not anymore!! 🤯 We created a blog post on how to train FLAN-T5-XXL on a single GPU using LoRA. 🥳 👉

Efficient Large Language Model training with LoRA and Hugging Face

Learn how to fine-tune Google's FLAN-T5 XXL on a Single GPU using LoRA And Hugging Face Transformers.

www.philschmid.de

6

79

374

Philipp Schmid

@_philschmid

3 months

Am I the only one who thinks @OpenAI used @UnrealEngine or something similar to create synthetic data? The Sora videos look like videos games to me.

33

14

368

Philipp Schmid

@_philschmid

1 month

We are lowering the prices for Compute on Hugging Face by up to 50%!🤯 Yes, you heard it right @huggingface Spaces & Inference Endpoints are now, on average, 20% cheaper than AWS EC2 on-demand! 🤑 We were working hard to unify and improve our infrastructure to reduce our and…

10

73

368

Philipp Schmid

@_philschmid

5 months

OpenChat is currently one of the best open @ChatGPTapp alternatives. 🚀 The team ( @alignment_lab ) behind OpenChat released the paper on how they achieved ChatGPT (GPT-3.5) performance. The performance of OpenChat comes from its strong base model ( @MistralAI 7B LLM) and in…

6

69

363

Philipp Schmid

@_philschmid

18 days

Phi-3 mini model released under MIT! 🚀 Last Week Llama 3, this week Phi-3 🤯 @Microsoft Phi-3 comes in 3 different sizes: mini (3.8B), small (7B) & medium (14B). Phi-3-mini was released today, claiming to match Llama 3 8B performance! 🚀 3.8B TL;DR: 2️⃣ Instruct Versions with 4k…

13

97

364

Philipp Schmid

@_philschmid

2 months

Introducing StarCoder 2 ⭐️ The most complete open Code-LLM 🤖 StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance…

11

80

364

Philipp Schmid

@_philschmid

10 months

Are Vector Databases here to stay? 🔍 Yes, it seems LLMs are lost in the Middle and lost focus on long inputs.🗺👁‍🗨 In “Lost in the Middle: How Language Models Use Long Contexts,” researchers from Stanford tried to understand better how LLMs make use of the context📚✨ 🧵1/5

9

84

351

Philipp Schmid

@_philschmid

3 years

Today is the day. After almost 8 months first time at the @huggingface office ever. 🧑🏻‍💻🏢 All of them exists I can confirm Hugging Face isn’t an AGI 🤯🤗

6

12

355

Philipp Schmid

@_philschmid

3 months

2.4x faster generation with Llama 70B and new NVIDIA AI XQA-kernel! 🤯 @NVIDIAAI just open-sourced a new GPU kernel (XQA) optimization for MQA and GQA during the generation phase. 🚀 With XQA enabled, NVIDIA could boost the throughput of Meta Llama 70B on H200 (🆕) from 1227…

13

62

349

Philipp Schmid

@_philschmid

6 months

Whisper just got smaller, faster!🔔 The audio team at @huggingface released DistilWhisper, a distilled version of @OpenAI Whisper🧪 DistilWhisper is a drop-in replacement for Whisper on English speech recognition being 5.8 times faster and accuracy within 1% WER🤯 🧶

4

70

340

Philipp Schmid

@_philschmid

3 months

Can LLMs solve complex problems like humans?🧠 SELF-DISCOVER from Google Deepmind proposes a new framework that teaches itself to think critically and step-by-step, mimicking human reasoning! It's boosting LLMs problem-solving skills by up to 32% 🧮 Implementation: Stage 1: 1️⃣…

7

89

337

Philipp Schmid

@_philschmid

1 year

Have you heard the news? 🗞️ Not one but two new open-source LLMs have been released! 🌍 @MosaicML and @togethercompute released new 7B LLM models under the Apache 2.0 license. 🤯 Available on Hugging Face: Together: MosaicML:

6

70

332

Philipp Schmid

@_philschmid

1 year

Introducing StarCoder ⭐️ a 15B open-source Code-LLM created by @huggingface and @ServiceNow through @BigCodeProject 🔡 8192 token context window 📊 trained on 1 trillion token 💭 80+ Programming languages 🔐 only permissive licensed data ✅ commercial use

bigcode/starcoder · Hugging Face

huggingface.co

4

74

328

Philipp Schmid

@_philschmid

2 months

GPU Poor, no more you are.✨ We are excited to announce “Train on DGX” Cloud on @huggingface to train open LLMs on one or more @nvidia H100 and L40S. 🤯 Train on DGX Cloud is now available to every Enterprise Hub organization! 🏙️ Train on DGX: 🚂 Is powered by Hugging Face…

6

61

328

Philipp Schmid

@_philschmid

7 months

Llama 2 outperforms GPT-3.5 on long contexts! 🤯 @AIatMeta silently released Long Llama, a series of new Llama 2 models with up to 32k context. 🏆 Long Llama 2 70B is on par with @OpenAI GPT-4 for summarization and outperforms GPT-3.5 16k on 7/10 long context tasks! 🧶

11

65

315

Philipp Schmid

@_philschmid

5 days

If you are using Whisper for transcription, listen⁉️👂We created an optimized Whisper with Speaker Diarization for @huggingface Inference Endpoints 🤗 We created a reference implementation that optimizes Whisper with Flash Attention and Speculative Decoding and combines it with…

9

57

320

Philipp Schmid

@_philschmid

2 months

Zehpyr 7B Gemma released!🔷🔶 We are excited to announce a Zephyr Gemma, the best-fine-tuned version of @Google Gemma 7B. Outperforming Google Gemma Instruct on 5 out 6 benchmarks, including MT Bench, AGIEval & GPT4All. 🤯🚀 Zephyr Gemma TL;DR; 🧠 Fine-tuned Gemma 7B on DEITA…

5

79

311

Philipp Schmid

@_philschmid

6 months

How can we teach LLMs to be factual, correct, and more reliable? 🤔 RAG is one approach to adding information to the prompt. But, always retrieving can lead to bad responses😔 Self-RAG proposes a new method to teach LLMs when to retrieve information and how to use it.🤯 🧶

6

53

313

Philipp Schmid

@_philschmid

15 days

Llama 3 extended to almost 100,000-token context! ✅ By Combining PoSE and continuing pre-training on Llama 3 8B base for 300M tokens, the community ( @winglian ) managed to extend the context from 8k to 64k. 🚀 Applying rope scaling afterward led to a supported context window of…

7

56

311

Philipp Schmid

@_philschmid

9 months

Code Llama with @huggingface 🤗 Yesterday, @MetaAI released Code Llama, a family of open-access code LLMs! Today, we release the integration in the Hugging Face ecosystem🔥 Models: 👉 blog post: 👉 Blog post covers how to use it!

Code Llama: Llama 2 learns to code

huggingface.co

7

80

298

Philipp Schmid

@_philschmid

8 months

Falcon 180B released! 🚨🦅 @TIIuae just their new Falcon model which beats OpenAI GPT-3.5 and is on par with Google's PaLM-2 Large👑 With 180 billion parameters, Falcon 180B is 2.5x larger than Llama 2 and was trained on 4x more compute🤯 👉 🧶

Spread Your Wings: Falcon 180B is here

huggingface.co

12

104

304

Philipp Schmid

@_philschmid

6 months

Zephyr code released! The @huggingface H4 team has just released the code for Zephyr-7b, which was trained with Direct Preference Optimization.🚀 👉 The handbook contains robust recipes for: 📚 Supervised fine-tuning 🎯 Direct preference optimization

GitHub - huggingface/alignment-handbook: Robust recipes to align language models with human and AI...

Robust recipes to align language models with human and AI preferences - huggingface/alignment-handbook

github.com

6

66

300

Philipp Schmid

@_philschmid

4 months

In RAG, retrieving the right information is and will become crucial for success with powerful LLMs. This leads to how we can improve embedding models for my queries and data? 🤔 A new paper by Microsoft, “Improving Text Embeddings with Large Language Models,” proposes using LLMs…

5

56

300

Philipp Schmid

@_philschmid

11 months

New open-source LLMs! 🔔 @salesforce released XGen 7B, a new LLM with an 8k context under Apache 2.0🔓 XGen has the same architecture as @MetaAI LLaMa ca be used 1-to-1 replacement for commercial use! 🔥 👉 🧵 1/2

5

73

293

Philipp Schmid

@_philschmid

1 month

Serverless LLMs for Developers!🌪️ We are excited to announce "Deploy on @Cloudflare Workers AI" on @huggingface enabling developers to easily use open LLMs as serverless APIs powered by Cloudflare's edge GPU data centers. 🚀😍 It’s the first integration of our partnership with…

12

67

292

Philipp Schmid

@_philschmid

7 months

How can we improve reasoning and reduce hallucinations of LLMs? 🤔 @GoogleDeepMind and a group of researchers propose a prompting technique, “Hypotheses-to-Theories (HtT)” to teach LLMs rules to improve reasoning and reduce hallucination. 🧠 🧶

6

71

288

Philipp Schmid

@_philschmid

1 month

New open model from @MistralAI ! 🧠 Yesterday night, Mistral released Mixtral 8x22B a 176B MoE via magnet link. 🔗🤯 What we know so far: 🧮 176B MoE with ~40B active 📜 context length of 65k tokens. 🪨 Base model can be fine-tuned 👀 ~260GB VRAM in fp16, 73GB in int4 📜 Apache…

11

67

291

Philipp Schmid

@_philschmid

3 months

Code Llama 🦙 + Web Search 🌐 = 10x dev 🧑🏻‍💻! Try on @huggingface for free!! 🤗

8

51

288

Philipp Schmid

@_philschmid

6 months

How can you evaluate LLMs?🤔 Two approaches are human evaluations and using LLMs as a judge👩‍⚖️ As human evaluation is expensive, I wrote a hands-on blog post using the LLMs to evaluate RAG and other applications using @huggingface and @LangChainAI 👉 🧶

Evaluate LLMs and RAG a practical example using Langchain and Hugging Face

Learn how to evaluate LLMs and RAG pipelines using Langchain and Hugging Face

www.philschmid.de

3

55

283

Philipp Schmid

@_philschmid

1 year

End of last year, Google open-sourced FLAN-T5, a better T5 model in any aspect We created a new repository on Hugging Face, which implements a custom handler for inference Endpoints that allows you to deploy FLAN-T5-XXL on a single A10G GPU. 🤯😍 👉 🧵:

6

51

284

Philipp Schmid

@_philschmid

2 months

Can we make RAG applications more robust with fine-tuning? A paper by @Microsoft and UC Berkley put this to the test to see if small open LLMs, like @AIatMeta Llama 7B, can match @OpenAI GPT-3.5. They called it “Retrieval Augmented Fine Tuning (RAFT)”, where you train an LLM…

5

62

275

Philipp Schmid

@_philschmid

2 months

Elon Musk kept his word and released Grok-1🤯 Grok-1 is a 314B big Mixture-of-Experts (MoE) transformer. 🧐 What we know so far: 🧠 Base model, not fine-tuned ⚖️ Apache 2.0 license 🧮 314B MoE with 25% active on a token 📊 According to the initial announcement; 73% on MMLU,…

8

68

277

Philipp Schmid

@_philschmid

1 year

Multi-query attention (MQA) is gaining popularity in LLMs. With the release of StarCoder 14B and Falcon 7B/40B, we have two accessible LLMs using it 🔥 MQA shares key and value matrices across attention heads, which enables to generate longer text using less memory. 🧠 🧵

5

51

274

Philipp Schmid

@_philschmid

1 year

Generative AI to power the Next-Gen Document Understanding 📄🤖 Say goodbye to traditional OCR and welcome Donut, an MIT-licensed Generative AI model that processes your documents directly. 🤖🥇 👉

Generative AI for Document Understanding with Hugging Face and Amazon SageMaker

Learn how to fine-tune Donut-base a Generative AI model for document-understand/document-parsing using Hugging Face Transformers and Amazon SageMaker.

www.philschmid.de

7

52

272

Philipp Schmid

@_philschmid

1 year

🎯 Goal: Summarize chats & dialogues ✏️ Content: Fine-tune @GoogleAI FLAN-T5 for summarization ✅ Result: Summarize ChatGPT dialogues with @huggingface Transformers 📔Blog:

Fine-tune FLAN-T5 for chat & dialogue summarization

Learn how to fine-tune Google's FLAN-T5 for chat & dialogue summarization using Hugging Face Transformers.

www.philschmid.de

10

66

269

Philipp Schmid

@_philschmid

5 days

New MoE alert! 🔔 DeepSeek V2 Chat just got released, a 236B parameter Mixture of Experts Model with a 128k context window and 21B active parameter. 🏎️⚡️ TL;DR 🧮 236B parameters with 21B activated during generation 👨‍🏫 160 experts with 6 active in generation 🚀 Matches Mixtral…

6

56

276

Philipp Schmid

@_philschmid

2 months

GaLore is a new Memory Efficient Fine-tuning Technique, for “full-tuning” billion-parameter models, like Llama 2 7B on consumer-size GPUs. In contrast to LoRA, GaLore reduces the memory by projecting the optimizer states and gradients into a lower-dimensional. 🤯 TL;DR 📝 🚀…

3

61

269

Philipp Schmid

@_philschmid

3 months

Can a 2B LLM outperform Mistral 7B or Llama 13B? Creators of the popular Ultrafeedback dataset released MiniCPM, a 2.4B parameter model claiming performance close to Mistral 7B, Llama 2 13B, or Falcon 40B. 🤯🤔 As part of the release, the researchers released a detailed…

8

54

265

Philipp Schmid

@_philschmid

23 days

Llama 3 released! 🚨🔔 @AIatMeta just released their best open LLM! 👑🚀 Llama 3 is the next iteration of Llama with a ~10% relative improvement to its predecessor! 🤯 Llama 3 comes in 2 different sizes 8B and 70B with a new extended tokenizer and commercially permissive license!…

7

63

266

Philipp Schmid

@_philschmid

30 days

Mixture of Adapters? 🤔 PHATGOOSE proposes a new method to combine Adapters (LoRA) into a single MoE-like architecture with gating and routing mechanism to experts (fine-tunes). Implementation 1️⃣ Select or train a collection of fine-tuned adapters (LoRA) with the same base model…

8

54

265

Philipp Schmid

@_philschmid

2 months

Gemma fine-tuning with ChatML works! ☑️ I created a minimal example script to fine-tune Gemma 7B on the Dolly dataset using TRL, PEFT, Flash Attention with LoRA, and the @OpenAI chatML format. 🫡 It should help others to be unblocked. Let me know if it's too minimalistic I…

9

45

262

Philipp Schmid

@_philschmid

2 months

Introducing OpenHermesPreferences 1M! 🦋 We just released the largest open AI preference dataset on Hugging Face! 🤯 Together with @argilla_io , we extended the OpenHermes ( @Teknium1 ) dataset into a pair-wise comparison dataset for RLHF and DPO. 🧠 Dataset: 📦 Size: ~1 million AI…

6

56

257

Philipp Schmid

@_philschmid

2 years

Transformers are changing machine learning, starting with NLP, and now, with audio and computer vision💬👄 👀 You can now use the Hugging Face Inference DLC to do automatic speech recognition using wav2vec2 model or WavLM🤯 🖼 📈

3

59

254

Philipp Schmid

@_philschmid

5 months

What if you can improve LLMs using direct customer feedback? 🤔 Most alignment methods like RLHF or DPO require multiple outputs from the same prompt to improve the model by learning the preferred one. 👀 In real-world use cases, you oftentimes only have the option to collect…

3

59

252

Philipp Schmid

@_philschmid

11 months

"grouped-query attention" (GQA) from Google proposes a method to convert models from multi-head attention (MHA) to GQA 🧠 GQA claims to offer similar benefits to multi-query attention (MQA) with faster inference via reduced # key-value heads.🤯🛫 👉 🧵1/3

3

59

248

Philipp Schmid

@_philschmid

17 days

First open LLM from @SnowflakeDB ! Arctic is 480B Dense-MoE with a 10B dense transformer model and a 128x3.66B MoE MLP designed specifically for enterprise AI. 🤔 TL;DR: 🧠 480B parameters with 17B active during generation 👨‍🏫 128 experts with 2 active in generation 2️⃣ Instruct…

12

52

252

Philipp Schmid

@_philschmid

5 months

🚨Never trust marketing content🚨 Fixed the results of @GoogleAI Gemini Ultra on MMLU. Details: But yes Gemini Ultra > GPT-4 on CoT @32 according to the report.

15

34

248

Philipp Schmid

@_philschmid

4 months

Can LLMs improve themselves? Self-play fine-tuning (SPIN) is a new method that promises to enhance the performance of LLMs without needing additional human-annotated data beyond the original fine-tuning dataset. 🎮 The paper aims to improve an LLM by having it iteratively…

5

55

243

Philipp Schmid

@_philschmid

8 months

Not, Yet another RoPE extensioN method! 🙄 But listen, “YaRN” allows you to scale LLMs like llama 2 to over 100k context! 🤯 Llama 2 13B 128k on🤗 👉 The code and the Paper: 👉

4

52

244

Philipp Schmid

@_philschmid

5 months

Have you heard of Mixture of Experts (MoE) models? 🤔 With the release of @MistralAI Mixtral 8x7B, MoEs are gaining attention, it is also rumored @OpenAI GPT-4 is an MoE👀 But what exactly are MoEs, and how do they work? We created an in-depth blog.

Mixture of Experts Explained

huggingface.co

0

63

242

Philipp Schmid

@_philschmid

11 months

Do we need RL to align LLMs with Human feedback? 🔍👀 Last week, @Stanford researchers unveiled a paper introducing Direct Preference Optimization (DPO) - a new algorithm that could change the way we align LLMs with Human Feedback 🧵 1/3

3

63

243

Philipp Schmid

@_philschmid

2 years

🌸 BLOOM is here, available and accessible for everyone. 🤗🤝 Get yourself a account and try it yourself: 👉 At a glance : 🔢 176 billion parameter 🌍 59 languages 🔓 Open-Access or read more about 👉

5

51

236

Philipp Schmid

@_philschmid

10 months

Is Llama 2 special or just a better iteration of Llama 1? 🤔 Over the weekend, I had time to read the paper in which Meta released. 📖 Below are some of my findings, which you might have missed📝 🧵 1/6

4

52

235

Philipp Schmid

@_philschmid

26 days

Introducing Idefics2, the strongest Vision-Language-Model (VLM) < 10B! 🚀 Idefics2 comes with significantly enhanced capabilities in OCR, document understanding, and visual reasoning. 💬📄🖼️ TL;DR; 📚 8B base and instruction variant 🖼️ Image + text inputs ⇒ Text output 📷…

2

58

240

Philipp Schmid

@_philschmid

5 months

We just got more details on Mixtral 8x7B from @MistralAI 🧠 Mixtral is sparse mixture of expert models (SMoE) with open weights outperforming existing open LLMs like Meta Llama 70B.🤯 💪🏻 TL;DR: ⬇️

2

55

237

Philipp Schmid

@_philschmid

1 year

FLANv2 dataset is available on Hugging Face! 🚨 Hugging Face user, SirNeural uploaded a replica of the FLAN dataset, which was used to fine-tune the FLAN-T5 models to the Hugging Face Hub🤗😍 👉 let's train more instruction fine-tuned models! 📈🔥

SirNeural/flan_v2 · Datasets at Hugging Face

huggingface.co

3

55

236

Philipp Schmid

@_philschmid

5 months

Access to GPUs is becoming increasingly difficult, especially for fine-tuning your own LLMs, like llama or mistral.😔 Happy to share support for open LLMs on @awscloud Trainium.⚡️ 👉 You can now easily fine-tune LLMs using Hugging Face Optimum Neuron…

Fine-tune Llama 7B on AWS Trainium

In this blog post you will learn how to fine-tune Llama 7B on AWS Trainium using the Hugging Face Optimum Neuron library.

www.philschmid.de

2

52

234

Philipp Schmid

@_philschmid

2 months

Can ORPO redefine how we train and align LLMs for RLHF? 🤔 State-of-the-art LLMs followed the process of Base Model → Supervised Fine-tuning → RLHF (PPO/DPO). This is very resource-intensive and complex. 😒 Odds Ratio Preference Optimization (ORPO) proposes a new method to…

2

66

235

Philipp Schmid

@_philschmid

4 years

I am very honored that my notebook is now an official part of transformers by @huggingface . 🥳🥰🤗 Learn "How to fine-tune a non-English GPT-2 Model with Trainer class".

7

42

235

Philipp Schmid

@_philschmid

6 months

Did @MSFTResearch leak the parameter count of @OpenAI GPT-3.5 turbo?🤯 According to „CodeFusion: A Pre-trained Diffusion Model for Code Generation“ paper gpt-3.5-turbo has only 20B parameter. Paper:

14

37

232

Philipp Schmid

@_philschmid

1 month

Jamba released! @AI21Labs just released the first production-scale Mamba implementation! Jamba is a hybrid SSM-Transformer MoE rivaling open transformer-based LLMs 🚀 TL;DR: 🧠 52B parameters with 12B active during generation 👨‍🏫 16 experts with 2 active in generation 🆕 New…

5

43

235

Philipp Schmid

@_philschmid

3 months

New Embedding Models for Code released by @awscloud ! Embedding Models are at the heart of every RAG application. Without good embeddings, retrieving relevant context to answer your user prompts is impossible. 🔍 Super exciting to see Amazon release CodeSage, a family of open…

5

38

232

Philipp Schmid

@_philschmid

2 months

Chunking or Splitting your documents correctly is crucial for good RAG applications. 🥇 Without the right concept, you might not be able to retrieve the correct information based on your user input. 🧭 To help you better understand how your documents are split using, e.g.,…

4

44

233

Philipp Schmid

@_philschmid

3 months

Introducing Messages API with @OpenAI compatibility for @huggingface Inference Endpoints and Text Generation Inference! 🚀 The new API can be directly used with OpenAI's client libraries or third-party tools, like @LangChainAI or @llama_index . 🖼️🦜🦙 Migrating from closed…

8

47

228