Thomas Wolf @Thom_Wolf Twitter profile

Last Seen Profiles

@Itsuki_Hikarua

@RBLI

@rvnnumenna

@_R4V3N5_

@PanjLoL

@LyaSpicy

@pdxssw

@ContraKat

@KOB4

@Portellinhaaa_

@yeet_frog

@SinceroJordan

@StoutsPub

@DavidKeever

@dpvh64y9pCrFQON

@Sauciegal

@Leo_Santo5

@sucralose__

@YanaUfelman

@PodimoMX

@marusanatomy

@CapitalHigh_MBB

@KatyTigerMedia

@chzeritz

@alnurulg

@echodagger

@plasticherbs

@Kelly_RDC

@mkhoops

@hiwigOooagain

@BridgerDav63664

@cursnd

@TheKMGroup

@Guido_Scherp

@nikkikolls

@dinotv

Thomas Wolf

@Thom_Wolf

2 years

we need a billionaire bullish enough to spend 13b per year on nuclear fusion like zuck is doing on the metaverse

75

207

2K

Thomas Wolf

@Thom_Wolf

5 years

🔥Pytorch-Transformers 1.0🔥 Six NLU/NLG architectures: BERT, GPT, GPT-2, Transfo-XL, XLNet, XLM Total: 27 pretrained models Still the same -Superfast onboarding -SOTA scripts: GLUE, SQuAD, Text generation New -Unified API -Access hidden-states, attentions... -Torchscript -...

37

590

2K

Thomas Wolf

@Thom_Wolf

3 years

Authors have no say on the animal O'Reilly choose for the cover of their book But I'm really happy that they chose a parrot🦜 for the cover of the book on Transformers we are finalising with Lewis and Leandro It's a Coconut Lorikeet parrot (a very stochastic Coconut Lorikeet😉)

30

200

2K

Thomas Wolf

@Thom_Wolf

9 days

Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation…

Guilherme Penedo

@gui_penedo

9 days

We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!

38

332

1K

24

301

2K

Thomas Wolf

@Thom_Wolf

2 years

I read a lot of books this year to broaden my horizons in AI/ML with adjacent or complementary disciplines. It was a great pleasure so I’m sharing some my reading list here with a couple of notes: [1/12]

23

290

2K

Thomas Wolf

@Thom_Wolf

5 years

🤗Transformers 2.0💥 State-of-the-art NLP in TensorFlow 2.0/PyTorch 8 architectures 33 trained models 102 lang. Seamlessly pick the right framework for training, eval, deploy Train on TPU ⏩ finetune/test in PyTorch ⏩ serve w. TFX 🍒Keras magic: train SOTA model in 10 lines👇

19

401

2K

Thomas Wolf

@Thom_Wolf

4 years

Surviving every AI wave, two kernels have consistently been the beating hearts of Natural Language Processing: Datasets and Metrics Today we release "nlp", a library to easily share & load data/metrics already providing access to 99+ datasets! Try it👉

17

408

2K

Thomas Wolf

@Thom_Wolf

2 months

playing with a basic, fully-local and open-source speech-to-text-to-speech pipeline on my mac less than 120 lines of code to chain local whisper + Zephyr (in LM studio) + an Openvoice TTS … latency is 1.5-2.5 sec on an M3. already quite impressed how all…

31

144

1K

Thomas Wolf

@Thom_Wolf

1 month

[75min talk] i finally recorded this lecture I gave two weeks ago because people kept asking me for a video so here it is, enjoy "The Little guide to building Large Language Models in 2024" tried to keep it short and comprehensive – focusing on concepts that are crucial for…

12

243

1K

Thomas Wolf

@Thom_Wolf

2 years

Just received my first physical copy of our book & the feeling is... surreal One year and a half in the making & I'm amazingly proud with the result It covers so much ground from NLP w/o labels up to training billion param models, multilinguality, pruning, classif, generation..

54

130

1K

Thomas Wolf

@Thom_Wolf

3 years

A few stuff not a lot of people know about HuggingFace: -🤗is a very small team less than 30 -🤗transformers GH stars are growing faster than legends like PyTorch, will probably pass it in 2021 -Open-source/-science is even more 🤗DNA than ppl think -🤗is cash-flow positive today

27

92

1K

Thomas Wolf

@Thom_Wolf

5 years

Currently working on the coming NAACL "Transfer Learning in NLP" tutorial with @seb_ruder @mattthemathman and @swabhz . Pretty excited! And I've discovered you can write a Transformer model like GPT-2 in less than 40 lines of code now! 40 lines of code & 40 GB of data...

15

279

1K

Thomas Wolf

@Thom_Wolf

6 years

I've spent most of 2018 training models that could barely fit 1-4 samples/GPU. But SGD usually needs more than few samples/batch for decent results. I wrote a post gathering practical tips I use, from simple tricks to multi-GPU code & distributed setups:

11

370

1K

Thomas Wolf

@Thom_Wolf

5 years

With 180+ papers mentioning 🤗 Transformers and its predecessors, it was high time to put out a real paper that people could cite. 🥳 🎉 With @LysandreJik @SanhEstPasMoi @julien_c @ClementDelangue @moi_anthony @pierrci @remilouf @MorganFunto @jamieabrew

11

269

1K

Thomas Wolf

@Thom_Wolf

4 years

They are loosely connected to what I’m working on these days but these three books are still very clearly the most enjoyable read I’ve had since I joined the field. What a pleasure it was to read them!

21

107

1K

Thomas Wolf

@Thom_Wolf

4 months

The progressive rise of open (source/access) AI models back from the ashes in 2023 This will be remembered as one of the most remarkable change in the AI field of our year

17

194

906

Thomas Wolf

@Thom_Wolf

4 years

There is a bit of magic in the new 🤗nlp library besides giving dead-simple access to 120+ datasets🧙‍♂️ We've tested it with @qlhoest and loading a 17GB+ dataset like English Wikipedia only takes... 9MB in RAM🐣 And you can iterate over the data at 2-3 Gbit/s🚀 Try it yourself👇

12

223

1K

Thomas Wolf

@Thom_Wolf

5 years

I’m working on a series of mini tutorials for a wider NLP audience. 🤗 Transformers can be intimidating and I’d like to show that you can get ~SOTA results in 10 lines of code on tasks such as text/tokens/words classification, question answering, maybe generation. Other topics?

66

104

1K

Thomas Wolf

@Thom_Wolf

5 years

Pytorch-bert v0.6 is out with OpenAI's pretrained GPT-2 🦄 small model & the usual accompanying example scripts to use it. Now... can you guys wait that the ACL deadline has passed to release any crazy new transformer? 😅 Thanks, you are the best! 🥰

18

262

971

Thomas Wolf

@Thom_Wolf

5 years

Interesting developments happened in 2018/2019 for natural language generation decoding algorithms: here's a thread with some papers & code So, the two most common decoders for language generation used to be greedy-decoding (GD) and beam-search (BS). [1/9]

12

292

954

Thomas Wolf

@Thom_Wolf

4 years

I often meet research scientists interested in open-sourcing their code/research and asking for advice. Here is a thread for you. First: why should you open-source models along with your paper? Because science is a virtuous circle of knowledge sharing not a zero-sum competition

2

284

922

Thomas Wolf

@Thom_Wolf

5 years

Our PyTorch BERT is on pip! I took extra care to make it both easy to use and modular. Uses @ai2_allennlp file caching technique to download/cache/load Google's pretrained models Includes 6 PyTorch models with various architectures, tokenizer & optimizer 👉

7

277

911

Thomas Wolf

@Thom_Wolf

6 months

Over the past weeks the H4 team has been busy pushing the Zephyr 7B model to new heights 🗻 The new version is now topping all 7b models on chat evals and even 10x larger models 🤯🔥 Here are the intuitions on it 1/ Start with the strongest pretrained model you can find:…

25

184

892

Thomas Wolf

@Thom_Wolf

1 year

So this week we've finally released 💫 StarCoder () with @BigCodeProject StarCoder is the first large model (15B) which is both high performance (beating the like of PaLM, LLaMa, CodeGen or OpenAI code-crushman-001 on code generation) and also trained…

29

170

871

Thomas Wolf

@Thom_Wolf

2 years

Super happy the kindle version of our book is finally out🔥 & paper versions are being printed as I speak 🤩 We made a homepage with updated news at And we’ve open sourced all the code of the book (it's a lot!): #transformersbook

20

163

867

Thomas Wolf

@Thom_Wolf

1 year

There are completely mind-blowing examples in the GPT4 "sparks of AGI" study

29

102

846

Thomas Wolf

@Thom_Wolf

5 years

PT-BERT 0.5 out💥 Pretty big release w. not 1 but TWO new pretrained models: -classic: OpenAI's GPT -brand-new: Transformer-XL by Google/CMU As always both should be super easy to use So...BERT now stands for Big-&-Extending-Repository-of-Transformers😅 Happy Transfer Learning!

7

239

813

Thomas Wolf

@Thom_Wolf

1 year

two years ago the entry point to participate in AI research was to have a couple of GPUs for training or finetuning now the entry level is to be able to regularly train a 50-70B params model on a couple hundred billion tokens

47

81

803

Thomas Wolf

@Thom_Wolf

3 years

I’m not here to solve AGI in 5 years whatever it might be. I’ve read enough AI & neuroscience. I know we’re still years away I’m here to make the research communities healthier, fighting for more collaboration and sharing The journey is even more important than the destination

13

68

809

Thomas Wolf

@Thom_Wolf

4 years

So I've made a new multimodal ML coding exercise & I'm so excited about it that I want to blog/share it w. everyone... but I can't because then it won't be a hiring test anymore 😭 🙃 ... please apply to join @huggingface so I can share it with you! End-result of the ML test 👇

22

126

778

Thomas Wolf

@Thom_Wolf

4 years

Some AI directions: -robustness/comm. sense: SOTA models should work in real-life! -few-shot learning: 20 examples should be enough! -continual learning: GPT3 should know about COVID! -explainability: why should I trust this model? -efficiency: do I need 600B params/2B tokens?

27

116

758

Thomas Wolf

@Thom_Wolf

1 month

this 30-min-read blog post on how to craft and generate a 25B+ tokens synthetic text dataset distills more information and alphas than a typical NeurIPS best paper

5

113

753

Thomas Wolf

@Thom_Wolf

3 months

We've just open-sourced two tools we use for large-scale data processing and large-scale model trainings: - datatrove – all things webscale data processing: deduplication, filtering, tokenization – - nanotron – all things 3D parallelism: lightweight and…

6

138

748

Thomas Wolf

@Thom_Wolf

11 months

The license of the Falcon 40B model has just been changed to… Apache-2 which means that this model is now free for any usage including commercial use (and same for the 7B) 🎉

Thomas Wolf

@Thom_Wolf

11 months

LLaMa is dethroned 👑 A brand new LLM is topping the Open Leaderboard: Falcon 40B 🛩 *interesting* specs: - tuned for efficient inference - licence similar to Unity allowing commercial use - strong performances - high-quality dataset also released Check the authors' thread 👇

16

121

621

15

146

742

Thomas Wolf

@Thom_Wolf

4 years

T5 is now officially included in 🤗Transformers v2.7.0 thanks to our joint work with @colinraffel & @PatrickPlaten A powerful encoder-decoder by @GoogleAI which natively handles many NLP tasks as text-to-text tasks Just ask it to "Translate" or "Summarize" and enjoy the result!

12

164

676

Thomas Wolf

@Thom_Wolf

4 months

intern applicants: i'm actually very likely more interested in your crazy little unfinished side project or nerdy interest than your gpa – you can proudly show them!

20

48

661

Thomas Wolf

@Thom_Wolf

6 years

Things to keep in mind when reading research papers: -papers are biased towards using complex models -papers from well-funded labs are biased towards using the biggest datasets on the biggest machines -papers from high-profile orgs don't have inherently better ideas (but more PR)

5

174

640

Thomas Wolf

@Thom_Wolf

5 years

Here is an op-for-op @PyTorch re-implementation of @GoogleAI 's BERT model by @sanhestpasmoi , @timrault and I. We made a script to load Google's pre-trained models and it performs about the same as the TF implementation in our tests (see the readme). Enjoy!

GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch,...

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

github.com

11

200

649

Thomas Wolf

@Thom_Wolf

5 months

Google really did miss an opportunity to leapfrog both OpenAI and Meta and regain a place as 👑 of AI by making Gemini open source

32

62

628

Thomas Wolf

@Thom_Wolf

11 months

LLaMa is dethroned 👑 A brand new LLM is topping the Open Leaderboard: Falcon 40B 🛩 *interesting* specs: - tuned for efficient inference - licence similar to Unity allowing commercial use - strong performances - high-quality dataset also released Check the authors' thread 👇

16

121

621

Thomas Wolf

@Thom_Wolf

3 months

You likely missed it if you only follow ML Twitter but there's a series of mind-blowing tech reports and open-source models coming from China (DeepSeek, MiniCPM, UltraFeedback...) with so much lesson learned and experiments openly shared together with models, data, etc This…

51

98

563

Thomas Wolf

@Thom_Wolf

3 years

I'm doing a lot of "slow" science these days. Not following arxiv anymore, mostly reading long-form research works and textbooks in areas outside of my usual fields. It feels very good, I should do it more often

7

24

619

Thomas Wolf

@Thom_Wolf

10 months

What was going on with the Open LLM Leaderboard? Its numbers didn't match the ones reported in the LLaMA paper! We've decided to dive in this rabbit hole with friends from the LLaMA & Falcon teams and got back with a blog post of learnings & surprises:

8

142

604

Thomas Wolf

@Thom_Wolf

4 years

Let me highlight this amazing work I've read recently on #compositionality in NLP, in which you'll find both: - a deep discussion of what it means for a neural model to be compositional - a deep and insightful comparison of LSTM, ConvNet & Transformers! 👉

4

137

573

Thomas Wolf

@Thom_Wolf

7 days

This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…

Sergey Edunov

@edunov

9 days

People seem to over-index on the 15T number after Llama 3. While the number matters, what is even more important is the quality and diversity of those tokens. If there was a good way to measure those, that would have been an impressive result to report.

1

9

114

17

127

818

Thomas Wolf

@Thom_Wolf

14 days

Time for the open-source AI robots revolution 🚀 We’ve been playing with a low-cost DJI robot controlled by 3 local open-source AI models (Whisper, Idefics2, Parler-TTS - all Apache2) & orchestrated by Dora-cs In comments a 250 lines code gist to build on top of it => enjoy!!

22

115

590

Thomas Wolf

@Thom_Wolf

6 months

There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI…

13

130

571

Thomas Wolf

@Thom_Wolf

21 days

If you didn’t follow all, the situation has dramatically changed on the arena of LLMs recently: - @AnthropicAI ’s Claude 3 opus is now the undefeated winner of all closed-source models (just look at this win-rate line!) - @cohere Command R+ is the new super strong leader of…

lmsys.org

@lmsysorg

21 days

Exciting news - the latest Arena result are out! @cohere 's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere 's incredible work & valuable contribution…

43

314

1K

13

107

574

Thomas Wolf

@Thom_Wolf

23 days

I was playing with a new robotics framework called Dora-rs today. A super impressive replacement of ROS (the Robot Operating System), for those who know, one of the pain-point to lower the entry barrier in robotics imo Dora-rs is much much easier to install and fully integrate…

9

77

565

Thomas Wolf

@Thom_Wolf

4 years

Open-sourcing a community-focused library basically means you'll keep fighting with a bunch of well-intentioned people who want to morph your simple code in a cathedral of complex and smart abstractions. Writing easy-to-read, simple-to-use code is an under-rated skill.

9

71

560

Thomas Wolf

@Thom_Wolf

1 year

There is a fascinating recent trend of training *smaller models for longer* w.r.t. Chinchilla optimal predictions Best explanation I've seen of this? This new blog post by @harm_devries (with collaborators of the @BigCodeProject ): Clearly these are only…

16

114

554

Thomas Wolf

@Thom_Wolf

4 years

Developer path: -Started code at 11 -Work on laser-plasma @BerkeleyLab -PhD on quantum physics -Switch to IP law -European Patent Attorney -Discover Machine Learning at @iclr2017 -Open-source first ML library @huggingface -1M download, HF raises $15M, hires crazy talented people

Kelly Vaughn

@kvlly

4 years

Developer path: - Started coding @ 11 - First freelance client @ 14 - Graduated HS @ 17 (freelancing) - Made $56k freelancing @ 25 - Made $137k freelancing @ 26 - Started running an agency @ 27 - Agency did $223k @ 28 - Agency did $430k @ 29 - Aiming for $1 million this year!

116

302

4K

13

35

554

Thomas Wolf

@Thom_Wolf

5 years

🐣 New Tutorial, open-source code & demo! Building a SOTA Conversational AI with transfer learning & OpenAI GPT models -Code/pretrained model from our NeurIPS 2018 ConvAI2 competition model, SOTA on automatic track -Detailed Tutorial w. code -Cool demo 👇

10

168

550

Thomas Wolf

@Thom_Wolf

5 years

💥 NeuralCoref 4.0 is out! Blazing fast English coreference resolution w. SpaCy ☀️ Now on pip: pip install neuralcoref 🍃 Model 10x smaller 💫 Compatible w. SpaCy 2.1 🔗 We've also added a neat feature to incorporate Domain Knowledge Here is an example👇

7

150

542

Thomas Wolf

@Thom_Wolf

4 years

I'm thinking about adding very explicit and simple examples to 🤗transformers like this one👇 I like that it's only 45 lines of codes but you see/control all the important steps (data processing, training, evaluation, HP search) plus the search gives you robust perf. wdyt?

25

60

544

Thomas Wolf

@Thom_Wolf

1 year

I’m not gonna lie, the GPT4 just released is quite less exciting than what I was expecting No multimodale generations and a tech report carefully emptied of any useful info on the model/training/compute I guess we’re getting spoiled in today’s AI world

30

32

523

Thomas Wolf

@Thom_Wolf

5 years

The new blog post of @m__dehghani is a very nice and visual introduction to Universal Transformers & their motivations: … I like a lot the reformulation of Graves' Adaptive Computation Time as a dynamic recurrence in depth. Feels like a general idea.

0

146

520

Thomas Wolf

@Thom_Wolf

5 years

A question I get from time to time is how to convert a pretrained TensorFlow model in PyTorch easily and reliably. We're starting to be quite familiar with the process so I've written a short blog post summarizing our workflow and some lessons learned 👇

🌓 From TensorFlow to PyTorch

Friends and users of our open-source tools are often surprised how fast 🚀 we reimplement the latest SOTA pretrained TensorFlow models to…

medium.com

8

123

520

Thomas Wolf

@Thom_Wolf

5 years

New release of Transformers repo is shaping up & I'm very excited! Gifts for all: -SOTA Lovers: new XLNet & XLM archi + 6 new Bert/GPT trained chkpt -Research Lovers: unified model API, attention/hidden-state outputs to swap/study models -Speed Lovers: Torchscript & head pruning!

4

114

517

Thomas Wolf

@Thom_Wolf

5 months

So many AI founders I meet have ideas that are frustratingly small. like the web just got invented and people are like “web consultancy gonna be the biggest thing ever” Go build something bold!

47

63

512

Thomas Wolf

@Thom_Wolf

4 years

The most enjoyable papers usually fall into two buckets: - the summer internship crazy projects => short ambitious open-ended - the slow & carefully crafted 1-year long projects => read like a novel w. highs, lows and teachings None of these are triggered by conference deadlines

4

47

505

Thomas Wolf

@Thom_Wolf

4 years

This is deeply wrong Predicting students future grades from school history + past grades *and* making automatic decisions based on them is the exact example of a system that should *not* be deployed And the fact that it already determined the fate of 170k students is horrifying

explAInable.NL

@ExplainableNL

4 years

International Baccalaureate program uses prediction algorithm to replace high-stakes exam. So much wrong with this; not just black box, but worse: deep misunderstandings about what you can expect and allow from ML by decision makers (incl at colleges).

5

27

104

18

134

503

Thomas Wolf

@Thom_Wolf

29 days

Little known OSS gem: the Open-source Cookbook A collection of notebooks for building practical AI applications using open-source tools and models: Doc: Currently contains 16 notebooks in English (with some in Chinese as well):…

1

97

497

Thomas Wolf

@Thom_Wolf

5 years

I needed a good GAN to tweak for a CV+NLP project so here is a *pretrained* version of BigGAN in PyTorch Sweet stuff: -AFAIK code is not public yet so here is an op-for-op implem to read/tweak -Checkpoints 2x smaller (no dead vars) -Print images in terminal (icing on the cake)👇

10

104

486

Thomas Wolf

@Thom_Wolf

2 years

I really miss the days I was creating the transformers library and then creating the datasets library. My resolution for 2022: free (a lot of) time to code again. Coding is really the fun part of our work

17

15

484

Thomas Wolf

@Thom_Wolf

6 years

I wrote a post on how you can make your Python NLP module 50-100 times faster! Bonus: a Jupyter notebook with examples processing over 80 millions words per sec… Spoiler: use spaCy's internals and a bit of Cython magic Hat tips @honnibal @_inesmontani

🚀 100 Times Faster Natural Language Processing in Python

How to take advantage of spaCy & a bit of Cython for blazing fast NLP

medium.com

6

150

483

Thomas Wolf

@Thom_Wolf

5 months

As a non-CS person, what an honor to sit among the authors of these 4 awarded papers at NeurIPS 2023 (out of 13k papers submitted 🤯) I was only an enabler, all props should go to the amazing @Muennighoff (starting soon grad school...) as well as @srush_nlp @boazbaraktcs …

12

33

470

Thomas Wolf

@Thom_Wolf

1 year

This is crazy! I still remember when I started coding Transformers with Victor and Tim one cold night of October 2018 in Bruxelles after attending the EMNLP conference and its social event at the Royal Museums of Fine Arts And today, 5 years after, Hugging Face Transformers is…

14

41

465

Thomas Wolf

@Thom_Wolf

2 years

“Move slow and build things”

7

66

469

Thomas Wolf

@Thom_Wolf

5 years

We've spent a few evenings last week building an interactive demo called *Write with Transformer* It lets you interact in a very intimate way with GPT-2, call, control, question the model... and I just can't stop playing with it! You can try it at

Julien Chaumond

@julien_c

5 years

At NAACL last week we built a new side project, Write With Transformer. It lets you trigger GPT-2 completions multiple times, in a Google Doc-like interface. 🦄 It's like having a unicorn friend that completes your thoughts 🦄 cc @gdb @AlecRad Try it:

16

161

474

13

130

464

Thomas Wolf

@Thom_Wolf

5 years

I've added FP16 training to our PyTorch BERT repo to easily fine-tune BERT-large on GPU. The repo has become a showcase of all the tools you can use to train huge NNs 🙂 Got >91 F1 on SQuAD training BERT-large a few hours on 4-GPUs. Should take less than a day on 1-(recent)-GPU

8

115

457

Thomas Wolf

@Thom_Wolf

2 years

there is a scary possibility that we may solve all the benchmarks we come up for AI... without understanding anything fundamentally deep about what intelligence is about a bummer for those like me who are see AI as a fantastic way to unlock deeper insights on human intelligence

35

44

454

Thomas Wolf

@Thom_Wolf

4 years

Quarantine had me cancel many side projects & go back to the roots of what I enjoy doing: science and building software that spark joy ...and back to the roots of when I used to do it: these precious first hours of the night, when everybody‘s asleep & the night is still young

8

17

441

Thomas Wolf

@Thom_Wolf

2 years

Awesome News! Due to the popularity of our book "NLP with Transformers", @OReillyMedia has decided to print it in **full colors** from now on in the revised edition I've just got first printed copies and the results is 🤩 More info at

17

46

442

Thomas Wolf

@Thom_Wolf

2 years

Building a library with a strong C++ (or Rust) backend coupled with a carefully designed Python front is such a magically powerful combination Feels like creating pure super-powers

16

28

438

Thomas Wolf

@Thom_Wolf

1 year

realized yesterday that if a closed-source AI company had invented flash attention nobody would know about it – and this makes me sad for the current state of AI knowledge sharing many cool AI algorithmic improvements are probably already being kept behind closed doors

5

62

428

Thomas Wolf

@Thom_Wolf

5 years

BERT Rediscovers the Classical NLP Pipeline by I. Tenney, D. Das & E. Pavlic is 4 pages of great insights Such a constant source of fascinating papers from Ellie Pavlick & her collaborators! Here's BERT correcting his prediction along the model depth🤯

1

91

428

Thomas Wolf

@Thom_Wolf

11 months

Nobody's been talking about it but it's rather *mind-blowing* imo that the open-source Flacon 40B model is topping LLaMa 65B on leaderboards and many evals while having required not even half the compute of LLaMa to train from scratch 🤯 Quick back of the envelop calculations: -…

9

64

429

Thomas Wolf

@Thom_Wolf

4 months

Some predictions for 2024 – keeping only the more controversial ones. You certainly saw the non-controversial ones (multimodality, etc) already 1. At least 10 new unicorn companies building SOTA open foundation models in 2024 Stars are so aligned: - a smart, small and dedicated…

19

77

411

Thomas Wolf

@Thom_Wolf

4 years

I liked the LSH attention in the reformer Sparse, efficient, simple Dynamic sparse attn is fascinating & mostly dealt by – softmax+topK: Recurrent Independent Mech. (MILA) Product-Key Mem (FB) – 𝛂-entmax: Adap. Sparse Transformer (DeepSPIN) links👇[1/3]

4

97

413

Thomas Wolf

@Thom_Wolf

4 years

If you're using Transformers from source, we've rolled out 2 nice beta features (TBR in January) 💥Ultra-fast Bert/GPT2 tokenizers (up to 80x faster) 🦄Easy/versatile sequence generation for generative models: top-k/nucleus/temperature sampling, penalized/greedy, beam search...

9

85

405

Thomas Wolf

@Thom_Wolf

4 years

People asking me to teach classes clearly give zero fuck to the imposter syndrome of a former physics PhD turned lawyer before joining AI Anyway I'll co-teach NLPL Winter School w Yoav Goldberg talking transfer learning, its limits & where the field might head Will share slides

11

19

408

Thomas Wolf

@Thom_Wolf

3 years

A few years ago I was mostly interested in models, creating 🤗transformers, adding BERT, GPT, T5… Over time I’ve seen my interests shift to data (sharing, evaluation, processing) leading to 🤗datasets And I see many people around me follow a similar path We are slowly maturing

Rachel Thomas

@math_rachel

3 years

An overall lack of recognition for the invisible, arduous, & taken-for-granted data work in AI leads to poor data practices, resulting in data cascades (negative, downstream events)... “Everyone wants to do the model work, not the data work” 1/

18

229

778

10

61

404

Thomas Wolf

@Thom_Wolf

2 years

large language models are slightly boring

29

26

393

Thomas Wolf

@Thom_Wolf

2 years

Interesting podcast investigating what happened in 1984 that made so many women give up on computer science

8

89

384

Thomas Wolf

@Thom_Wolf

5 years

I've been playing a bit w. the coming TensorFlow 2.0 & I was pleasantly surprised! In a couple of hours, I could convert our PyTorch Bert to TF2.0 and a few hours later, load pretrained weights w. a clean interface. Still a few rough edges but a huge step forward in terms of UX

8

71

383

Thomas Wolf

@Thom_Wolf

5 years

there, you have them on one slide

2

92

383

Thomas Wolf

@Thom_Wolf

3 years

ML researchers: we don’t really need to learn about biology or developmental psychology because plane don’t fly like birds anyway Also ML researchers: “this new algorithm take inspiration form [our layman view of] how XXX happens in humans” 🙃

12

35

380

Thomas Wolf

@Thom_Wolf

3 years

Starting a big project takes a lot more time & energy than people expect. I've been pushing mostly one project per year: -2019 🤗Transformers -2020 🤗Datasets -2021 @BigscienceW I used to find it frustratingly slow – now I accept it Give your projects the time they need to grow

8

25

377

Thomas Wolf

@Thom_Wolf

4 years

Our amazing @julien_c has done an impressive job revamping the frontpage last week. You can now browse all the open-access models (1500+), datasets (120+), and NLP metrics (10+) of our libraries with a nice interface & a cool quick search! Give it a try👇

Manu Romero

@mrm8488

4 years

Love the new look and feel of @huggingface They added all in one search (models, datasets, metrics) and everything is quite intuitive.

0

5

32

5

86

378

Thomas Wolf

@Thom_Wolf

2 years

In 2021, we've seen an explosion of grounded Langage Models: from "image/video+text" to more embodied models in "simulation+text" But, when tested on text benchmarks, these grounded models really struggle to improve over pure text-LM e.g T5/GPT3 Why? >>

10

79

374

Thomas Wolf

@Thom_Wolf

5 years

. @kaushal316 wrote a nice step-by-step tutorial on how to finetune BERT on a classification task ( @kaggle Toxic Challenge) Covers everything from data processing to model modification Results are top-10% w. a very simple 30-lines-of-code single model 👇

Multi-label Text Classification using BERT – The Mighty Transformer

The past year has ushered in an exciting age for Natural Language Processing using deep neural networks. Research in the field of using…

medium.com

1

106

367

Thomas Wolf

@Thom_Wolf

2 years

We've open-sourced a new experiment: the alpha version of a library called "🎢 simulate" It's a python lib for building a diverse set of simulation environments for embodied and synthetic data research by reusing/tweaking/sharing assets and scenes

GitHub - huggingface/simulate: 🎢 Creating and sharing simulation environments for embodied and...

🎢 Creating and sharing simulation environments for embodied and synthetic data research - huggingface/simulate

github.com

6

74

368

Thomas Wolf

@Thom_Wolf

6 years

NeuralCoref v3.0 is out✨! - up to 100x faster than v2.0 (thanks Cython) 🚀 - Integrated in spaCy models and pipeline 🤗 + 💫 = 💙 - Based on the fast neural net model by @stanfordnlp , trained in @PyTorch Check it out: Cc @spacy_io

5

128

365

Thomas Wolf

@Thom_Wolf

8 months

Crazy how 34 billions parameters models seemed huge and unmanageable outside of a data center just maybe 1.5 years ago. Now it’s laptop stuff

Georgi Gerganov

@ggerganov

8 months

Full F16 precision 34B Code Llama at >20 t/s on M2 Ultra

40

270

2K

8

39

366

Thomas Wolf

@Thom_Wolf

2 years

ml twitter is dead publishing our internship description videos on tiktok

7

22

353

Thomas Wolf

@Thom_Wolf

5 years

A fascinating article by @lena_voita if you're interested in understanding what makes MLM models like BERT differents from LM models like GPT/GPT-2 (auto-regressive) and MT models. And conveyed in such a beautiful blog post, a master-piece of knowledge sharing!

Lena Voita

@lena_voita

5 years

Evolution of Representations in the Transformer: blog post on our @emnlp2019 paper is out! blog post: paper: @lena_voita , @RicoSennrich , @iatitov

5

156

616

3

110

362

Thomas Wolf

@Thom_Wolf

5 years

🥳 The Transformers library is turning 1⃣ today 🎂 What a ride! - 16k+ stars on Github - 160+ contributors and the most amazing features are still to come, I'm so excited about what's next Here is Megan's humbling shout out to it in her keynote at TensorFlow World yesterday 😍

10

55

350

Thomas Wolf

@Thom_Wolf

6 months

one of the most striking event of 2023: the rise of the GPU-poor models you can do a lot (and much cheaper) with well trained smaller models

17

51

349

Thomas Wolf

@Thom_Wolf

3 years

Reading 50 years old AI books I always find cute how people back then thought they were close to figuring out AI comparing LISP/PLANNER programs to brain/consciousness and then I look at us now & can already feel the tender looks of 2040's AI researchers on our current DL models

4

40

348