Thomas Wolf Profile Banner
Thomas Wolf Profile
Thomas Wolf

@Thom_Wolf

68,313
Followers
4,341
Following
318
Media
3,572
Statuses

Co-founder and CSO @HuggingFace - open-source and open-science

Joined February 2011
Don't wanna be here? Send us removal request.
@Thom_Wolf
Thomas Wolf
2 years
we need a billionaire bullish enough to spend 13b per year on nuclear fusion like zuck is doing on the metaverse
75
207
2K
@Thom_Wolf
Thomas Wolf
5 years
🔥Pytorch-Transformers 1.0🔥 Six NLU/NLG architectures: BERT, GPT, GPT-2, Transfo-XL, XLNet, XLM Total: 27 pretrained models Still the same -Superfast onboarding -SOTA scripts: GLUE, SQuAD, Text generation New -Unified API -Access hidden-states, attentions... -Torchscript -...
Tweet media one
37
590
2K
@Thom_Wolf
Thomas Wolf
3 years
Authors have no say on the animal O'Reilly choose for the cover of their book But I'm really happy that they chose a parrot🦜 for the cover of the book on Transformers we are finalising with Lewis and Leandro It's a Coconut Lorikeet parrot (a very stochastic Coconut Lorikeet😉)
Tweet media one
30
200
2K
@Thom_Wolf
Thomas Wolf
9 days
Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation…
@gui_penedo
Guilherme Penedo
9 days
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Tweet media one
38
332
1K
24
301
2K
@Thom_Wolf
Thomas Wolf
2 years
I read a lot of books this year to broaden my horizons in AI/ML with adjacent or complementary disciplines. It was a great pleasure so I’m sharing some my reading list here with a couple of notes: [1/12]
23
290
2K
@Thom_Wolf
Thomas Wolf
5 years
🤗Transformers 2.0💥 State-of-the-art NLP in TensorFlow 2.0/PyTorch 8 architectures 33 trained models 102 lang. Seamlessly pick the right framework for training, eval, deploy Train on TPU ⏩ finetune/test in PyTorch ⏩ serve w. TFX 🍒Keras magic: train SOTA model in 10 lines👇
Tweet media one
19
401
2K
@Thom_Wolf
Thomas Wolf
4 years
Surviving every AI wave, two kernels have consistently been the beating hearts of Natural Language Processing: Datasets and Metrics Today we release "nlp", a library to easily share & load data/metrics already providing access to 99+ datasets! Try it👉
Tweet media one
17
408
2K
@Thom_Wolf
Thomas Wolf
2 months
playing with a basic, fully-local and open-source speech-to-text-to-speech pipeline on my mac less than 120 lines of code to chain local whisper + Zephyr (in LM studio) + an Openvoice TTS … latency is 1.5-2.5 sec on an M3. already quite impressed how all…
31
144
1K
@Thom_Wolf
Thomas Wolf
1 month
[75min talk] i finally recorded this lecture I gave two weeks ago because people kept asking me for a video so here it is, enjoy "The Little guide to building Large Language Models in 2024" tried to keep it short and comprehensive – focusing on concepts that are crucial for…
Tweet media one
12
243
1K
@Thom_Wolf
Thomas Wolf
2 years
Just received my first physical copy of our book & the feeling is... surreal One year and a half in the making & I'm amazingly proud with the result It covers so much ground from NLP w/o labels up to training billion param models, multilinguality, pruning, classif, generation..
Tweet media one
54
130
1K
@Thom_Wolf
Thomas Wolf
3 years
A few stuff not a lot of people know about HuggingFace: -🤗is a very small team less than 30 -🤗transformers GH stars are growing faster than legends like PyTorch, will probably pass it in 2021 -Open-source/-science is even more 🤗DNA than ppl think -🤗is cash-flow positive today
27
92
1K
@Thom_Wolf
Thomas Wolf
5 years
Currently working on the coming NAACL "Transfer Learning in NLP" tutorial with @seb_ruder @mattthemathman and @swabhz . Pretty excited! And I've discovered you can write a Transformer model like GPT-2 in less than 40 lines of code now! 40 lines of code & 40 GB of data...
Tweet media one
15
279
1K
@Thom_Wolf
Thomas Wolf
6 years
I've spent most of 2018 training models that could barely fit 1-4 samples/GPU. But SGD usually needs more than few samples/batch for decent results. I wrote a post gathering practical tips I use, from simple tricks to multi-GPU code & distributed setups:
Tweet media one
11
370
1K
@Thom_Wolf
Thomas Wolf
5 years
With 180+ papers mentioning 🤗 Transformers and its predecessors, it was high time to put out a real paper that people could cite. 🥳 🎉 With @LysandreJik @SanhEstPasMoi @julien_c @ClementDelangue @moi_anthony @pierrci @remilouf @MorganFunto @jamieabrew
Tweet media one
11
269
1K
@Thom_Wolf
Thomas Wolf
4 years
They are loosely connected to what I’m working on these days but these three books are still very clearly the most enjoyable read I’ve had since I joined the field. What a pleasure it was to read them!
Tweet media one
21
107
1K
@Thom_Wolf
Thomas Wolf
4 months
The progressive rise of open (source/access) AI models back from the ashes in 2023 This will be remembered as one of the most remarkable change in the AI field of our year
Tweet media one
17
194
906
@Thom_Wolf
Thomas Wolf
4 years
There is a bit of magic in the new 🤗nlp library besides giving dead-simple access to 120+ datasets🧙‍♂️ We've tested it with @qlhoest and loading a 17GB+ dataset like English Wikipedia only takes... 9MB in RAM🐣 And you can iterate over the data at 2-3 Gbit/s🚀 Try it yourself👇
Tweet media one
12
223
1K
@Thom_Wolf
Thomas Wolf
5 years
I’m working on a series of mini tutorials for a wider NLP audience. 🤗 Transformers can be intimidating and I’d like to show that you can get ~SOTA results in 10 lines of code on tasks such as text/tokens/words classification, question answering, maybe generation. Other topics?
66
104
1K
@Thom_Wolf
Thomas Wolf
5 years
Pytorch-bert v0.6 is out with OpenAI's pretrained GPT-2 🦄 small model & the usual accompanying example scripts to use it. Now... can you guys wait that the ACL deadline has passed to release any crazy new transformer? 😅 Thanks, you are the best! 🥰
Tweet media one
18
262
971
@Thom_Wolf
Thomas Wolf
5 years
Interesting developments happened in 2018/2019 for natural language generation decoding algorithms: here's a thread with some papers & code So, the two most common decoders for language generation used to be greedy-decoding (GD) and beam-search (BS). [1/9]
12
292
954
@Thom_Wolf
Thomas Wolf
4 years
I often meet research scientists interested in open-sourcing their code/research and asking for advice. Here is a thread for you. First: why should you open-source models along with your paper? Because science is a virtuous circle of knowledge sharing not a zero-sum competition
Tweet media one
2
284
922
@Thom_Wolf
Thomas Wolf
5 years
Our PyTorch BERT is on pip! I took extra care to make it both easy to use and modular. Uses @ai2_allennlp file caching technique to download/cache/load Google's pretrained models Includes 6 PyTorch models with various architectures, tokenizer & optimizer 👉
Tweet media one
7
277
911
@Thom_Wolf
Thomas Wolf
6 months
Over the past weeks the H4 team has been busy pushing the Zephyr 7B model to new heights 🗻 The new version is now topping all 7b models on chat evals and even 10x larger models 🤯🔥 Here are the intuitions on it 1/ Start with the strongest pretrained model you can find:…
Tweet media one
25
184
892
@Thom_Wolf
Thomas Wolf
1 year
So this week we've finally released 💫 StarCoder () with @BigCodeProject StarCoder is the first large model (15B) which is both high performance (beating the like of PaLM, LLaMa, CodeGen or OpenAI code-crushman-001 on code generation) and also trained…
Tweet media one
29
170
871
@Thom_Wolf
Thomas Wolf
2 years
Super happy the kindle version of our book is finally out🔥 & paper versions are being printed as I speak 🤩 We made a homepage with updated news at And we’ve open sourced all the code of the book (it's a lot!): #transformersbook
Tweet media one
20
163
867
@Thom_Wolf
Thomas Wolf
1 year
There are completely mind-blowing examples in the GPT4 "sparks of AGI" study
Tweet media one
29
102
846
@Thom_Wolf
Thomas Wolf
5 years
PT-BERT 0.5 out💥 Pretty big release w. not 1 but TWO new pretrained models: -classic: OpenAI's GPT -brand-new: Transformer-XL by Google/CMU As always both should be super easy to use So...BERT now stands for Big-&-Extending-Repository-of-Transformers😅 Happy Transfer Learning!
Tweet media one
7
239
813
@Thom_Wolf
Thomas Wolf
1 year
two years ago the entry point to participate in AI research was to have a couple of GPUs for training or finetuning now the entry level is to be able to regularly train a 50-70B params model on a couple hundred billion tokens
47
81
803
@Thom_Wolf
Thomas Wolf
3 years
I’m not here to solve AGI in 5 years whatever it might be. I’ve read enough AI & neuroscience. I know we’re still years away I’m here to make the research communities healthier, fighting for more collaboration and sharing The journey is even more important than the destination
13
68
809
@Thom_Wolf
Thomas Wolf
4 years
So I've made a new multimodal ML coding exercise & I'm so excited about it that I want to blog/share it w. everyone... but I can't because then it won't be a hiring test anymore 😭 🙃 ... please apply to join @huggingface so I can share it with you! End-result of the ML test 👇
22
126
778
@Thom_Wolf
Thomas Wolf
4 years
Some AI directions: -robustness/comm. sense: SOTA models should work in real-life! -few-shot learning: 20 examples should be enough! -continual learning: GPT3 should know about COVID! -explainability: why should I trust this model? -efficiency: do I need 600B params/2B tokens?
27
116
758
@Thom_Wolf
Thomas Wolf
1 month
this 30-min-read blog post on how to craft and generate a 25B+ tokens synthetic text dataset distills more information and alphas than a typical NeurIPS best paper
Tweet media one
5
113
753
@Thom_Wolf
Thomas Wolf
3 months
We've just open-sourced two tools we use for large-scale data processing and large-scale model trainings: - datatrove – all things webscale data processing: deduplication, filtering, tokenization – - nanotron – all things 3D parallelism: lightweight and…
6
138
748
@Thom_Wolf
Thomas Wolf
11 months
The license of the Falcon 40B model has just been changed to… Apache-2 which means that this model is now free for any usage including commercial use (and same for the 7B) 🎉
@Thom_Wolf
Thomas Wolf
11 months
LLaMa is dethroned 👑 A brand new LLM is topping the Open Leaderboard: Falcon 40B 🛩 *interesting* specs: - tuned for efficient inference - licence similar to Unity allowing commercial use - strong performances - high-quality dataset also released Check the authors' thread 👇
Tweet media one
16
121
621
15
146
742
@Thom_Wolf
Thomas Wolf
4 years
T5 is now officially included in 🤗Transformers v2.7.0 thanks to our joint work with @colinraffel & @PatrickPlaten A powerful encoder-decoder by @GoogleAI which natively handles many NLP tasks as text-to-text tasks Just ask it to "Translate" or "Summarize" and enjoy the result!
Tweet media one
12
164
676
@Thom_Wolf
Thomas Wolf
4 months
intern applicants: i'm actually very likely more interested in your crazy little unfinished side project or nerdy interest than your gpa – you can proudly show them!
20
48
661
@Thom_Wolf
Thomas Wolf
6 years
Things to keep in mind when reading research papers: -papers are biased towards using complex models -papers from well-funded labs are biased towards using the biggest datasets on the biggest machines -papers from high-profile orgs don't have inherently better ideas (but more PR)
5
174
640
@Thom_Wolf
Thomas Wolf
5 years
Here is an op-for-op @PyTorch re-implementation of @GoogleAI 's BERT model by @sanhestpasmoi , @timrault and I. We made a script to load Google's pre-trained models and it performs about the same as the TF implementation in our tests (see the readme). Enjoy!
11
200
649
@Thom_Wolf
Thomas Wolf
5 months
Google really did miss an opportunity to leapfrog both OpenAI and Meta and regain a place as 👑 of AI by making Gemini open source
32
62
628
@Thom_Wolf
Thomas Wolf
11 months
LLaMa is dethroned 👑 A brand new LLM is topping the Open Leaderboard: Falcon 40B 🛩 *interesting* specs: - tuned for efficient inference - licence similar to Unity allowing commercial use - strong performances - high-quality dataset also released Check the authors' thread 👇
Tweet media one
16
121
621
@Thom_Wolf
Thomas Wolf
3 months
You likely missed it if you only follow ML Twitter but there's a series of mind-blowing tech reports and open-source models coming from China (DeepSeek, MiniCPM, UltraFeedback...) with so much lesson learned and experiments openly shared together with models, data, etc This…
51
98
563
@Thom_Wolf
Thomas Wolf
3 years
I'm doing a lot of "slow" science these days. Not following arxiv anymore, mostly reading long-form research works and textbooks in areas outside of my usual fields. It feels very good, I should do it more often
7
24
619
@Thom_Wolf
Thomas Wolf
10 months
What was going on with the Open LLM Leaderboard? Its numbers didn't match the ones reported in the LLaMA paper! We've decided to dive in this rabbit hole with friends from the LLaMA & Falcon teams and got back with a blog post of learnings & surprises:
Tweet media one
8
142
604
@Thom_Wolf
Thomas Wolf
4 years
Let me highlight this amazing work I've read recently on #compositionality in NLP, in which you'll find both: - a deep discussion of what it means for a neural model to be compositional - a deep and insightful comparison of LSTM, ConvNet & Transformers! 👉
Tweet media one
4
137
573
@Thom_Wolf
Thomas Wolf
7 days
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
@edunov
Sergey Edunov
9 days
People seem to over-index on the 15T number after Llama 3. While the number matters, what is even more important is the quality and diversity of those tokens. If there was a good way to measure those, that would have been an impressive result to report.
1
9
114
17
127
818
@Thom_Wolf
Thomas Wolf
14 days
Time for the open-source AI robots revolution 🚀 We’ve been playing with a low-cost DJI robot controlled by 3 local open-source AI models (Whisper, Idefics2, Parler-TTS - all Apache2) & orchestrated by Dora-cs In comments a 250 lines code gist to build on top of it => enjoy!!
22
115
590
@Thom_Wolf
Thomas Wolf
6 months
There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI…
13
130
571
@Thom_Wolf
Thomas Wolf
21 days
If you didn’t follow all, the situation has dramatically changed on the arena of LLMs recently: - @AnthropicAI ’s Claude 3 opus is now the undefeated winner of all closed-source models (just look at this win-rate line!) - @cohere Command R+ is the new super strong leader of…
@lmsysorg
lmsys.org
21 days
Exciting news - the latest Arena result are out! @cohere 's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere 's incredible work & valuable contribution…
Tweet media one
43
314
1K
13
107
574
@Thom_Wolf
Thomas Wolf
23 days
I was playing with a new robotics framework called Dora-rs today. A super impressive replacement of ROS (the Robot Operating System), for those who know, one of the pain-point to lower the entry barrier in robotics imo Dora-rs is much much easier to install and fully integrate…
9
77
565
@Thom_Wolf
Thomas Wolf
4 years
Open-sourcing a community-focused library basically means you'll keep fighting with a bunch of well-intentioned people who want to morph your simple code in a cathedral of complex and smart abstractions. Writing easy-to-read, simple-to-use code is an under-rated skill.
9
71
560
@Thom_Wolf
Thomas Wolf
1 year
There is a fascinating recent trend of training *smaller models for longer* w.r.t. Chinchilla optimal predictions Best explanation I've seen of this? This new blog post by @harm_devries (with collaborators of the @BigCodeProject ): Clearly these are only…
Tweet media one
16
114
554
@Thom_Wolf
Thomas Wolf
4 years
Developer path: -Started code at 11 -Work on laser-plasma @BerkeleyLab -PhD on quantum physics -Switch to IP law -European Patent Attorney -Discover Machine Learning at @iclr2017 -Open-source first ML library @huggingface -1M download, HF raises $15M, hires crazy talented people
@kvlly
Kelly Vaughn
4 years
Developer path: - Started coding @ 11 - First freelance client @ 14 - Graduated HS @ 17 (freelancing) - Made $56k freelancing @ 25 - Made $137k freelancing @ 26 - Started running an agency @ 27 - Agency did $223k @ 28 - Agency did $430k @ 29 - Aiming for $1 million this year!
116
302
4K
13
35
554
@Thom_Wolf
Thomas Wolf
5 years
🐣 New Tutorial, open-source code & demo! Building a SOTA Conversational AI with transfer learning & OpenAI GPT models -Code/pretrained model from our NeurIPS 2018 ConvAI2 competition model, SOTA on automatic track -Detailed Tutorial w. code -Cool demo 👇
10
168
550
@Thom_Wolf
Thomas Wolf
5 years
💥 NeuralCoref 4.0 is out! Blazing fast English coreference resolution w. SpaCy ☀️ Now on pip: pip install neuralcoref 🍃 Model 10x smaller 💫 Compatible w. SpaCy 2.1 🔗 We've also added a neat feature to incorporate Domain Knowledge Here is an example👇
Tweet media one
7
150
542
@Thom_Wolf
Thomas Wolf
4 years
I'm thinking about adding very explicit and simple examples to 🤗transformers like this one👇 I like that it's only 45 lines of codes but you see/control all the important steps (data processing, training, evaluation, HP search) plus the search gives you robust perf. wdyt?
Tweet media one
25
60
544
@Thom_Wolf
Thomas Wolf
1 year
I’m not gonna lie, the GPT4 just released is quite less exciting than what I was expecting No multimodale generations and a tech report carefully emptied of any useful info on the model/training/compute I guess we’re getting spoiled in today’s AI world
30
32
523
@Thom_Wolf
Thomas Wolf
5 years
The new blog post of @m__dehghani is a very nice and visual introduction to Universal Transformers & their motivations: … I like a lot the reformulation of Graves' Adaptive Computation Time as a dynamic recurrence in depth. Feels like a general idea.
0
146
520
@Thom_Wolf
Thomas Wolf
5 years
A question I get from time to time is how to convert a pretrained TensorFlow model in PyTorch easily and reliably. We're starting to be quite familiar with the process so I've written a short blog post summarizing our workflow and some lessons learned 👇
8
123
520
@Thom_Wolf
Thomas Wolf
5 years
New release of Transformers repo is shaping up & I'm very excited! Gifts for all: -SOTA Lovers: new XLNet & XLM archi + 6 new Bert/GPT trained chkpt -Research Lovers: unified model API, attention/hidden-state outputs to swap/study models -Speed Lovers: Torchscript & head pruning!
Tweet media one
4
114
517
@Thom_Wolf
Thomas Wolf
5 months
So many AI founders I meet have ideas that are frustratingly small. like the web just got invented and people are like “web consultancy gonna be the biggest thing ever” Go build something bold!
47
63
512
@Thom_Wolf
Thomas Wolf
4 years
The most enjoyable papers usually fall into two buckets: - the summer internship crazy projects => short ambitious open-ended - the slow & carefully crafted 1-year long projects => read like a novel w. highs, lows and teachings None of these are triggered by conference deadlines
4
47
505
@Thom_Wolf
Thomas Wolf
4 years
This is deeply wrong Predicting students future grades from school history + past grades *and* making automatic decisions based on them is the exact example of a system that should *not* be deployed And the fact that it already determined the fate of 170k students is horrifying
@ExplainableNL
explAInable.NL
4 years
International Baccalaureate program uses prediction algorithm to replace high-stakes exam. So much wrong with this; not just black box, but worse: deep misunderstandings about what you can expect and allow from ML by decision makers (incl at colleges).
5
27
104
18
134
503
@Thom_Wolf
Thomas Wolf
29 days
Little known OSS gem: the Open-source Cookbook A collection of notebooks for building practical AI applications using open-source tools and models: Doc: Currently contains 16 notebooks in English (with some in Chinese as well):…
Tweet media one
1
97
497
@Thom_Wolf
Thomas Wolf
5 years
I needed a good GAN to tweak for a CV+NLP project so here is a *pretrained* version of BigGAN in PyTorch Sweet stuff: -AFAIK code is not public yet so here is an op-for-op implem to read/tweak -Checkpoints 2x smaller (no dead vars) -Print images in terminal (icing on the cake)👇
Tweet media one
10
104
486
@Thom_Wolf
Thomas Wolf
2 years
I really miss the days I was creating the transformers library and then creating the datasets library. My resolution for 2022: free (a lot of) time to code again. Coding is really the fun part of our work
17
15
484
@Thom_Wolf
Thomas Wolf
6 years
I wrote a post on how you can make your Python NLP module 50-100 times faster! Bonus: a Jupyter notebook with examples processing over 80 millions words per sec… Spoiler: use spaCy's internals and a bit of Cython magic Hat tips @honnibal @_inesmontani
6
150
483
@Thom_Wolf
Thomas Wolf
5 months
As a non-CS person, what an honor to sit among the authors of these 4 awarded papers at NeurIPS 2023 (out of 13k papers submitted 🤯) I was only an enabler, all props should go to the amazing @Muennighoff (starting soon grad school...) as well as @srush_nlp @boazbaraktcs
Tweet media one
12
33
470
@Thom_Wolf
Thomas Wolf
1 year
This is crazy! I still remember when I started coding Transformers with Victor and Tim one cold night of October 2018 in Bruxelles after attending the EMNLP conference and its social event at the Royal Museums of Fine Arts And today, 5 years after, Hugging Face Transformers is…
Tweet media one
14
41
465
@Thom_Wolf
Thomas Wolf
2 years
“Move slow and build things”
7
66
469
@Thom_Wolf
Thomas Wolf
5 years
We've spent a few evenings last week building an interactive demo called *Write with Transformer* It lets you interact in a very intimate way with GPT-2, call, control, question the model... and I just can't stop playing with it! You can try it at
@julien_c
Julien Chaumond
5 years
At NAACL last week we built a new side project, Write With Transformer. It lets you trigger GPT-2 completions multiple times, in a Google Doc-like interface. 🦄 It's like having a unicorn friend that completes your thoughts 🦄 cc @gdb @AlecRad Try it:
16
161
474
13
130
464
@Thom_Wolf
Thomas Wolf
5 years
I've added FP16 training to our PyTorch BERT repo to easily fine-tune BERT-large on GPU. The repo has become a showcase of all the tools you can use to train huge NNs 🙂 Got >91 F1 on SQuAD training BERT-large a few hours on 4-GPUs. Should take less than a day on 1-(recent)-GPU
Tweet media one
8
115
457
@Thom_Wolf
Thomas Wolf
2 years
there is a scary possibility that we may solve all the benchmarks we come up for AI... without understanding anything fundamentally deep about what intelligence is about a bummer for those like me who are see AI as a fantastic way to unlock deeper insights on human intelligence
35
44
454
@Thom_Wolf
Thomas Wolf
4 years
Quarantine had me cancel many side projects & go back to the roots of what I enjoy doing: science and building software that spark joy ...and back to the roots of when I used to do it: these precious first hours of the night, when everybody‘s asleep & the night is still young
Tweet media one
8
17
441
@Thom_Wolf
Thomas Wolf
2 years
Awesome News! Due to the popularity of our book "NLP with Transformers", @OReillyMedia has decided to print it in **full colors** from now on in the revised edition I've just got first printed copies and the results is 🤩 More info at
Tweet media one
17
46
442
@Thom_Wolf
Thomas Wolf
2 years
Building a library with a strong C++ (or Rust) backend coupled with a carefully designed Python front is such a magically powerful combination Feels like creating pure super-powers
16
28
438
@Thom_Wolf
Thomas Wolf
1 year
realized yesterday that if a closed-source AI company had invented flash attention nobody would know about it – and this makes me sad for the current state of AI knowledge sharing many cool AI algorithmic improvements are probably already being kept behind closed doors
Tweet media one
5
62
428
@Thom_Wolf
Thomas Wolf
5 years
BERT Rediscovers the Classical NLP Pipeline by I. Tenney, D. Das & E. Pavlic is 4 pages of great insights Such a constant source of fascinating papers from Ellie Pavlick & her collaborators! Here's BERT correcting his prediction along the model depth🤯
Tweet media one
1
91
428
@Thom_Wolf
Thomas Wolf
11 months
Nobody's been talking about it but it's rather *mind-blowing* imo that the open-source Flacon 40B model is topping LLaMa 65B on leaderboards and many evals while having required not even half the compute of LLaMa to train from scratch 🤯 Quick back of the envelop calculations: -…
9
64
429
@Thom_Wolf
Thomas Wolf
4 months
Some predictions for 2024 – keeping only the more controversial ones. You certainly saw the non-controversial ones (multimodality, etc) already 1. At least 10 new unicorn companies building SOTA open foundation models in 2024 Stars are so aligned: - a smart, small and dedicated…
19
77
411
@Thom_Wolf
Thomas Wolf
4 years
I liked the LSH attention in the reformer Sparse, efficient, simple Dynamic sparse attn is fascinating & mostly dealt by – softmax+topK: Recurrent Independent Mech. (MILA) Product-Key Mem (FB) – 𝛂-entmax: Adap. Sparse Transformer (DeepSPIN) links👇[1/3]
Tweet media one
4
97
413
@Thom_Wolf
Thomas Wolf
4 years
If you're using Transformers from source, we've rolled out 2 nice beta features (TBR in January) 💥Ultra-fast Bert/GPT2 tokenizers (up to 80x faster) 🦄Easy/versatile sequence generation for generative models: top-k/nucleus/temperature sampling, penalized/greedy, beam search...
Tweet media one
9
85
405
@Thom_Wolf
Thomas Wolf
4 years
People asking me to teach classes clearly give zero fuck to the imposter syndrome of a former physics PhD turned lawyer before joining AI Anyway I'll co-teach NLPL Winter School w Yoav Goldberg talking transfer learning, its limits & where the field might head Will share slides
11
19
408
@Thom_Wolf
Thomas Wolf
3 years
A few years ago I was mostly interested in models, creating 🤗transformers, adding BERT, GPT, T5… Over time I’ve seen my interests shift to data (sharing, evaluation, processing) leading to 🤗datasets And I see many people around me follow a similar path We are slowly maturing
@math_rachel
Rachel Thomas
3 years
An overall lack of recognition for the invisible, arduous, & taken-for-granted data work in AI leads to poor data practices, resulting in data cascades (negative, downstream events)... “Everyone wants to do the model work, not the data work” 1/
Tweet media one
18
229
778
10
61
404
@Thom_Wolf
Thomas Wolf
2 years
large language models are slightly boring
29
26
393
@Thom_Wolf
Thomas Wolf
2 years
Interesting podcast investigating what happened in 1984 that made so many women give up on computer science
Tweet media one
8
89
384
@Thom_Wolf
Thomas Wolf
5 years
I've been playing a bit w. the coming TensorFlow 2.0 & I was pleasantly surprised! In a couple of hours, I could convert our PyTorch Bert to TF2.0 and a few hours later, load pretrained weights w. a clean interface. Still a few rough edges but a huge step forward in terms of UX
Tweet media one
8
71
383
@Thom_Wolf
Thomas Wolf
5 years
there, you have them on one slide
Tweet media one
2
92
383
@Thom_Wolf
Thomas Wolf
3 years
ML researchers: we don’t really need to learn about biology or developmental psychology because plane don’t fly like birds anyway Also ML researchers: “this new algorithm take inspiration form [our layman view of] how XXX happens in humans” 🙃
12
35
380
@Thom_Wolf
Thomas Wolf
3 years
Starting a big project takes a lot more time & energy than people expect. I've been pushing mostly one project per year: -2019 🤗Transformers -2020 🤗Datasets -2021 @BigscienceW I used to find it frustratingly slow – now I accept it Give your projects the time they need to grow
8
25
377
@Thom_Wolf
Thomas Wolf
4 years
Our amazing @julien_c has done an impressive job revamping the frontpage last week. You can now browse all the open-access models (1500+), datasets (120+), and NLP metrics (10+) of our libraries with a nice interface & a cool quick search! Give it a try👇
@mrm8488
Manu Romero
4 years
Love the new look and feel of @huggingface They added all in one search (models, datasets, metrics) and everything is quite intuitive.
0
5
32
5
86
378
@Thom_Wolf
Thomas Wolf
2 years
In 2021, we've seen an explosion of grounded Langage Models: from "image/video+text" to more embodied models in "simulation+text" But, when tested on text benchmarks, these grounded models really struggle to improve over pure text-LM e.g T5/GPT3 Why? >>
10
79
374
@Thom_Wolf
Thomas Wolf
5 years
. @kaushal316 wrote a nice step-by-step tutorial on how to finetune BERT on a classification task ( @kaggle Toxic Challenge) Covers everything from data processing to model modification Results are top-10% w. a very simple 30-lines-of-code single model 👇
1
106
367
@Thom_Wolf
Thomas Wolf
2 years
We've open-sourced a new experiment: the alpha version of a library called "🎢 simulate" It's a python lib for building a diverse set of simulation environments for embodied and synthetic data research by reusing/tweaking/sharing assets and scenes
6
74
368
@Thom_Wolf
Thomas Wolf
6 years
NeuralCoref v3.0 is out✨! - up to 100x faster than v2.0 (thanks Cython) 🚀 - Integrated in spaCy models and pipeline 🤗 + 💫 = 💙 - Based on the fast neural net model by @stanfordnlp , trained in @PyTorch Check it out: Cc @spacy_io
Tweet media one
5
128
365
@Thom_Wolf
Thomas Wolf
8 months
Crazy how 34 billions parameters models seemed huge and unmanageable outside of a data center just maybe 1.5 years ago. Now it’s laptop stuff
@ggerganov
Georgi Gerganov
8 months
Full F16 precision 34B Code Llama at >20 t/s on M2 Ultra
40
270
2K
8
39
366
@Thom_Wolf
Thomas Wolf
2 years
ml twitter is dead publishing our internship description videos on tiktok
7
22
353
@Thom_Wolf
Thomas Wolf
5 years
A fascinating article by @lena_voita if you're interested in understanding what makes MLM models like BERT differents from LM models like GPT/GPT-2 (auto-regressive) and MT models. And conveyed in such a beautiful blog post, a master-piece of knowledge sharing!
@lena_voita
Lena Voita
5 years
Evolution of Representations in the Transformer: blog post on our @emnlp2019 paper is out! blog post: paper: @lena_voita , @RicoSennrich , @iatitov
Tweet media one
5
156
616
3
110
362
@Thom_Wolf
Thomas Wolf
5 years
🥳 The Transformers library is turning 1⃣ today 🎂 What a ride! - 16k+ stars on Github - 160+ contributors and the most amazing features are still to come, I'm so excited about what's next Here is Megan's humbling shout out to it in her keynote at TensorFlow World yesterday 😍
Tweet media one
10
55
350
@Thom_Wolf
Thomas Wolf
6 months
one of the most striking event of 2023: the rise of the GPU-poor models you can do a lot (and much cheaper) with well trained smaller models
Tweet media one
17
51
349
@Thom_Wolf
Thomas Wolf
3 years
Reading 50 years old AI books I always find cute how people back then thought they were close to figuring out AI comparing LISP/PLANNER programs to brain/consciousness and then I look at us now & can already feel the tender looks of 2040's AI researchers on our current DL models
4
40
348