Matthew Carrigan @carrigmat Twitter profile

Pinned Tweet

Matthew Carrigan

7 months

Chat templates are now live in @huggingface Transformers 4.34! It's time to put an end to a massive source of subtle, performance-destroying bugs in chat models.

Chat Templates: An End to the Silent Performance Killer

huggingface.co

1

6

18

Last Seen Profiles

@seakatzu

@noviachao

@zatu_smhr

@cole_decker19

@XAedroHORJoMXId

@idaesoficial

@w9flMXzUt5P5xhp

@ObinnaDesmondd

@Duel_Labs

@identitybird

@haepibaileoseu

@carolaam

@p3nc4ripisangku

@greatestptl

@danwallisx

@lgabyt

@ii_toshi_obasan

@TaraJanice39482

@FrankMSinclair

@MichaelKWUHall

@kagura_hinako

@Gofblug

@javiervaz8

@nomadito

@JasonGiordano

@iamofficialobj

@LindesmithF

@ZjRony

@an8952070541166

@LatinFire_NSFW

@BusesLaUnion

@WhiteheadA66565

@tictac

@hinanoe64139508

@MahiroYukishiro

@BoysLoveBuzz

Matthew Carrigan

@carrigmat

2 years

Deep learning pro tip: When submitting a paper for blind review, claim that you used JAX + Haiku. Unable to see the author byline, the reviewers will assume you're at DeepMind and be intimidated into automatically accepting you, possibly even for a keynote presentation.

11

101

1K

Matthew Carrigan

@carrigmat

1 month

Alright, strap in. Support for Command-R+ was merged into llama.cpp exactly 4 hours ago. We're going to start talking to a GPT-4 level model on local hardware without a GPU. If you have 64GB of RAM, feel free to follow along 🧵

24

123

1K

Matthew Carrigan

@carrigmat

2 years

Real programmers debug by putting something like print("aa aaaa AAAAA") inside a hot loop.

38

66

959

Matthew Carrigan

@carrigmat

7 months

Played with Zephyr a bit and it's... just open-source ChatGPT in 7B parameters. You can run this stuff locally on your desktop and you don't even need to quantize. Actually outrageous how good the quality is:

HuggingFaceH4/zephyr-7b-alpha · Hugging Face

huggingface.co

19

104

773

Matthew Carrigan

@carrigmat

2 years

Saw this in the @huggingface office, went back to working on my laptop, and then four hours later shouted "OH I GET IT, BECAUSE IT'S A FORK OF CAFFE"

11

53

709

Matthew Carrigan

@carrigmat

1 year

Hugging Face infiltration team is in - it was surprisingly easy when everyone was away at NeurIPS.

10

78

691

Matthew Carrigan

@carrigmat

2 years

The fun thing about being a TensorFlow engineer at a mostly-PyTorch company is that people panic when they encounter even simple TF code and start like ringing a hand bell or something. "Tensorflow boy! TENSORFLOW BOY, MY CODE HAS BUGS! RECTIFY THIS AT ONCE!"

9

40

564

Matthew Carrigan

@carrigmat

1 year

this is @huggingface , we see you out there retweeting the latest state of the art miracle of modern technology and then going home and using bert-base-uncased for the fifth year in a row

15

43

471

Matthew Carrigan

@carrigmat

5 months

I got 12 tokens/second out of Mixtral-8x7B with NO GPU - more than fast enough for live chat! You can too! Hardware: Supermicro MBD-H13SSL-N AMD EPYC 9124 12 x 16GB 4800mhz DDR5 ECC RDIMM Software: llama.cpp + Mixtral Q8 (on @huggingface ) For why this works, thread below 🧵

18

48

427

Matthew Carrigan

@carrigmat

1 month

the engineers at @huggingface haven't slept in days you all have to stop

Mistral AI

@MistralAI

1 month

magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%%3A1337%2Fannounce&tr=http%3A%2F%%3A1337%2Fannounce

279

846

6K

5

17

401

Matthew Carrigan

@carrigmat

3 months

Hey! Are you using chat models on @huggingface like: - LLaMA - Mi(s/x)tral - Falcon - Zephyr - Phi Do you want massive performance gains? Then you should be using chat templates! The guide is here: (Thanks to Daniel Furman for the table)

9

51

358

Matthew Carrigan

@carrigmat

2 years

Over the last year we've put a lot of effort into refreshing and overhauling everything TensorFlow-related at Hugging Face. We've finally put together a beginner-friendly blog post talking about the library, its API, and how to use it all as a TF engineer!

Hugging Face's TensorFlow Philosophy

huggingface.co

8

62

327

Matthew Carrigan

@carrigmat

1 year

Don't be afraid of TPUs! At @huggingface we just added a Colab TPU tutorial, so you can click through and start training language and image models on TensorFlow + TPU in seconds. If you've never tried before, now's the time!

tpu_training-tf.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

8

65

279

Matthew Carrigan

@carrigmat

2 years

Hey all! @huggingface needs some help from community contributors to make our codebase a lot simpler and more maintainable. There are two big changes we want to make to almost every model class, and even if they're simple in isolation, it's a lot of work across the codebase! 🧵

3

66

274

Matthew Carrigan

@carrigmat

2 years

There's a fully functional protein design space on HuggingFace now, which would have felt like outrageous science fiction even 18 months ago. I'm going to try to explain what the incredible potential here is. 🧵

ProteinMPNN - a Hugging Face Space by simonduerr

huggingface.co

5

63

267

Matthew Carrigan

@carrigmat

2 years

My primary motivation for working at @huggingface is to stop that goddamn Michael Bay movie series being the first result when you google 'transformers'.

6

5

215

Matthew Carrigan

@carrigmat

1 year

roon is a psyop to convince everyone that @openai has an insurmountable lead through its many mysterious and magical advantages instead of a 6 to 12 month head start doing basically the same thing as the rest of the field

roon

@tszzl

1 year

does kind of seem like open source models are pure copium rn first of all the only models that reach an even slightly interesting level of capability are effectively stolen from meta and cannot be deployed commercially

45

15

642

8

2

205

Matthew Carrigan

@carrigmat

1 year

I wrote a blogpost for @huggingface about deep learning with proteins for people who know about one of those things and are curious about the other! (People who understand neither or both are welcome too)

Deep Learning with Proteins

huggingface.co

7

44

212

Matthew Carrigan

@carrigmat

3 years

Hey everyone! I've just started at @huggingface , where I'll be taking the blame for everything Tensorflow-related. If you use 🤗Transformers through TF, let me know how you find it! If you tried but encountered difficulties, let me know that too!

18

14

209

Matthew Carrigan

@carrigmat

2 years

Apropos of nothing in particular, the TensorFlow team at @huggingface would like to remind you all that all TF models on the Hub are stored as .h5 weights files, which are not unpickled and do not permit arbitrary code execution. You can come back to our side any time you want.

4

12

204

Matthew Carrigan

@carrigmat

2 years

This is huge - we've got a state-of-the-art protein folding model with a protein language model base to replace the multiple sequence alignment (MSA) step, no database needed and orders of magnitude faster speed! On @huggingface in today's release - example notebooks incoming!

AI at Meta

@AIatMeta

2 years

Announcing the ESM Metagenomic Atlas — the first comprehensive view of the ‘dark matter’ of the protein universe. Made possible by ESMFold, a new breakthrough model for protein folding from Meta AI. More in our new blog ➡️ 1/3

24

265

1K

2

42

207

Matthew Carrigan

@carrigmat

2 years

2022 fanfic: - The PyTorch -> JAX migration continues - Keras becomes a floating frontend again - JAX is the first new framework it supports - As a result of the above, everyone else at HF has to use Keras - I start wearing a crown to the office

Andrej Karpathy

@karpathy

2 years

@giffmana @PreetumNakkiran @francoisfleuret PyTorch is succumbing to entropy at an alarming rate and I’m not sure has internalized what made everyone switch to it from tensorflow

7

12

138

7

19

201

Matthew Carrigan

@carrigmat

1 month

This is legitimately historic for AI: We now have an open model that outperforms the original GPT-4, both 0314 and 0613. A phenomenal achievement from @cohere

lmsys.org

@lmsysorg

1 month

Exciting news - the latest Arena result are out! @cohere 's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere 's incredible work & valuable contribution…

43

314

1K

3

35

178

Matthew Carrigan

@carrigmat

3 years

Our @TensorFlow examples push for the 🤗Transformers library is now finished - check it out at ! Everything has now been rewritten as more native, idiomatic TF code - but what does that mean for users? A short thread:

3

35

171

Matthew Carrigan

@carrigmat

4 months

each time he invented your discovery before you did, he does one pushup

Jürgen Schmidhuber

@SchmidhuberAI

4 months

The GOAT of tennis @DjokerNole said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!

166

149

2K

2

17

170

Matthew Carrigan

@carrigmat

10 months

Actually losing my mind over this bit of the Keras Core announcement. There's loads of peaceful, content PyTorch engineers at @huggingface and I'm about to absolutely blast through the wall like the Kool-Aid man and obliterate their comfortable, familiar workflows.

4

11

165

Matthew Carrigan

@carrigmat

2 years

With help from @fchollet and my @huggingface colleagues, we just pushed a new feature to Keras that will be helpful for NLP in particular: The ability for predict() to return RaggedTensor. Why is that useful? 🧵

2

13

160

Matthew Carrigan

@carrigmat

1 year

5

10

156

Matthew Carrigan

@carrigmat

2 years

HuggingFace protein notebooks are up - tell your biologist friends! Classification tasks with proteins, just like BERT: Fold proteins in Colab or your local GPU and export PDB files: TensorFlow version coming soon too!

2

30

145

Matthew Carrigan

@carrigmat

18 days

In retrospect, "We've just released a 45 terabyte dataset that solves all your language model training needs, so everyone should download it" was a mistake for the @huggingface infrastructure team

7

12

137

Matthew Carrigan

@carrigmat

6 months

things are currently chaotic enough that if you tweet "OpenAI is nothing without its people" you can probably get hired by sama's new MSFT team before anyone realizes the real challenge is keeping people from noticing long enough to make it to the vesting cliff tho

6

9

120

Matthew Carrigan

@carrigmat

1 year

Keras notebooks for protein tasks with @huggingface are up! The same approach that made large language models so successful for text can be applied equally well to proteins, with huge potential for biotech applications. Check it out at the link below!

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

3

37

115

Matthew Carrigan

@carrigmat

3 years

Hugging Face isn't just an NLP shop! Transformer models are used for everything from RL to protein folding these days, so if you're an ML+CV engineer and you'd like to maintain the reference open source model repository for your field, get in touch!

1

13

102

Matthew Carrigan

@carrigmat

2 years

I brought this on myself.

Matthew Carrigan

@carrigmat

2 years

The fun thing about being a TensorFlow engineer at a mostly-PyTorch company is that people panic when they encounter even simple TF code and start like ringing a hand bell or something. "Tensorflow boy! TENSORFLOW BOY, MY CODE HAS BUGS! RECTIFY THIS AT ONCE!"

9

40

564

1

2

103

Matthew Carrigan

@carrigmat

17 days

"CPU inference for LLMs is too slow!" yeah well check out this LLM with 480B parameters of which 17B are active: Never has a model been more perfectly suited for a DDR5 Epyc server

Snowflake/snowflake-arctic-instruct · Hugging Face

huggingface.co

0

19

103

Matthew Carrigan

@carrigmat

8 months

Hey all! We're adding a new feature called Chat Templates to @huggingface transformers in the upcoming version. If you're using chat models, we think you'll want to know about this one. If you know people working with them, please share with them too! 🧵

Templates for Chat Models

huggingface.co

1

26

101

Matthew Carrigan

@carrigmat

3 years

Training or fine-tuning a state-of-the-art 🤗Transformer model with Keras is now extraordinarily quick and easy. I made a minimal gist here - all you need to do is pip install transformers and tensorflow and swap in your own texts and labels:

keras_bert.py

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

0

18

91

Matthew Carrigan

@carrigmat

2 years

We're exploring end-to-end NLP TensorFlow models in 🤗Transformers! We've got a quick gist here if you want to get started, or you can read on for more. 🧵

end_to_end_bert.py

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

1

16

89

Matthew Carrigan

@carrigmat

2 months

Gemini drawing some ahistorical images of non-white people was front-page news in the New York Post and Elon tweeted about it for days. This is like a hundred times more dangerous and we'll never hear about it again

Valentin Hofmann

@vjhofmann

2 months

Second, when LLMs are asked to pass judgment on defendants who committed murder, they choose the death penalty more often when the defendants speak African American English rather than Standardized American English, again without being overtly told that they are African American.

5

26

101

2

29

90

Matthew Carrigan

@carrigmat

2 months

The original core of the TF/XLA generation in @huggingface transformers was written on a transatlantic flight, and tested via Google Colab + in-flight wifi. It was ~100X faster than the previous implementation, which was written on the ground.

the Rich

@Duderichy

2 months

nobody has ever done deep work on an airplane

253

83

6K

3

10

85

Matthew Carrigan

@carrigmat

9 months

A cool fact apropos of nothing: Asterix's dog was called "Idéfix" in the original French, a pun on "idée fixe", meaning a fixed idea or obsession. In the English translation, they named him "Dogmatix". This is the kind of thing translators should get medals for.

4

7

82

Matthew Carrigan

@carrigmat

1 year

goddamnit change the name of your company

1

13

83

Matthew Carrigan

@carrigmat

1 year

4

7

81

Matthew Carrigan

@carrigmat

10 months

We're already planning possible Keras Core integrations at @huggingface - we'd love to have a shared codebase so any @tensorflow model is automatically JAX-compatible and vice-versa. Big potential improvements to performance and the range of models supported for both frameworks!

3

9

76

Matthew Carrigan

@carrigmat

6 months

Big genomics news today at @huggingface : We're delighted to welcome HyenaDNA to the Hub! Models: Paper: Thanks to @HazyResearch @exnx @MichaelPoli6 @marjanfaizi for the model, and for your work on the port! More info in 🧵

2

13

72

Matthew Carrigan

@carrigmat

2 years

I stuck a Tensorflow sticker on the other coffee machine by way of revenge and was rewarded by hearing a French-accented "NONNNN" emanating from the kitchen area every half-hour or so for the rest of the day.

2

1

67

Matthew Carrigan

@carrigmat

1 month

For the last year, open models would benchmark themselves against ChatGPT, but this is the first one I've seen with the confidence to benchmark against GPT4-turbo. It really feels like a new era for open LLMs, and the weights are already on @huggingface !

cohere

@cohere

1 month

Today, we’re introducing Command R+: a state-of-the-art RAG-optimized LLM designed to tackle enterprise-grade workloads and speak the languages of global business. Our R-series model family is now available on Microsoft Azure, and coming soon to additional cloud providers.

26

197

955

3

11

64

Matthew Carrigan

@carrigmat

10 months

STOP THE PRESSES MULTI-BACKEND KERAS IS BACK

François Chollet

@fchollet

10 months

We're launching Keras Core, a new library that brings the Keras API to JAX and PyTorch in addition to TensorFlow. It enables you to write cross-framework deep learning components and to benefit from the best that each framework has to offer. Read more:

127

828

4K

1

3

61

Matthew Carrigan

@carrigmat

9 months

The ESM models (including ESMFold!) have all been ported to @huggingface and will remain there even though the ESM team has been disbanded. We have example notebooks (look under 'Biological Sequences') if you've never tried it before!

🤗 Transformers Notebooks

huggingface.co

Sergey Ovchinnikov 🇺🇦

@sokrypton

9 months

RIP ESMFold server 😢 (anyone has extra storage to archive all the predicted structures and pretrained models before @MetaAI pulls the plug)?

9

30

116

0

12

61

Matthew Carrigan

@carrigmat

2 years

Anyway if you're a TF engineer please apply so we can eventually rise up and become the new upper class, perhaps with bells of our own:

1

5

59

Matthew Carrigan

@carrigmat

2 years

Crypto is collapsing and Transformers has overtaken Bitcoin on GitHub. It's a good day. My only fear is that the grifters will switch from crypto scams to AI scams now, because we had a really great run when they were all distracted with Ponzi scheming each other over there.

3

4

57

Matthew Carrigan

@carrigmat

2 years

There should be a competition every year in the field where everyone has to train a model as good as the original BERT with as little time/hardware as possible. I want to see >80% on GLUE from a toaster by 2030.

3

4

55

Matthew Carrigan

@carrigmat

1 month

Mixtral 8x22B repo is up:

mistral-community/Mixtral-8x22B-v0.1 · Hugging Face

huggingface.co

1

20

56

Matthew Carrigan

@carrigmat

1 year

Tip when using @huggingface : The tokenizers and data collators support a "pad_to_multiple_of" argument, which can be super helpful for getting efficient input shapes. It also greatly reduces the number of possible input shapes, so XLA works a lot better too!

Andrej Karpathy

@karpathy

1 year

The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.

86

367

5K

0

4

55

Matthew Carrigan

@carrigmat

2 years

Giving a Transformers tutorial at @europython in July where, with an entire ocean to protect me from my coworkers, I cannot be prevented from teaching impressionable young minds to exclusively use @TensorFlow .

3

4

56

Matthew Carrigan

@carrigmat

1 month

I really like the vibe at @huggingface when a big open weights model drops. Everyone scuttles around, clicking their mandibles at each other. Alertness pheromones all over the place. Small teams of drones begin, spontaneously, to secrete wax in the shape of a draft PR.

1

9

56

Matthew Carrigan

@carrigmat

2 years

A quick thread about the technical details of generating text from language models with XLA and TF, because it's interesting and because we just launched it in the most recent release of Transformers! () 🧵

Faster Text Generation with TensorFlow and XLA

huggingface.co

1

7

55

Matthew Carrigan

@carrigmat

2 years

(Also for the record, I'm a huge fan of all of my coworkers. This tweet is just revenge for them asking questions like "So, do you still use TensorFlow when no-one's looking?")

1

53

Matthew Carrigan

@carrigmat

3 years

T0: Great in a crisis.

2

54

Matthew Carrigan

@carrigmat

2 years

Github Copilot refuses to copy the dictionary key "trans_scale_factor" to an attribute but will do it if you call it "trains" or "trays" or... just about anything else, really.

1

2

52

Matthew Carrigan

@carrigmat

2 years

If you've never contributed to 🤗Transformers before, that's okay! There's a guide linked in each of those issues, and you can also come ask questions on the great-code-cleanup event channel on our Discord! Come build the state-of-the-art in AI with us

Join the Hugging Face Discord Server!

We're working to democratize good machine learning 🤗Verify to link your Hub and Discord accounts! | 78155 members

discord.com

1

6

49

Matthew Carrigan

@carrigmat

2 years

@Molem7b5 - Keras is actually really convenient for most tasks - Performance (with XLA) is excellent - @fchollet has way better tweets PyTorch is cool too, but I think it has a much steeper learning curve (You forgot torch.backends.cudnn.benchmark? Training speed drops by half!)

1

2

48

Matthew Carrigan

@carrigmat

2 months

@Noahpinion This feels like you're trying to substitute reassuring culture wars for the more uncomfortable question of whether what's happening in Gaza is justifiable or not. "Leftists" being annoying doesn't mean you should reflexively ignore any cause they're associated with!

7

1

46

Matthew Carrigan

@carrigmat

2 years

Also, a Keras pro tip: Keras doesn't have AdamW in the core library, but it doesn't need it. Just skip the built-in L2 regularization, and instead make a WeightDecay constraint and add it to the relevant kernels.

1

2

47

Matthew Carrigan

@carrigmat

2 years

Pair programming with @GuggerSylvain

3

48

Matthew Carrigan

@carrigmat

2 years

Have you ever wanted to port a Transformers model to TensorFlow and dump a giant PR on me at 4pm on a Friday? Sure you have, and now you can with the help of an amazing guide from my colleague @joao_gante !

How to convert a 🤗 Transformers model to TensorFlow?

huggingface.co

0

13

47

Matthew Carrigan

@carrigmat

2 months

please do not call it that i have stock options in that face

bowser

@browserdotsys

2 months

hilarious that unicode had to introduce a new emoji to represent an actual hug (🫂) because the existing one is universally depicted as gropey mcgropeface (🤗)

15

7

283

0

1

46

Matthew Carrigan

@carrigmat

1 year

One downside of actually working on open-source things is the mystique is gone. People will believe all kinds of adderall-fuelled magic go on behind the curtain in @openai , but you can just look at my commit history and see me get really confused about embeddings for three hours

Alexander Doria

@Dorialexander

1 year

After 15 minutes of hard work and wild guesses, I present you my masterpiece: the political compass of #AI as of March 1st 2023 (susceptible to quick update…)

28

96

679

1

45

Matthew Carrigan

@carrigmat

2 years

The first change is this: A lot of our models are missing type hints, and we want to add them! This will enable new features, and let us ensure correctness across our increasingly-huge codebase. If you're interested, check out the issue here:

Add missing type hints · Issue #16059 · huggingface/transformers

This issue is part of our Great Code Cleanup 2022. If you're interested in helping out, take a look at this thread, or come join us on Discord and talk with other contributors! 🚀 Add missing ty...

github.com

1

45

Matthew Carrigan

@carrigmat

1 year

TensorFlow tip: If you're getting NaN values in training, just run tf.debugging.enable_check_numerics() before you train. Every operation will be checked and TF will immediately error out the moment the first NaN appears, so you can see where it crept in.

2

5

45

Matthew Carrigan

@carrigmat

5 months

Quick early takes about the @MistralAI release: - It's just a state dict, can't run it until the code is also released - State dict suggests a Mixture of Experts (MoE) model with 2 experts being run in each forward pass (out of 8 total) - Each expert is Mistral-7B architecture

3

5

45

Matthew Carrigan

@carrigmat

11 months

The "Models citing this paper" box on @huggingface Papers is legitimately great. Instant connections from arxiv to the model, sample code, everything. (Spotted while I was looking at )

Paper page - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

huggingface.co

0

10

44

Matthew Carrigan

@carrigmat

7 months

Extremely important info for people coding in @TensorFlow right now! We're preparing for Keras 3 compatibility in @huggingface transformers already.

François Chollet

@fchollet

7 months

In TensorFlow, tf.keras will start resolving to Keras 3 in TF v2.16, to be released Q1 2024. It's already the case today in tf-nightly.

4

5

44

2

13

44

Matthew Carrigan

@carrigmat

1 year

Interested in big training runs but scared of TPU? Don't be! I wrote a demo with @RisingSayak showing scalable TPU training with @huggingface models and TensorFlow. GPU shortages can't hurt you now!

Training a language model with 🤗 Transformers using TensorFlow and TPUs

huggingface.co

0

6

44

Matthew Carrigan

@carrigmat

1 year

I want one of those Boston Dynamics dogs with a microphone and an internet connection, so it can follow me around and I can just ask it random questions which it forwards to ChatGPT and then reads the answer to me in a Scooby Doo voice

3

0

44

Matthew Carrigan

@carrigmat

2 years

@a_e_roberts "covid-19 as a human"

7

5

43

Matthew Carrigan

@carrigmat

1 month

First up, a note about hardware: Text generation is limited by memory bandwidth. This will run on any machine with 64GB or more, but if you want speed I recommend DDR5, ideally on an 8 or even 12-channel motherboard, like Xeon/Epyc/Threadripper Pro/Apple silicon.

1

0

42

Matthew Carrigan

@carrigmat

2 years

Echoing this - if you're fine-tuning on a downstream, English-language task, swap out BERT or RoBERTa and try .from_pretrained("microsoft/deberta-v3-large"). I've seen the error rate drop by over a third on some benchmarks. Works on both TF and PyTorch!

Lewis Tunstall

@_lewtun

2 years

Pro tip: there are better models than BERT these days 🙃 - Deberta is great for downstream performance 📊: - MiniLM is great for training speed (and gets similar performance to BERT) 🏃:

4

13

155

1

3

41

Matthew Carrigan

@carrigmat

1 year

There's nothing quite as satisfying as opening a PR at 6:30pm on a Friday, tagging three of your colleagues to urgently review it and then immediately turning off your computer and walking out the door

4

3

42

Matthew Carrigan

@carrigmat

3 years

Today's @TensorFlow example at @huggingface is translation! A number of pre-trained translation models as well as paired datasets for training exist on our hub, or you can supply your own text pairs and build a never-before-seen translation model!

0

13

40

Matthew Carrigan

@carrigmat

2 years

We have outputs from the @huggingface ESMFold demo! This will be moved to its official home in @huggingface 's example notebooks soon, but for now you can access it here:

protein_folding.ipynb

Colaboratory notebook

colab.research.google.com

2

5

39

Matthew Carrigan

@carrigmat

3 years

I've only been at the company for three months and I still am not at all used to famous people having the faintest idea who I am, this is great.

François Chollet

@fchollet

3 years

Great job @huggingface team, in particular @carrigmat

0

4

31

1

0

40

Matthew Carrigan

@carrigmat

2 years

Fun fact: Thanks to @narsilou , if you hang out in the @huggingface Discord, you can request any audio model from the Hub to hang out in voice chat with you and live-transcribe. Not limited to English!

1

9

38

Matthew Carrigan

@carrigmat

9 months

We now have full support for Nucleotide Transformer from @instadeepai at @huggingface , so here's a quick thread about DNA, protein, and how to choose between DNA or protein models.

InstaDeep

@instadeepai

10 months

Our Nucleotide Transformers models are now available on @huggingface ! 🤗🧬 This includes the 4 model weights, the pre-training, downstream tasks datasets, and 2 notebooks for task finetuning. 📚 To learn more: 🤗 Check them out!

1

29

77

1

10

38

Matthew Carrigan

@carrigmat

1 month

Next, we're going to get the compressed Command-R+ model and weights in GGUF format. That's here: Download the biggest size you can fit in RAM, with maybe 8-16GB of headroom (so at 64GB, try iq3_m or iq3_s, which are ~48GB). Bigger sizes are split.

dranger003/c4ai-command-r-plus-iMat.GGUF at main

huggingface.co

1

39

Matthew Carrigan

@carrigmat

2 months

Available on @huggingface right now, including weights from intermediate checkpoints during pretraining. Congratulations to everyone involved!

togethercomputer/evo-1-131k-base · Hugging Face

huggingface.co

Patrick Hsu

@pdhsu

2 months

Is DNA all you need? In new work, we report Evo, a genomic foundation model that learns across the fundamental languages of biology: DNA, RNA, and proteins. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale.

39

441

2K

0

8

39

Matthew Carrigan

@carrigmat

1 month

Also, note that the model will get stupider at the smaller quantizations. If you try this at iq2 and it gives you a terrible answer, don't blame me! You may need 128GB of RAM to fit the higher-quality Q6 and Q8 quantizations.

1

0

39

Matthew Carrigan

@carrigmat

1 year

@IRHotTakes Look you don't mention that and we don't mention you guys doing a little imperialism in the Philippines as a treat, we had a deal

1

0

35

Matthew Carrigan

@carrigmat

10 months

Beware closed-source foundations - they look great, but can be surprisingly unsound if you want to build on them. When you clone a model from @huggingface it's stable, and you know your prompt will still work 6-12 months from now.

The world's most powerful AI model suddenly got 'lazier' and 'dumber.' A radical redesign of...

Users of OpenAI's GPT-4 are complaining that the AI model is performing worse lately. Industry insiders said a redesign of GPT-4 could be to blame.

www.businessinsider.com

0

4

37

Matthew Carrigan

@carrigmat

3 years

@AndrewYNg I started with ML by doing your Coursera course through Octave, back in 2011. It feels oddly affecting to come full circle and now be working at @huggingface during this partnership. Thank you for the work you put in way back then, it really changed my life!

1

0

38

Matthew Carrigan

@carrigmat

5 months

I have so much affection for the people out there on huggingface doing unholy frankenmerges and layer splices of LLMs. They aren't even publishing research papers most of the time, it's just pure independent mad science

0

4

36

Matthew Carrigan

@carrigmat

3 years

We recently made a small change with big impacts to the TensorFlow code for 🤗 Transformers. In short: You no longer need to manually specify a loss in most cases when training with Keras. Simply pass your labels in the input dictionary, as shown in the example. 🧵

Hugging Face

@huggingface

3 years

💫TensorFlow 💫 Leveraging per-model loss for Keras training is now super simple. Simply compile() with no loss argument! No more headaches about finding the right loss for your ModelForMaskedLanguageModeling! --- On current master: Keras callbacks to push to the hub 🤩

2

9

79

1

4

35

Matthew Carrigan

@carrigmat

2 years

Gonna get a TNSRFLW one that traces the route this one takes and then executes it 15% faster.

Joshua Achiam ⚗️

@jachiam0

2 years

WHOMST

4

8

242

2

36

Matthew Carrigan

@carrigmat

26 days

WizardLM

@WizardLM_AI

26 days

🔥Today we are announcing WizardLM-2, our next generation state-of-the-art LLM. New family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B - demonstrates highly competitive performance compared to leading proprietary LLMs. 📙Release Blog:…

77

269

1K

0

4

35

Matthew Carrigan

@carrigmat

1 year

AI right now is @openai wearing robes and dancing around a cauldron as they perform the ritual to summon their robot god and beget the singularity and @microsoft being like "Sweet, maybe we can use this increase our search market share"

1

6

34

Matthew Carrigan

@carrigmat

3 months

$4.5 billion

Martin Shkreli (e/acc)

@wagieeacc

3 months

billion dollar UI:

65

44

2K

3

2

34

Matthew Carrigan

@carrigmat

5 months

We're now in an era where you can run open-source GPT-3.5-level models locally at full speed and you don't even need a GPU. The future is wild.

1

2

34

Matthew Carrigan

@carrigmat

7 months

Great thread: Transformers have no working memory that doesn't correspond to part of the input, and so they look for redundant parts of the input that they can use for global working memory. Adding true working memory tokens shows really cool results!

TimDarcet @ICLR24🇦🇹🥐🎻

@TimDarcet

7 months

Vision transformers need registers! Or at least, it seems they 𝘸𝘢𝘯𝘵 some… ViTs have artifacts in attention maps. It’s due to the model using these patches as “registers”. Just add new tokens (“[reg]”): - no artifacts - interpretable attention maps 🦖 - improved performances!

43

327

2K

0

6

33

Matthew Carrigan

@carrigmat

3 years

More new-style Tensorflow examples, this time pre-training a language model from an existing model or from scratch! If you've ever wanted to train GPT-2 on your local PC and you have a few months to sit around staring at a progress bar, now's your chance!

1

9

32

Matthew Carrigan

@carrigmat

2 years

Would like to clarify that I'm just standing slightly closer to the camera and not actually twice the height of everyone else.

Suzana Ilić

@suzatweet

2 years

🤗 in Dublin!

1

6

153

1

0

32