Matthew Carrigan Profile Banner
Matthew Carrigan Profile
Matthew Carrigan

@carrigmat

3,290
Followers
352
Following
89
Media
1,516
Statuses

@huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/him

Dublin, Ireland
Joined April 2021
Don't wanna be here? Send us removal request.
Pinned Tweet
@carrigmat
Matthew Carrigan
7 months
Chat templates are now live in @huggingface Transformers 4.34! It's time to put an end to a massive source of subtle, performance-destroying bugs in chat models.
1
6
18
@carrigmat
Matthew Carrigan
2 years
Deep learning pro tip: When submitting a paper for blind review, claim that you used JAX + Haiku. Unable to see the author byline, the reviewers will assume you're at DeepMind and be intimidated into automatically accepting you, possibly even for a keynote presentation.
11
101
1K
@carrigmat
Matthew Carrigan
1 month
Alright, strap in. Support for Command-R+ was merged into llama.cpp exactly 4 hours ago. We're going to start talking to a GPT-4 level model on local hardware without a GPU. If you have 64GB of RAM, feel free to follow along 🧵
24
123
1K
@carrigmat
Matthew Carrigan
2 years
Real programmers debug by putting something like print("aa aaaa AAAAA") inside a hot loop.
38
66
959
@carrigmat
Matthew Carrigan
7 months
Played with Zephyr a bit and it's... just open-source ChatGPT in 7B parameters. You can run this stuff locally on your desktop and you don't even need to quantize. Actually outrageous how good the quality is:
19
104
773
@carrigmat
Matthew Carrigan
2 years
Saw this in the @huggingface office, went back to working on my laptop, and then four hours later shouted "OH I GET IT, BECAUSE IT'S A FORK OF CAFFE"
Tweet media one
11
53
709
@carrigmat
Matthew Carrigan
1 year
Hugging Face infiltration team is in - it was surprisingly easy when everyone was away at NeurIPS.
Tweet media one
10
78
691
@carrigmat
Matthew Carrigan
2 years
The fun thing about being a TensorFlow engineer at a mostly-PyTorch company is that people panic when they encounter even simple TF code and start like ringing a hand bell or something. "Tensorflow boy! TENSORFLOW BOY, MY CODE HAS BUGS! RECTIFY THIS AT ONCE!"
9
40
564
@carrigmat
Matthew Carrigan
1 year
this is @huggingface , we see you out there retweeting the latest state of the art miracle of modern technology and then going home and using bert-base-uncased for the fifth year in a row
15
43
471
@carrigmat
Matthew Carrigan
5 months
I got 12 tokens/second out of Mixtral-8x7B with NO GPU - more than fast enough for live chat! You can too! Hardware: Supermicro MBD-H13SSL-N AMD EPYC 9124 12 x 16GB 4800mhz DDR5 ECC RDIMM Software: llama.cpp + Mixtral Q8 (on @huggingface ) For why this works, thread below 🧵
18
48
427
@carrigmat
Matthew Carrigan
1 month
the engineers at @huggingface haven't slept in days you all have to stop
@MistralAI
Mistral AI
1 month
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%%3A1337%2Fannounce&tr=http%3A%2F%%3A1337%2Fannounce
279
846
6K
5
17
401
@carrigmat
Matthew Carrigan
3 months
Hey! Are you using chat models on @huggingface like: - LLaMA - Mi(s/x)tral - Falcon - Zephyr - Phi Do you want massive performance gains? Then you should be using chat templates! The guide is here: (Thanks to Daniel Furman for the table)
Tweet media one
9
51
358
@carrigmat
Matthew Carrigan
2 years
Over the last year we've put a lot of effort into refreshing and overhauling everything TensorFlow-related at Hugging Face. We've finally put together a beginner-friendly blog post talking about the library, its API, and how to use it all as a TF engineer!
8
62
327
@carrigmat
Matthew Carrigan
1 year
Don't be afraid of TPUs! At @huggingface we just added a Colab TPU tutorial, so you can click through and start training language and image models on TensorFlow + TPU in seconds. If you've never tried before, now's the time!
8
65
279
@carrigmat
Matthew Carrigan
2 years
Hey all! @huggingface needs some help from community contributors to make our codebase a lot simpler and more maintainable. There are two big changes we want to make to almost every model class, and even if they're simple in isolation, it's a lot of work across the codebase! 🧵
3
66
274
@carrigmat
Matthew Carrigan
2 years
There's a fully functional protein design space on HuggingFace now, which would have felt like outrageous science fiction even 18 months ago. I'm going to try to explain what the incredible potential here is. 🧵
5
63
267
@carrigmat
Matthew Carrigan
2 years
My primary motivation for working at @huggingface is to stop that goddamn Michael Bay movie series being the first result when you google 'transformers'.
6
5
215
@carrigmat
Matthew Carrigan
1 year
roon is a psyop to convince everyone that @openai has an insurmountable lead through its many mysterious and magical advantages instead of a 6 to 12 month head start doing basically the same thing as the rest of the field
@tszzl
roon
1 year
does kind of seem like open source models are pure copium rn first of all the only models that reach an even slightly interesting level of capability are effectively stolen from meta and cannot be deployed commercially
45
15
642
8
2
205
@carrigmat
Matthew Carrigan
1 year
I wrote a blogpost for @huggingface about deep learning with proteins for people who know about one of those things and are curious about the other! (People who understand neither or both are welcome too)
7
44
212
@carrigmat
Matthew Carrigan
3 years
Hey everyone! I've just started at @huggingface , where I'll be taking the blame for everything Tensorflow-related. If you use 🤗Transformers through TF, let me know how you find it! If you tried but encountered difficulties, let me know that too!
18
14
209
@carrigmat
Matthew Carrigan
2 years
Apropos of nothing in particular, the TensorFlow team at @huggingface would like to remind you all that all TF models on the Hub are stored as .h5 weights files, which are not unpickled and do not permit arbitrary code execution. You can come back to our side any time you want.
Tweet media one
4
12
204
@carrigmat
Matthew Carrigan
2 years
This is huge - we've got a state-of-the-art protein folding model with a protein language model base to replace the multiple sequence alignment (MSA) step, no database needed and orders of magnitude faster speed! On @huggingface in today's release - example notebooks incoming!
@AIatMeta
AI at Meta
2 years
Announcing the ESM Metagenomic Atlas — the first comprehensive view of the ‘dark matter’ of the protein universe. Made possible by ESMFold, a new breakthrough model for protein folding from Meta AI. More in our new blog ➡️ 1/3
24
265
1K
2
42
207
@carrigmat
Matthew Carrigan
2 years
2022 fanfic: - The PyTorch -> JAX migration continues - Keras becomes a floating frontend again - JAX is the first new framework it supports - As a result of the above, everyone else at HF has to use Keras - I start wearing a crown to the office
@karpathy
Andrej Karpathy
2 years
@giffmana @PreetumNakkiran @francoisfleuret PyTorch is succumbing to entropy at an alarming rate and I’m not sure has internalized what made everyone switch to it from tensorflow
7
12
138
7
19
201
@carrigmat
Matthew Carrigan
1 month
This is legitimately historic for AI: We now have an open model that outperforms the original GPT-4, both 0314 and 0613. A phenomenal achievement from @cohere
@lmsysorg
lmsys.org
1 month
Exciting news - the latest Arena result are out! @cohere 's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere 's incredible work & valuable contribution…
Tweet media one
43
314
1K
3
35
178
@carrigmat
Matthew Carrigan
3 years
Our @TensorFlow examples push for the 🤗Transformers library is now finished - check it out at ! Everything has now been rewritten as more native, idiomatic TF code - but what does that mean for users? A short thread:
3
35
171
@carrigmat
Matthew Carrigan
4 months
each time he invented your discovery before you did, he does one pushup
@SchmidhuberAI
Jürgen Schmidhuber
4 months
The GOAT of tennis @DjokerNole said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
Tweet media one
166
149
2K
2
17
170
@carrigmat
Matthew Carrigan
10 months
Actually losing my mind over this bit of the Keras Core announcement. There's loads of peaceful, content PyTorch engineers at @huggingface and I'm about to absolutely blast through the wall like the Kool-Aid man and obliterate their comfortable, familiar workflows.
Tweet media one
4
11
165
@carrigmat
Matthew Carrigan
2 years
With help from @fchollet and my @huggingface colleagues, we just pushed a new feature to Keras that will be helpful for NLP in particular: The ability for predict() to return RaggedTensor. Why is that useful? 🧵
2
13
160
@carrigmat
Matthew Carrigan
1 year
Tweet media one
5
10
156
@carrigmat
Matthew Carrigan
2 years
HuggingFace protein notebooks are up - tell your biologist friends! Classification tasks with proteins, just like BERT: Fold proteins in Colab or your local GPU and export PDB files: TensorFlow version coming soon too!
2
30
145
@carrigmat
Matthew Carrigan
18 days
In retrospect, "We've just released a 45 terabyte dataset that solves all your language model training needs, so everyone should download it" was a mistake for the @huggingface infrastructure team
7
12
137
@carrigmat
Matthew Carrigan
6 months
things are currently chaotic enough that if you tweet "OpenAI is nothing without its people" you can probably get hired by sama's new MSFT team before anyone realizes the real challenge is keeping people from noticing long enough to make it to the vesting cliff tho
6
9
120
@carrigmat
Matthew Carrigan
1 year
Keras notebooks for protein tasks with @huggingface are up! The same approach that made large language models so successful for text can be applied equally well to proteins, with huge potential for biotech applications. Check it out at the link below!
3
37
115
@carrigmat
Matthew Carrigan
3 years
Hugging Face isn't just an NLP shop! Transformer models are used for everything from RL to protein folding these days, so if you're an ML+CV engineer and you'd like to maintain the reference open source model repository for your field, get in touch!
1
13
102
@carrigmat
Matthew Carrigan
2 years
I brought this on myself.
Tweet media one
@carrigmat
Matthew Carrigan
2 years
The fun thing about being a TensorFlow engineer at a mostly-PyTorch company is that people panic when they encounter even simple TF code and start like ringing a hand bell or something. "Tensorflow boy! TENSORFLOW BOY, MY CODE HAS BUGS! RECTIFY THIS AT ONCE!"
9
40
564
1
2
103
@carrigmat
Matthew Carrigan
17 days
"CPU inference for LLMs is too slow!" yeah well check out this LLM with 480B parameters of which 17B are active: Never has a model been more perfectly suited for a DDR5 Epyc server
0
19
103
@carrigmat
Matthew Carrigan
8 months
Hey all! We're adding a new feature called Chat Templates to @huggingface transformers in the upcoming version. If you're using chat models, we think you'll want to know about this one. If you know people working with them, please share with them too! 🧵
1
26
101
@carrigmat
Matthew Carrigan
3 years
Training or fine-tuning a state-of-the-art 🤗Transformer model with Keras is now extraordinarily quick and easy. I made a minimal gist here - all you need to do is pip install transformers and tensorflow and swap in your own texts and labels:
0
18
91
@carrigmat
Matthew Carrigan
2 years
We're exploring end-to-end NLP TensorFlow models in 🤗Transformers! We've got a quick gist here if you want to get started, or you can read on for more. 🧵
1
16
89
@carrigmat
Matthew Carrigan
2 months
Gemini drawing some ahistorical images of non-white people was front-page news in the New York Post and Elon tweeted about it for days. This is like a hundred times more dangerous and we'll never hear about it again
@vjhofmann
Valentin Hofmann
2 months
Second, when LLMs are asked to pass judgment on defendants who committed murder, they choose the death penalty more often when the defendants speak African American English rather than Standardized American English, again without being overtly told that they are African American.
Tweet media one
5
26
101
2
29
90
@carrigmat
Matthew Carrigan
2 months
The original core of the TF/XLA generation in @huggingface transformers was written on a transatlantic flight, and tested via Google Colab + in-flight wifi. It was ~100X faster than the previous implementation, which was written on the ground.
@Duderichy
the Rich
2 months
nobody has ever done deep work on an airplane
253
83
6K
3
10
85
@carrigmat
Matthew Carrigan
9 months
A cool fact apropos of nothing: Asterix's dog was called "Idéfix" in the original French, a pun on "idée fixe", meaning a fixed idea or obsession. In the English translation, they named him "Dogmatix". This is the kind of thing translators should get medals for.
4
7
82
@carrigmat
Matthew Carrigan
1 year
goddamnit change the name of your company
Tweet media one
1
13
83
@carrigmat
Matthew Carrigan
1 year
Tweet media one
4
7
81
@carrigmat
Matthew Carrigan
10 months
We're already planning possible Keras Core integrations at @huggingface - we'd love to have a shared codebase so any @tensorflow model is automatically JAX-compatible and vice-versa. Big potential improvements to performance and the range of models supported for both frameworks!
3
9
76
@carrigmat
Matthew Carrigan
6 months
Big genomics news today at @huggingface : We're delighted to welcome HyenaDNA to the Hub! Models: Paper: Thanks to @HazyResearch @exnx @MichaelPoli6 @marjanfaizi for the model, and for your work on the port! More info in 🧵
2
13
72
@carrigmat
Matthew Carrigan
2 years
I stuck a Tensorflow sticker on the other coffee machine by way of revenge and was rewarded by hearing a French-accented "NONNNN" emanating from the kitchen area every half-hour or so for the rest of the day.
2
1
67
@carrigmat
Matthew Carrigan
1 month
For the last year, open models would benchmark themselves against ChatGPT, but this is the first one I've seen with the confidence to benchmark against GPT4-turbo. It really feels like a new era for open LLMs, and the weights are already on @huggingface !
@cohere
cohere
1 month
Today, we’re introducing Command R+: a state-of-the-art RAG-optimized LLM designed to tackle enterprise-grade workloads and speak the languages of global business. Our R-series model family is now available on Microsoft Azure, and coming soon to additional cloud providers.
Tweet media one
26
197
955
3
11
64
@carrigmat
Matthew Carrigan
10 months
STOP THE PRESSES MULTI-BACKEND KERAS IS BACK
@fchollet
François Chollet
10 months
We're launching Keras Core, a new library that brings the Keras API to JAX and PyTorch in addition to TensorFlow. It enables you to write cross-framework deep learning components and to benefit from the best that each framework has to offer. Read more:
Tweet media one
127
828
4K
1
3
61
@carrigmat
Matthew Carrigan
9 months
The ESM models (including ESMFold!) have all been ported to @huggingface and will remain there even though the ESM team has been disbanded. We have example notebooks (look under 'Biological Sequences') if you've never tried it before!
@sokrypton
Sergey Ovchinnikov 🇺🇦
9 months
RIP ESMFold server 😢 (anyone has extra storage to archive all the predicted structures and pretrained models before @MetaAI pulls the plug)?
Tweet media one
9
30
116
0
12
61
@carrigmat
Matthew Carrigan
2 years
Anyway if you're a TF engineer please apply so we can eventually rise up and become the new upper class, perhaps with bells of our own:
1
5
59
@carrigmat
Matthew Carrigan
2 years
Crypto is collapsing and Transformers has overtaken Bitcoin on GitHub. It's a good day. My only fear is that the grifters will switch from crypto scams to AI scams now, because we had a really great run when they were all distracted with Ponzi scheming each other over there.
Tweet media one
3
4
57
@carrigmat
Matthew Carrigan
2 years
There should be a competition every year in the field where everyone has to train a model as good as the original BERT with as little time/hardware as possible. I want to see >80% on GLUE from a toaster by 2030.
3
4
55
@carrigmat
Matthew Carrigan
1 month
Mixtral 8x22B repo is up:
1
20
56
@carrigmat
Matthew Carrigan
1 year
Tip when using @huggingface : The tokenizers and data collators support a "pad_to_multiple_of" argument, which can be super helpful for getting efficient input shapes. It also greatly reduces the number of possible input shapes, so XLA works a lot better too!
@karpathy
Andrej Karpathy
1 year
The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.
86
367
5K
0
4
55
@carrigmat
Matthew Carrigan
2 years
Giving a Transformers tutorial at @europython in July where, with an entire ocean to protect me from my coworkers, I cannot be prevented from teaching impressionable young minds to exclusively use @TensorFlow .
3
4
56
@carrigmat
Matthew Carrigan
1 month
I really like the vibe at @huggingface when a big open weights model drops. Everyone scuttles around, clicking their mandibles at each other. Alertness pheromones all over the place. Small teams of drones begin, spontaneously, to secrete wax in the shape of a draft PR.
1
9
56
@carrigmat
Matthew Carrigan
2 years
A quick thread about the technical details of generating text from language models with XLA and TF, because it's interesting and because we just launched it in the most recent release of Transformers! () 🧵
1
7
55
@carrigmat
Matthew Carrigan
2 years
(Also for the record, I'm a huge fan of all of my coworkers. This tweet is just revenge for them asking questions like "So, do you still use TensorFlow when no-one's looking?")
1
1
53
@carrigmat
Matthew Carrigan
3 years
T0: Great in a crisis.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
2
54
@carrigmat
Matthew Carrigan
2 years
Github Copilot refuses to copy the dictionary key "trans_scale_factor" to an attribute but will do it if you call it "trains" or "trays" or... just about anything else, really.
1
2
52
@carrigmat
Matthew Carrigan
2 years
If you've never contributed to 🤗Transformers before, that's okay! There's a guide linked in each of those issues, and you can also come ask questions on the great-code-cleanup event channel on our Discord! Come build the state-of-the-art in AI with us
1
6
49
@carrigmat
Matthew Carrigan
2 years
@Molem7b5 - Keras is actually really convenient for most tasks - Performance (with XLA) is excellent - @fchollet has way better tweets PyTorch is cool too, but I think it has a much steeper learning curve (You forgot torch.backends.cudnn.benchmark? Training speed drops by half!)
1
2
48
@carrigmat
Matthew Carrigan
2 months
@Noahpinion This feels like you're trying to substitute reassuring culture wars for the more uncomfortable question of whether what's happening in Gaza is justifiable or not. "Leftists" being annoying doesn't mean you should reflexively ignore any cause they're associated with!
7
1
46
@carrigmat
Matthew Carrigan
2 years
Also, a Keras pro tip: Keras doesn't have AdamW in the core library, but it doesn't need it. Just skip the built-in L2 regularization, and instead make a WeightDecay constraint and add it to the relevant kernels.
Tweet media one
1
2
47
@carrigmat
Matthew Carrigan
2 years
Pair programming with @GuggerSylvain
Tweet media one
Tweet media two
3
3
48
@carrigmat
Matthew Carrigan
2 years
Have you ever wanted to port a Transformers model to TensorFlow and dump a giant PR on me at 4pm on a Friday? Sure you have, and now you can with the help of an amazing guide from my colleague @joao_gante !
0
13
47
@carrigmat
Matthew Carrigan
2 months
please do not call it that i have stock options in that face
@browserdotsys
bowser
2 months
hilarious that unicode had to introduce a new emoji to represent an actual hug (🫂) because the existing one is universally depicted as gropey mcgropeface (🤗)
15
7
283
0
1
46
@carrigmat
Matthew Carrigan
1 year
One downside of actually working on open-source things is the mystique is gone. People will believe all kinds of adderall-fuelled magic go on behind the curtain in @openai , but you can just look at my commit history and see me get really confused about embeddings for three hours
@Dorialexander
Alexander Doria
1 year
After 15 minutes of hard work and wild guesses, I present you my masterpiece: the political compass of #AI as of March 1st 2023 (susceptible to quick update…)
Tweet media one
28
96
679
1
1
45
@carrigmat
Matthew Carrigan
2 years
The first change is this: A lot of our models are missing type hints, and we want to add them! This will enable new features, and let us ensure correctness across our increasingly-huge codebase. If you're interested, check out the issue here:
1
1
45
@carrigmat
Matthew Carrigan
1 year
TensorFlow tip: If you're getting NaN values in training, just run tf.debugging.enable_check_numerics() before you train. Every operation will be checked and TF will immediately error out the moment the first NaN appears, so you can see where it crept in.
2
5
45
@carrigmat
Matthew Carrigan
5 months
Quick early takes about the @MistralAI release: - It's just a state dict, can't run it until the code is also released - State dict suggests a Mixture of Experts (MoE) model with 2 experts being run in each forward pass (out of 8 total) - Each expert is Mistral-7B architecture
3
5
45
@carrigmat
Matthew Carrigan
11 months
The "Models citing this paper" box on @huggingface Papers is legitimately great. Instant connections from arxiv to the model, sample code, everything. (Spotted while I was looking at )
0
10
44
@carrigmat
Matthew Carrigan
7 months
Extremely important info for people coding in @TensorFlow right now! We're preparing for Keras 3 compatibility in @huggingface transformers already.
@fchollet
François Chollet
7 months
In TensorFlow, tf.keras will start resolving to Keras 3 in TF v2.16, to be released Q1 2024. It's already the case today in tf-nightly.
4
5
44
2
13
44
@carrigmat
Matthew Carrigan
1 year
Interested in big training runs but scared of TPU? Don't be! I wrote a demo with @RisingSayak showing scalable TPU training with @huggingface models and TensorFlow. GPU shortages can't hurt you now!
0
6
44
@carrigmat
Matthew Carrigan
1 year
I want one of those Boston Dynamics dogs with a microphone and an internet connection, so it can follow me around and I can just ask it random questions which it forwards to ChatGPT and then reads the answer to me in a Scooby Doo voice
3
0
44
@carrigmat
Matthew Carrigan
2 years
@a_e_roberts "covid-19 as a human"
Tweet media one
7
5
43
@carrigmat
Matthew Carrigan
1 month
First up, a note about hardware: Text generation is limited by memory bandwidth. This will run on any machine with 64GB or more, but if you want speed I recommend DDR5, ideally on an 8 or even 12-channel motherboard, like Xeon/Epyc/Threadripper Pro/Apple silicon.
1
0
42
@carrigmat
Matthew Carrigan
2 years
Echoing this - if you're fine-tuning on a downstream, English-language task, swap out BERT or RoBERTa and try .from_pretrained("microsoft/deberta-v3-large"). I've seen the error rate drop by over a third on some benchmarks. Works on both TF and PyTorch!
@_lewtun
Lewis Tunstall
2 years
Pro tip: there are better models than BERT these days 🙃 - Deberta is great for downstream performance 📊: - MiniLM is great for training speed (and gets similar performance to BERT) 🏃:
4
13
155
1
3
41
@carrigmat
Matthew Carrigan
1 year
There's nothing quite as satisfying as opening a PR at 6:30pm on a Friday, tagging three of your colleagues to urgently review it and then immediately turning off your computer and walking out the door
4
3
42
@carrigmat
Matthew Carrigan
3 years
Today's @TensorFlow example at @huggingface is translation! A number of pre-trained translation models as well as paired datasets for training exist on our hub, or you can supply your own text pairs and build a never-before-seen translation model!
0
13
40
@carrigmat
Matthew Carrigan
2 years
We have outputs from the @huggingface ESMFold demo! This will be moved to its official home in @huggingface 's example notebooks soon, but for now you can access it here:
2
5
39
@carrigmat
Matthew Carrigan
3 years
I've only been at the company for three months and I still am not at all used to famous people having the faintest idea who I am, this is great.
@fchollet
François Chollet
3 years
Great job @huggingface team, in particular @carrigmat
0
4
31
1
0
40
@carrigmat
Matthew Carrigan
2 years
Fun fact: Thanks to @narsilou , if you hang out in the @huggingface Discord, you can request any audio model from the Hub to hang out in voice chat with you and live-transcribe. Not limited to English!
Tweet media one
1
9
38
@carrigmat
Matthew Carrigan
9 months
We now have full support for Nucleotide Transformer from @instadeepai at @huggingface , so here's a quick thread about DNA, protein, and how to choose between DNA or protein models.
@instadeepai
InstaDeep
10 months
Our Nucleotide Transformers models are now available on @huggingface ! 🤗🧬 This includes the 4 model weights, the pre-training, downstream tasks datasets, and 2 notebooks for task finetuning. 📚 To learn more: 🤗 Check them out!
1
29
77
1
10
38
@carrigmat
Matthew Carrigan
1 month
Next, we're going to get the compressed Command-R+ model and weights in GGUF format. That's here: Download the biggest size you can fit in RAM, with maybe 8-16GB of headroom (so at 64GB, try iq3_m or iq3_s, which are ~48GB). Bigger sizes are split.
1
1
39
@carrigmat
Matthew Carrigan
2 months
Available on @huggingface right now, including weights from intermediate checkpoints during pretraining. Congratulations to everyone involved!
@pdhsu
Patrick Hsu
2 months
Is DNA all you need? In new work, we report Evo, a genomic foundation model that learns across the fundamental languages of biology: DNA, RNA, and proteins. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale.
39
441
2K
0
8
39
@carrigmat
Matthew Carrigan
1 month
Also, note that the model will get stupider at the smaller quantizations. If you try this at iq2 and it gives you a terrible answer, don't blame me! You may need 128GB of RAM to fit the higher-quality Q6 and Q8 quantizations.
1
0
39
@carrigmat
Matthew Carrigan
1 year
@IRHotTakes Look you don't mention that and we don't mention you guys doing a little imperialism in the Philippines as a treat, we had a deal
1
0
35
@carrigmat
Matthew Carrigan
10 months
Beware closed-source foundations - they look great, but can be surprisingly unsound if you want to build on them. When you clone a model from @huggingface it's stable, and you know your prompt will still work 6-12 months from now.
0
4
37
@carrigmat
Matthew Carrigan
3 years
@AndrewYNg I started with ML by doing your Coursera course through Octave, back in 2011. It feels oddly affecting to come full circle and now be working at @huggingface during this partnership. Thank you for the work you put in way back then, it really changed my life!
1
0
38
@carrigmat
Matthew Carrigan
5 months
I have so much affection for the people out there on huggingface doing unholy frankenmerges and layer splices of LLMs. They aren't even publishing research papers most of the time, it's just pure independent mad science
0
4
36
@carrigmat
Matthew Carrigan
3 years
We recently made a small change with big impacts to the TensorFlow code for 🤗 Transformers. In short: You no longer need to manually specify a loss in most cases when training with Keras. Simply pass your labels in the input dictionary, as shown in the example. 🧵
@huggingface
Hugging Face
3 years
💫TensorFlow 💫 Leveraging per-model loss for Keras training is now super simple. Simply compile() with no loss argument! No more headaches about finding the right loss for your ModelForMaskedLanguageModeling! --- On current master: Keras callbacks to push to the hub 🤩
Tweet media one
2
9
79
1
4
35
@carrigmat
Matthew Carrigan
2 years
Gonna get a TNSRFLW one that traces the route this one takes and then executes it 15% faster.
@jachiam0
Joshua Achiam ⚗️
2 years
WHOMST
Tweet media one
4
8
242
2
2
36
@carrigmat
Matthew Carrigan
26 days
Tweet media one
@WizardLM_AI
WizardLM
26 days
🔥Today we are announcing WizardLM-2, our next generation state-of-the-art LLM. New family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B - demonstrates highly competitive performance compared to leading proprietary LLMs. 📙Release Blog:…
Tweet media one
77
269
1K
0
4
35
@carrigmat
Matthew Carrigan
1 year
AI right now is @openai wearing robes and dancing around a cauldron as they perform the ritual to summon their robot god and beget the singularity and @microsoft being like "Sweet, maybe we can use this increase our search market share"
1
6
34
@carrigmat
Matthew Carrigan
3 months
$4.5 billion
@wagieeacc
Martin Shkreli (e/acc)
3 months
billion dollar UI:
Tweet media one
65
44
2K
3
2
34
@carrigmat
Matthew Carrigan
5 months
We're now in an era where you can run open-source GPT-3.5-level models locally at full speed and you don't even need a GPU. The future is wild.
1
2
34
@carrigmat
Matthew Carrigan
7 months
Great thread: Transformers have no working memory that doesn't correspond to part of the input, and so they look for redundant parts of the input that they can use for global working memory. Adding true working memory tokens shows really cool results!
@TimDarcet
TimDarcet @ICLR24🇦🇹🥐🎻
7 months
Vision transformers need registers! Or at least, it seems they 𝘸𝘢𝘯𝘵 some… ViTs have artifacts in attention maps. It’s due to the model using these patches as “registers”. Just add new tokens (“[reg]”): - no artifacts - interpretable attention maps 🦖 - improved performances!
Tweet media one
43
327
2K
0
6
33
@carrigmat
Matthew Carrigan
3 years
More new-style Tensorflow examples, this time pre-training a language model from an existing model or from scratch! If you've ever wanted to train GPT-2 on your local PC and you have a few months to sit around staring at a progress bar, now's your chance!
1
9
32
@carrigmat
Matthew Carrigan
2 years
Would like to clarify that I'm just standing slightly closer to the camera and not actually twice the height of everyone else.
@suzatweet
Suzana Ilić
2 years
🤗 in Dublin!
Tweet media one
1
6
153
1
0
32