muhtasham Profile Banner
muhtasham Profile
muhtasham

@Muhtasham9

1,169
Followers
928
Following
213
Media
1,482
Statuses

In my pre-training years "Dude! You should move to SF!" - @dylan522p

Munich not yet SF
Joined March 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@Muhtasham9
muhtasham
10 months
w boss
Tweet media one
3
1
55
@Muhtasham9
muhtasham
1 month
A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)
Tweet media one
@Muhtasham9
muhtasham
2 years
Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
0
2
14
7
139
895
@Muhtasham9
muhtasham
4 months
Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this…
Tweet media one
10
35
258
@Muhtasham9
muhtasham
1 year
Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…
1
40
241
@Muhtasham9
muhtasham
2 years
Meta: Multi-tasking while reading about Multi-task NLP models
Tweet media one
3
10
133
@Muhtasham9
muhtasham
2 months
StarCoder2 running on M2 8GB
1
7
91
@Muhtasham9
muhtasham
2 months
DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4?
7
6
82
@Muhtasham9
muhtasham
2 years
@_jasonwei @arankomatsuzaki Might contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says
2
5
72
@Muhtasham9
muhtasham
1 year
@Muhtasham9
muhtasham
1 year
🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
5
17
61
1
2
66
@Muhtasham9
muhtasham
1 year
🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
5
17
61
@Muhtasham9
muhtasham
2 months
The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --model…
@BigCodeProject
BigCode
2 months
Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open!
Tweet media one
15
192
676
2
13
59
@Muhtasham9
muhtasham
3 months
Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
2
4
37
@Muhtasham9
muhtasham
2 years
@tszzl Here is PDF by @amasad
1
1
35
@Muhtasham9
muhtasham
1 year
If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.
0
10
34
@Muhtasham9
muhtasham
3 months
🚀Now supports real-time streaming
@Muhtasham9
muhtasham
3 months
Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
2
4
37
2
7
31
@Muhtasham9
muhtasham
2 years
Let's see how different LM's multiply matrices / think 💭 using this Space GPT-J-6B i see what you did there👀 Built using amazing @Gradio Blocks 🧱 APIs, also you can use new @huggingface 🤗 Community Tab to make suggestions and collaborate
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
2 years
Large Language Models are Zero-Shot Reasoners Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.
Tweet media one
59
573
3K
2
11
29
@Muhtasham9
muhtasham
1 year
Ultimate comeback
Tweet media one
0
4
31
@Muhtasham9
muhtasham
1 month
Using the example of the language model (i.e. decoder-only) LLaMa-2, let’s look at the main major architectural improvements for LLM: — Post LayerNorm → Pre LayerNorm (). This makes the convergence more stable. Now the process goes in such a way that the…
1
0
30
@Muhtasham9
muhtasham
7 months
📢 Just published: How traditional OS concepts like Branch Prediction & Virtual Memory Paging shape today's Large Language Models ( #LLMs ). LLMs = CPUs of early computing? Feedback welcome! 🔗
0
3
29
@Muhtasham9
muhtasham
1 month
— Absolute position embedding → RoPE (). The method itself is that we rotate the token embeddings by an angle depending on the position. And it works well. In addition, the method opened up a number of modifications to expand the context to very large…
1
0
27
@Muhtasham9
muhtasham
2 months
"Flops are cheap, bandwidth is adding more pins, and latency is physics. Deal with it. "
1
7
26
@Muhtasham9
muhtasham
2 years
Your car gathers a shocking amount of data about you, which you don’t get to see, and the manufacturer sells that to third parties, who use it in ways that are counter to your interests.
0
19
28
@Muhtasham9
muhtasham
6 months
@vboykis He deployed on Friday
0
1
27
@Muhtasham9
muhtasham
1 month
— ReLU activation → SwiGLU (). Gated Linear Units (a family of methods to which SwiGLU belongs. It adds the operation of element-wise multiplication of matrices, one of which has passed through the sigmoid and thus controls the intensity of the signal…
1
0
23
@Muhtasham9
muhtasham
2 years
When your model is training and you see live footage of forward and back prop via @weights_biases
0
4
22
@Muhtasham9
muhtasham
1 year
@CisLmu researcher distilling latest paper about instruction tuning
Tweet media one
1
4
21
@Muhtasham9
muhtasham
1 month
Attention modifications (), for example, using one K-V pair of matrices per group of Q matrices at once. This improvement mainly already affects the optimization of inference. But there are also a huge number of methods aimed at reducing the quadratic…
2
2
22
@Muhtasham9
muhtasham
2 months
Except it’s called AI engineering now Come to @aiDotEngineer conf to learn more
@vboykis
vicki
2 months
2013 — 2023: you were hired to do machine learning but do data engineering 2023 — : you were hired to do machine learning but do web dev
21
36
773
3
2
22
@Muhtasham9
muhtasham
11 months
Burning some gpus after first @LangChainAI meetup in Munich
Tweet media one
1
3
19
@Muhtasham9
muhtasham
3 years
New SOTA on BCI SSVEP spellers. Our new DNN achieves impressive information transfer rates (ITR) with only 0.4 seconds of stimulation: 265.23 bits/min on the benchmark and 196.59 bits/min on BETA dataset. Paper: Code: #bci #ssvep
Tweet media one
3
1
15
@Muhtasham9
muhtasham
1 year
All started with GPT2 moment, but only last week trained internal model and it did good, but fine-tuning made 50% better. @amasad
Tweet media one
1
3
17
@Muhtasham9
muhtasham
2 years
the amount of details one can get from @weights_biases is absolutely electric 💥
Tweet media one
0
2
16
@Muhtasham9
muhtasham
1 year
Thanks for putting this together @nathanbenaich and @NotionHQ
Tweet media one
1
3
17
@Muhtasham9
muhtasham
2 months
MLX weights below
@_lewtun
Lewis Tunstall
2 months
Happy to share the latest Zephyr recipe based on @Google 's Gemma 7B 🔷🔶! Outperforms Gemma 7B Instruct on MT Bench & AGIEval, showing the potential of RLAIF to align this series of base models 💪 🧑‍🍳 I hope this recipe enables the community to create many more fine-tunes!
Tweet media one
3
40
162
0
3
15
@Muhtasham9
muhtasham
1 month
“there's a graveyard of ideas around attention” @TrentonBricken
0
3
15
@Muhtasham9
muhtasham
1 year
Full house 🦜 @full_stack_dl
Tweet media one
0
1
16
@Muhtasham9
muhtasham
2 months
Spotted GPT-5 in the wild
Tweet media one
0
1
15
@Muhtasham9
muhtasham
5 months
@lvwerra Yay congrats also got recently promoted to Sr Random Seed Engineer
1
0
14
@Muhtasham9
muhtasham
1 year
@saahil addressing industry's challenges in scaling MLOps in multimodal settings
Tweet media one
0
3
13
@Muhtasham9
muhtasham
2 years
“The thing that determines whether you’re the product isn’t whether you’re paying for the product: it’s whether market power and regulatory forbearance allow the company to get away with selling you.” —  @doctorow
1
8
15
@Muhtasham9
muhtasham
1 month
machine learning is low-precision linear algebra during developing TPU google cut down mantissa from 23 bits to 5 bits and invented bf16 fast forward now we have 1.58 bit LLMs
@simonw
Simon Willison
1 month
Huh, I missed this earlier this month: Microsoft Research used a similar trick for their "1.58-bit" LLM BitNet
4
2
42
0
0
14
@Muhtasham9
muhtasham
2 years
Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
0
2
14
@Muhtasham9
muhtasham
4 months
Supporting local compute pfp by @evanjconrad
Tweet media one
3
0
14
@Muhtasham9
muhtasham
3 months
@swyx Shameless plug but this would make it easier to compare
@Muhtasham9
muhtasham
4 months
Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this…
Tweet media one
10
35
258
1
0
13
@Muhtasham9
muhtasham
5 months
Whats the bottleneck of your GPU-floor? @anyscalecompute meetup
Tweet media one
0
1
12
@Muhtasham9
muhtasham
2 months
Uncle jokes followed by biggest GPU heck yeah #NVIDIA #GTC24
Tweet media one
1
2
12
@Muhtasham9
muhtasham
6 months
Top recommendation: Beautifully written in-depth explanation of this concepts, which I failed to do in my initial blog High quality tokens, future LLMs can boost their reasoning and get sense of humor from @charles_irl if this blog ends up in their dataset
@charles_irl
Charles 🎉 Frye
6 months
PagedAttention, Virtual Context, Speculative Decoding, Register Tokens: the last year has seen many ideas from systems programming applied to LLMs. Not many folks live in that intersection, so I wrote an explainer post to make them a bit more accessible!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
18
301
2K
1
3
11
@Muhtasham9
muhtasham
1 year
Tweet media one
2
1
11
@Muhtasham9
muhtasham
2 months
It´s here
@NVIDIAAIDev
NVIDIA AI Developer
2 months
Accelerate your coding tasks, from code completion to code summarization with StarCoder2, the latest state-of-the-art, open code #LLM built by @HuggingFace , @ServiceNow , and NVIDIA. Learn more 👉
1
37
131
1
0
11
@Muhtasham9
muhtasham
1 year
iCoffe Pro Max
Tweet media one
2
1
10
@Muhtasham9
muhtasham
6 months
Sharing @huggingface collection of old models from RoBERTa all the way to GPT2 pre-trained and finetuned on Tajik language, stay tuned for more to come, mistral-7b, llama2-7b, and others on the way
1
0
10
@Muhtasham9
muhtasham
2 months
Great tune! Smooth run on m2 8gb python -m mlx_lm.generate --model mlx-community/OpenCodeInterpreter-SC2-3B-4bit --prompt "Write a quick sort in C++" --temp 0.0 --colorize
@xiangyue96
Xiang Yue
2 months
🌟 Big thanks for making StarCoder 2 open-source! 🚀 We've swiftly finetuned it on our Code-Feedback instruction dataset, the dataset behind OpenCodeInterpreter. 📈 HumanEval Scores are boosted ~30%. 3B Model: from 31.7 to 67.1! 7B Model: from 35.4 to 75.6! 🛠️ CodeFeedback has…
Tweet media one
42
64
265
0
3
10
@Muhtasham9
muhtasham
2 years
Reminder: Join amazing Transformers lecture by @giffmana tomorrow
@MunichNlp
Munich🥨NLP
2 years
🥨NEW EVENT🥨 Transformers in all glory details: @GoogleAI Brain Team Scientist Lucas Beyer @giffmana will explain the currently most dominant deep learning architecture for natural language processing in an exclusive event with @MunichNlp . Details below👇
Tweet media one
1
3
11
0
4
10
@Muhtasham9
muhtasham
3 months
Will try to feed 10M tokens over weekend
Tweet media one
1
0
9
@Muhtasham9
muhtasham
7 months
Tweet media one
0
2
10
@Muhtasham9
muhtasham
8 months
Transformers everywhere…
Tweet media one
0
0
9
@Muhtasham9
muhtasham
2 months
is this this the company motto? smh @EMostaque stay strong king
Tweet media one
@amasad
Amjad Masad
2 months
Corporate AI drama is accelerating faster than AI itself.
Tweet media one
39
90
1K
0
0
9
@Muhtasham9
muhtasham
4 months
Patterns from CIDR database conference: Stanford - turns out databases are actually LLMs and every problem is an ML problem. Berkeley - let me solve some NP hardish algorithmic problem using LP and other techniques that might find application 50 years later. CMU - let me…
0
2
9
@Muhtasham9
muhtasham
7 months
@vboykis Also rich
0
0
0
@Muhtasham9
muhtasham
2 months
Tweet media one
0
0
9
@Muhtasham9
muhtasham
2 years
𝙏𝙝𝙧𝙚𝙚 𝙩𝙝𝙞𝙣𝙜𝙨 𝙚𝙫𝙚𝙧𝙮𝙤𝙣𝙚 𝙨𝙝𝙤𝙪𝙡𝙙 𝙠𝙣𝙤𝙬 𝙖𝙗𝙤𝙪𝙩 𝙑𝙞𝙨𝙞𝙤𝙣 𝙏𝙧𝙖𝙣𝙨𝙛𝙤𝙧𝙢𝙚𝙧𝙨 by @MetaAI Summary thread 🧵
1
1
8
@Muhtasham9
muhtasham
3 months
🟩
@Muhtasham9
muhtasham
10 months
w boss
Tweet media one
3
1
55
0
0
9
@Muhtasham9
muhtasham
1 year
💫StarCoder which was released today by @BigCodeProject is prime example of Open Source outcompeting Big shot out to @lvwerra @harmdevries77 @Thom_Wolf @huggingface @ServiceNowRSRCH
@dylan522p
Dylan Patel
1 year
Google "We Have No Moat, And Neither Does OpenAI" Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI This is the opinion of one Googler, we do not agree, simply sharing. $GOOGL $MSFT $META $AI $NVDA $AMZN $AAPL
31
124
690
0
0
9
@Muhtasham9
muhtasham
1 year
Beating OpenAI large v2 with Fine-tuned *medium* model from 85.8 WER down to 23.1 WER special thanks to @LambdaAPI and @huggingface team especially @sanchitgandhi99 and @reach_vb
0
0
8
@Muhtasham9
muhtasham
7 months
Based @ykilcher at @tum .ai summit
Tweet media one
2
0
8
@Muhtasham9
muhtasham
1 year
Nett hier. Aber waren Sie schon mal in @TU_Muenchen ?
Tweet media one
1
0
8
@Muhtasham9
muhtasham
1 year
How to get rich from LLMs 🤑 This made my day @full_stack_dl
Tweet media one
0
0
8
@Muhtasham9
muhtasham
1 month
@jtvhk bruhh they should just outsource to @sfcompute
0
0
8
@Muhtasham9
muhtasham
1 year
Kinda like this emoji 🌉 but with crescent 🌙
Tweet media one
0
0
7
@Muhtasham9
muhtasham
2 months
Image and prompt by yours truly @marksaroufim teaching style is like a casual conversation with a senior engineer on your team
@neurosp1ke
Andreas Köpf
2 months
CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC
Tweet media one
1
20
107
1
1
8
@Muhtasham9
muhtasham
1 year
@MattNiessner @synthesiaIO Forget AutoGPT, AutoProf is the real deal
0
0
8
@Muhtasham9
muhtasham
2 months
@ClementDelangue Yeah your runway should be enough to do this
1
0
8
@Muhtasham9
muhtasham
1 year
“LLMs are not database, they are not up to date, think of them as are reasoning engine and some sort of retrievers will solve the the issue of up do date knowledge” @sama
Tweet media one
0
0
3
@Muhtasham9
muhtasham
9 months
Lot of wisdom from @kagglingdieter
Tweet media one
2
0
8
@Muhtasham9
muhtasham
2 months
Super model MLX weights below
@abacaj
anton
2 months
Release phi-2-super. Fine tuned over phi-2 and aligned with cDPO. MT-bench of 7.1875, surpassing many larger models. Humaneval score 60.98%, Humaneval-Plus 54.88%
Tweet media one
47
62
566
0
2
8
@Muhtasham9
muhtasham
2 months
Took some time off web-sockets
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
8
@Muhtasham9
muhtasham
2 months
Just noticed slick apple podcast feature with search now I can just type instead of rewinding back to find right segment And another banger from @latentspacepod @FanaHOVA @swyx
0
0
6
@Muhtasham9
muhtasham
1 year
With the swarm of users experimenting @bing Chat aka Sydney. I feel similar vibes like that of “OMG LaMDA is sentient guy”. Again many things can be said but before folks start posting terminator images let me leave this here …
Tweet media one
1
1
7
@Muhtasham9
muhtasham
1 year
Benedikt sharing the learnings from 5 data science competitions for recommender systems he did over the last 3 years.
Tweet media one
0
0
7
@Muhtasham9
muhtasham
7 months
bf16 >> fp16 more numerically stable in practice
0
0
7
@Muhtasham9
muhtasham
8 months
Thanks @dk21 and @jefrankle for this amazing session, can’t wait for upcoming sessions
@weights_biases
Weights & Biases
8 months
We are LIVE🎉 Tune in for Lesson 3 of the Training & Fine-Tuning LLMs Course with @MosaicML 📚 You will learn data scaling laws to construct custom datasets, & dive deep into data curation, ethics, storage, & streaming best practices. Stream now🔗
0
3
6
0
1
7
@Muhtasham9
muhtasham
2 years
Found the famous books cover page star while hiking today @aureliengeron
Tweet media one
0
0
7
@Muhtasham9
muhtasham
9 months
@EMostaque replicate?
0
0
4
@Muhtasham9
muhtasham
3 months
Germany is probably the only country you get invited to dinner by VC and the day after get asked to paypal the amount, or probably recession hitting hard on everyone
1
0
7
@Muhtasham9
muhtasham
9 months
TIL: @lexfridman hails from Buston, Tajikistan 🇹🇯 When our paths cross, I'll be ready with a friendly, "What's up, homie?"
1
1
7
@Muhtasham9
muhtasham
6 months
@amasad @perplexity_ai @googlecloud Damn time to switch all dev to iPad with Replit Core
0
0
2
@Muhtasham9
muhtasham
2 years
@bradneuberg @tszzl @amasad Should be from the Facebook IPO, so around 2012
1
0
7
@Muhtasham9
muhtasham
1 year
@jeremyphoward
Jeremy Howard
1 year
"Any model made available in the EU, without first passing extensive, and expensive, licensing, would subject companies to massive fines of the greater of €20,000,000 or 4% of worldwide revenue. Opensource developers, and hosting services such as GitHub... would be liable"
102
234
1K
0
0
3
@Muhtasham9
muhtasham
2 months
this guy is bending trajectories in latent space
@isidentical
batuhan (e/single)
2 months
@altryne @playground_ai @alisabets @EMostaque down to 1.73sec/image on A100. i really need to sleep but can't stop, am so competitive.
0
0
10
0
1
6
@Muhtasham9
muhtasham
5 months
🍂🍁
Tweet media one
0
0
6
@Muhtasham9
muhtasham
9 months
@Teknium1 prefill: 46.4 tok/s, decode: 7.2 tok/s iPhone 14 pro
0
0
6