muhtasham @Muhtasham9 Twitter profile

Pinned Tweet

muhtasham

@Muhtasham9

10 months

w boss

3

1

55

Last Seen Profiles

@EdinDiamond

@DevanHoffer

@mrhameljr

@cmueaglesFB

@rachelfuda

@antonclarkson11

@narcotech

@CryptoEcoika

@grubb_jackson

@Atulmaharaj

@hijabjilbab1

@theconleyclark

@HootsuiteDE

@Uncommitted_ID

@P39170r

@AshiqAfshan

@keelgaa

@MMacDonncha

@LeonWesley76

@ShontayLuna

@femalerapgamee

@hijabjilbab1

@DUTCHMAN_79

@R_25en

@thepeaklady

@Pti_pk_lover

@MelindaMende

@CobbFballFri

@p32601122

@KiryuAttorney

@forskningsradet

@Mysteriouaocean

@Manawr_0

@cHJhQ4KcNQCE515

@GarethFfowc

@kthompsonbbn

muhtasham

@Muhtasham9

1 month

A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)

muhtasham

@Muhtasham9

2 years

Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.

0

2

14

7

139

895

muhtasham

@Muhtasham9

4 months

Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this…

10

35

258

muhtasham

@Muhtasham9

1 year

Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…

This link will take you to a page that’s not on LinkedIn

lnkd.in

1

40

241

muhtasham

@Muhtasham9

2 years

Meta: Multi-tasking while reading about Multi-task NLP models

3

10

133

muhtasham

@Muhtasham9

2 months

StarCoder2 running on M2 8GB

1

7

91

muhtasham

@Muhtasham9

2 months

DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4?

Stealing Part of a Production Language Model

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our...

arxiv.org

7

6

82

muhtasham

@Muhtasham9

2 years

@_jasonwei @arankomatsuzaki Might contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says

NLP's Clever Hans Moment has Arrived

A review of Timothy Niven and Hung-Yu Kao, 2019: Probing Neural Network Comprehension of Natural Language Arguments

thegradient.pub

2

5

72

muhtasham

@Muhtasham9

1 year

@Mascobot

muhtasham

@Muhtasham9

1 year

🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate

5

17

61

1

2

66

muhtasham

@Muhtasham9

1 year

🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate

5

17

61

muhtasham

@Muhtasham9

2 months

The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --model…

mlx-community (MLX Community)

huggingface.co

BigCode

@BigCodeProject

2 months

Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open!

15

192

676

2

13

59

muhtasham

@Muhtasham9

3 months

Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev

2

4

37

muhtasham

@Muhtasham9

2 years

@tszzl Here is PDF by @amasad

1

35

muhtasham

@Muhtasham9

1 year

If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.

Machine Learners Guide to Real World - 🌉 A Deep Dive into the LLM Bootcamp Experience: Revolutio...

muhtasham.github.io

0

10

34

muhtasham

@Muhtasham9

3 months

🚀Now supports real-time streaming

muhtasham

@Muhtasham9

3 months

Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev

2

4

37

2

7

31

muhtasham

@Muhtasham9

2 years

Let's see how different LM's multiply matrices / think 💭 using this Space GPT-J-6B i see what you did there👀 Built using amazing @Gradio Blocks 🧱 APIs, also you can use new @huggingface 🤗 Community Tab to make suggestions and collaborate

Aran Komatsuzaki

@arankomatsuzaki

2 years

Large Language Models are Zero-Shot Reasoners Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.

59

573

3K

2

11

29

muhtasham

@Muhtasham9

1 year

Ultimate comeback

0

4

31

muhtasham

@Muhtasham9

1 month

Using the example of the language model (i.e. decoder-only) LLaMa-2, let’s look at the main major architectural improvements for LLM: — Post LayerNorm → Pre LayerNorm (). This makes the convergence more stable. Now the process goes in such a way that the…

On Layer Normalization in the Transformer Architecture

The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be...

arxiv.org

1

0

30

muhtasham

@Muhtasham9

7 months

📢 Just published: How traditional OS concepts like Branch Prediction & Virtual Memory Paging shape today's Large Language Models ( #LLMs ). LLMs = CPUs of early computing? Feedback welcome! 🔗

Machine Learners Guide to Real World - 2️⃣ Concepts from Operating Systems That Found Their Way in...

muhtasham.github.io

0

3

29

muhtasham

@Muhtasham9

1 month

— Absolute position embedding → RoPE (). The method itself is that we rotate the token embeddings by an angle depending on the position. And it works well. In addition, the method opened up a number of modifications to expand the context to very large…

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In...

arxiv.org

1

0

27

muhtasham

@Muhtasham9

2 months

"Flops are cheap, bandwidth is adding more pins, and latency is physics. Deal with it. "

1

7

26

muhtasham

@Muhtasham9

2 years

Your car gathers a shocking amount of data about you, which you don’t get to see, and the manufacturer sells that to third parties, who use it in ways that are counter to your interests.

Your car knows too much about you. That could be a privacy nightmare.

Modern cars collect a lot of data on their drivers.

mashable.com

0

19

28

muhtasham

@Muhtasham9

6 months

@vboykis He deployed on Friday

0

1

27

muhtasham

@Muhtasham9

10 months

@alex_valaitis @MosaicML Was going to skip but, not correct! @MosaicML is not open source LLM startup, its platform, and don’t sleep on them yet, they just released this today, 2x context length of LLAMA!

Announcing MPT-7B-8K: 8K Context Length for Document Understanding | Databricks Blog

Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting from the MPT-7B checkpoint in 3 days on...

www.databricks.com

1

2

25

muhtasham

@Muhtasham9

1 month

— ReLU activation → SwiGLU (). Gated Linear Units (a family of methods to which SwiGLU belongs. It adds the operation of element-wise multiplication of matrices, one of which has passed through the sigmoid and thus controls the intensity of the signal…

GLU Variants Improve Transformer

Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible,...

arxiv.org

1

0

23

muhtasham

@Muhtasham9

2 years

When your model is training and you see live footage of forward and back prop via @weights_biases

0

4

22

muhtasham

@Muhtasham9

1 year

@CisLmu researcher distilling latest paper about instruction tuning

1

4

21

muhtasham

@Muhtasham9

1 month

Attention modifications (), for example, using one K-V pair of matrices per group of Q matrices at once. This improvement mainly already affects the optimization of inference. But there are also a huge number of methods aimed at reducing the quadratic…

GQA: Training Generalized Multi-Query Transformer Models from...

Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to...

arxiv.org

2

22

muhtasham

@Muhtasham9

2 months

Except it’s called AI engineering now Come to @aiDotEngineer conf to learn more

vicki

@vboykis

2 months

2013 — 2023: you were hired to do machine learning but do data engineering 2023 — : you were hired to do machine learning but do web dev

21

36

773

3

2

22

muhtasham

@Muhtasham9

4 months

Eugene's blog:

Evaluation & Hallucination Detection for Abstractive Summaries

Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.

eugeneyan.com

0

19

muhtasham

@Muhtasham9

11 months

Burning some gpus after first @LangChainAI meetup in Munich

1

3

19

muhtasham

@Muhtasham9

3 years

New SOTA on BCI SSVEP spellers. Our new DNN achieves impressive information transfer rates (ITR) with only 0.4 seconds of stimulation: 265.23 bits/min on the benchmark and 196.59 bits/min on BETA dataset. Paper: Code: #bci #ssvep

3

1

15

muhtasham

@Muhtasham9

1 year

All started with GPT2 moment, but only last week trained internal model and it did good, but fine-tuning made 50% better. @amasad

1

3

17

muhtasham

@Muhtasham9

2 years

the amount of details one can get from @weights_biases is absolutely electric 💥

0

2

16

muhtasham

@Muhtasham9

1 year

Thanks for putting this together @nathanbenaich and @NotionHQ

1

3

17

muhtasham

@Muhtasham9

1 month

LayerNorm → RMSNorm (). RMSNorm is computationally simpler, but works with the same quality. (5/6)

Root Mean Square Layer Normalization

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling...

arxiv.org

1

0

16

muhtasham

@Muhtasham9

2 months

MLX weights below

mlx-community/zephyr-7b-gemma-v0.1-4bit · Hugging Face

huggingface.co

Lewis Tunstall

@_lewtun

2 months

Happy to share the latest Zephyr recipe based on @Google 's Gemma 7B 🔷🔶! Outperforms Gemma 7B Instruct on MT Bench & AGIEval, showing the potential of RLAIF to align this series of base models 💪 🧑‍🍳 I hope this recipe enables the community to create many more fine-tunes!

3

40

162

0

3

15

muhtasham

@Muhtasham9

1 month

“there's a graveyard of ideas around attention” @TrentonBricken

0

3

15

muhtasham

@Muhtasham9

1 year

Full house 🦜 @full_stack_dl

0

1

16

muhtasham

@Muhtasham9

2 months

Spotted GPT-5 in the wild

0

1

15

muhtasham

@Muhtasham9

5 months

@lvwerra Yay congrats also got recently promoted to Sr Random Seed Engineer

1

0

14

muhtasham

@Muhtasham9

1 year

@saahil addressing industry's challenges in scaling MLOps in multimodal settings

0

3

13

muhtasham

@Muhtasham9

2 years

“The thing that determines whether you’re the product isn’t whether you’re paying for the product: it’s whether market power and regulatory forbearance allow the company to get away with selling you.” — @doctorow

1

8

15

muhtasham

@Muhtasham9

1 month

machine learning is low-precision linear algebra during developing TPU google cut down mantissa from 23 bits to 5 bits and invented bf16 fast forward now we have 1.58 bit LLMs

Simon Willison

@simonw

1 month

Huh, I missed this earlier this month: Microsoft Research used a similar trick for their "1.58-bit" LLM BitNet

4

2

42

0

14

muhtasham

@Muhtasham9

2 years

Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.

0

2

14

muhtasham

@Muhtasham9

4 months

Supporting local compute pfp by @evanjconrad

3

0

14

muhtasham

@Muhtasham9

3 months

@swyx Shameless plug but this would make it easier to compare

muhtasham

@Muhtasham9

4 months

Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this…

10

35

258

1

0

13

muhtasham

@Muhtasham9

5 months

Whats the bottleneck of your GPU-floor? @anyscalecompute meetup

0

1

12

muhtasham

@Muhtasham9

1 year

@nathanbenaich @huggingface 🤗

1

0

10

muhtasham

@Muhtasham9

2 months

Uncle jokes followed by biggest GPU heck yeah #NVIDIA #GTC24

1

2

12

muhtasham

@Muhtasham9

6 months

Top recommendation: Beautifully written in-depth explanation of this concepts, which I failed to do in my initial blog High quality tokens, future LLMs can boost their reasoning and get sense of humor from @charles_irl if this blog ends up in their dataset

Charles 🎉 Frye

@charles_irl

6 months

PagedAttention, Virtual Context, Speculative Decoding, Register Tokens: the last year has seen many ideas from systems programming applied to LLMs. Not many folks live in that intersection, so I wrote an explainer post to make them a bit more accessible!

18

301

2K

1

3

11

muhtasham

@Muhtasham9

1 year

@rasbt @3scorciav

2

1

11

muhtasham

@Muhtasham9

2 months

It´s here

NVIDIA AI Developer

@NVIDIAAIDev

2 months

Accelerate your coding tasks, from code completion to code summarization with StarCoder2, the latest state-of-the-art, open code #LLM built by @HuggingFace , @ServiceNow , and NVIDIA. Learn more 👉

1

37

131

1

0

11

muhtasham

@Muhtasham9

2 months

GitHub - xai-org/grok-1: Grok open release

Grok open release. Contribute to xai-org/grok-1 development by creating an account on GitHub.

github.com

0

1

10

muhtasham

@Muhtasham9

1 year

iCoffe Pro Max

2

1

10

muhtasham

@Muhtasham9

6 months

Sharing @huggingface collection of old models from RoBERTa all the way to GPT2 pre-trained and finetuned on Tajik language, stay tuned for more to come, mistral-7b, llama2-7b, and others on the way

Tajik Language Models - a muhtasham Collection

huggingface.co

1

0

10

muhtasham

@Muhtasham9

2 months

Great tune! Smooth run on m2 8gb python -m mlx_lm.generate --model mlx-community/OpenCodeInterpreter-SC2-3B-4bit --prompt "Write a quick sort in C++" --temp 0.0 --colorize

Xiang Yue

@xiangyue96

2 months

🌟 Big thanks for making StarCoder 2 open-source! 🚀 We've swiftly finetuned it on our Code-Feedback instruction dataset, the dataset behind OpenCodeInterpreter. 📈 HumanEval Scores are boosted ~30%. 3B Model: from 31.7 to 67.1! 7B Model: from 35.4 to 75.6! 🛠️ CodeFeedback has…

42

64

265

0

3

10

muhtasham

@Muhtasham9

2 years

Reminder: Join amazing Transformers lecture by @giffmana tomorrow

Munich🥨NLP

@MunichNlp

2 years

🥨NEW EVENT🥨 Transformers in all glory details: @GoogleAI Brain Team Scientist Lucas Beyer @giffmana will explain the currently most dominant deep learning architecture for natural language processing in an exclusive event with @MunichNlp . Details below👇

1

3

11

0

4

10

muhtasham

@Muhtasham9

3 months

Will try to feed 10M tokens over weekend

1

0

9

muhtasham

@Muhtasham9

7 months

@NaderLikeLadder @alecqfong

0

2

10

muhtasham

@Muhtasham9

8 months

Transformers everywhere…

0

9

muhtasham

@Muhtasham9

3 months

Repo:

GitHub - Muhtasham/pod-helper: 🎧 Pod-Helper: Real-time audio transcription and repair on consumer...

🎧 Pod-Helper: Real-time audio transcription and repair on consumer hardware - Muhtasham/pod-helper

github.com

1

0

9

muhtasham

@Muhtasham9

2 months

is this this the company motto? smh @EMostaque stay strong king

Amjad Masad

@amasad

2 months

Corporate AI drama is accelerating faster than AI itself.

39

90

1K

0

9

muhtasham

@Muhtasham9

4 months

Patterns from CIDR database conference: Stanford - turns out databases are actually LLMs and every problem is an ML problem. Berkeley - let me solve some NP hardish algorithmic problem using LP and other techniques that might find application 50 years later. CMU - let me…

0

2

9

muhtasham

@Muhtasham9

7 months

@vboykis Also rich

0

muhtasham

@Muhtasham9

2 months

#SD3

0

9

muhtasham

@Muhtasham9

2 years

𝙏𝙝𝙧𝙚𝙚 𝙩𝙝𝙞𝙣𝙜𝙨 𝙚𝙫𝙚𝙧𝙮𝙤𝙣𝙚 𝙨𝙝𝙤𝙪𝙡𝙙 𝙠𝙣𝙤𝙬 𝙖𝙗𝙤𝙪𝙩 𝙑𝙞𝙨𝙞𝙤𝙣 𝙏𝙧𝙖𝙣𝙨𝙛𝙤𝙧𝙢𝙚𝙧𝙨 by @MetaAI Summary thread 🧵

1

8

muhtasham

@Muhtasham9

3 months

🟩

muhtasham

@Muhtasham9

10 months

w boss

3

1

55

0

9

muhtasham

@Muhtasham9

1 year

💫StarCoder which was released today by @BigCodeProject is prime example of Open Source outcompeting Big shot out to @lvwerra @harmdevries77 @Thom_Wolf @huggingface @ServiceNowRSRCH

Dylan Patel

@dylan522p

1 year

Google "We Have No Moat, And Neither Does OpenAI" Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI This is the opinion of one Googler, we do not agree, simply sharing. $GOOGL $MSFT $META $AI $NVDA $AMZN $AAPL

31

124

690

0

9

muhtasham

@Muhtasham9

1 year

Beating OpenAI large v2 with Fine-tuned *medium* model from 85.8 WER down to 23.1 WER special thanks to @LambdaAPI and @huggingface team especially @sanchitgandhi99 and @reach_vb

muhtasham/whisper-medium-tg_tj · Hugging Face

huggingface.co

0

8

muhtasham

@Muhtasham9

7 months

Based @ykilcher at @tum .ai summit

2

0

8

muhtasham

@Muhtasham9

1 year

Nett hier. Aber waren Sie schon mal in @TU_Muenchen ?

1

0

8

muhtasham

@Muhtasham9

1 year

How to get rich from LLMs 🤑 This made my day @full_stack_dl

0

8

muhtasham

@Muhtasham9

2 years

looking forward to this talk @PyConDE #PyConDE #PyDataBerlin

Financial Portfolio Management with Deep Reinforcement Learning #PyConDE #PyDataBerlin #PyData

intelligent_portfolio_optimization_with_deep_reinforcement_learning

2022.pycon.de

0

4

8

muhtasham

@Muhtasham9

1 month

@jtvhk bruhh they should just outsource to @sfcompute

0

8

muhtasham

@Muhtasham9

1 year

Kinda like this emoji 🌉 but with crescent 🌙

0

7

muhtasham

@Muhtasham9

2 months

Image and prompt by yours truly @marksaroufim teaching style is like a casual conversation with a senior engineer on your team

Andreas Köpf

@neurosp1ke

2 months

CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC

1

20

107

1

8

muhtasham

@Muhtasham9

1 year

@MattNiessner @synthesiaIO Forget AutoGPT, AutoProf is the real deal

0

8

muhtasham

@Muhtasham9

2 months

@ClementDelangue Yeah your runway should be enough to do this

1

0

8

muhtasham

@Muhtasham9

1 year

“LLMs are not database, they are not up to date, think of them as are reasoning engine and some sort of retrievers will solve the the issue of up do date knowledge” @sama

0

3

muhtasham

@Muhtasham9

9 months

Lot of wisdom from @kagglingdieter

2

0

8

muhtasham

@Muhtasham9

2 months

Super model MLX weights below

mlx-community/phi-2-super-4bit · Hugging Face

huggingface.co

anton

@abacaj

2 months

Release phi-2-super. Fine tuned over phi-2 and aligned with cDPO. MT-bench of 7.1875, surpassing many larger models. Humaneval score 60.98%, Humaneval-Plus 54.88%

47

62

566

0

2

8

muhtasham

@Muhtasham9

2 months

Took some time off web-sockets

1

0

8

muhtasham

@Muhtasham9

1 year

@Francis_YAO_

Efficiently Scaling Transformer Inference

We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths....

arxiv.org

0

1

7

muhtasham

@Muhtasham9

2 months

Just noticed slick apple podcast feature with search now I can just type instead of rewinding back to find right segment And another banger from @latentspacepod @FanaHOVA @swyx

0

6

muhtasham

@Muhtasham9

1 year

With the swarm of users experimenting @bing Chat aka Sydney. I feel similar vibes like that of “OMG LaMDA is sentient guy”. Again many things can be said but before folks start posting terminator images let me leave this here …

1

7

muhtasham

@Muhtasham9

1 year

Benedikt sharing the learnings from 5 data science competitions for recommender systems he did over the last 3 years.

0

7

muhtasham

@Muhtasham9

7 months

bf16 >> fp16 more numerically stable in practice

0

7

muhtasham

@Muhtasham9

8 months

Thanks @dk21 and @jefrankle for this amazing session, can’t wait for upcoming sessions

Weights & Biases

@weights_biases

8 months

We are LIVE🎉 Tune in for Lesson 3 of the Training & Fine-Tuning LLMs Course with @MosaicML 📚 You will learn data scaling laws to construct custom datasets, & dive deep into data curation, ethics, storage, & streaming best practices. Stream now🔗

0

3

6

0

1

7

muhtasham

@Muhtasham9

2 years

Found the famous books cover page star while hiking today @aureliengeron

0

7

muhtasham

@Muhtasham9

9 months

@EMostaque replicate?

0

4

muhtasham

@Muhtasham9

3 months

Germany is probably the only country you get invited to dinner by VC and the day after get asked to paypal the amount, or probably recession hitting hard on everyone

1

0

7

muhtasham

@Muhtasham9

9 months

TIL: @lexfridman hails from Buston, Tajikistan 🇹🇯 When our paths cross, I'll be ready with a friendly, "What's up, homie?"

1

7

muhtasham

@Muhtasham9

6 months

@amasad @perplexity_ai @googlecloud Damn time to switch all dev to iPad with Replit Core

0

2

muhtasham

@Muhtasham9

2 years

@bradneuberg @tszzl @amasad Should be from the Facebook IPO, so around 2012

1

0

7

muhtasham

@Muhtasham9

1 year

Jeremy Howard

@jeremyphoward

1 year

"Any model made available in the EU, without first passing extensive, and expensive, licensing, would subject companies to massive fines of the greater of €20,000,000 or 4% of worldwide revenue. Opensource developers, and hosting services such as GitHub... would be liable"

102

234

1K

0

3

muhtasham

@Muhtasham9

2 months

this guy is bending trajectories in latent space

batuhan (e/single)

@isidentical

2 months

@altryne @playground_ai @alisabets @EMostaque down to 1.73sec/image on A100. i really need to sleep but can't stop, am so competitive.

0

10

0

1

6

muhtasham

@Muhtasham9

5 months

🍂🍁

0

6

muhtasham

@Muhtasham9

2 months

StarCoder 2 and The Stack v2: The Next Generation

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software...

arxiv.org

0

4

6

muhtasham

@Muhtasham9

9 months

@Teknium1 prefill: 46.4 tok/s, decode: 7.2 tok/s iPhone 14 pro

0

6