Andriy Burkov @burkov Twitter profile

Pinned Tweet

Andriy Burkov

@burkov

9 months

You can now ask questions to my books:

6

15

109

Last Seen Profiles

@electricidty

@TomJumboGrumbo

@CoachKG5

@MB_destiny9397

@mezgit24

@Here4YOUth_

@sayakafc

@OleMiss

@Alc_Iztapalapa

@kkaayybaby

@UltraInstinct_8

@BooksofMyHeart

@AFA_Air_Space

@drpeaceokonkwo

@junhyeongeum

@NWC_DAMMAM

@CFXway

@borjasfoxa

@carrigman

@SREWeekly

@JayWinuk

@PRiogMistan

@cursed_delta

@susanematthews

@DeonLapd

@ruthiepikelet

@only_angel123

@girlfiemarkeuu

@CoachSilby

@stylorvf

@GustavoMonaco

@coarxs

@Moala7Akauola

@sarahrobbo1985

@xtkoll

@jason_edge

Andriy Burkov

@burkov

3 years

To say "machine learning is just statistics" is as stupid as saying that physics is just mathematics.

311

561

11K

Andriy Burkov

@burkov

19 days

Meta is doing what OpenAI was funded to do, but Zuck is somehow the bad guy while Altman is a visionary.

220

500

6K

Andriy Burkov

@burkov

2 months

If today's Google was the 15-years-ago Google, ChatGPT would have been invented in Google while OpenAI would still be a catching-up non-profit. Today's Google is a shadow of what the company once was: the one that invented an infinite-size email inbox, reinvented online maps,…

212

432

6K

Andriy Burkov

@burkov

4 months

GPT-4 is officially annoying. You ask it to generate 100 entities. It generates 10 and says "I generated only 10. Now you can continue by yourself in the same way." You change the prompt by adding "I will not accept fewer than 100 entities." It generates 20 and says: "I stopped…

556

241

5K

Andriy Burkov

@burkov

2 months

Anyone who tried to read any scientific article at least once knows that English cannot be used to clearly convey ideas. Most people, including the brightest of scientists, have a hard time writing clearly. I'm sure Nvidia CEO knows that too. What he is doing here is he is…

Dare Obasanjo🐀

@Carnage4Life

2 months

Jensen Huang, CEO of Nvidia, argues that we should stop saying kids should learn to code. He argues the rise of AI means we can replace programming languages with human language prompts thus enabling everyone to be a programmer. AI will kill coding.

1K

5K

21K

296

356

4K

Andriy Burkov

@burkov

2 months

1. Finetune an LLM on your training data. 2. Demo the performance on the same training data. 3. Make big claims. So typical that even annoying.

Cognition

@cognition_labs

2 months

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is…

4K

11K

46K

135

241

4K

Andriy Burkov

@burkov

2 months

@OilGains You see - you read my post in English and didn't understand anything.

54

95

3K

Andriy Burkov

@burkov

5 months

Isn't it a matter of prestige for Google to build a serious rival to GPT-4? What's wrong with this company for the last 5 years? How can they let a 700-people company beat Google in AI? Is it a mediocre CEO or something else?

333

106

3K

Andriy Burkov

@burkov

2 months

Don't let them fool you: AGI today is no nearer to us than it was two years ago. While ChatGPT might appear to be a step closer to AGI, from a scientific standpoint, it's not: training a neural network to predict the next word is not groundbreaking science. Achieving AGI would…

270

323

2K

Andriy Burkov

@burkov

18 days

Well, Llama 3 8B is not that magical after all. (A simple one!)

202

105

2K

Andriy Burkov

@burkov

8 months

TensorFlow doesn't support Windows anymore. Google is killing yet another product they convinced the world to use.

127

144

2K

Andriy Burkov

@burkov

8 months

My daughter just started college: "Dad, we are studying matrices. There are rules of how to multiply them, I get them, but I don't get why we need all this." Me working on an illustration for the Transformer chapter of my new book: "Oh, look at my screen. Here's why:"

66

127

2K

Andriy Burkov

@burkov

11 days

Google lays off its entire Python team:

61

233

1K

Andriy Burkov

@burkov

4 years

My top Python libraries for data science: scikit-learn PyTorch TensorFlow Keras Pandas SciPy NumPy Seaborn spaCy XGBoost Bonus: Gensim Scrapy Flask MySQLdb huggingface What's missing?

141

239

1K

Andriy Burkov

@burkov

3 months

Noticed how nobody says "deep learning" anymore?

97

70

1K

Andriy Burkov

@burkov

2 years

In my team, we use physical GPUs for machine learning R&D not because cloud GPUs are 5 to 10 times more expensive, but because innovation is impossible when you have to think "do I experiment or do I save money."

35

91

1K

Andriy Burkov

@burkov

18 days

I see too many people don’t understand why Meta spends billions to train and then gives away its large language models. They think Zuck does this purely out of a love for open source. In any decision, there are two elements: 1) a reason and 2) a pretext. Sometimes these align,…

93

127

1K

Andriy Burkov

@burkov

6 months

I really admire how Elon Musk can make an event out of anything. So he apparently trained a GPT-3.5 competitor (which already has dozens of competitors and is not really hard to beat given it only has 20B parameters) called Grok but everybody is already talking about it as an…

198

67

1K

Andriy Burkov

@burkov

16 days

This is how you win the AI race

59

78

1K

Andriy Burkov

@burkov

4 months

For the first time, I actually believe that Meta will match or beat GPT-4. It looks like for Zuckerberg it's a personal matter. It's also a great strategy: by releasing a model similar to GPT-4 under Apache 2.0 license, it kills the business of its growing competitor while it's…

64

60

980

Andriy Burkov

@burkov

2 years

If I was to start in AI today from scratch, I would start with The Hundred-Page Machine Learning Book. This was in fact my primary motivation to write it. My target audience was me 10 yeas ago.

13

79

955

Andriy Burkov

@burkov

2 years

Looks like ML converges to the following: - xgboost for tabular data, - a pretrained transformer for everything else.

25

110

952

Andriy Burkov

@burkov

2 months

There are only 2 possibilities: 1. GPT-4 is a 2T model and OpenAI uses an entire node of 8xH100 (that costs $400,000) to serve the inference just for you for $20/month. or 2. GPT-4 is a model that is 10 times smaller (it cannot be smaller than 200B) and OpenAI uses one H100…

169

85

952

Andriy Burkov

@burkov

3 months

The most popular use case for Claude and Gemini is to compare them to GPT-4.

32

72

949

Andriy Burkov

@burkov

5 months

Despite what Elon and many other optimists think, autoregressive models (which LLMs are) will not be able to write more than one or a couple pages of coherent text. An entire book? No way. Each newly generated word contributes to the error. After a couple of pages, the error is…

155

81

799

Andriy Burkov

@burkov

3 years

Technical books are expensive. In some countries, people need to work an entire week to buy one. This is awful. Here's what I think as an author of two expensive books. I don't mind if you pirate my books to learn. Buy it later, when you get a better job thanks to the knowledge.

15

58

791

Andriy Burkov

@burkov

2 years

If you want to do a PhD in AI and look for an exciting research direction, here's one for you: memory-augmented machine learning. The goal is to create algorithms that would train a model that would decide when to use external memory of facts or what to save in it for future use.

24

83

784

Andriy Burkov

@burkov

3 years

Why spend 2 weeks to label more data if you can spend an entire year to design a more complex NN architecture?

16

51

789

Andriy Burkov

@burkov

2 years

Two books to start your machine learning journey

9

129

760

Andriy Burkov

@burkov

6 months

This 7B model beats ChatGPT and Grok:

openchat/openchat_3.5 · Hugging Face

huggingface.co

38

76

762

Andriy Burkov

@burkov

3 years

People who try to learn machine learning (or another similar) science by themselves find it very hard to understand why those 1/N or 1/2 are used. It takes years before they realize that it doesn't serve any purpose other than aesthetically pleasing the scientist who wrote them.

33

95

747

Andriy Burkov

@burkov

5 months

What is a good detailed tutorial on LLM fine-tuning?

22

63

756

Andriy Burkov

@burkov

2 years

If you want to make a career in machine learning knowing only one ML algorithm, learn xgboost.

28

59

706

Andriy Burkov

@burkov

5 months

SOLAR: an 11B model that beats every open model, including Mixtral, Yi-34B, Llama 2 70B, and Falcon 180B:

19

72

695

Andriy Burkov

@burkov

4 months

OpenAI doesn't use ChatGPT to power its customer support chatbot. This is everything you need to know about using LLMs for anything more important than generating noisy training data and RAG.

25

55

686

Andriy Burkov

@burkov

3 years

Data science in the nutshell: do linear regression, earn $175k.

19

104

642

Andriy Burkov

@burkov

5 months

The whole idea of "safe/unsafe LLMs" is based on the assumption that an adult person is incapable of critical thinking or can suffer damage from words. This infantile idea is a reflection of how infantile the Western civilization has become.

71

92

643

Andriy Burkov

@burkov

7 months

Despite the fact that most LLMs have the chatting capability and many are even finetuned to chat, this capability is useless in the commercial B2C or B2B setting. Multistage chats are unreliable, they quickly diverge from the business objective, the level of hallucination…

71

78

650

Andriy Burkov

@burkov

3 years

Want to quickly test if a candidate understands machine learning? Ask only two questions: 1) why a test set is needed and 2) why linear regression works poorly with outliers.

24

65

609

Andriy Burkov

@burkov

2 years

It's not science if you trained an even larger neural network. It's engineering. Science would be to achieve a similar or better model quality by using a fraction of resources. Science would be to solve a problem that previously wasn't solvable. Leave engineering to engineers.

28

62

579

Andriy Burkov

@burkov

2 years

In machine learning, you never know whether the project will succeed or not. In most cases, it doesn't succeed indeed. This makes working in a non-AI-centric organization painful as an ML engineer or data scientist. When you cannot commit to success, you seem to be a layman.

17

84

589

Andriy Burkov

@burkov

4 months

It's unlikely that OpenAI will win against The NY Times. The reason for this is simple: they don't know how ChatGPT works and thus will have a hard time answering the judge's question: "Is it possible that your model reproduces the copyrighted content verbatim? If yes, can you…

174

84

580

Andriy Burkov

@burkov

13 days

A 262k-token context finetune of Llama 3 8B:

gradientai/Llama-3-8B-Instruct-262k · Hugging Face

huggingface.co

24

67

567

Andriy Burkov

@burkov

2 years

One of the most important features of machine learning (probably the most important one) is that you don't have to know math to train models. All the optimization is carefully isolated from the user. What previously took talent and years of complex math studies now takes nothing.

31

65

559

Andriy Burkov

@burkov

2 years

If you want to do a Ph.D. in machine learning and look for an exciting research direction, here's one: algorithms and techniques that would allow encrypting the data, training a model on the encrypted data, and then using the model on the unencrypted data.

17

74

537

Andriy Burkov

@burkov

1 month

We should seriously stop calling open-weight LLMs "open source". Weights are not equivalent to the source code in traditional software. Data is. So if the creator of an LLM is showing you weights they are just showing off. They don't let you reproduce their model independently…

46

68

542

Andriy Burkov

@burkov

2 months

This one will fail miserably when put in the wild. It will end up where self-driving cars have gone. Remember this tweet.

16

24

542

Andriy Burkov

@burkov

4 months

Modern AI has become possible thanks to this game: the first really 3D first-person shooter that benefited from a GPU.

23

50

529

Andriy Burkov

@burkov

4 months

If you really want to do something useful in AI, instead of training another tiny llama, pick up this project and train a 1B-parameter multilingual BERT with 32k input size. The code is here . The data is all over @huggingface . The…

GitHub - HazyResearch/m2: Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"

Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture" - HazyResearch/m2

github.com

9

80

505

Andriy Burkov

@burkov

2 months

"The first AI software engineer" my ass.

11

21

504

Andriy Burkov

@burkov

19 days

Llama 3 70B beats Mistral Large. What exactly Mistral is now supposed to sell?

66

28

506

Andriy Burkov

@burkov

2 months

Function calling accuracy in LLMs really sucks. The best function calling accuracy is obtained with GPT-4 and it's 83.8%. It's already too low to be practical, but one should discount this number more assuming that the test data Berkeley used to evaluate function calling…

24

45

501

Andriy Burkov

@burkov

2 years

A high threshold for entry into the field of machine learning is a lie. Compared to many other professions ML is very simple to get into. Just learn one programming language considered the simplest to learn (Python) and two libraries that have the best docs (sklearn and PyTorch).

17

69

487

Andriy Burkov

@burkov

5 months

Ok, all of you remember that demo when Google's model called a restaurant to make a reservation and pretended to be a real person? Remember all those wows and cheerings? Where is this model now? Did anyone see it, try it IRL? OpenAI didn't pompously show ChatGPT in a video one…

28

33

449

Andriy Burkov

@burkov

5 months

Once Google caught Bing on using Google's search results. Now it's the other way around. Microsoft is the Google of our times.

43

26

440

Andriy Burkov

@burkov

2 years

Why do so many scientists support paywalled Medium? Isn't it the opposite of the openness the research community gears towards with ArXiv, GitHub, and alikes?

29

37

439

Andriy Burkov

@burkov

2 months

It's not a coincidence that Claude 3 beats GPT-4 on all benchmarks by a small margin. I think that the trick is to constantly run the pretraining of an ~8x100B MoE LLM on more and more data and do occasional instruct-finetunes to see if it beats GPT-4 on all benchmarks. Once it…

32

41

442

Andriy Burkov

@burkov

2 years

This is probably one of the most important web pages on AI:

2

51

417

Andriy Burkov

@burkov

2 years

If I had an hour to solve a problem I'd spend 55 minutes labeling data and 5 minutes training the model.

13

46

423

Andriy Burkov

@burkov

2 months

@zendaimyo I said English but I meant all human language. Very weird think to narrow down on to criticize imo.

12

2

420

Andriy Burkov

@burkov

6 months

A 7B model from Intel almost as capable as Falcon 180B:

Intel/neural-chat-7b-v3-1 · Hugging Face

huggingface.co

11

37

419

Andriy Burkov

@burkov

2 years

In computer science, 2% of scientists do 98% of useful research.

33

28

403

Andriy Burkov

@burkov

4 months

So you realize what really drives AI applications. Downloads last month on @huggingface : Mixtral-8x7B-Instruct-v0.1: 843,843 phi-2: 329,824 bert-base-uncased: 32,670,091 roberta-base: 21,673,938 Clearly, the most useful model right now would be a 1B-parameter BERT with 32k+…

21

43

401

Andriy Burkov

@burkov

3 years

FLAML by Microsoft: a lightweight Python library that finds accurate machine learning models automatically, efficiently, and economically.

GitHub - microsoft/FLAML: A fast library for AutoML and tuning. Join our Discord: https://discord...

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP. - microsoft/FLAML

github.com

4

73

396

Andriy Burkov

@burkov

2 years

"machine learning with Excel" "data science with command line" "deep learning with R" ok, it's possible, but WHY?

28

42

383

Andriy Burkov

@burkov

6 months

Why there's no LLM finetuned specifically for RAG? It's the most important use case for LLMs.

36

27

382

Andriy Burkov

@burkov

2 years

Are you thinking about doing a PhD in AI and looking for an exciting research direction? Here's one for you: ML with a human in the loop. That is, AI should be smart enough to know when to ask a human for a label, when to pass control to the human, and when to doubt the human.

17

45

368

Andriy Burkov

@burkov

5 months

The Apache 2.0 licensed Mixtral beats proprietary GPT-3.5 Turbo, Gemini Pro, and the newest Claude 2.1. It would take just careful fine-tuning to reach GPT-4 level of performance. 2024 will be awesome!

12

56

364

Andriy Burkov

@burkov

5 months

Claude 2.1 is less capable than Claude 2.0 and Claude 1.0. This is everything you need to know about how well we understand neural networks.

21

25

364

Andriy Burkov

@burkov

3 months

This page is gold. This is how you describe your models:

7

65

356

Andriy Burkov

@burkov

1 month

A 7B-parameter model that beats ChatGPT-3.5, Mixtral, Gemni Pro, and some of the best 30B and 70B models. Isn't this exciting? Meaning that you can squeeze much more capability per parameter if you know what you are doing.

16

46

365

Andriy Burkov

@burkov

2 years

Want your machine learning project to be as far as possible from production? Start it in a notebook.

27

28

347

Andriy Burkov

@burkov

6 months

@SciumoInc Nobody pays for GPT-3.5. It's free. Nobody got even close to GPT-4 yet.

10

1

342

Andriy Burkov

@burkov

2 years

There will be no new AI winter. There will be a data science/data scientist winter. Most businesses will soon see no real benefit from having a team of data scientists assuming the current cost of having such a team.

35

34

331

Andriy Burkov

@burkov

2 years

2011: "We provide AI-powered search." = "We use TF-IDF." 2021: "We provide AI-powered search." = "We use pretrained document embeddings."

8

34

335

Andriy Burkov

@burkov

2 months

@DanielCardena Oh yes, we have seen Bard, that was definitely ground-breaking :-)

2

1

326

Andriy Burkov

@burkov

2 years

The biggest lie in modern AI is that you don't need to understand the math behind it to be able to create successful AI systems. To solve a problem using machine learning, you need to formulate an optimization problem. If you don't understand math, good luck in guessing it!

16

59

312

Andriy Burkov

@burkov

7 months

We need to find a better name for what we currently call open-source LLMs. To reproduce an LLM, the source code is not enough. It's even not the main component. The main component is the dataset. So, if an organization only releases the source code but keeps the pretraining or…

20

32

310

Andriy Burkov

@burkov

17 days

In English, Llama 3 8B is as good as Mistral Large, the most capable closed Mistral's model likely larger than 200B parameters. This is unbelievable!

21

23

310

Andriy Burkov

@burkov

4 years

This chart demonstrates a potential machine learning roadmap for machine learning engineers.

5

67

295

Andriy Burkov

@burkov

3 months

I said Apple Vision Pro didn't have a killer app. I'm sorry, I was wrong.

14

38

291

Andriy Burkov

@burkov

5 months

Hallucinations in LLMs are by design. It's a feature, not a bug. And you cannot fix a feature.

13

40

286

Andriy Burkov

@burkov

19 days

Smaller than 100B-parameter models are poor with factual queries. Their logic and math capabilities are approaching GPT-4, but they cannot get the facts right. This is likely where the additional 300B+ parameters are used in larger models. PS: I think the parameters is not the…

35

22

287

Andriy Burkov

@burkov

1 year

Engineers of the future listening to ChatGPT

10

33

277

Andriy Burkov

@burkov

18 days

@itsHesamSheikh No, because this is not Meta's core business and they don't think it will pay enough to care. They fear to lose their business to someone who will become too strong to fight with. It's better when there are thousands small AI companies than 3 large ones.

5

11

278

Andriy Burkov

@burkov

2 years

Another ML quiz: you didn't update the model in production but, after some time, the predictions of the model changed for some inputs. The inputs didn't change. What happened?

63

15

274

Andriy Burkov

@burkov

3 years

Machine learning is the only engineering field where independently how much an expert you are, you will answer "I don't know" to the question whether this can be done.

7

22

267

Andriy Burkov

@burkov

2 months

Theorem: Fixing the problem of hallucinations in LLM is equivalent to creating the AGI.

66

20

269

Andriy Burkov

@burkov

4 years

A series of paths created by 800 unmanned bicycles being pushed until they fall over:

4

41

263

Andriy Burkov

@burkov

2 years

NLP is one of the most exciting applications of machine learning with lots of interesting challenges, techniques, and tricks. However, no one buys books on NLP. It discourages and I feel sorry for the authors. Why do you think this is?

41

24

253

Andriy Burkov

@burkov

3 years

In reality, Abraham would spend the first 6 hours updating CUDA drivers :-)

Daniel Bourke

@mrdbourke

3 years

“If I had 8 hours to build a machine learning model, I’d spend the first 6 hours preparing my dataset.” - Abraham Lossfunction

22

238

2K

8

29

250

Andriy Burkov

@burkov

4 months

Phi-2, a 2.7B parameter LLM from @Microsoft , is now distributed under the MIT license which allows commercial use. The model beats the most capable models of up to 13B parameters, including Mistral-7B and Llama 2-13B, on most benchmarks, especially on math and coding benchmarks.…

2

38

255

Andriy Burkov

@burkov

5 months

A Llama-2-based model finetuned for function calling:

Trelis/Llama-2-7b-chat-hf-function-calling-v2 · Hugging Face

huggingface.co

2

29

240

Andriy Burkov

@burkov

1 month

So the term was coined last year, but some geniuses from HR want you to have 3+ years experience with LLMs, as well as "Lang Chain, LLAMA Index" (looks like they put them in quotes because someone sent them this expression in quotes).

25

241

Andriy Burkov

@burkov

2 years

Transfer learning is a unique skill of neural networks that no other machine learning algorithm has. This unique property is way more important than their ability to learn deep structures.

7

25

238

Andriy Burkov

@burkov

5 months

Just figured out that when you use function calling in OpenAI API, you should submit the function call result back to the chatbot using "role": "function".

13

16

235

Andriy Burkov

@burkov

2 years

Open source NLP is fueling a new wave of startups

0

39

235

Andriy Burkov

@burkov

3 years

The tragedy of a data scientist: a programmer only needs a computer to create, while a data scientist cannot do anything without a dataset.

12

34

233

Andriy Burkov

@burkov

1 month

I don't know why no one has yet implemented such an obvious idea: instead of training an LLM to predict the next word, train it to predict a full paragraph of maximum length of, say, 100 tokens. As a result: decreased hallucinations and 100 times faster inference.

95

9

241

Andriy Burkov

@burkov

3 years

A friendly introduction to machine learning compilers and optimizers

0

39

217