Sebastien Bubeck @SebastienBubeck Twitter profile | Pikagi

Pikagi

Sebastien Bubeck

@SebastienBubeck

34,456

Followers

1,318

Following

166

Media

1,440

Statuses

VP GenAI Research, Microsoft AI

Seattle, WA

https://t.co/4ivbPewmyf

Joined January 2012

Don't wanna be here? Send us removal request.

Pinned Tweet

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

10 days

phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)

40

183

912

Last Seen Profiles

@michaelnorthup1

@LCLuminator

@mmay3r

@cesarconde_

@gwolajan

@LawnAngeli50755

@AriannaSavage8

@AUAfghanistan

@xairylle_art

@TorontoRehab

@rumeosagie

@MichaelCendrick

@blocksixtynine

@Sluggers07VB

@pixelgames

@Larasglasses

@NatsWP

@ClubGosport

@tattyhassan

@hourlysungchan

@PaulScottt61978

@MetinUca

@LiliumRegia

@YusufDFI

@tuyotuya817

@DirectRelief

@bdanderson13

@KissnerRadio

@FinleyBizjack

@nular

@royal_fantasyy

@JustKeepItJuicy

@travlingtenor

@familyguyshots

@Currencycloud

@aselsanaa

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

At @MSFTResearch we had early access to the marvelous #GPT4 from @OpenAI for our work on @bing . We took this opportunity to document our experience. We're so excited to share our findings. In short: time to face it, the sparks of #AGI have been ignited.

Tweet media one

67

730

3K

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 months

Starting the year with a small update, phi-2 is now under MIT license, enjoy everyone!

Tweet media one

54

284

2K

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

We trained a small transformer (100M params) for basic arithmetic. W. the right training data it nails 12x12 digits multiplication w/o CoT (that's 10^24 possibilities, so no it's not memorization🤣). Maybe arithmetic is not the LLM kryptonite after all?🤔

Tweet card media

Positional Description Matters for Transformers Arithmetic

Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding...

68

270

2K

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

11 months

New LLM in town: ***phi-1 achieves 51% on HumanEval w. only 1.3B parameters & 7B tokens training dataset*** Any other >50% HumanEval model is >1000x bigger (e.g., WizardCoder from last week is 10x in model size and 100x in dataset size). How? ***Textbooks Are All You Need***

Tweet media one

45

340

2K

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

Last couple of weeks I gave a few talks on the Sparks paper, here is the MIT recording! The talk doesn't do justice to all the insights we have in the paper itself. Neither talk nor twitter threads are a substitute for actual reading of the 155 pages :-)

Tweet card media

Sparks of AGI: early experiments with GPT-4

The new wave of AI systems, ChatGPT and its more powerful successors, exhibit extraordinary capabilities across a broad swath of domains. In light of this, w...

www.youtube.com

15

299

585

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

The Chomsky et al. opinion piece in the @nytimes about ChatGPT is making the rounds. Rather than trying to deconstruct their argument, I asked @bing what it thinks of it. Now you can judge for yourself who has the moral high ground 😂.

Tweet media one

51

293

1K

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

Enjoy everyone! (And remember it's a base model so you might have to play around with your prompts; if you want it to follow instructions you can try the format "Instruct:... Ouput:")

Tweet card media

microsoft/phi-2 · Hugging Face

28

195

1K

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

8 months

How far does one billion parameters take you? As it turns out, pretty far!!! Today we're releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs. For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B

Tweet media one

32

182

842

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

Transformers are changing the world. But how do they learn? And what do they learn? Our 1st @MSFTResearch ML Foundations team paper proposes a synthetic task, LEGO, to investigate such questions. Sample insights on Transformers thanks to LEGO below 1/8

Tweet card media

Unveiling Transformers with LEGO: a synthetic reasoning task

We propose a synthetic reasoning task, LEGO (Learning Equality and Group Operations), that encapsulates the problem of following a chain of reasoning, and we study how the Transformer...

4

110

738

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

We may have found a solid hypothesis to explain why extreme overparametrization is so helpful in #DeepLearning , especially if one is concerned about adversarial robustness. 1/7

Tweet media one

5

129

677

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

Phi-2 numbers, finally! We're seeing a consistent ranking: phi-2 outperforms Mistral 7B & Gemini Nano 2* (*on their reported benchmarks) and is roughly comparable to Llama 2-70B (sometimes better, sometimes worse). Beyond benchmarks, playing with the models tells a similar story.

Tweet media one

@satyanadella

Satya Nadella

5 months

From new best-in-class small language models to state-of-the-art prompting techniques, we’re excited to share these innovations and put them in the hands of researchers and developers.

95

161

2K

21

84

587

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

phi-2 is coming to Hugging Face, hold tight :-)

19

32

556

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

For my 500th tweet I'm super excited to release five 1h videos covering the most important results presented in my monograph Convex Optimization: Algorithms and Complexity. This time I tried hard to emphasize the intuition behind the calculations! 1/6

Tweet card media

Convex Optimization: Algorithms and Complexity

www.youtube.com

2

105

532

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

phi-2 is really a good base for further fine-tuning: we FT on 1M math exercises (similar to phi-1 w. CodeExercises) & test on recent French nation-wide math exam (published after phi-2 finished training). The results are encouraging! Go try your own data

Tweet media one

21

69

525

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

(Nesterov) Acceleration in convex optimization is one of the most striking phenomenon in all of optimization, and now you can learn about all the different viewpoints on it from a very nice 156 pages survey paper by d'Aspremont, Scieur and Taylor!

Tweet card media

Acceleration Methods

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families...

3

102

514

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

The video of my talk @EPFL_en today on Transformers and how to make sense of them is online!

Tweet card media

Unveiling Transformers with LEGO

Based on joint work with Yi Zhang, Arturs Backurs, Ronen Eldan, Suriya Gunasekar, Tal Wagner https://arxiv.org/abs/2206.04301.

www.youtube.com

5

68

521

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

Sorry I know it's a bit confusing: to download phi-2 go to Azure AI Studio, find the phi-2 page and click on the "artifacts" tab. See picture.

Tweet media one

@karpathy

Andrej Karpathy

5 months

@simonw No they fully released it. But they hide it very well for some reason. Go to artifacts tab.

13

11

350

32

55

495

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

6 months

Microsoft💜Open Source + SLMs!!!!! We're so excited to announce our new *phi-2* model that was just revealed at #MSIgnite by @satyanadella ! At 2.7B size, phi-2 is much more robust than phi-1.5 and reasoning capabilities are greatly improved too. Perfect model to be fine-tuned!

Tweet media one

Tweet media two

18

87

491

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

A major open problem in ML is whether convex techniques (kernel methods in particular) can reproduce the striking successes of deep learning. In a guest post series (two parts) on I'm a bandit, @julienmairal weighs in on the question!

Tweet media one

2

118

464

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

Just watched an incredible talk by @AlexGDimakis at the Simons Institute, highly recommended. Their Iterative Layer Optimization technique to solve inverse problems with GANs make a LOT of sense! The empirical results on the famous blurred Obama face speak for themselves! 1/4

Tweet media one

3

78

461

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Time for some retrospective! I shared some 25 papers that I particularly enjoyed in the last decade. I would love for you to share some papers that are missing in this list (there are many!!), either here or in the comments on the blog.

2

106

430

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

6 months

My group is hiring a large cohort of interns for the summer of 2024 to work on the Foundations of Large Language Models! Come help us uncover the new physics of A.I. to improve the LLM building practices! (Pic below from our NeurIPS 2023 paper w. interns)

Tweet media one

9

52

404

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

At #NeurIPS2018 where it was just announced that our paper on non-smooth distributed optimization with Kevin Scaman, @BachFrancis , Laurent Massoulie and Yin Tat Lee got a best paper award. Lots of interesting open problems left there, check out the paper

Tweet card media

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the...

10

48

377

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

We ( @gauthier_gidel @velythyl @busycalibrating @vernadec & myself) would like to announce the accepted blog posts to @iclr_conf 's 1st Blogpost Track. Experiment was a great success with 20 accepted posts out of 61 submissions, roughly the size of the 1st @iclr_conf itself! 1/24

7

67

363

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

I'm really happy that the law of robustness got recognized as an important new insight with a NeurIPS outstanding paper award! The video below summarizes what the law is about, what it means, and what it predicts. It's also a great capstone for @geoishard 's fantastic phd work!

@MSFTResearch

Microsoft Research

2 years

Learn about the significance of overparametrization in neural networks, the universal law of robustness, and what “A Universal Law of Robustness via Isoperimetry” means for future research in this short video with @SebastienBubeck : .

1

25

131

25

41

362

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

10 days

Game on!

Tweet card media

microsoft/Phi-3-mini-4k-instruct · Hugging Face

21

55

363

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

A fun way to describe Nesterov's momentum:

Tweet media one

1

52

349

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

The **Machine Learning Foundations** group at @MSFTResearch Redmond is hiring at all levels (including postdoc)! Come join @ZeyuanAllenZhu @suriyagnskr @jerryzli @ilyaraz2 @talw and myself to develop the next generation of ML theory!

11

83

341

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

Karen Uhlenbeck concludes her Abel prize lecture with 5 minutes on #DeepLearning !!! She says about it: ``My conjecture is there is some interesting mathematics of some sort that I have no idea." Couldn't agree more.

Tweet card media

Karen Uhlenbeck: Some Thoughts on the Calculus of Variations

Abstract:I will talk about some of the classic problems in the calculus of variations, and describe some of the mathematics which was developed to solve them...

www.youtube.com

5

42

318

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

We're so pumped to see phi-2 at the top of trending models on @huggingface ! It's sibling phi-1.5 has already half a million downloads. Can't wait to see the mechanistic interpretability works that will come out of this & their impact on all the important LLM research questions!

Tweet media one

25

64

314

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

11 months

Full details in the paper: Awesome collaboration with our (also awesome) @MSFTResearch team! Cc a few authors with an active twitter account: @EldanRonen (we follow-up on his TinyStories w. Yuanzhi Li!) @JyotiAneja @sytelus @AdilSlm @YiZhangZZZ @xinw_ai

Tweet card media

Textbooks Are All You Need

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8...

7

54

323

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

11 months

Terence Tao reflecting on GPT-4 in the AI Anthology coordinated by @erichorvitz : "I expect, say, 2026-level AI, when used properly, will be a trustworthy co-author in mathematical research, and in many other fields as well." Terry gets it.

Tweet card media

Embracing change and resetting expectations

Read our Story, Embracing change and resetting expectations on Microsoft Unlocked

unlocked.microsoft.com

3

50

312

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

Why do neural networks generalize? IMO we still have no (good) idea. Recent emerging hypothesis: NN learning dynamics discovers *general-purpose circuits* (e.g., induction head in transformers). In we take a first step to prove this hypothesis. 1/8

Tweet card media

Learning threshold neurons via the "edge of stability"

Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical...

9

43

317

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

11 months

I cannot recommend this podcast episode strongly enough. It's simply THE MOST INSIGHTFUL 2 hours content that you can find on LLMs. And it's by none others than @EldanRonen and Yuanzhi Li from our team @MSFTResearch . Stay tuned for a LOT MORE from us soon.

Tweet card media

The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Micros...

Nathan Labenz sits down with Ronen Eldan and Yuanzhi Li of Microsoft Research to discuss the small natural language dataset they created called TinyStories. ...

www.youtube.com

2

42

307

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

New video! Probably best described as "a motivational speech to study deep learning mathematically" :-). The ever so slightly more formal title is "Mathematical theory of deep learning: Can we do it? Should we do it?" 1/3

Tweet card media

Mathematical theory of deep learning: Can we do it? Should we do it?

Extended motivational speech to study deep learning mathematically.I gave this talk at an NSF Town Hall where the goal was to discuss successes of deep learn...

www.youtube.com

2

40

285

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

Every Mon/Thu I will post a 1h lecture on the ``Five Miracles of Mirror Descent". We start with basic reminders of convexity, the classical analysis of gradient descent, and a discussion of its robustness properties as well as the regret interpretation.

2

40

275

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

AI still has a long way to go .... to me this example is exactly what happened with the whole "sentient discussion": if you prompt with the seed of an answer, the transformer architecture will latch onto this seed. It's really a game of mirrors ...

Tweet media one

18

23

272

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

The universal law of robustness is a tentative theoretical justification for *large* overparametrization in neural network learning. Here is a video explaining the law, in the context of other recent results on overparametrization (e.g., double descent).

Tweet card media

A Universal Law of Robustness

I give a tentative theoretical justification for why large overparametrization is important in neural networks.Primarily based on "A Universal Law of Robustn...

www.youtube.com

0

65

272

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

New video with a crash course on *tensors* (spoiler: no they aren't JUST multi-dimensional arrays!). Include discussion of cross norms & basic facts about rank. We then use it to get insights into neural networks (in the context of our law of robustness).

Tweet card media

Crash course on tensors with application to neural networks

Crash course on tensors (what they are, what cross norms are, basic generalities about nuclear norm/operator and rank), followed by an application of this to...

www.youtube.com

0

27

271

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

Microsoft Research is hiring (in-person) interns! There are many different opportunities in all the labs. Here are some options in the Machine Learning research area in MSR @Redmond : ML Foundations Neural Architecture Search 1/2

3

53

266

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

Part II of @julienmairal guest post on CNN-inspired kernel methods: you will learn how to efficiently approximate those kernels, and even push the CNN analogy further by doing an end-to-end optimization which includes the approximation step.

Tweet media one

0

54

257

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

Congratulations to Laslo Lovasz and Avi Wigderson for winning the 2021 Abel Prize!!!!!!! What a fantastic recognition for theoretical computer science from the mathematics community.

3

34

250

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

Interesting thread! To me the ``reason" for CLT is simply high-dim geometry. Consider unit ball in dim n+1 & slice it at distance x from the origin to get a dim n ball of radius (1-x^2)^{1/2}. The volume of the slice is prop to (1-x^2)^{n/2}~exp(-(1/2)n x^2). Tada the Gaussian!!

@shoyer

Stephan Hoyer

3 years

Does anyone know a good intuitive explanation for the central limit theorem? I realized the other day that even though I use it all the time I can't really justify *why* it's true.

55

25

258

5

29

249

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Congratulations to our colleague Lin Xiao @MSFTResearch for the #NeurIPS2019 test of time award!!! Online convex optimization and mirror descent for the win!! (As always? :-).)

NeurIPS 2019 Paper Awards

With this blog post, it is our pleasure to unveil the NeurIPS paper awards for 2019, and share more information on the selection process…

neuripsconf.medium.com

1

45

248

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

Join us on YouTube at 1pm PT/4pm ET today for the premiere of our "debate" with @bgreene @ylecun @tristanharris on whether a new kind of intelligence has emerged with GPT-4, and what consequences it might have.

Tweet card media

AI: Grappling with a New Kind of Intelligence

A novel intelligence has roared into the mainstream, sparking euphoric excitement as well as abject fear. Explore the landscape of possible futures in a brav...

www.youtube.com

11

45

241

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Looks we might be home for some time, so I'm giving a shot at making homemade math videos on proba/optim/ML. First video gives a proof of the very nice ICML19 Theorem by @deepcohen - Rosenfeld- @zicokolter on certified defense against adversarial examples.

Tweet card media

Randomized smoothing for certified robustness

We give a short proof of the Cohen-Rosenfeld-Kolter theorem on the certified robustness of randomized smoothing.Cohen-Rosenfeld-Kolter paper: https://arxiv.o...

www.youtube.com

2

44

234

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

I rarely tweet about non ML/math topics but I felt like sharing this one. Just finished my first 100+ miles bike ride with the amazing @ilyaraz2 !!!! It was so much fun, and here is the mandatory finish line picture in front of our beloved @MSFTResearch Building 99 😁

Tweet media one

5

0

229

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

In non-convex optimization, gradient descent is the obvious algorithm because non-local reasoning is hard w/o convexity. In "how to trap a gradient flow" we go beyond gradient descent by uncovering a new local to global phenomenon. Details in new video!

Tweet card media

How to trap a gradient flow

New results about finding stationary points of non-convex functions in low dimension. Based on this paper https://arxiv.org/abs/2001.02968 , joint work with ...

www.youtube.com

3

25

222

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

I'm looking for an intern (w. hands on #DeepLearning exp.+curious about theory) to work closely with me on adversarial examples this summer. MSR summer are exciting, lots of strong theory visitors also curious about DL. Fantastic opportunity to build bridges! DM for more/pls RT

8

69

220

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 months

This is a really excellent video, including an insightful discussion of benchmarks like MMLU:

Tweet card media

Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?

Phi-2 is a tiny model that could fit on a phone, but it outperforms huge language models like Llama 2. I explain more about how it was made and what it means...

www.youtube.com

2

28

217

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

@pmddomingos Vast majority of AI researchers recognize AI ethics as an important field of study, just as worthy as any other AI subfield. Doesn't mean that everyone has to study it, doesn't mean it has less problems than other subfields, but DOES mean that @pmddomingos is extremely misguided.

4

9

214

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

The @TEDTalks by @YejinChoinka is both insightful & beautifully delivered! Totally agree with her that GPT-4 is simultaneously brilliant and incredibly stupid. Yejin gives 3 examples of common sense failing that are worth examining a bit more closely. 1/5

Tweet card media

Why AI Is Incredibly Smart and Shockingly Stupid | Yejin Choi | TED

Computer scientist Yejin Choi is here to demystify the current state of massive artificial intelligence systems like ChatGPT, highlighting three key problems...

www.youtube.com

10

53

213

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 months

At a time where 314B parameters models are trending, come join me at #NVIDIAGTC to see what you can do with 1 or 2B parameters :-) (and coming soon, what can you do with 3B?!?)

Tweet media one

7

13

214

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

This is the strongest list of learning theory papers I have ever seen: . Very exciting progress on many fronts! #COLT19

0

26

207

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

This is an excellent blog post on kernels, by one of the world experts on the topic @BachFrancis . *Anyone* interested in ML (theorists & practitioners alike) should be comfortable with everything written there (i.e. the material has to become insight). 1/4

1

29

197

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

#Mathematics of #MachineLearning by @MSFTResearch & @uwcse & @mathmoves : 2 weeks of lectures on statistical learning theory, convex opt, bandits, #ReinforcementLearning , and #DeepLearning . Schedule here: and livestream link here:

Tweet card media

Paul G. Allen School

www.youtube.com

2

73

194

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

6 years

If you are in Montreal next week I recommend attending our first workshop in the ``mathematics of ML" program that I co-organize with Gabor Lugosi and Luc Devroye. The lectures will be recorded and hopefully available soon after the workshop.

4

57

194

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Need to brush up your online decision making fundamentals before #NeurIPS2019 ? Check out these two fantastic new books: - Introduction to bandits by Alex Slivkins - Bandit algorithms by Tor Lattimore and @CsabaSzepesvari 1/2

Tweet card media

Introduction to Multi-Armed Bandits

Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several...

3

47

184

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

This is really the major open problem in deep learning: gradient descent on these architectures has an uncanny ability to dodge any trap, why/how?

@JFPuget

JFPuget 🇺🇦

2 years

Deep learning is too much resistant to bugs. I just found a major one in the pipeline I have been using for 2 weeks. Yet it produced results good enough to not alert me on possible bugs.

27

25

609

15

10

169

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Mark your calendars, next two weeks there will be exciting workshops at the Simons Institute: - Concentration of Measure Phenomena, Oct. 19 – Oct. 23. - Mathematics of Online Decision Making, Oct. 26 – Oct. 30.

0

32

172

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

Amazing news out of the math world: the KLS conjecture has perhaps been proven !!! The paper still needs to be checked carefully, but it follows a well-established line of work (initiated by Ronen Eldan, and refined in particular by Yin Tat Lee). 1/3

Tweet card media

An Almost Constant Lower Bound of the Isoperimetric Coefficient in...

We prove an almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. The lower bound has the dimension dependency $d^{-o_d(1)}$. When the dimension is large enough, our...

4

13

171

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Exciting start of the year for theory of #DeepLearning ! SGD on neural nets can: 1) simulate any other learning alg w. some poly-time init [Abbe & Sandon ] 2) learn efficiently hierarchical concept classes [ @ZeyuanAllenZhu & Y. Li ]

Tweet card media

Backward Feature Correction: How Deep Learning Performs Deep...

Deep learning is also known as hierarchical learning, where the learner _learns_ to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample...

1

42

167

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

I just tried the new Bard powered by Palm 2 and asked it to draw a unicorn in TikZ. It's not quite there yet :-).

Tweet media one

17

11

168

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Adversarial examples are imo *the* cleanest major open problem in ML. I don't know what was said precisely, but diminishing the central role of this problem is not healthy for our field. Ofc in the absence of a solution there are many alternative questions that we can/should ask.

@tdietterich

Thomas G. Dietterich

4 years

Very thought-provoking talk by Justin Gilmer at the #ICML2020 UDL workshop. Adversarial examples are just a case of out-of-distribution error. There is no particular reason to defend against the nearest OOD error (i.e., L-infty adversarial example) 1/

7

61

273

14

19

166

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

And now for an interlude from Transformers taking over the world and the (very unfortunate) Twitter drama: *The randomized k-server conjecture is false!* Joint work w. Christian Coester & Yuval Rabani . Picture below is our hard metric space for k-server.

Tweet media one

3

18

166

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

It's shaping up to be a fine afternoon! (Yes, Talagrand's new book is out!)

Tweet media one

4

8

163

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Since the seminal works [ @goodfellow_ian , Shlens, @ChrSzegedy , ICLR15; @aleks_madry et al. ICLR18] it is known that larger models help for robustness. We posit that in fact *overparametrization is a fundamental law of robustness*. A thread (and a video).

Tweet card media

A law of robustness for neural networks

I describe a mathematical conjecture potentially establishing overparametrization as a law of robustness for neural networks. In particular it would imply th...

www.youtube.com

3

27

165

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

Lots of discussion around #LaMDA is missing the point: ofc it's not sentient but the issue is that those systems are so good at mimicking that non-technical people can easily be fooled. As is often the case when topics escape experts, the truth matters less than how it "feels".

7

12

161

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

9 months

Unfortunately this is a correct take. I expect all my works on LLMs to remain unpublished because of this situation. Maybe that's the price to pay when the community gets too big. For me personally it's a non-issue, but what about young students entering the field?

@peter_richtarik

Peter Richtarik

@peter_richtarik

9 months

#NeurIPS2023 reviewing if science was sport: any athelete can evaluate any other athlete, irrespective of their specialization, experience, or level. Result? An amateur 100m dash guy criticizes a high jumper for lack of speed during her world-record-breaking jump. Reject.

4

22

281

8

9

162

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

Want to learn more about overparametrization, adversarial examples, and why interpolation does not lead to overfitting (generalization IV lecture)? The videos for yesterday's talks at the deep learning bootcamp are already online and it's worth a watch!

Tweet card media

Deep Learning Boot Camp

www.youtube.com

1

38

159

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

I personally think that LLM learning is closer to the process of evolution than it is to humans learning within their lifetime. In fact, a better caricature would be to compare human learning with LLMs' in-context learning capabilities.

@ylecun

Yann LeCun

1 year

Humans don't need to learn from 1 trillion words to reach human intelligence. What are LLMs missing?

695

501

3K

7

13

158

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

Today I'm not announcing a new paper but rather a fitness goal, with my fantastic fitness collaborator @ilyaraz2 😁. Started running 6 months ago & did my first half-marathon! Reasonable time, 1:52 but @ilyaraz2 crushed it at 1:46 !!! Running is so much fun, highly recommended 😁

Tweet media one

4

0

156

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

Lots of discussion abt open source LLMs catching up The Big Ones, including eye-catching claims s.a. 90% of ChatGPT's quality (by the really cool work of @lmsysorg ). Two Sparks authors @marcotcr & @scottlundberg explore this further in a new blog post 1/2

Tweet card media

Exploring ChatGPT vs open-source models on slightly harder tasks

Rather than asking simple questions, we try using these models in slightly more realistic scenarios. TL;DR: ChatGPT (3.5) still seems…

4

28

151

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

16 days

Hmmm, I have a feeling this plot might need an overhaul rather soon🤣. I guess phi-2 was the lower left part of the triangle. I wonder what those guys have been up to in the last 6 months? 🤔

@armandjoulin

Armand Joulin

16 days

Fixed the fix.

Tweet media one

6

9

115

10

12

154

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

Starting tomorrow (with livestream): Simons Institute workshop on *Learning and Testing in High Dimensions*. We have a great line-up of talks, featuring many of the recent exciting results in high-dimensional learning!

0

18

154

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

@YiMaTweets It might not be a great idea to give your audience the impression that most mysteries of DL have been resolved when in fact hardly any one has been... there is work for at least a full generation to make progress here.

2

4

155

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Fantastic progress by Tor Lattimore on bandit convex optimization !!! The regret is now d^{2.5} sqrt(T) (down from d^{9.5} sqrt(T)), and the proof is short and sweet. Very close to the conjectured bound of d^{1.5} sqrt(T) . 1/2

Tweet card media

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the...

4

18

150

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

I think most people in ML will relate to the title of this paper😄. Next philosophical breakthrough: think about the reality vector as a set of weights in a neural net?🤣

Tweet media one

6

12

142

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

The Machine Learning Foundations team at @MSFTResearch Redmond is looking for a postdoc. Come join us ( @ZeyuanAllenZhu @suriyagnskr @jerryzli @talw16 and Yi Zhang) to work on topics ranging from quantum learning to understanding transformer architectures!

2

31

144

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

1 year

New video! I discuss the "Physics of AI": how controlled experiments and toy mathematical models could help us make progress on understanding Deep Learning, with two examples from MSR's Machine Learning Foundations group: LEGO & Edge of Stability analysis

Tweet card media

We propose an approach to the science of deep learning that roughly follows what physicists do to understand reality: (1) explore phenomena through controlle...

www.youtube.com

4

22

142

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

Congratulations @DeepMindAI . I am amazed. 3 years ago I bet that by 2021, AI would still not compete with pros at SC2. Today I lost that bet pretty badly... Maybe it's time to do a Bayesian update on my beliefs... #AlphaStar

2

11

142

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

New video on the very nice proof by @ZeyuanAllenZhu and Yuanzhi Li showing the limitations of kernel methods (even when the training set can be chosen for the task at hand) compared to more sophisticated procedures (e.g., deep learning).

Tweet card media

Provable limitations of kernel methods

Simple demonstration of the limitations of kernel methods, even if the training data points can be chosen (the latter makes parity with noise easy to learn f...

www.youtube.com

6

18

141

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

An accessible presentation by @ZeyuanAllenZhu of his breakthrough discovery with Yuanzhi Li of the backward feature correction phenomenon (feature purification is also discussed in a second part). Interesting progress to explain the power of deep learning!

Tweet card media

Why Does Deep Learning Perform Deep Learning - MSR AI Seminar...

Despite the success of deep learning, from a theoretical standpoint, it remains absurdly unclear why deep learning is better than shallow learning. The main ...

www.youtube.com

1

42

138

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

Tor Lattimore did it again, after improving Bandit Convex Optimization to n^2.5 sqrt(T) (down from n^9.5), he now shows n^1 for ridge functions, i.e. 1-dim convex composed w. linear is no harder than mere linear! No algorithm is known matching those rates!

Tweet card media

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, θ\rangle)$ for convex $g_t : \mathbb R \to...

2

17

138

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

Congratulations to Nick Littlestone & Manfred Warmuth, giants of learning theory, for winning the 30 years #FOCS test of time award! The weighted majority algorithm has been hugely influential in #TCS , and led to many practical breakthroughs (eg boosting).

0

23

133

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

2 years

I was planning to do only 1 fitness post/year but I'm just too excited about this milestone not to share: just finished my first real Olympic Triathlon in 3h5mn!!! I missed my target by 5mn but I will put that on the count of the heatwave which made the run a bit excruciating...

Tweet media one

3

0

132

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

8 months

phi-1.5 & phi-1 are available right now on @huggingface & @Azure ML! We can't wait to see what the community will discover with them. The phi-1.5 team Yuanzhi Li @EldanRonen @allie_adg @suriyagnskr is ready to answer questions too!

Tweet card media

microsoft/phi-1 · Hugging Face

2

19

129

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

3 years

I'm quite excited by this: a *Blog Posts track* at ICLR! Posts to be officially "published" by ICLR (& can be cited as such). Key req: blog about *previously published papers at ICLR*. It's an attempt to embed memory into our conference publication system, which is sorely needed.

@iclr_conf

ICLR 2024

3 years

ICLR is happy to announce the call for contributions for our very first "Blog Posts Track". We invite submissions in blog format discussing previously published papers at ICLR. For details on this exciting new experiment in publication models see:

10

201

1K

2

15

128

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

While reading up on superconcentration for an upcoming neural network paper, I found these delightful slides by Sourav Chatterjee : difficult material masterfully explained, giving you exactly the essence of deep phenomena, highly highly recommended!

1

23

124

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

How much overparametrization is needed for neural net memorization? In 1988 Eric Baum answers with a ``combinatorial" construction. But in fact even NTK can do it! But there is more! Measuring norm of weights rather than # neurons, we give a *complex* weight training method. 1/3

Tweet media one

1

16

126

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

6 years

Just started a youtube channel ! First set of videos will be recordings of a 10h bandit minicourse. After that I plan to record video lectures of my crash course in learning theory ( , ).

Tweet card media

Sebastien Bubeck

www.youtube.com

1

34

122

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

I had a lot of fun lecturing in @ENS_ULM last week on *The Five Miracles of Mirror Descent* (robustness, potential-based, tracking, information geometry, adaptivity). I am indebted to Claire Boyer who took excellent notes . Videos will be online soon too.

3

15

121

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

6 years

Think that model-based methods are always more efficient (theoretically) than model-free? Think again: ! @beenwrekt @haldaume3

Tweet card media

Is Q-learning Provably Efficient?

Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typically...

3

19

121

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

8 years

Already 8000 registered for NIPS 2016, it's insane...

6

97

119

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

8 months

How can such a small model have completions seemingly coming from a frontier LLM? Well, **Textbooks Are All You Need** strikes back! Indeed, on top of phi-1's data, phi-1.5 is trained *only on synthetic data*. See video to learn more abt this strategy.

Tweet card media

Textbooks Are All You Need

I discuss the power of the "Textbooks Are All You Need" methodology to build much more compact LLMs using higher quality data. I emphasize phi-1 (coding LLM ...

www.youtube.com

5

18

121

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

I don't know if @OpenAI 's new language model is really better than the competition, but I am very impressed by their marketing skills. The headline ``what we built is so good that we can't even tell you what we built" is pure genius!!! 1/2

5

15

120

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

4 years

Multi-agent learning is full of open problems, even for basic bandits. W/ T. Budzinski we resolve one such question , but more interestingly we achieve a seemingly impossible property! Q: What else are we wrongly assuming to be impossible in this field??

Tweet media one

1

23

122

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

Almost 400 submissions to #COLT19 !! This is great news, #ML is in dire need of more theoretical grounding, and I have high hopes that the COLT community (both old timers and newcomers) have a shot at doing that! Looking forward to unearth the breakthroughs in these 400 papers :)

4

8

121

@SebastienBubeck

Sebastien Bubeck

@SebastienBubeck

5 years

My 1st #DeepLearning paper & yet another win for #19thcenturyMathematics ! Weierstrass transform (used to prove Stone-Weierstrass thm) produce *smooth* (= robust) functions. #AdversarialTraining on Weierstrass transformed #deepnets give ell_2 SOTA ImageNet!

Tweet media one

4

16

118