Sebastien Bubeck Profile Banner
Sebastien Bubeck Profile
Sebastien Bubeck

@SebastienBubeck

34,456
Followers
1,318
Following
166
Media
1,440
Statuses

VP GenAI Research, Microsoft AI

Seattle, WA
Joined January 2012
Don't wanna be here? Send us removal request.
Pinned Tweet
@SebastienBubeck
Sebastien Bubeck
10 days
phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)
40
183
912
@SebastienBubeck
Sebastien Bubeck
1 year
At @MSFTResearch we had early access to the marvelous #GPT4 from @OpenAI for our work on @bing . We took this opportunity to document our experience. We're so excited to share our findings. In short: time to face it, the sparks of #AGI have been ignited.
Tweet media one
67
730
3K
@SebastienBubeck
Sebastien Bubeck
4 months
Starting the year with a small update, phi-2 is now under MIT license, enjoy everyone!
Tweet media one
54
284
2K
@SebastienBubeck
Sebastien Bubeck
5 months
We trained a small transformer (100M params) for basic arithmetic. W. the right training data it nails 12x12 digits multiplication w/o CoT (that's 10^24 possibilities, so no it's not memorization🤣). Maybe arithmetic is not the LLM kryptonite after all?🤔
68
270
2K
@SebastienBubeck
Sebastien Bubeck
11 months
New LLM in town: ***phi-1 achieves 51% on HumanEval w. only 1.3B parameters & 7B tokens training dataset*** Any other >50% HumanEval model is >1000x bigger (e.g., WizardCoder from last week is 10x in model size and 100x in dataset size). How? ***Textbooks Are All You Need***
Tweet media one
45
340
2K
@SebastienBubeck
Sebastien Bubeck
1 year
Last couple of weeks I gave a few talks on the Sparks paper, here is the MIT recording! The talk doesn't do justice to all the insights we have in the paper itself. Neither talk nor twitter threads are a substitute for actual reading of the 155 pages :-)
15
299
585
@SebastienBubeck
Sebastien Bubeck
1 year
The Chomsky et al. opinion piece in the @nytimes about ChatGPT is making the rounds. Rather than trying to deconstruct their argument, I asked @bing what it thinks of it. Now you can judge for yourself who has the moral high ground 😂.
Tweet media one
51
293
1K
@SebastienBubeck
Sebastien Bubeck
5 months
Enjoy everyone! (And remember it's a base model so you might have to play around with your prompts; if you want it to follow instructions you can try the format "Instruct:... Ouput:")
28
195
1K
@SebastienBubeck
Sebastien Bubeck
8 months
How far does one billion parameters take you? As it turns out, pretty far!!! Today we're releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs. For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B
Tweet media one
32
182
842
@SebastienBubeck
Sebastien Bubeck
2 years
Transformers are changing the world. But how do they learn? And what do they learn? Our 1st @MSFTResearch ML Foundations team paper proposes a synthetic task, LEGO, to investigate such questions. Sample insights on Transformers thanks to LEGO below 1/8
4
110
738
@SebastienBubeck
Sebastien Bubeck
3 years
We may have found a solid hypothesis to explain why extreme overparametrization is so helpful in #DeepLearning , especially if one is concerned about adversarial robustness. 1/7
Tweet media one
5
129
677
@SebastienBubeck
Sebastien Bubeck
5 months
Phi-2 numbers, finally! We're seeing a consistent ranking: phi-2 outperforms Mistral 7B & Gemini Nano 2* (*on their reported benchmarks) and is roughly comparable to Llama 2-70B (sometimes better, sometimes worse). Beyond benchmarks, playing with the models tells a similar story.
Tweet media one
@satyanadella
Satya Nadella
5 months
From new best-in-class small language models to state-of-the-art prompting techniques, we’re excited to share these innovations and put them in the hands of researchers and developers.
95
161
2K
21
84
587
@SebastienBubeck
Sebastien Bubeck
5 months
phi-2 is coming to Hugging Face, hold tight :-)
19
32
556
@SebastienBubeck
Sebastien Bubeck
5 years
For my 500th tweet I'm super excited to release five 1h videos covering the most important results presented in my monograph Convex Optimization: Algorithms and Complexity. This time I tried hard to emphasize the intuition behind the calculations! 1/6
2
105
532
@SebastienBubeck
Sebastien Bubeck
5 months
phi-2 is really a good base for further fine-tuning: we FT on 1M math exercises (similar to phi-1 w. CodeExercises) & test on recent French nation-wide math exam (published after phi-2 finished training). The results are encouraging! Go try your own data
Tweet media one
21
69
525
@SebastienBubeck
Sebastien Bubeck
3 years
(Nesterov) Acceleration in convex optimization is one of the most striking phenomenon in all of optimization, and now you can learn about all the different viewpoints on it from a very nice 156 pages survey paper by d'Aspremont, Scieur and Taylor!
3
102
514
@SebastienBubeck
Sebastien Bubeck
5 months
Sorry I know it's a bit confusing: to download phi-2 go to Azure AI Studio, find the phi-2 page and click on the "artifacts" tab. See picture.
Tweet media one
@karpathy
Andrej Karpathy
5 months
@simonw No they fully released it. But they hide it very well for some reason. Go to artifacts tab.
13
11
350
32
55
495
@SebastienBubeck
Sebastien Bubeck
6 months
Microsoft💜Open Source + SLMs!!!!! We're so excited to announce our new *phi-2* model that was just revealed at #MSIgnite by @satyanadella ! At 2.7B size, phi-2 is much more robust than phi-1.5 and reasoning capabilities are greatly improved too. Perfect model to be fine-tuned!
Tweet media one
Tweet media two
18
87
491
@SebastienBubeck
Sebastien Bubeck
5 years
A major open problem in ML is whether convex techniques (kernel methods in particular) can reproduce the striking successes of deep learning. In a guest post series (two parts) on I'm a bandit, @julienmairal weighs in on the question!
Tweet media one
2
118
464
@SebastienBubeck
Sebastien Bubeck
2 years
Just watched an incredible talk by @AlexGDimakis at the Simons Institute, highly recommended. Their Iterative Layer Optimization technique to solve inverse problems with GANs make a LOT of sense! The empirical results on the famous blurred Obama face speak for themselves! 1/4
Tweet media one
3
78
461
@SebastienBubeck
Sebastien Bubeck
4 years
Time for some retrospective! I shared some 25 papers that I particularly enjoyed in the last decade. I would love for you to share some papers that are missing in this list (there are many!!), either here or in the comments on the blog.
2
106
430
@SebastienBubeck
Sebastien Bubeck
6 months
My group is hiring a large cohort of interns for the summer of 2024 to work on the Foundations of Large Language Models! Come help us uncover the new physics of A.I. to improve the LLM building practices! (Pic below from our NeurIPS 2023 paper w. interns)
Tweet media one
9
52
404
@SebastienBubeck
Sebastien Bubeck
5 years
At #NeurIPS2018 where it was just announced that our paper on non-smooth distributed optimization with Kevin Scaman, @BachFrancis , Laurent Massoulie and Yin Tat Lee got a best paper award. Lots of interesting open problems left there, check out the paper
10
48
377
@SebastienBubeck
Sebastien Bubeck
2 years
We ( @gauthier_gidel @velythyl @busycalibrating @vernadec & myself) would like to announce the accepted blog posts to @iclr_conf 's 1st Blogpost Track. Experiment was a great success with 20 accepted posts out of 61 submissions, roughly the size of the 1st @iclr_conf itself! 1/24
7
67
363
@SebastienBubeck
Sebastien Bubeck
2 years
I'm really happy that the law of robustness got recognized as an important new insight with a NeurIPS outstanding paper award! The video below summarizes what the law is about, what it means, and what it predicts. It's also a great capstone for @geoishard 's fantastic phd work!
@MSFTResearch
Microsoft Research
2 years
Learn about the significance of overparametrization in neural networks, the universal law of robustness, and what “A Universal Law of Robustness via Isoperimetry” means for future research in this short video with @SebastienBubeck : .
1
25
131
25
41
362
@SebastienBubeck
Sebastien Bubeck
5 years
A fun way to describe Nesterov's momentum:
Tweet media one
1
52
349
@SebastienBubeck
Sebastien Bubeck
4 years
The **Machine Learning Foundations** group at @MSFTResearch Redmond is hiring at all levels (including postdoc)! Come join @ZeyuanAllenZhu   @suriyagnskr   @jerryzli @ilyaraz2 @talw and myself to develop the next generation of ML theory!
11
83
341
@SebastienBubeck
Sebastien Bubeck
3 years
Karen Uhlenbeck concludes her Abel prize lecture with 5 minutes on #DeepLearning !!! She says about it: ``My conjecture is there is some interesting mathematics of some sort that I have no idea." Couldn't agree more.
5
42
318
@SebastienBubeck
Sebastien Bubeck
5 months
We're so pumped to see phi-2 at the top of trending models on @huggingface ! It's sibling phi-1.5 has already half a million downloads. Can't wait to see the mechanistic interpretability works that will come out of this & their impact on all the important LLM research questions!
Tweet media one
25
64
314
@SebastienBubeck
Sebastien Bubeck
11 months
Terence Tao reflecting on GPT-4 in the AI Anthology coordinated by @erichorvitz : "I expect, say, 2026-level AI, when used properly, will be a trustworthy co-author in mathematical research, and in many other fields as well." Terry gets it.
3
50
312
@SebastienBubeck
Sebastien Bubeck
1 year
Why do neural networks generalize? IMO we still have no (good) idea. Recent emerging hypothesis: NN learning dynamics discovers *general-purpose circuits* (e.g., induction head in transformers). In we take a first step to prove this hypothesis. 1/8
9
43
317
@SebastienBubeck
Sebastien Bubeck
11 months
I cannot recommend this podcast episode strongly enough. It's simply THE MOST INSIGHTFUL 2 hours content that you can find on LLMs. And it's by none others than @EldanRonen and Yuanzhi Li from our team @MSFTResearch . Stay tuned for a LOT MORE from us soon.
2
42
307
@SebastienBubeck
Sebastien Bubeck
2 years
New video! Probably best described as "a motivational speech to study deep learning mathematically" :-). The ever so slightly more formal title is "Mathematical theory of deep learning: Can we do it? Should we do it?" 1/3
2
40
285
@SebastienBubeck
Sebastien Bubeck
5 years
Every Mon/Thu I will post a 1h lecture on the ``Five Miracles of Mirror Descent". We start with basic reminders of convexity, the classical analysis of gradient descent, and a discussion of its robustness properties as well as the regret interpretation.
2
40
275
@SebastienBubeck
Sebastien Bubeck
2 years
AI still has a long way to go .... to me this example is exactly what happened with the whole "sentient discussion": if you prompt with the seed of an answer, the transformer architecture will latch onto this seed. It's really a game of mirrors ...
Tweet media one
18
23
272
@SebastienBubeck
Sebastien Bubeck
3 years
The universal law of robustness is a tentative theoretical justification for *large* overparametrization in neural network learning. Here is a video explaining the law, in the context of other recent results on overparametrization (e.g., double descent).
0
65
272
@SebastienBubeck
Sebastien Bubeck
3 years
New video with a crash course on *tensors* (spoiler: no they aren't JUST multi-dimensional arrays!). Include discussion of cross norms & basic facts about rank. We then use it to get insights into neural networks (in the context of our law of robustness).
0
27
271
@SebastienBubeck
Sebastien Bubeck
1 year
Microsoft Research is hiring (in-person) interns! There are many different opportunities in all the labs. Here are some options in the Machine Learning research area in MSR @Redmond : ML Foundations Neural Architecture Search 1/2
3
53
266
@SebastienBubeck
Sebastien Bubeck
5 years
Part II of @julienmairal guest post on CNN-inspired kernel methods: you will learn how to efficiently approximate those kernels, and even push the CNN analogy further by doing an end-to-end optimization which includes the approximation step.
Tweet media one
0
54
257
@SebastienBubeck
Sebastien Bubeck
3 years
Congratulations to Laslo Lovasz and Avi Wigderson for winning the 2021 Abel Prize!!!!!!! What a fantastic recognition for theoretical computer science from the mathematics community.
3
34
250
@SebastienBubeck
Sebastien Bubeck
3 years
Interesting thread! To me the ``reason" for CLT is simply high-dim geometry. Consider unit ball in dim n+1 & slice it at distance x from the origin to get a dim n ball of radius (1-x^2)^{1/2}. The volume of the slice is prop to (1-x^2)^{n/2}~exp(-(1/2)n x^2). Tada the Gaussian!!
@shoyer
Stephan Hoyer
3 years
Does anyone know a good intuitive explanation for the central limit theorem? I realized the other day that even though I use it all the time I can't really justify *why* it's true.
55
25
258
5
29
249
@SebastienBubeck
Sebastien Bubeck
4 years
Congratulations to our colleague Lin Xiao @MSFTResearch for the #NeurIPS2019 test of time award!!! Online convex optimization and mirror descent for the win!! (As always? :-).)
1
45
248
@SebastienBubeck
Sebastien Bubeck
5 months
Join us on YouTube at 1pm PT/4pm ET today for the premiere of our "debate" with @bgreene @ylecun @tristanharris on whether a new kind of intelligence has emerged with GPT-4, and what consequences it might have.
11
45
241
@SebastienBubeck
Sebastien Bubeck
4 years
Looks we might be home for some time, so I'm giving a shot at making homemade math videos on proba/optim/ML. First video gives a proof of the very nice ICML19 Theorem by @deepcohen - Rosenfeld- @zicokolter on certified defense against adversarial examples.
2
44
234
@SebastienBubeck
Sebastien Bubeck
3 years
I rarely tweet about non ML/math topics but I felt like sharing this one. Just finished my first 100+ miles bike ride with the amazing @ilyaraz2 !!!! It was so much fun, and here is the mandatory finish line picture in front of our beloved @MSFTResearch Building 99 😁
Tweet media one
5
0
229
@SebastienBubeck
Sebastien Bubeck
4 years
In non-convex optimization, gradient descent is the obvious algorithm because non-local reasoning is hard w/o convexity. In "how to trap a gradient flow" we go beyond gradient descent by uncovering a new local to global phenomenon. Details in new video!
3
25
222
@SebastienBubeck
Sebastien Bubeck
5 years
I'm looking for an intern (w. hands on #DeepLearning exp.+curious about theory) to work closely with me on adversarial examples this summer. MSR summer are exciting, lots of strong theory visitors also curious about DL. Fantastic opportunity to build bridges! DM for more/pls RT
8
69
220
@SebastienBubeck
Sebastien Bubeck
2 years
@pmddomingos Vast majority of AI researchers recognize AI ethics as an important field of study, just as worthy as any other AI subfield. Doesn't mean that everyone has to study it, doesn't mean it has less problems than other subfields, but DOES mean that @pmddomingos is extremely misguided.
4
9
214
@SebastienBubeck
Sebastien Bubeck
1 year
The @TEDTalks by @YejinChoinka is both insightful & beautifully delivered! Totally agree with her that GPT-4 is simultaneously brilliant and incredibly stupid. Yejin gives 3 examples of common sense failing that are worth examining a bit more closely. 1/5
10
53
213
@SebastienBubeck
Sebastien Bubeck
2 months
At a time where 314B parameters models are trending, come join me at #NVIDIAGTC to see what you can do with 1 or 2B parameters :-) (and coming soon, what can you do with 3B?!?)
Tweet media one
7
13
214
@SebastienBubeck
Sebastien Bubeck
5 years
This is the strongest list of learning theory papers I have ever seen: . Very exciting progress on many fronts! #COLT19
0
26
207
@SebastienBubeck
Sebastien Bubeck
5 years
This is an excellent blog post on kernels, by one of the world experts on the topic @BachFrancis . *Anyone* interested in ML (theorists & practitioners alike) should be comfortable with everything written there (i.e. the material has to become insight). 1/4
1
29
197
@SebastienBubeck
Sebastien Bubeck
5 years
#Mathematics of #MachineLearning by @MSFTResearch & @uwcse & @mathmoves : 2 weeks of lectures on statistical learning theory, convex opt, bandits, #ReinforcementLearning , and #DeepLearning . Schedule here: and livestream link here:
2
73
194
@SebastienBubeck
Sebastien Bubeck
6 years
If you are in Montreal next week I recommend attending our first workshop in the ``mathematics of ML" program that I co-organize with Gabor Lugosi and Luc Devroye. The lectures will be recorded and hopefully available soon after the workshop.
4
57
194
@SebastienBubeck
Sebastien Bubeck
4 years
Need to brush up your online decision making fundamentals before #NeurIPS2019 ? Check out these two fantastic new books: - Introduction to bandits by Alex Slivkins - Bandit algorithms by Tor Lattimore and @CsabaSzepesvari 1/2
3
47
184
@SebastienBubeck
Sebastien Bubeck
2 years
This is really the major open problem in deep learning: gradient descent on these architectures has an uncanny ability to dodge any trap, why/how?
@JFPuget
JFPuget 🇺🇦
2 years
Deep learning is too much resistant to bugs. I just found a major one in the pipeline I have been using for 2 weeks. Yet it produced results good enough to not alert me on possible bugs.
27
25
609
15
10
169
@SebastienBubeck
Sebastien Bubeck
4 years
Mark your calendars, next two weeks there will be exciting workshops at the Simons Institute: - Concentration of Measure Phenomena, Oct. 19 – Oct. 23. - Mathematics of Online Decision Making, Oct. 26 – Oct. 30.
0
32
172
@SebastienBubeck
Sebastien Bubeck
3 years
Amazing news out of the math world: the KLS conjecture has perhaps been proven !!! The paper still needs to be checked carefully, but it follows a well-established line of work (initiated by Ronen Eldan, and refined in particular by Yin Tat Lee). 1/3
4
13
171
@SebastienBubeck
Sebastien Bubeck
4 years
Exciting start of the year for theory of #DeepLearning ! SGD on neural nets can: 1) simulate any other learning alg w. some poly-time init [Abbe & Sandon ] 2) learn efficiently hierarchical concept classes [ @ZeyuanAllenZhu & Y. Li ]
1
42
167
@SebastienBubeck
Sebastien Bubeck
1 year
I just tried the new Bard powered by Palm 2 and asked it to draw a unicorn in TikZ. It's not quite there yet :-).
Tweet media one
17
11
168
@SebastienBubeck
Sebastien Bubeck
4 years
Adversarial examples are imo *the* cleanest major open problem in ML. I don't know what was said precisely, but diminishing the central role of this problem is not healthy for our field. Ofc in the absence of a solution there are many alternative questions that we can/should ask.
@tdietterich
Thomas G. Dietterich
4 years
Very thought-provoking talk by Justin Gilmer at the #ICML2020 UDL workshop. Adversarial examples are just a case of out-of-distribution error. There is no particular reason to defend against the nearest OOD error (i.e., L-infty adversarial example) 1/
7
61
273
14
19
166
@SebastienBubeck
Sebastien Bubeck
1 year
And now for an interlude from Transformers taking over the world and the (very unfortunate) Twitter drama: *The randomized k-server conjecture is false!* Joint work w. Christian Coester & Yuval Rabani . Picture below is our hard metric space for k-server.
Tweet media one
3
18
166
@SebastienBubeck
Sebastien Bubeck
2 years
It's shaping up to be a fine afternoon! (Yes, Talagrand's new book is out!)
Tweet media one
4
8
163
@SebastienBubeck
Sebastien Bubeck
4 years
Since the seminal works [ @goodfellow_ian , Shlens, @ChrSzegedy , ICLR15; @aleks_madry et al. ICLR18] it is known that larger models help for robustness. We posit that in fact *overparametrization is a fundamental law of robustness*. A thread (and a video).
3
27
165
@SebastienBubeck
Sebastien Bubeck
2 years
Lots of discussion around #LaMDA is missing the point: ofc it's not sentient but the issue is that those systems are so good at mimicking that non-technical people can easily be fooled. As is often the case when topics escape experts, the truth matters less than how it "feels".
7
12
161
@SebastienBubeck
Sebastien Bubeck
9 months
Unfortunately this is a correct take. I expect all my works on LLMs to remain unpublished because of this situation. Maybe that's the price to pay when the community gets too big. For me personally it's a non-issue, but what about young students entering the field?
@peter_richtarik
Peter Richtarik
9 months
#NeurIPS2023 reviewing if science was sport: any athelete can evaluate any other athlete, irrespective of their specialization, experience, or level. Result? An amateur 100m dash guy criticizes a high jumper for lack of speed during her world-record-breaking jump. Reject.
4
22
281
8
9
162
@SebastienBubeck
Sebastien Bubeck
5 years
Want to learn more about overparametrization, adversarial examples, and why interpolation does not lead to overfitting (generalization IV lecture)? The videos for yesterday's talks at the deep learning bootcamp are already online and it's worth a watch!
1
38
159
@SebastienBubeck
Sebastien Bubeck
1 year
I personally think that LLM learning is closer to the process of evolution than it is to humans learning within their lifetime. In fact, a better caricature would be to compare human learning with LLMs' in-context learning capabilities.
@ylecun
Yann LeCun
1 year
Humans don't need to learn from 1 trillion words to reach human intelligence. What are LLMs missing?
695
501
3K
7
13
158
@SebastienBubeck
Sebastien Bubeck
2 years
Today I'm not announcing a new paper but rather a fitness goal, with my fantastic fitness collaborator @ilyaraz2 😁. Started running 6 months ago & did my first half-marathon! Reasonable time, 1:52 but @ilyaraz2 crushed it at 1:46 !!! Running is so much fun, highly recommended 😁
Tweet media one
4
0
156
@SebastienBubeck
Sebastien Bubeck
1 year
Lots of discussion abt open source LLMs catching up The Big Ones, including eye-catching claims s.a. 90% of ChatGPT's quality (by the really cool work of @lmsysorg ). Two Sparks authors @marcotcr & @scottlundberg explore this further in a new blog post 1/2
4
28
151
@SebastienBubeck
Sebastien Bubeck
16 days
Hmmm, I have a feeling this plot might need an overhaul rather soon🤣. I guess phi-2 was the lower left part of the triangle. I wonder what those guys have been up to in the last 6 months? 🤔
@armandjoulin
Armand Joulin
16 days
Fixed the fix.
Tweet media one
6
9
115
10
12
154
@SebastienBubeck
Sebastien Bubeck
3 years
Starting tomorrow (with livestream): Simons Institute workshop on *Learning and Testing in High Dimensions*. We have a great line-up of talks, featuring many of the recent exciting results in high-dimensional learning!
0
18
154
@SebastienBubeck
Sebastien Bubeck
2 years
@YiMaTweets It might not be a great idea to give your audience the impression that most mysteries of DL have been resolved when in fact hardly any one has been... there is work for at least a full generation to make progress here.
2
4
155
@SebastienBubeck
Sebastien Bubeck
4 years
Fantastic progress by Tor Lattimore on bandit convex optimization !!! The regret is now d^{2.5} sqrt(T) (down from d^{9.5} sqrt(T)), and the proof is short and sweet. Very close to the conjectured bound of d^{1.5} sqrt(T) . 1/2
4
18
150
@SebastienBubeck
Sebastien Bubeck
3 years
I think most people in ML will relate to the title of this paper😄. Next philosophical breakthrough: think about the reality vector as a set of weights in a neural net?🤣
Tweet media one
6
12
142
@SebastienBubeck
Sebastien Bubeck
3 years
The Machine Learning Foundations team at @MSFTResearch Redmond is looking for a postdoc. Come join us ( @ZeyuanAllenZhu @suriyagnskr @jerryzli @talw16 and Yi Zhang) to work on topics ranging from quantum learning to understanding transformer architectures!
2
31
144
@SebastienBubeck
Sebastien Bubeck
1 year
New video! I discuss the "Physics of AI": how controlled experiments and toy mathematical models could help us make progress on understanding Deep Learning, with two examples from MSR's Machine Learning Foundations group: LEGO & Edge of Stability analysis
4
22
142
@SebastienBubeck
Sebastien Bubeck
5 years
Congratulations @DeepMindAI . I am amazed. 3 years ago I bet that by 2021, AI would still not compete with pros at SC2. Today I lost that bet pretty badly... Maybe it's time to do a Bayesian update on my beliefs... #AlphaStar
2
11
142
@SebastienBubeck
Sebastien Bubeck
4 years
New video on the very nice proof by @ZeyuanAllenZhu and Yuanzhi Li showing the limitations of kernel methods (even when the training set can be chosen for the task at hand) compared to more sophisticated procedures (e.g., deep learning).
6
18
141
@SebastienBubeck
Sebastien Bubeck
4 years
An accessible presentation by @ZeyuanAllenZhu of his breakthrough discovery with Yuanzhi Li of the backward feature correction phenomenon (feature purification is also discussed in a second part). Interesting progress to explain the power of deep learning!
1
42
138
@SebastienBubeck
Sebastien Bubeck
3 years
Tor Lattimore did it again, after improving Bandit Convex Optimization to n^2.5 sqrt(T) (down from n^9.5), he now shows n^1 for ridge functions, i.e. 1-dim convex composed w. linear is no harder than mere linear! No algorithm is known matching those rates!
2
17
138
@SebastienBubeck
Sebastien Bubeck
5 years
Congratulations to Nick Littlestone & Manfred Warmuth, giants of learning theory, for winning the 30 years #FOCS test of time award! The weighted majority algorithm has been hugely influential in #TCS , and led to many practical breakthroughs (eg boosting).
0
23
133
@SebastienBubeck
Sebastien Bubeck
2 years
I was planning to do only 1 fitness post/year but I'm just too excited about this milestone not to share: just finished my first real Olympic Triathlon in 3h5mn!!! I missed my target by 5mn but I will put that on the count of the heatwave which made the run a bit excruciating...
Tweet media one
3
0
132
@SebastienBubeck
Sebastien Bubeck
8 months
phi-1.5 & phi-1 are available right now on @huggingface & @Azure ML! We can't wait to see what the community will discover with them. The phi-1.5 team Yuanzhi Li @EldanRonen @allie_adg @suriyagnskr is ready to answer questions too!
2
19
129
@SebastienBubeck
Sebastien Bubeck
3 years
I'm quite excited by this: a *Blog Posts track* at ICLR! Posts to be officially "published" by ICLR (& can be cited as such). Key req: blog about *previously published papers at ICLR*. It's an attempt to embed memory into our conference publication system, which is sorely needed.
@iclr_conf
ICLR 2024
3 years
ICLR is happy to announce the call for contributions for our very first "Blog Posts Track". We invite submissions in blog format discussing previously published papers at ICLR. For details on this exciting new experiment in publication models see:
10
201
1K
2
15
128
@SebastienBubeck
Sebastien Bubeck
4 years
While reading up on superconcentration for an upcoming neural network paper, I found these delightful slides by Sourav Chatterjee : difficult material masterfully explained, giving you exactly the essence of deep phenomena, highly highly recommended!
1
23
124
@SebastienBubeck
Sebastien Bubeck
4 years
How much overparametrization is needed for neural net memorization? In 1988 Eric Baum answers with a ``combinatorial" construction. But in fact even NTK can do it! But there is more! Measuring norm of weights rather than # neurons, we give a *complex* weight training method. 1/3
Tweet media one
1
16
126
@SebastienBubeck
Sebastien Bubeck
6 years
Just started a youtube channel ! First set of videos will be recordings of a 10h bandit minicourse. After that I plan to record video lectures of my crash course in learning theory ( , ).
1
34
122
@SebastienBubeck
Sebastien Bubeck
5 years
I had a lot of fun lecturing in @ENS_ULM last week on *The Five Miracles of Mirror Descent* (robustness, potential-based, tracking, information geometry, adaptivity). I am indebted to Claire Boyer who took excellent notes . Videos will be online soon too.
3
15
121
@SebastienBubeck
Sebastien Bubeck
8 years
Already 8000 registered for NIPS 2016, it's insane...
6
97
119
@SebastienBubeck
Sebastien Bubeck
8 months
How can such a small model have completions seemingly coming from a frontier LLM? Well, **Textbooks Are All You Need** strikes back! Indeed, on top of phi-1's data, phi-1.5 is trained *only on synthetic data*. See video to learn more abt this strategy.
5
18
121
@SebastienBubeck
Sebastien Bubeck
5 years
I don't know if @OpenAI 's new language model is really better than the competition, but I am very impressed by their marketing skills. The headline ``what we built is so good that we can't even tell you what we built" is pure genius!!! 1/2
5
15
120
@SebastienBubeck
Sebastien Bubeck
4 years
Multi-agent learning is full of open problems, even for basic bandits. W/ T. Budzinski we resolve one such question , but more interestingly we achieve a seemingly impossible property! Q: What else are we wrongly assuming to be impossible in this field??
Tweet media one
1
23
122
@SebastienBubeck
Sebastien Bubeck
5 years
Almost 400 submissions to #COLT19 !! This is great news, #ML is in dire need of more theoretical grounding, and I have high hopes that the COLT community (both old timers and newcomers) have a shot at doing that! Looking forward to unearth the breakthroughs in these 400 papers :)
4
8
121
@SebastienBubeck
Sebastien Bubeck
5 years
My 1st #DeepLearning paper & yet another win for #19thcenturyMathematics ! Weierstrass transform (used to prove Stone-Weierstrass thm) produce *smooth* (= robust) functions. #AdversarialTraining on Weierstrass transformed #deepnets give ell_2 SOTA ImageNet!
Tweet media one
4
16
118