Sanjeev Arora @prfsanjeevarora Twitter profile | Pikagi

Pikagi

Sanjeev Arora

@prfsanjeevarora

21,272

Followers

32

Following

9

Media

409

Statuses

Director, @PrincetonPLI and Professor @PrincetonCS . Seeks math/conceptual understanding of deep learning and large AI models.

New Jersey, USA

https://t.co/IKHQnFbrwQ

Joined July 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

8 months

Really excited about the launch of this research initiative. Hiring Research Scientists now. Research Software Engineers and postdocs over next few months. 300 H100 GPUs. Multidisciplinary teams. Princeton helps keep AI expertise in the open sphere. More:

@PrincetonPLI

Princeton PLI

8 months

“The dramatic rise of AI capabilities…is a watershed event for humanity…It is also sure to transform research and teaching in every academic discipline.” – @prfsanjeevarora , director of the new @Princeton Language and Intelligence initiative. For more:

Tweet media one

1

7

56

14

66

492

Last Seen Profiles

@maldiveswave_

@Kolthro

@PearlRiverWBB

@_10toLIFE

@rongmkadinali

@RockySchwartz38

@BrandLokalPride

@RocketXLabsENS

@CoachLaDage

@MouflonUnmilked

@AP_ButterflyPin

@youtaroyoujo

@smartgworl

@banshi_lal32508

@JohzafatRuiz

@NYSTaxDept

@NYCFCSOURCE

@AnhSketch

@Blue_Bunny

@runreing_s66515

@tristan_crrd

@UnforgivenLoL

@yikildikya

@NgouadiMpika

@Luis_JFT97

@Lazergician

@NBB_BNB_NL

@mottagio1971

@jloughlenart

@singh7302

@SirChandlerBlog

@ally_cocaine

@mdn_net

@ScarletsWcR

@syldan_jp

@lees3olvr

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Conventional wisdom: "Not enough data? Use classic learners (Random Forests, RBF SVM, ..), not deep nets." New paper: infinitely wide nets beat these and also beat finite nets. Infinite nets train faster than finite nets here (hint: Neural Tangent Kernel)!

Tweet card media

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b)...

10

207

827

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

"Is optimization the right language to understand the brain?" is a famous controversy in neuroscience. My new blog post asks if optimization is the right language even to understand deep learning? (TL;DR: let's think: trajectories!)

Is Optimization a Sufficient Language for Understanding Deep Learning?

Algorithms off the convex path.

www.offconvex.org

6

200

718

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 year

Princeton has a new Center for Language and Intelligence, researching LLMs + large AI models, as well as their interdisciplinary applications. Looking for postdocs/research scientists/engineers; attractive conditions.

22

116

622

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Conventional wisdom: slowly decay learning rate (lr) when training deep nets. Empirically, some exotic lr schedules also work, eg cosine. New work with Zhiyuan Li: exponentially increasing lr works too! Experiments + surprising math explanation. See

15

136

554

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Blogpost on our new theory for word2vec-like representation learning methods for images, text, etc. Explains why representation do well on previously unseen classification tasks Relevant to meta learning, transfer learning? Paper

Tweet card media

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent...

0

142

504

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Workshop: "Theory of Deep Learning: Where Next?" at the Institute for Advanced Study, Tuesday--Friday this week. Amazing schedule of talks! Registration is closed (sorry), but follow livestream here

Tweet card media

IAS Livestream Page

8

124

502

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Off to ICML'18 to present a tutorial on "Toward Theoretical Understanding of Deep Learning" Tuesday 1pm. Lecture slides and bibliography here.

2

120

494

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 month

Big congratulations to Avi Wigderson of IAS Princeton for winning the Turing Award in CS. Truly an all-time great in theoretical computer science and discrete math. Also one of the nicest human beings I know --friend and mentor to so many (including me)

Tweet card media

Avi Wigderson of the Institute for Advanced Study is the recipient of the 2023 ACM A.M. Turing Award

For foundational contributions to the theory of computation, including reshaping our understanding of the role of randomness in computation

3

66

488

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Our long-delayed blogpost on ICLR20 paper that shows current deep nets can be trained with learning rate that is exponentially increasing. Not just experiments but also a mathematical proof that this is at least as powerful as usual LR tuning.

Exponential Learning Rate Schedules for Deep Learning (Part 1)

Algorithms off the convex path.

www.offconvex.org

2

118

470

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 year

Major news in AI today. Hinton is the father of modern deep learning and AI. Lecun and Bengio were his postdocs. @ilyasut of OpenAI was his student.

Tweet card media

‘The Godfather of A.I.’ Leaves Google and Warns of Danger Ahead (Published 2023)

For half a century, Geoffrey Hinton nurtured the technology at the heart of chatbots like ChatGPT. Now he worries it will cause serious harm.

www.nytimes.com

24

82

445

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

"New directions in optimization, statistics, and machine learning." April 15,16. Online workshop at @the_IAS . Speakers include @pushmeet @StefanoErmon @_beenkim @RogerGrosse @zicokolter @chelseabfinn @percyliang . Enter email info for zoom link

15

91

393

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 months

The GPUs have landed! (300 H100s on tap for AI research @PrincetonPLI )

Tweet card media

Princeton invests in new 300-GPU cluster for academic AI research

A cluster of 300 Nvidia H100 GPUs will boost the University’s robust computing infrastructure, accelerate exploration of generative AI, and help keep AI research in the public sphere.

ai.princeton.edu

5

26

390

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

How do you compute with an infinitely wide deep net (eg, AlexNet or VGG with width taken to infinity)? Despite crazy overparametrization, this net works OK on finite dataset CIFAR10. To understand how this was done ( via "Neural Tangent Kernels") see

Tweet card media

On Exact Computation with an Infinitely Wide Neural Net

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and...

4

79

377

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Deep-learning-free text embeddings. Surprisingly simple text embeddings suffice to match the performance of much more sophisticated methods for capturing the meaning of text.

Deep-learning-free Text and Sentence Embedding, Part 1

Algorithms off the convex path.

www.offconvex.org

1

116

350

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

ICML 2018 videos seem to be up on youtube now. The video of my tutorial "Toward Theoretical Understanding of Deep Learning" is

Tweet card media

ICML 2018: Tutorial Session: Toward the Theoretical Understanding of...

Watch this video with AI-generated Table of Content (ToC), Phrase Cloud and In-video Search here: https://videos.videoken.com/index.php/videos/icml-2018-tuto...

www.youtube.com

0

79

322

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Contrastive learning gives great data representations. New paper (title is a homage to Zhang et al'16) says understanding requires opening the black box of deep learning). (Note: Lead author Nikunj Saunshi, is on the job market.)

Tweet card media

Understanding Contrastive Learning Requires Incorporating Inductive Biases

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of...

2

60

301

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

7 months

We're looking for postdoctoral fellows in AI! We offer: excellent cohort of young researchers, dedicated GPU cluster with 300H100s, $100K salary (+$10k research funds), stunning campus. 1 hour from NYC and Philly. Renewable, i.e., possible to stay multiple years. Join us!

@PrincetonPLI

Princeton PLI

7 months

Excited to announce the Princeton Language and Intelligence Postdoctoral Research Fellowship! Candidates are encouraged to apply by the start-of-review date, Friday, December 1, 11:59 pm (EST), for full consideration. Details:

Tweet media one

3

13

56

6

86

303

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

7 months

We're hiring Research Engineers and Research Scientists now, and postdocs in the winter. Please join us in developing AI as well as apply it to academic disciplines including in humanities, social sciences and the sciences.

@PrincetonPLI

Princeton PLI

7 months

"Beyond ChatGPT: New Princeton Language and Intelligence (PLI) initiative pushes the boundaries of large AI models." Read full story:

Tweet media one

0

7

39

6

42

253

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

7 months

With sparse coding again popular for interpretability in LLMs please look at older work! "Latent structure in word embeddings" , Atoms of meaning" , Decoding brain fMRI via sentence embeddings

Tweet card media

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski. Transactions of the Association for Computational Linguistics, Volume 6. 2018.

aclanthology.org

1

37

245

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Fine tuned LLMs can solve many NLP tasks. A priori, fine-tuning a huge LM on a few datapoints could lead to catastrophic overfitting. So why doesn’t it? Our theory + experiments (on GLUE) reveal that fine-tuning is often well-approximated as simple kernel-based learning. 1/2

5

34

239

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

New blog post by Nadav Cohen. If we want to understand deep learning, we have to start analysing the trajectory of gradient descent rather than the landscape. . The paper is here

Tweet card media

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss...

0

75

239

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Matching Alexnet performance (89%) on CIFAR10 using kernel method. Excluding deep nets, previous best was 86% (Mairal NIPS'16). Key Ideas: convolutional NTK + Coates-Ng random patches layer + way to fold data augmentation into kernel defn

Tweet card media

Enhanced Convolutional Neural Tangent Kernels

Recent research shows that for training with $\ell_2$ loss, convolutional neural networks (CNNs) whose width (number of channels in convolutional layers) goes to infinity correspond to regression...

0

47

231

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Hoping to read new papers by Allen-Zhu et al. Training provably converges on greatly overparametrized deep nets. And such overparametrized deep nets can generalize when trained on data from teacher net. and

Tweet card media

Learning and Generalization in Overparameterized Neural Networks,...

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is...

1

55

226

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

InstaHide: trains deep nets on encrypted data only. Very fast, preserves privacy of user data, small accuracy loss (unlike differential privacy).

Tweet media one

@IASMLSeminars

IAS Seminar Series on Theoretical Machine Learning

4 years

This coming week Prof. @FeiziSoheil of @umdcs and @prfsanjeevarora of @PrincetonCS and @the_IAS will be speaking at @the_IAS as part of the Special Year on Theoretical #MachineLearning . Details and registration info at and .

Tweet media one

2

19

74

5

72

216

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Shiller's advice is good in any field. Easy but sad explanation for why young people often ignore this advice : (N+1)th result in a field with N results is difficult to obtain, hence easy to publish. The 1st or 2nd result in a field are easier to obtain, but harder to publish.

@econfilm

Econ Films

5 years

Nobel Laureate @RobertJShiller gives advice to young Economists: be daring and go beyond the frontiers of knowledge. An @econfilm production for @lindaunobel . #econtwitter

3

62

115

3

47

215

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Remember matrix completion? Deep linear nets solve it better than the old nuclear norm algorithm. Analysis requires going beyond traditional optimization view and understanding #trajectories . Blog post by Nadav and Wei: . Paper

Tweet card media

Implicit Regularization in Deep Matrix Factorization

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low...

2

52

216

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

New mathematical explanation of lack of barriers in deep learning landscape (i.e., low-cost solutions interconnected via regions of low cost; ICML18). Applies to realistic deep nets and uses noise stability property. Rong Ge's blog post about our paper

Landscape Connectivity of Low Cost Solutions for Multilayer Nets

Algorithms off the convex path.

www.offconvex.org

4

60

211

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

3 years

Giving a plenary talk today @RealAAAI #AAAI2021 on "Opening the black box of deep learning (+ take-aways for AI)"

AAAI-21/IAAI-21/EAAI-21 Invited Speaker Program - AAAI

AAAI-21 is pleased to present the following series of distinguished speakers: February 4 6:30 – 7:30 AM: Opening Ceremony and Conference Awards Qiang Yang, AAAI-21 General Chair; Kevin Leyton Brown...

7

29

210

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

3 years

Has deep learning overfitted to test sets of popular datasets? Move over Occam! Rip Van Winkle's Razor gives nontrivial upper bounds on amount of overfit for popular architectures. Blog post + article with Yi Zhang

Rip van Winkle's Razor, a Simple New Estimate for Adaptive Data Analysis

Algorithms off the convex path.

www.offconvex.org

2

41

204

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Saliency maps give “human interpretability” to deep learning. NIPS18 paper ( @mrtz @goodfellow_ian @_beenkim ) showed they fail “sanity checks” involving model and data randomization. We fix saliency maps to pass sanity checks ("Competition for pixels")

Tweet card media

A Simple Saliency Method That Passes the Sanity Checks

There is great interest in "saliency methods" (also called "attribution methods"), which give "explanations" for a deep net's decision, by assigning a "score" to each feature/pixel in the input....

1

42

199

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Day long event at the Institute for Advanced Study on Fri Feb 22. Deep Learning: Alchemy or Science? Speakers: Mike Collins, @ylecun , @zacharylipton , Joelle Pineau, Shai Shalev Schwartz. Will be livestreamed. Panel will respond to qs from worldwide audience via twitter.

4

48

198

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Giving three talks for ETH Zurich Paul Bernays Lecture 2022. "The quest for mathematical understanding of artificial intelligence." . This week's two talks are accessible to non-experts.

Tweet card media

Paul Bernays Lectures 2022

8

33

191

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Blog post on new mismatches between current theories of optimization and modern deep learning. Tiny Learning Rates don't hurt generalization. Surprising insight about fast mixing in landscape and what it means. New theory with @zhiyuanli_ and @vfleaking .

Mismatches between Traditional Optimization Analyses and Modern Deep Learning

Algorithms off the convex path.

www.offconvex.org

2

40

188

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Simons Foundation and NSF propose to spend $20M to fund projects on Mathematical and Scientific Foundations of Deep Learning An interesting public-private partnership to fund basic research.

0

28

176

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

2nd article on Deep Learning Free Text embeddings that are easy, trivial and fast to implement, and compete quite well with way more complicated embeddings.

Deep-learning-free Text and Sentence Embedding, Part 2

Algorithms off the convex path.

www.offconvex.org

2

54

173

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Graduation celebrations are back! Glad to celebrate with @weihu_ , who's off to Asst. Prof position at @UMich this Fall.

Tweet media one

4

0

165

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

My new paper (joint with Nadav Cohen and Elad Hazan) on the benefits of overparametrization is up . I recommend Nadav's nice blog post as a starting point:

Can increasing depth serve to accelerate optimization?

Algorithms off the convex path.

www.offconvex.org

2

38

165

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Visited the new @GoogleAI lab in Palmer Square, Princeton and enjoyed the excellent coffee with my colleague (and lab co-director) @HazanPrinceton . Exciting times for machine learning and AI in Princeton NJ!

Tweet media one

4

9

163

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

3 years

Congratulations to Avi Wigderson and Laci Lovasz for winning the Abel Prize (the math equivalent of the Nobel prize)!

Tweet card media

2 Win Abel Prize for Work That Bridged Math and Computer Science (Published 2021)

Avi Wigderson and László Lovász will share the annual prize that aims to be something like the Nobel for mathematics.

www.nytimes.com

1

14

163

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Theory of Deep Learning: Where Next? Workshop @the_IAS Princeton Oct 15-18 2019. Great speaker lineup! Registration open. Contributed paper/talk/poster submission deadline Sept 2.

Workshop on Theory of Deep Learning: Where next?

The event was live-streamed. Organizers: The workshop was organized by Sanjeev Arora (IAS/Princeton University), Joan Bruna (IAS/NYU), Rong Ge (IAS/Duke), Suriya Gunasekar(IAS/Toyota Technical...

1

44

160

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Blog returns from summer. New article by Simon Du and Wei Hu on Neural Tangent Kernels (which capture the power of infinitely wide nets trained on finite datasets). . Watch out for more in coming weeks!

Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)

Algorithms off the convex path.

www.offconvex.org

0

27

151

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Computing Convolutional Neural Tangent Kernels (CNTK) for 20-layer nets with pooling layer is computationally expensive and many people wrote to us wondering how it is feasible. Short answer: these students not only have great theory chops, but can also write CUDA!

@RuosongW

Ruosong Wang

5 years

We have released code for computing Convolutional Neural Tangent Kernel (CNTK) used in our paper "On Exact Computation with an Infinitely Wide Neural Net", which will appear in NeurIPS 2019. Paper: Code:

1

46

209

1

26

147

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Seminar series in theoretical ML is continuing online this summer at @the_IAS . Upcoming speakers: @mraginsky (today at 12:20pm!), Mike Jordan, Shankar Sastry, etc. Registration required.

Special year seminar - Math

1

23

144

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Introductory article on the generalization mystery of deep learning

Generalization Theory and Deep Nets, An introduction

Algorithms off the convex path.

www.offconvex.org

0

54

144

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 months

Congratulations on this overdue recognition of your amazing work @chrmanning !

@StanfordAILab

Stanford AI Lab

4 months

Congratulations to @StanfordAILab Director @chrmanning , awarded the 2024 IEEE John von Neumann Medal, one of @IEEEAwards ’s top awards “for outstanding achievements in computer-related science and technology”, for his advances in #NLProc .

Tweet media one

22

61

561

1

4

139

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Looking forward to talks by @wellingmax and Yoshua Bengio this week as our special year on machine learning, @the_IAS

@IASMLSeminars

IAS Seminar Series on Theoretical Machine Learning

4 years

This coming week Prof. @wellingmax of @UvA_Amsterdam and Prof. Yoshua Bengio of @umontreal @MILAMontreal will be speaking at @the_IAS as part of the Special Year on Theoretical #MachineLearning . Details and registration info at and .

Tweet media one

1

31

84

0

18

128

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

10 months

Princeton Language and Intelligence Initiative looking for Research Scientists (PhD reqd). Foci: (i) Foundation Models, LLMs (ii) Applications of models to other disciplines. (iii) Understanding effects on society and mitigating harms. Lets Chat at ICML23?

1

34

132

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Youtube now has video of my plenary lecture at International Congress of Mathematicians: Mathematics of Machine Learning and Deep Learning.

Tweet card media

The mathematics of machine learning and deep learning – Sanjeev Arora...

Plenary Lecture 15The mathematics of machine learning and deep learningSanjeev Arora Abstract: Machine learning is the sub-field of computer science concerne...

www.youtube.com

1

28

129

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

#IASMLyear Special year in machine learning, optimization and statistics 2019-20; Institute for Advanced Study. Visit with stipend for term or a year; shorter visits possible for industry folks. Apply by Dec 1

Special Year on Optimization, Statistics, and Theoretical Machine Learning

The special year was led by Sanjeev Arora, Charles Fitzmorris Professor of Computer Science at Princeton University, with a dual appointment at IAS in 2017-2019 (Visiting Professor) and 2019-2020...

0

35

127

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

How do you induce embeddings for a word from a single or few occurences? Simple method that also improves unsupervised sentence embeddings: A la carte embeddings. Also how diff. meanings of word reside inside its embedding (TACL )

Simple and efficient semantic embeddings for rare words, n-grams, and language features

Algorithms off the convex path.

www.offconvex.org

1

43

126

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

43% women among Princeton Computer Science majors (Engg track) in class of 2022, out of a total of 115! Outlier or part of a national trend?

9

5

122

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Five amazing expositions of zero-knowledge proofs by Amit Sahai of UCLA aimed at five v. different types of listeners. Heartening to see a mathy video rack up millions of views in a few weeks.

Tweet card media

Computer Scientist Explains One Concept in 5 Levels of Difficulty |...

Computer scientist Amit Sahai, PhD, is asked to explain the concept of zero-knowledge proofs to 5 different people; a child, a teen, a college student, a gra...

www.youtube.com

1

21

118

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Efficient Covid testing: how to test more patients with fixed number of test kits. Cool applications of math concepts we teach our students: coding theory, compressed sensing, etc.

1

35

122

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

New blog post describes our new paper (with Rong Ge, Behnam Neyshabur, Yi Zhang) making progress on the generalization mystery of deep nets. The bounds are orders of magnitude better than recent papers.

Proving generalization of deep nets via compression

Algorithms off the convex path.

www.offconvex.org

2

37

117

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

3 months

Amazing finding (wonder how much the compute budget was :)). Not surprising if you think about how fractals emerge in even simpler settings.

@jaschasd

Jascha Sohl-Dickstein

3 months

The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.

26

175

1K

1

13

116

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

10 months

Very excited about this paper and its implications. Turing-completeness of transformers implies they can simulate other models inside them. But it's nontrivial a net can do gradient updates on another net insided them which is 1/8th the size. Great work by the student team!

@Abhishek_034

Abhishek Panigrahi

10 months

**New paper ** In-context learning was explained as simulate + train simple models at inference. We show a 2B model can run GD on an internal 125M model. Surprising simulation + AI safety implications! 1/5 w/ @SadhikaMalladi , @xiamengzhou , @prfsanjeevarora

2

49

242

2

21

108

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Panel discussion at 4:30pm in IAS workshop "Theory of Deep Learning: Where Next?" Panelists include @ylecun , @chrmanning , Srebro, Bottou, Collins, Kakade etc. Please tweet your questions for the panel in response to this.

Workshop on Theory of Deep Learning: Where next?

The event was live-streamed. Organizers: The workshop was organized by Sanjeev Arora (IAS/Princeton University), Joan Bruna (IAS/NYU), Rong Ge (IAS/Duke), Suriya Gunasekar(IAS/Toyota Technical...

6

18

111

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Postdoc positions in theoretical machine learning at Princeton CS Dept. Relevant faculty include Elad Hazan, Ryan Adams, Yoram Singer, and me. Mention in cover letter which faculty you are interested in. Best to apply by Dec 15; latest by Jan 10.

2

40

112

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

The AIML group at Princeton University computer science

1

12

112

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Good to see the leader in this week's Economist about large language models. Covers questions many of the issues being discussed in AI/ML, including nature of "intelligence", huge training cost (and "rich getting richer"), scaling phenomena, geopolitics.

@TheEconomist

The Economist

2 years

“Foundation models” represent a breakthrough in artiﬁcial intelligence or AI. They are a new form of creative, non-human intelligence and promise to bring great benefits

11

44

111

4

15

103

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

My talk at @mitidss on theory for contrastive unsupervised representation learning (word2vec-like methods popular for learning embeddings of images, text, molecules etc.). Paper (with amazing student group) is here . Blog post soon!

Tweet card media

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent...

0

24

103

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 months

I was quite curious what OpenAI's preparedness unit is working on, and @aleks_madry gave a good high-level view in our Princeton Alignment and Safety seminar Kudos to @SadhikaMalladi and @YangsiboHuang for interesting followup Q&A

2

12

99

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

4 years

Looking forward to these talks by @aleks_madry (Tues 12:30EST) and Mike Jordan (Thurs 3pm) this week! Register to get zoom password.

@IASMLSeminars

IAS Seminar Series on Theoretical Machine Learning

4 years

This coming week Prof. @aleks_madry of @MIT_CSAIL and Prof. Michael I. Jordan of @berkeley_ai , will be speaking at @the_IAS as part of the Special Year on Theoretical #MachineLearning . Details and registration info at and .

Tweet media one

1

46

113

2

14

97

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Congratulations to @PeterShor1 and Dan Spielman for winning the breakthrough prize this year!

1

3

98

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

3 months

Article on (i) theory of emergence of complex skills in LLMs (ii) SKILL-MIX eval -- shows LLMs able to use skills combos not seen during training. @QuantaMagazine 's thoroughness and quality exemplary! Quotes @geoffreyhinton . Video of related talk

Tweet card media

Why do large language models display new and complex skills?

Sanjeev Arora, the Charles C. Fitzmorris Professor of Computer Science at Princeton University, visited CSE on December 1, 2023 to deliver the 20th EECS Will...

www.youtube.com

@QuantaMagazine

Quanta Magazine

@QuantaMagazine

4 months

“Stochastic parrots” generate text only by combining information they have already seen, not through any understanding of their own. Are ChatGPT, Bard and other large chatbots simply parroting their training data? The answer is probably no.

8

50

152

2

9

99

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Was fun to give the Ahlfors lectures in the Harvard Math dept this week. Link from Harvard Crimson Slides for talk 1: Slides for talk 2:

Tweet card media

mathofdeeplearning.pptx

Shared with Dropbox

www.dropbox.com

2

20

95

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Nontrivial generalization bounds on deep nets are tough. PGDL competition (Neurips20) promoted empirical study of predictors of generalization error. Our ICLR22 spotlight aced PGDL testbed. Idea: estimate w synthetic data from GANs trained on training data

Predicting Generalization using GANs

Algorithms off the convex path.

www.offconvex.org

3

16

90

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

2-day workshop "New Directions in Reinforcement Learning and Control" @the_IAS in Princeton Nov 7-8. Schedule and livestream here .

Tweet card media

IAS Livestream Page

0

23

92

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Speaking tomorrow (Friday) 2pm in @icmlconf workshop on Theor. Physics in Deep Learning. Title: "Is Optimization a sufficient language to understand deep learning?" (Also, grad Orestis speaking Thurs 4pm about our work on word2vec-like methods for representation learning.)

2

7

91

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Congratulations to @tengyuma for honorable mention in 2018 ACM Doctoral Dissertation award! . Congrats also to @chelseabfinn and Ryan Beckett. Tengyu and Ryan were both Princeton grad students!

Tweet card media

Chelsea Finn will receive the 2018 ACM Doctoral Dissertation Award

Chelsea Finn will receive the 2018 ACM Doctoral Dissertation Award for her thesis “Learning to Learn with Gradients.”

1

5

89

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Deep Learning #AlchemyOrScience at @the_IAS *this Friday* 9:50amEST livestream link: Tweet questions for speakers using #AlchemyOrScience , Speaker name.

Tweet card media

Deep Learning: Alchemy or Science?

Deep learning has led to rapid progress in open problems of artificial intelligence—recognizing images, playing Go, driving cars, automating translation between languages—and has triggered a new gold...

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Deep Learning: Alchemy or Science? IAS February 22, 2019. Description: ; Agenda:

0

15

59

1

21

89

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Fantastic popular lecture by Stanford's Chris Manning on Natural Language Processing and Deep Learning. Best popular introduction I know of to the mysteries of language and how to teach machines to understand them.

Tweet card media

Theoretical Machine Learning Lecture Series: Deep Learning and Cognition

Deep learning, which is the reemergence of artificial neural networks, has recently succeeded as an approach towards artificial intelligence. In many fields, including computational linguistics, deep...

0

29

85

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 month

Very interesting papers @ZeyuanAllenZhu . This trick is very interesting. I recall hearing evidence OpenAI does label training data with source/provenance (the LLM sometimes spits out those memorized labels). Can't remember where/who I learnt this from

@ZeyuanAllenZhu

Zeyuan Allen-Zhu

@ZeyuanAllenZhu

1 month

Result 10/11/12: surprisingly, when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times! A simple fix: add domain tokens to your data; LLMs can auto-detect domains rich in knowledge and prioritize.

Tweet media one

18

40

288

3

9

86

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

AI/ML winter school for Indian college profs Dec 12-23; @Infosys Mysore campus. Fully paid; apply by Nov 1 Thanks to @pulkitology @sameer_ @pathak2206 , @dineshjayaraman Shubhashis Banerjee, CV Jawahar, Chetan Arora @kvijayraghavan @AshokaUniv 1/2

7

22

83

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

An overdue recognition! Congratulations Yoshua, Geoff and @ylecun !

@TheOfficialACM

Association for Computing Machinery

@TheOfficialACM

5 years

Yoshua Bengio, Geoffrey Hinton and Yann LeCun, the fathers of #DeepLearning , receive the 2018 #ACMTuringAward for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing today.

Tweet media one

26

1K

3K

0

4

80

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Nice succinct resource on optimization.

@HazanPrinceton

Elad Hazan

@HazanPrinceton

5 years

finished compiling lecture notes from my course on optimization for machine learning: (comments/suggestions welcome!)

9

147

660

0

10

81

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Cohen et al. 2021 showed that gradient descent in deep nets doesnt operate acc. to traditional optimization: it operates beyond "Edge of stability." New paper with @zhiyuanli_ @Abhishek_034 analyses GD beyond EoS and shows sharpness reduction benefit.

Tweet card media

Understanding Gradient Descent on Edge of Stability in Deep Learning

Deep learning experiments by Cohen et al. [2021] using deterministic Gradient Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and sharpness (i.e., the largest...

4

20

81

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Also will discuss this paper briefly in my talk at the DL theory workshop. The talk is about the importance of analysing optimization trajectories.

@nadavcohen

Nadav Cohen

5 years

Proof of convergence to global optimum for gradient descent on linear neural networks (joint w/ @prfsanjeevarora @Hoooway Noah Golowich) --- check it out tomorrow in #NeurIPS2018 DL theory workshop poster session (220D, 3PM)!

Tweet media one

1

7

39

0

19

76

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

8 months

Research Software Engineer positions in AI! Enable core AI research & interdisciplinary applications at Princeton. SoTA GPU cluster with 300 Nvidia H100s. Attractive and collaborative work environment. Positions based in Princeton (but flexible work setup), starting asap.

@PrincetonPLI

Princeton PLI

8 months

We are hiring! We invite applications for our Research Software Engineer (RSE) position. Details:

Tweet media one

0

6

13

2

18

73

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 months

Launching blog @PrincetonPLI with a post on skillmix. LLMs aren't just "stochastic parrots." @geoffreyhinton recently mentioned this as evidence that LLMs do "understand" the world a fair bit. More blog posts on the way! (Hinton's post here: )

@PrincetonPLI

Princeton PLI

6 months

We are excited to introduce the PLI Blog! First post by @prfsanjeevarora , "Are Language Models Mere Stochastic Parrots? The SkillMix Test Says NO."

Tweet media one

1

5

31

3

15

72

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Our paper on provably efficient algorithms for topic modeling finally appeared in CACM. Many people use these methods instead of older EM or MCMC approaches.

2

29

72

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 year

Excited about this new work from our group. Local SGD will be increasingly important as distributed training strategies (with asynchronous) updates will allow more flexible training of large AI models. Great theory and experiments, kudos to @hmgxr128 and @vfleaking !

@hmgxr128

Xinran Gu

1 year

Local SGD, though designed to reduce communication, can generalize better than SGD! Our #ICLR2023 paper gives the first theoretical explanation of this phenomenon: local steps inject extra noise, driving the iterate to drift faster to flatter minima on the minimizer manifold. 1/4

2

32

186

3

4

71

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

11 months

Fine-tuning language models using just forward pass! Our paper should interest you if you have enough GPU memory to evaluate your model but not enough for efficient backpropagation. Zeroth order optimization is an old idea but there are subtleties and tricks in making this work!

@SadhikaMalladi

Sadhika Malladi

@SadhikaMalladi

11 months

Introducing MeZO - a memory-efficient zeroth-order optimizer that can fine-tune large language models with forward passes while remaining performant. MeZO can train a 30B model on 1x 80GB A100 GPU. Paper: Code:

Tweet media one

9

93

455

0

6

69

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Interesting survey of progress in analysis of nonconvex optimization in machine learning by Prateek Jain and Purushottam Kar.

1

26

72

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

15 days

Great thesis by @ShunyuYao12 !

@ShunyuYao12

Shunyu Yao

15 days

I will present my thesis defense tomorrow! Language Agents: From Next-Token Prediction to Digital Automation - 10am EST on Thursday, May 2 - - WebShop, ReAct, ToT, CoALA - Briefly: SWE-bench/agent - Thoughts on the future of language agents

Tweet media one

26

55

671

0

6

71

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 years

Skeptical about deep learning theory that uses continuous formulations (e.g. SDE) to reason about discrete Stochastic Gradient Descent? Don't miss this poster today.

@zhiyuanli_

Zhiyuan Li

2 years

Stochastic Differential Equation (SDE) has been widely used to model and understand SGD, e.g., the famous Linear Scaling Rule follows directly from it. But is this heuristic approximation really valid in deep learning practice? paper: 🧵(1/5)

Tweet media one

5

32

198

1

11

68

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

3 months

Excited about our new work from @PrincetonPLI . Our grads never cease to amaze us. It's better to use just 5% of the instruction-tuning data (suitably selected) instead of the full dataset.

@xiamengzhou

Mengzhou Xia

3 months

Lots of instruction tuning data out there...but how to best adapt LLMs for specific queries? Don’t use ALL of the data, use LESS! 5% beats the full dataset. Can even use one small model to select data for others! Paper: Code: [1/n]

Tweet media one

13

98

435

0

2

68

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

22 days

SWE-agent rocks

@PrincetonCS

Princeton Computer Science

1 month

Researchers @PrincetonPLI have created an autonomous AI software engineer that’s free and open source. 💻 Called SWE-agent, it uses an LLM, like GPT-4, to automatically fix coding problems in GitHub. 🤯 It can solve problems in about 90 seconds with high accuracy

Tweet media one

1

9

46

1

7

62

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Encoder-decoder GANs architectures still don't fix the theoretical problems in GANs framework such as mode collapse. Encoders may produce nonsense codes and the discriminator is none the wiser. Blog post and ICLR'18 paper

Tweet card media

Do GANs learn the distribution? Some Theory and Empirics

We propose a support size estimator of GANs's learned distribution to show they indeed suffer from mode collapse, and we prove that encoder-decoder GANs do not avoid the issue as well.

3

15

62

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 months

Deepseek's new VLM is very impressive. But p 7 of mentions they trained on 1M books from "Annas Archive", i.e., illegal downloads. That's 100B very high-quality tokens. Dark new world...

2

9

61

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 months

Excited to see how SWE Bench from @PrincetonPLI group is now guiding use of AI for software engineering.

@cognition_labs

Cognition

@cognition_labs

2 months

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is

5K

11K

46K

0

10

59

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

Deep Learning: Alchemy or Science? IAS February 22, 2019. Description: ; Agenda:

Tweet card media

Deep Learning: Alchemy or Science?

Deep learning has led to rapid progress in open problems of artificial intelligence—recognizing images, playing Go, driving cars, automating translation between languages—and has triggered a new gold...

0

15

59

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

2 months

LLMs can exhibit unsafe behaviors after fine-tuning on perfectly benign-looking data. To avoid this, it is best to ignore recommended fine-tuning best practices (eg on Llama2). TL;DR: fine-tune without the recommended safety prompt, but use the safety prompt at inference.

@vfleaking

Kaifeng Lyu

2 months

Fine-tuning can improve chatbots (e.g., Llama 2-Chat, GPT-3.5) on downstream tasks — but may unintentionally break their safety alignment. Our new paper: Adding a safety prompt is enough to largely mitigate the issue, but be cautious about when to add it!

Tweet media one

4

18

74

1

7

59

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Our paper on generalization bounds for deep nets (joint with Rong Ge, Behnam Neyshabur, and Yi Zhang) is here Is uses a new approach based upon direct compression. See also my blog post on

Proving generalization of deep nets via compression

Algorithms off the convex path.

www.offconvex.org

0

26

57

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 year

NSF funding large projects in infrastructure for computing. Deep learning (eg foundation models) an obvious use. Hoping universities are looking at this. Contact me if you need Princeton as partner.

Tweet card media

Mid-scale Research Infrastructure-1 (Mid-scale RI-1)

1

6

58

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 year

Very nice paper indeed. Learnt a lot from it.

@finbarrtimbers

finbarr

@finbarrtimbers

1 year

The Chinchilla paper is one of my favorite papers of the last few years I love that they actually came up with a law for training models. Very few papers bold enough to make that claim & back it up with excellent experiments

Tweet media one

7

54

564

2

8

57

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 months

I love this paper. Current coding evals do not test on complicated problems involved in real-life software engineering. Great post!

@PrincetonPLI

Princeton PLI

5 months

In our second PLI Blog post, authors @_carlosejimenez and @jyangballin describe testing LLMs for challenges that software engineers face everyday. Read it here:

Tweet media one

0

10

40

0

10

56

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

6 years

Slides and links for my Plenary lecture at International Congress of Mathematicians 2018. (Thanks to Assaf Naor for the picture).

3

15

57

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

1 month

Yeah the grads here are amazing. Every day at work is a treat.

@deliprao

Delip Rao e/σ

1 month

Proof that you don't need Olympiad golds for building towards a better Devin if you have open source. (although at Princeton, Olympiad medalists are so commonplace that even if they exist, they don't bother to mention them)

Tweet media one

36

44

409

0

0

54