Sanjeev Arora Profile
Sanjeev Arora

@prfsanjeevarora

21,272
Followers
32
Following
9
Media
409
Statuses

Director, @PrincetonPLI and Professor @PrincetonCS . Seeks math/conceptual understanding of deep learning and large AI models.

New Jersey, USA
Joined July 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@prfsanjeevarora
Sanjeev Arora
8 months
Really excited about the launch of this research initiative. Hiring Research Scientists now. Research Software Engineers and postdocs over next few months. 300 H100 GPUs. Multidisciplinary teams. Princeton helps keep AI expertise in the open sphere. More:
@PrincetonPLI
Princeton PLI
8 months
“The dramatic rise of AI capabilities…is a watershed event for humanity…It is also sure to transform research and teaching in every academic discipline.” – @prfsanjeevarora , director of the new @Princeton Language and Intelligence initiative. For more:
Tweet media one
1
7
56
14
66
492
@prfsanjeevarora
Sanjeev Arora
5 years
Conventional wisdom: "Not enough data? Use classic learners (Random Forests, RBF SVM, ..), not deep nets." New paper: infinitely wide nets beat these and also beat finite nets. Infinite nets train faster than finite nets here (hint: Neural Tangent Kernel)!
10
207
827
@prfsanjeevarora
Sanjeev Arora
5 years
"Is optimization the right language to understand the brain?" is a famous controversy in neuroscience. My new blog post asks if optimization is the right language even to understand deep learning? (TL;DR: let's think: trajectories!)
6
200
718
@prfsanjeevarora
Sanjeev Arora
1 year
Princeton has a new Center for Language and Intelligence, researching LLMs + large AI models, as well as their interdisciplinary applications. Looking for postdocs/research scientists/engineers; attractive conditions.
22
116
622
@prfsanjeevarora
Sanjeev Arora
5 years
Conventional wisdom: slowly decay learning rate (lr) when training deep nets. Empirically, some exotic lr schedules also work, eg cosine. New work with Zhiyuan Li: exponentially increasing lr works too! Experiments + surprising math explanation. See
15
136
554
@prfsanjeevarora
Sanjeev Arora
5 years
Blogpost on our new theory for word2vec-like representation learning methods for images, text, etc. Explains why representation do well on previously unseen classification tasks Relevant to meta learning, transfer learning? Paper
0
142
504
@prfsanjeevarora
Sanjeev Arora
5 years
Workshop: "Theory of Deep Learning: Where Next?" at the Institute for Advanced Study, Tuesday--Friday this week. Amazing schedule of talks! Registration is closed (sorry), but follow livestream here
8
124
502
@prfsanjeevarora
Sanjeev Arora
6 years
Off to ICML'18 to present a tutorial on "Toward Theoretical Understanding of Deep Learning" Tuesday 1pm. Lecture slides and bibliography here.
2
120
494
@prfsanjeevarora
Sanjeev Arora
1 month
Big congratulations to Avi Wigderson of IAS Princeton for winning the Turing Award in CS. Truly an all-time great in theoretical computer science and discrete math. Also one of the nicest human beings I know --friend and mentor to so many (including me)
3
66
488
@prfsanjeevarora
Sanjeev Arora
4 years
Our long-delayed blogpost on ICLR20 paper that shows current deep nets can be trained with learning rate that is exponentially increasing. Not just experiments but also a mathematical proof that this is at least as powerful as usual LR tuning.
2
118
470
@prfsanjeevarora
Sanjeev Arora
4 years
"New directions in optimization, statistics, and machine learning." April 15,16. Online workshop at @the_IAS . Speakers include @pushmeet @StefanoErmon @_beenkim @RogerGrosse @zicokolter @chelseabfinn @percyliang . Enter email info for zoom link
15
91
393
@prfsanjeevarora
Sanjeev Arora
5 years
How do you compute with an infinitely wide deep net (eg, AlexNet or VGG with width taken to infinity)? Despite crazy overparametrization, this net works OK on finite dataset CIFAR10. To understand how this was done ( via "Neural Tangent Kernels") see
4
79
377
@prfsanjeevarora
Sanjeev Arora
6 years
Deep-learning-free text embeddings. Surprisingly simple text embeddings suffice to match the performance of much more sophisticated methods for capturing the meaning of text.
1
116
350
@prfsanjeevarora
Sanjeev Arora
2 years
Contrastive learning gives great data representations. New paper (title is a homage to Zhang et al'16) says understanding requires opening the black box of deep learning). (Note: Lead author Nikunj Saunshi, is on the job market.)
2
60
301
@prfsanjeevarora
Sanjeev Arora
7 months
We're looking for postdoctoral fellows in AI! We offer: excellent cohort of young researchers, dedicated GPU cluster with 300H100s, $100K salary (+$10k research funds), stunning campus. 1 hour from NYC and Philly. Renewable, i.e., possible to stay multiple years. Join us!
@PrincetonPLI
Princeton PLI
7 months
Excited to announce the Princeton Language and Intelligence Postdoctoral Research Fellowship! Candidates are encouraged to apply by the start-of-review date, Friday, December 1, 11:59 pm (EST), for full consideration. Details:
Tweet media one
3
13
56
6
86
303
@prfsanjeevarora
Sanjeev Arora
7 months
We're hiring Research Engineers and Research Scientists now, and postdocs in the winter. Please join us in developing AI as well as apply it to academic disciplines including in humanities, social sciences and the sciences.
@PrincetonPLI
Princeton PLI
7 months
"Beyond ChatGPT: New Princeton Language and Intelligence (PLI) initiative pushes the boundaries of large AI models." Read full story:
Tweet media one
0
7
39
6
42
253
@prfsanjeevarora
Sanjeev Arora
7 months
With sparse coding again popular for interpretability in LLMs please look at older work! "Latent structure in word embeddings" , Atoms of meaning" , Decoding brain fMRI via sentence embeddings
1
37
245
@prfsanjeevarora
Sanjeev Arora
2 years
Fine tuned LLMs can solve many NLP tasks. A priori, fine-tuning a huge LM on a few datapoints could lead to catastrophic overfitting. So why doesn’t it? Our theory + experiments (on GLUE) reveal that fine-tuning is often well-approximated as simple kernel-based learning. 1/2
5
34
239
@prfsanjeevarora
Sanjeev Arora
6 years
New blog post by Nadav Cohen. If we want to understand deep learning, we have to start analysing the trajectory of gradient descent rather than the landscape. . The paper is here
0
75
239
@prfsanjeevarora
Sanjeev Arora
5 years
Matching Alexnet performance (89%) on CIFAR10 using kernel method. Excluding deep nets, previous best was 86% (Mairal NIPS'16). Key Ideas: convolutional NTK + Coates-Ng random patches layer + way to fold data augmentation into kernel defn
0
47
231
@prfsanjeevarora
Sanjeev Arora
6 years
Hoping to read new papers by Allen-Zhu et al. Training provably converges on greatly overparametrized deep nets. And such overparametrized deep nets can generalize when trained on data from teacher net. and
1
55
226
@prfsanjeevarora
Sanjeev Arora
4 years
InstaHide: trains deep nets on encrypted data only. Very fast, preserves privacy of user data, small accuracy loss (unlike differential privacy).
Tweet media one
@IASMLSeminars
IAS Seminar Series on Theoretical Machine Learning
4 years
This coming week Prof. @FeiziSoheil of @umdcs and @prfsanjeevarora of @PrincetonCS and @the_IAS will be speaking at @the_IAS as part of the Special Year on Theoretical #MachineLearning . Details and registration info at and .
Tweet media one
2
19
74
5
72
216
@prfsanjeevarora
Sanjeev Arora
5 years
Shiller's advice is good in any field. Easy but sad explanation for why young people often ignore this advice : (N+1)th result in a field with N results is difficult to obtain, hence easy to publish. The 1st or 2nd result in a field are easier to obtain, but harder to publish.
@econfilm
Econ Films
5 years
Nobel Laureate @RobertJShiller gives advice to young Economists: be daring and go beyond the frontiers of knowledge. An @econfilm production for @lindaunobel . #econtwitter
3
62
115
3
47
215
@prfsanjeevarora
Sanjeev Arora
5 years
Remember matrix completion? Deep linear nets solve it better than the old nuclear norm algorithm. Analysis requires going beyond traditional optimization view and understanding #trajectories . Blog post by Nadav and Wei: . Paper
2
52
216
@prfsanjeevarora
Sanjeev Arora
5 years
New mathematical explanation of lack of barriers in deep learning landscape (i.e., low-cost solutions interconnected via regions of low cost; ICML18). Applies to realistic deep nets and uses noise stability property. Rong Ge's blog post about our paper
4
60
211
@prfsanjeevarora
Sanjeev Arora
3 years
Has deep learning overfitted to test sets of popular datasets? Move over Occam! Rip Van Winkle's Razor gives nontrivial upper bounds on amount of overfit for popular architectures. Blog post + article with Yi Zhang
2
41
204
@prfsanjeevarora
Sanjeev Arora
5 years
Saliency maps give “human interpretability” to deep learning. NIPS18 paper ( @mrtz @goodfellow_ian @_beenkim ) showed they fail “sanity checks” involving model and data randomization. We fix saliency maps to pass sanity checks ("Competition for pixels")
1
42
199
@prfsanjeevarora
Sanjeev Arora
5 years
Day long event at the Institute for Advanced Study on Fri Feb 22. Deep Learning: Alchemy or Science? Speakers: Mike Collins, @ylecun , @zacharylipton , Joelle Pineau, Shai Shalev Schwartz. Will be livestreamed. Panel will respond to qs from worldwide audience via twitter.
4
48
198
@prfsanjeevarora
Sanjeev Arora
2 years
Giving three talks for ETH Zurich Paul Bernays Lecture 2022. "The quest for mathematical understanding of artificial intelligence." . This week's two talks are accessible to non-experts.
8
33
191
@prfsanjeevarora
Sanjeev Arora
4 years
Blog post on new mismatches between current theories of optimization and modern deep learning. Tiny Learning Rates don't hurt generalization. Surprising insight about fast mixing in landscape and what it means. New theory with @zhiyuanli_ and @vfleaking .
2
40
188
@prfsanjeevarora
Sanjeev Arora
4 years
Simons Foundation and NSF propose to spend $20M to fund projects on Mathematical and Scientific Foundations of Deep Learning An interesting public-private partnership to fund basic research.
0
28
176
@prfsanjeevarora
Sanjeev Arora
6 years
2nd article on Deep Learning Free Text embeddings that are easy, trivial and fast to implement, and compete quite well with way more complicated embeddings.
2
54
173
@prfsanjeevarora
Sanjeev Arora
2 years
Graduation celebrations are back! Glad to celebrate with @weihu_ , who's off to Asst. Prof position at @UMich this Fall.
Tweet media one
4
0
165
@prfsanjeevarora
Sanjeev Arora
6 years
My new paper (joint with Nadav Cohen and Elad Hazan) on the benefits of overparametrization is up . I recommend Nadav's nice blog post as a starting point:
2
38
165
@prfsanjeevarora
Sanjeev Arora
5 years
Visited the new @GoogleAI lab in Palmer Square, Princeton and enjoyed the excellent coffee with my colleague (and lab co-director) @HazanPrinceton . Exciting times for machine learning and AI in Princeton NJ!
Tweet media one
4
9
163
@prfsanjeevarora
Sanjeev Arora
5 years
Theory of Deep Learning: Where Next? Workshop @the_IAS Princeton Oct 15-18 2019. Great speaker lineup! Registration open. Contributed paper/talk/poster submission deadline Sept 2.
1
44
160
@prfsanjeevarora
Sanjeev Arora
5 years
Blog returns from summer. New article by Simon Du and Wei Hu on Neural Tangent Kernels (which capture the power of infinitely wide nets trained on finite datasets). . Watch out for more in coming weeks!
0
27
151
@prfsanjeevarora
Sanjeev Arora
5 years
Computing Convolutional Neural Tangent Kernels (CNTK) for 20-layer nets with pooling layer is computationally expensive and many people wrote to us wondering how it is feasible. Short answer: these students not only have great theory chops, but can also write CUDA!
@RuosongW
Ruosong Wang
5 years
We have released code for computing Convolutional Neural Tangent Kernel (CNTK) used in our paper "On Exact Computation with an Infinitely Wide Neural Net", which will appear in NeurIPS 2019. Paper: Code:
1
46
209
1
26
147
@prfsanjeevarora
Sanjeev Arora
4 years
Seminar series in theoretical ML is continuing online this summer at @the_IAS . Upcoming speakers: @mraginsky (today at 12:20pm!), Mike Jordan, Shankar Sastry, etc. Registration required.
1
23
144
@prfsanjeevarora
Sanjeev Arora
6 years
Introductory article on the generalization mystery of deep learning
0
54
144
@prfsanjeevarora
Sanjeev Arora
4 months
Congratulations on this overdue recognition of your amazing work @chrmanning !
@StanfordAILab
Stanford AI Lab
4 months
Congratulations to @StanfordAILab Director @chrmanning , awarded the 2024 IEEE John von Neumann Medal, one of @IEEEAwards ’s top awards “for outstanding achievements in computer-related science and technology”, for his advances in #NLProc .
Tweet media one
22
61
561
1
4
139
@prfsanjeevarora
Sanjeev Arora
4 years
Looking forward to talks by @wellingmax and Yoshua Bengio this week as our special year on machine learning, @the_IAS
@IASMLSeminars
IAS Seminar Series on Theoretical Machine Learning
4 years
This coming week Prof. @wellingmax of @UvA_Amsterdam and Prof. Yoshua Bengio of @umontreal @MILAMontreal will be speaking at @the_IAS as part of the Special Year on Theoretical #MachineLearning . Details and registration info at and .
Tweet media one
1
31
84
0
18
128
@prfsanjeevarora
Sanjeev Arora
10 months
Princeton Language and Intelligence Initiative looking for Research Scientists (PhD reqd). Foci: (i) Foundation Models, LLMs (ii) Applications of models to other disciplines. (iii) Understanding effects on society and mitigating harms. Lets Chat at ICML23?
1
34
132
@prfsanjeevarora
Sanjeev Arora
6 years
#IASMLyear Special year in machine learning, optimization and statistics 2019-20; Institute for Advanced Study. Visit with stipend for term or a year; shorter visits possible for industry folks. Apply by Dec 1
0
35
127
@prfsanjeevarora
Sanjeev Arora
6 years
How do you induce embeddings for a word from a single or few occurences? Simple method that also improves unsupervised sentence embeddings: A la carte embeddings. Also how diff. meanings of word reside inside its embedding (TACL )
1
43
126
@prfsanjeevarora
Sanjeev Arora
5 years
43% women among Princeton Computer Science majors (Engg track) in class of 2022, out of a total of 115! Outlier or part of a national trend?
9
5
122
@prfsanjeevarora
Sanjeev Arora
2 years
Five amazing expositions of zero-knowledge proofs by Amit Sahai of UCLA aimed at five v. different types of listeners. Heartening to see a mathy video rack up millions of views in a few weeks.
1
21
118
@prfsanjeevarora
Sanjeev Arora
4 years
Efficient Covid testing: how to test more patients with fixed number of test kits. Cool applications of math concepts we teach our students: coding theory, compressed sensing, etc.
1
35
122
@prfsanjeevarora
Sanjeev Arora
6 years
New blog post describes our new paper (with Rong Ge, Behnam Neyshabur, Yi Zhang) making progress on the generalization mystery of deep nets. The bounds are orders of magnitude better than recent papers.
2
37
117
@prfsanjeevarora
Sanjeev Arora
3 months
Amazing finding (wonder how much the compute budget was :)). Not surprising if you think about how fractals emerge in even simpler settings.
@jaschasd
Jascha Sohl-Dickstein
3 months
The boundary between trainable and untrainable neural network hyperparameter configurations is *fractal*! And beautiful! Here is a grid search over a different pair of hyperparameters -- this time learning rate and the mean of the parameter initialization distribution.
26
175
1K
1
13
116
@prfsanjeevarora
Sanjeev Arora
10 months
Very excited about this paper and its implications. Turing-completeness of transformers implies they can simulate other models inside them. But it's nontrivial a net can do gradient updates on another net insided them which is 1/8th the size. Great work by the student team!
@Abhishek_034
Abhishek Panigrahi
10 months
**New paper ** In-context learning was explained as simulate + train simple models at inference. We show a 2B model can run GD on an internal 125M model. Surprising simulation + AI safety implications! 1/5 w/ @SadhikaMalladi , @xiamengzhou , @prfsanjeevarora
2
49
242
2
21
108
@prfsanjeevarora
Sanjeev Arora
5 years
Panel discussion at 4:30pm in IAS workshop "Theory of Deep Learning: Where Next?" Panelists include @ylecun , @chrmanning , Srebro, Bottou, Collins, Kakade etc. Please tweet your questions for the panel in response to this.
6
18
111
@prfsanjeevarora
Sanjeev Arora
4 years
Postdoc positions in theoretical machine learning at Princeton CS Dept. Relevant faculty include Elad Hazan, Ryan Adams, Yoram Singer, and me. Mention in cover letter which faculty you are interested in. Best to apply by Dec 15; latest by Jan 10.
2
40
112
@prfsanjeevarora
Sanjeev Arora
5 years
The AIML group at Princeton University computer science
1
12
112
@prfsanjeevarora
Sanjeev Arora
2 years
Good to see the leader in this week's Economist about large language models. Covers questions many of the issues being discussed in AI/ML, including nature of "intelligence", huge training cost (and "rich getting richer"), scaling phenomena, geopolitics.
@TheEconomist
The Economist
2 years
“Foundation models” represent a breakthrough in artificial intelligence or AI. They are a new form of creative, non-human intelligence and promise to bring great benefits
11
44
111
4
15
103
@prfsanjeevarora
Sanjeev Arora
5 years
My talk at @mitidss on theory for contrastive unsupervised representation learning (word2vec-like methods popular for learning embeddings of images, text, molecules etc.). Paper (with amazing student group) is here . Blog post soon!
0
24
103
@prfsanjeevarora
Sanjeev Arora
2 months
I was quite curious what OpenAI's preparedness unit is working on, and @aleks_madry gave a good high-level view in our Princeton Alignment and Safety seminar Kudos to @SadhikaMalladi and @YangsiboHuang for interesting followup Q&A
2
12
99
@prfsanjeevarora
Sanjeev Arora
4 years
Looking forward to these talks by @aleks_madry (Tues 12:30EST) and Mike Jordan (Thurs 3pm) this week! Register to get zoom password.
@IASMLSeminars
IAS Seminar Series on Theoretical Machine Learning
4 years
This coming week Prof. @aleks_madry of @MIT_CSAIL and Prof. Michael I. Jordan of @berkeley_ai , will be speaking at @the_IAS as part of the Special Year on Theoretical #MachineLearning . Details and registration info at and .
Tweet media one
1
46
113
2
14
97
@prfsanjeevarora
Sanjeev Arora
2 years
Congratulations to @PeterShor1 and Dan Spielman for winning the breakthrough prize this year!
1
3
98
@prfsanjeevarora
Sanjeev Arora
3 months
Article on (i) theory of emergence of complex skills in LLMs (ii) SKILL-MIX eval -- shows LLMs able to use skills combos not seen during training. @QuantaMagazine 's thoroughness and quality exemplary! Quotes @geoffreyhinton . Video of related talk
@QuantaMagazine
Quanta Magazine
4 months
“Stochastic parrots” generate text only by combining information they have already seen, not through any understanding of their own. Are ChatGPT, Bard and other large chatbots simply parroting their training data? The answer is probably no.
8
50
152
2
9
99
@prfsanjeevarora
Sanjeev Arora
6 years
Was fun to give the Ahlfors lectures in the Harvard Math dept this week. Link from Harvard Crimson Slides for talk 1: Slides for talk 2:
2
20
95
@prfsanjeevarora
Sanjeev Arora
2 years
Nontrivial generalization bounds on deep nets are tough. PGDL competition (Neurips20) promoted empirical study of predictors of generalization error. Our ICLR22 spotlight aced PGDL testbed. Idea: estimate w synthetic data from GANs trained on training data
3
16
90
@prfsanjeevarora
Sanjeev Arora
5 years
2-day workshop "New Directions in Reinforcement Learning and Control" @the_IAS in Princeton Nov 7-8. Schedule and livestream here .
0
23
92
@prfsanjeevarora
Sanjeev Arora
5 years
Speaking tomorrow (Friday) 2pm in @icmlconf workshop on Theor. Physics in Deep Learning. Title: "Is Optimization a sufficient language to understand deep learning?" (Also, grad Orestis speaking Thurs 4pm about our work on word2vec-like methods for representation learning.)
2
7
91
@prfsanjeevarora
Sanjeev Arora
5 years
Congratulations to @tengyuma for honorable mention in 2018 ACM Doctoral Dissertation award! . Congrats also to @chelseabfinn and Ryan Beckett. Tengyu and Ryan were both Princeton grad students!
1
5
89
@prfsanjeevarora
Sanjeev Arora
5 years
Deep Learning #AlchemyOrScience at @the_IAS *this Friday* 9:50amEST livestream link: Tweet questions for speakers using #AlchemyOrScience , Speaker name.
@prfsanjeevarora
Sanjeev Arora
5 years
Deep Learning: Alchemy or Science? IAS February 22, 2019. Description: ; Agenda:
0
15
59
1
21
89
@prfsanjeevarora
Sanjeev Arora
6 years
Fantastic popular lecture by Stanford's Chris Manning on Natural Language Processing and Deep Learning. Best popular introduction I know of to the mysteries of language and how to teach machines to understand them.
0
29
85
@prfsanjeevarora
Sanjeev Arora
1 month
Very interesting papers @ZeyuanAllenZhu . This trick is very interesting. I recall hearing evidence OpenAI does label training data with source/provenance (the LLM sometimes spits out those memorized labels). Can't remember where/who I learnt this from
@ZeyuanAllenZhu
Zeyuan Allen-Zhu
1 month
Result 10/11/12: surprisingly, when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times! A simple fix: add domain tokens to your data; LLMs can auto-detect domains rich in knowledge and prioritize.
Tweet media one
18
40
288
3
9
86
@prfsanjeevarora
Sanjeev Arora
2 years
AI/ML winter school for Indian college profs Dec 12-23; @Infosys Mysore campus. Fully paid; apply by Nov 1 Thanks to @pulkitology @sameer_ @pathak2206 , @dineshjayaraman Shubhashis Banerjee, CV Jawahar, Chetan Arora @kvijayraghavan @AshokaUniv 1/2
7
22
83
@prfsanjeevarora
Sanjeev Arora
5 years
An overdue recognition! Congratulations Yoshua, Geoff and @ylecun !
@TheOfficialACM
Association for Computing Machinery
5 years
Yoshua Bengio, Geoffrey Hinton and Yann LeCun, the fathers of #DeepLearning , receive the 2018 #ACMTuringAward for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing today.
Tweet media one
26
1K
3K
0
4
80
@prfsanjeevarora
Sanjeev Arora
5 years
Nice succinct resource on optimization.
@HazanPrinceton
Elad Hazan
5 years
finished compiling lecture notes from my course on optimization for machine learning: (comments/suggestions welcome!)
9
147
660
0
10
81
@prfsanjeevarora
Sanjeev Arora
2 years
Cohen et al. 2021 showed that gradient descent in deep nets doesnt operate acc. to traditional optimization: it operates beyond "Edge of stability." New paper with @zhiyuanli_ @Abhishek_034 analyses GD beyond EoS and shows sharpness reduction benefit.
4
20
81
@prfsanjeevarora
Sanjeev Arora
5 years
Also will discuss this paper briefly in my talk at the DL theory workshop. The talk is about the importance of analysing optimization trajectories.
@nadavcohen
Nadav Cohen
5 years
Proof of convergence to global optimum for gradient descent on linear neural networks (joint w/ @prfsanjeevarora @Hoooway Noah Golowich) --- check it out tomorrow in #NeurIPS2018 DL theory workshop poster session (220D, 3PM)!
Tweet media one
1
7
39
0
19
76
@prfsanjeevarora
Sanjeev Arora
8 months
Research Software Engineer positions in AI! Enable core AI research & interdisciplinary applications at Princeton. SoTA GPU cluster with 300 Nvidia H100s. Attractive and collaborative work environment. Positions based in Princeton (but flexible work setup), starting asap.
@PrincetonPLI
Princeton PLI
8 months
We are hiring! We invite applications for our Research Software Engineer (RSE) position. Details:
Tweet media one
0
6
13
2
18
73
@prfsanjeevarora
Sanjeev Arora
6 months
Launching blog @PrincetonPLI with a post on skillmix. LLMs aren't just "stochastic parrots." @geoffreyhinton recently mentioned this as evidence that LLMs do "understand" the world a fair bit. More blog posts on the way! (Hinton's post here: )
@PrincetonPLI
Princeton PLI
6 months
We are excited to introduce the PLI Blog! First post by @prfsanjeevarora , "Are Language Models Mere Stochastic Parrots? The SkillMix Test Says NO."
Tweet media one
1
5
31
3
15
72
@prfsanjeevarora
Sanjeev Arora
6 years
Our paper on provably efficient algorithms for topic modeling finally appeared in CACM. Many people use these methods instead of older EM or MCMC approaches.
2
29
72
@prfsanjeevarora
Sanjeev Arora
1 year
Excited about this new work from our group. Local SGD will be increasingly important as distributed training strategies (with asynchronous) updates will allow more flexible training of large AI models. Great theory and experiments, kudos to @hmgxr128 and @vfleaking !
@hmgxr128
Xinran Gu
1 year
Local SGD, though designed to reduce communication, can generalize better than SGD! Our #ICLR2023 paper gives the first theoretical explanation of this phenomenon: local steps inject extra noise, driving the iterate to drift faster to flatter minima on the minimizer manifold. 1/4
2
32
186
3
4
71
@prfsanjeevarora
Sanjeev Arora
11 months
Fine-tuning language models using just forward pass! Our paper should interest you if you have enough GPU memory to evaluate your model but not enough for efficient backpropagation. Zeroth order optimization is an old idea but there are subtleties and tricks in making this work!
@SadhikaMalladi
Sadhika Malladi
11 months
Introducing MeZO - a memory-efficient zeroth-order optimizer that can fine-tune large language models with forward passes while remaining performant. MeZO can train a 30B model on 1x 80GB A100 GPU. Paper: Code:
Tweet media one
9
93
455
0
6
69
@prfsanjeevarora
Sanjeev Arora
6 years
Interesting survey of progress in analysis of nonconvex optimization in machine learning by Prateek Jain and Purushottam Kar.
1
26
72
@prfsanjeevarora
Sanjeev Arora
15 days
Great thesis by @ShunyuYao12 !
@ShunyuYao12
Shunyu Yao
15 days
I will present my thesis defense tomorrow! Language Agents: From Next-Token Prediction to Digital Automation - 10am EST on Thursday, May 2 - - WebShop, ReAct, ToT, CoALA - Briefly: SWE-bench/agent - Thoughts on the future of language agents
Tweet media one
26
55
671
0
6
71
@prfsanjeevarora
Sanjeev Arora
2 years
Skeptical about deep learning theory that uses continuous formulations (e.g. SDE) to reason about discrete Stochastic Gradient Descent? Don't miss this poster today.
@zhiyuanli_
Zhiyuan Li
2 years
Stochastic Differential Equation (SDE) has been widely used to model and understand SGD, e.g., the famous Linear Scaling Rule follows directly from it. But is this heuristic approximation really valid in deep learning practice? paper: 🧵(1/5)
Tweet media one
5
32
198
1
11
68
@prfsanjeevarora
Sanjeev Arora
3 months
Excited about our new work from @PrincetonPLI . Our grads never cease to amaze us. It's better to use just 5% of the instruction-tuning data (suitably selected) instead of the full dataset.
@xiamengzhou
Mengzhou Xia
3 months
Lots of instruction tuning data out there...but how to best adapt LLMs for specific queries? Don’t use ALL of the data, use LESS! 5% beats the full dataset. Can even use one small model to select data for others! Paper: Code: [1/n]
Tweet media one
13
98
435
0
2
68
@prfsanjeevarora
Sanjeev Arora
22 days
SWE-agent rocks
@PrincetonCS
Princeton Computer Science
1 month
Researchers @PrincetonPLI have created an autonomous AI software engineer that’s free and open source. 💻 Called SWE-agent, it uses an LLM, like GPT-4, to automatically fix coding problems in GitHub. 🤯 It can solve problems in about 90 seconds with high accuracy
Tweet media one
1
9
46
1
7
62
@prfsanjeevarora
Sanjeev Arora
6 years
Encoder-decoder GANs architectures still don't fix the theoretical problems in GANs framework such as mode collapse. Encoders may produce nonsense codes and the discriminator is none the wiser. Blog post and ICLR'18 paper
3
15
62
@prfsanjeevarora
Sanjeev Arora
2 months
Deepseek's new VLM is very impressive. But p 7 of mentions they trained on 1M books from "Annas Archive", i.e., illegal downloads. That's 100B very high-quality tokens.  Dark new world...
2
9
61
@prfsanjeevarora
Sanjeev Arora
2 months
Excited to see how SWE Bench from @PrincetonPLI group is now guiding use of AI for software engineering.
@cognition_labs
Cognition
2 months
Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is
5K
11K
46K
0
10
59
@prfsanjeevarora
Sanjeev Arora
2 months
LLMs can exhibit unsafe behaviors after fine-tuning on perfectly benign-looking data. To avoid this, it is best to ignore recommended fine-tuning best practices (eg on Llama2). TL;DR: fine-tune without the recommended safety prompt, but use the safety prompt at inference.
@vfleaking
Kaifeng Lyu
2 months
Fine-tuning can improve chatbots (e.g., Llama 2-Chat, GPT-3.5) on downstream tasks — but may unintentionally break their safety alignment. Our new paper: Adding a safety prompt is enough to largely mitigate the issue, but be cautious about when to add it!
Tweet media one
4
18
74
1
7
59
@prfsanjeevarora
Sanjeev Arora
6 years
Our paper on generalization bounds for deep nets (joint with Rong Ge, Behnam Neyshabur, and Yi Zhang) is here Is uses a new approach based upon direct compression. See also my blog post on
0
26
57
@prfsanjeevarora
Sanjeev Arora
1 year
NSF funding large projects in infrastructure for computing. Deep learning (eg foundation models) an obvious use. Hoping universities are looking at this. Contact me if you need Princeton as partner.
1
6
58
@prfsanjeevarora
Sanjeev Arora
1 year
Very nice paper indeed. Learnt a lot from it.
@finbarrtimbers
finbarr
1 year
The Chinchilla paper is one of my favorite papers of the last few years I love that they actually came up with a law for training models. Very few papers bold enough to make that claim & back it up with excellent experiments
Tweet media one
7
54
564
2
8
57
@prfsanjeevarora
Sanjeev Arora
5 months
I love this paper. Current coding evals do not test on complicated problems involved in real-life software engineering. Great post!
@PrincetonPLI
Princeton PLI
5 months
In our second PLI Blog post, authors @_carlosejimenez and @jyangballin describe testing LLMs for challenges that software engineers face everyday. Read it here:
Tweet media one
0
10
40
0
10
56
@prfsanjeevarora
Sanjeev Arora
6 years
Slides and links for my Plenary lecture at International Congress of Mathematicians 2018. (Thanks to Assaf Naor for the picture).
3
15
57
@prfsanjeevarora
Sanjeev Arora
1 month
Yeah the grads here are amazing. Every day at work is a treat.
@deliprao
Delip Rao e/σ
1 month
Proof that you don't need Olympiad golds for building towards a better Devin if you have open source. (although at Princeton, Olympiad medalists are so commonplace that even if they exist, they don't bother to mention them)
Tweet media one
36
44
409
0
0
54