Jesse Mu Profile Banner
Jesse Mu Profile
Jesse Mu

@jayelmnop

4,904
Followers
583
Following
120
Media
582
Statuses

Computational linguistics @AnthropicAI

Don't wanna be here? Send us removal request.
@jayelmnop
Jesse Mu
1 year
I've found the killer app of large language models.
Tweet media one
Tweet media two
Tweet media three
57
514
4K
@jayelmnop
Jesse Mu
4 years
The machine learning research process
Tweet media one
@yin_psyched
Yin Chen, M.S.
4 years
I can’t stop laughing at this.
Tweet media one
133
6K
29K
13
340
2K
@jayelmnop
Jesse Mu
1 year
Since prompting, instruction tuning, RLHF, ChatGPT etc are such new and fast-moving topics, I haven't seen many university course lectures covering this content. So we made some new slides for this year's CS224n: NLP w/ Deep Learning course at @Stanford !
Tweet media one
20
292
2K
@jayelmnop
Jesse Mu
1 year
PSA to anyone who wants to write an op-ed criticizing LLMs (yes, including Noam Chomsky): if you're going to come up with hypothetical failure cases for LLMs, at a minimum, please actually check that your case fails with a modern LLM
Tweet media one
Tweet media two
Tweet media three
31
88
870
@jayelmnop
Jesse Mu
2 years
I am announcing the Perverse Scaling Prize: a $1.14 USD prize for tasks which exhibit any of the following scaling curves
Tweet media one
@EthanJPerez
Ethan Perez
2 years
We’re announcing the Inverse Scaling Prize: a $100k grand prize + $150k in additional prizes for finding an important task where larger language models do *worse*. Link to contest details: 🧵
Tweet media one
48
313
2K
9
57
747
@jayelmnop
Jesse Mu
1 year
Excited to present 3 #NeurIPS2022 papers on a trend I've been very excited about recently: blurring the boundaries between language models and RL agents (+a bonus 4th paper on active learning!) 🧵(0/7) PS: I'm on the industry job market!
Tweet media one
9
87
685
@jayelmnop
Jesse Mu
1 year
Prompting is cool and all, but isn't it a waste of compute to encode a prompt over and over again? We learn to compress prompts up to 26x by using "gist tokens", saving memory+storage and speeding up LM inference: (w/ @XiangLisaLi2 and @noahdgoodman ) 🧵
14
119
602
@jayelmnop
Jesse Mu
2 years
TIL in 2009 two Berkeley undergrads flipped a coin *40,000* times (1hr/day for a semester) to see whether a coin flip was truly random (it's biased towards the side facing up pre-flip!) Gives a new meaning to the term "undergraduate research project"...
Tweet media one
8
73
512
@jayelmnop
Jesse Mu
1 year
Life update: this week I joined the Alignment team @AnthropicAI ! I’m starting part-time for now as I finish up my PhD at Stanford. Excited to work on making large language models safer and more aligned!
27
9
485
@jayelmnop
Jesse Mu
2 months
We’re hiring for the adversarial robustness team @AnthropicAI ! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)
Tweet media one
4
72
462
@jayelmnop
Jesse Mu
4 years
New preprint with @jacobandreas : we generate explanations of the individual neurons inside deep neural networks by identifying *compositional logical concepts* that closely approximate neuron behavior (e.g. "water that isn't blue") (1/5)
Tweet media one
Tweet media two
5
113
459
@jayelmnop
Jesse Mu
8 months
My lecture on prompting, instruction tuning, and RLHF for Stanford's CS224n course is (finally!) available online:
@jayelmnop
Jesse Mu
1 year
Since prompting, instruction tuning, RLHF, ChatGPT etc are such new and fast-moving topics, I haven't seen many university course lectures covering this content. So we made some new slides for this year's CS224n: NLP w/ Deep Learning course at @Stanford !
Tweet media one
20
292
2K
4
90
436
@jayelmnop
Jesse Mu
1 year
@AlexReibman Whew! Time to go back to my day job of solving leetcode #42 (trapping rain water) and #1330 (reverse subarray to maximize array value)
0
2
405
@jayelmnop
Jesse Mu
1 year
New LM eval just dropped—Google has no moat??
Tweet media one
5
29
360
@jayelmnop
Jesse Mu
4 months
Seeing some confusion like: "You trained a model to do Bad Thing, why are you surprised it does Bad Thing?" The point is not that we can train models to do Bad Thing. It's that if this happens, by accident or on purpose, we don't know how to stop a model from doing Bad Thing 1/5
@AnthropicAI
Anthropic
4 months
New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through.
Tweet media one
128
582
3K
11
39
335
@jayelmnop
Jesse Mu
2 years
Excited to share my work from my internship at @MetaAI : improving exploration in RL with language abstractions! Paper: 🧵 (1/8)
5
48
289
@jayelmnop
Jesse Mu
3 months
Achievement unlocked ✅, thanks for the shout-out @karpathy !
Tweet media one
@karpathy
Andrej Karpathy
3 months
New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and
Tweet media one
383
2K
14K
3
7
256
@jayelmnop
Jesse Mu
3 years
Anonymous Reviewer #2 's list of missing references
Tweet media one
3
8
242
@jayelmnop
Jesse Mu
2 years
Stable Diffusion Telephone: take an image, generate a likely prompt with CLIP interrogator (), feed the prompt back into Stable Diffusion, rinse and repeat
10
32
237
@jayelmnop
Jesse Mu
4 years
GPT-4: Language Models are Fully Autonomous Vehicles
@jeffbigham
hci.social/@jbigham
4 years
i think i'm going to wait until GPT-4 to upgrade. seems like a mid-cycle release. trillion parameters or bust.
4
12
179
6
14
210
@jayelmnop
Jesse Mu
1 year
Something I didn't fully understand until recently— Imagine FLOPs for 2 transformer fwd passes with 1 input token - w/ no KV cache - w/ a 2K length KV cache Decoding w/ a 2K length KV cache (w/ no optimizations) is only ~10% more FLOPs than no KV cache Feedforward is pricey!
Tweet media one
6
18
212
@jayelmnop
Jesse Mu
22 days
Tweet media one
2
20
204
@jayelmnop
Jesse Mu
4 years
Compositional Explanations of Neurons will be an oral presentation at #NeurIPS2020 !
@jayelmnop
Jesse Mu
4 years
New preprint with @jacobandreas : we generate explanations of the individual neurons inside deep neural networks by identifying *compositional logical concepts* that closely approximate neuron behavior (e.g. "water that isn't blue") (1/5)
Tweet media one
Tweet media two
5
113
459
3
28
189
@jayelmnop
Jesse Mu
2 years
Hey, can I borrow your marker? CS academics: Sure! If you found this helpful, please cite @misc {fleming2022, title={EXPO Black Dry Erase Marker, Chisel Tip}, author={Fleming, Sam}, year={2022}, eprint={2407.1242}, primaryClass={cs.OS - Office Supplies} }
4
12
181
@jayelmnop
Jesse Mu
1 year
Playing games with #ChatGPT : 1. Tic-Tac-Toe
Tweet media one
Tweet media two
Tweet media three
4
13
175
@jayelmnop
Jesse Mu
2 years
"deep learning is easy" this is how slack decides when to send a notification
Tweet media one
11
21
166
@jayelmnop
Jesse Mu
1 year
Gist model checkpoints are now up on @huggingface . Give it a try and see what prompts you can (or can't) compress! LLaMA-7B (weight diff only): FLAN-T5-XXL: Code:
@jayelmnop
Jesse Mu
1 year
Prompting is cool and all, but isn't it a waste of compute to encode a prompt over and over again? We learn to compress prompts up to 26x by using "gist tokens", saving memory+storage and speeding up LM inference: (w/ @XiangLisaLi2 and @noahdgoodman ) 🧵
14
119
602
0
31
158
@jayelmnop
Jesse Mu
2 years
prompt engineering is such a brittle and hacky way to use my half-trillion param black box LM I trained on reddit shitposts via adamw (not adam) cyclic lr=5e-5 (5e-4 too high) rotary positional embs (sinusoidal embs no good) batch size set to 124x the number of a100s on the clust
@gneubig
Graham Neubig
2 years
Recently some complain about prompting as an approach to NLP. "It's so brittle." "Prompt engineering is hacky." etc. But there's another way to view it: prompt engineering is another way of tuning the model's parameters, and human interpretable! See 1/2
Tweet media one
4
99
567
5
4
144
@jayelmnop
Jesse Mu
4 years
This week's @stanfordnlp seminar Thursday 10am PT: Diyi Yang ( @Diyi_Yang ) from Georgia Tech will speak on "When Social Context Meets NLP: Learning with Less Data and More Structures"! Open to the public - register at
Tweet media one
4
22
102
@jayelmnop
Jesse Mu
2 years
The year is 2053. The 10k most popular words in the English dictionary have all been claimed and implemented as Huggingface python packages. To complete basic daily activities you open a REPL: >>> import food >>> import reading >>> import toothbrush
4
3
101
@jayelmnop
Jesse Mu
5 months
I'll be presenting Gist tokens, our new approach to LLM prompt compression, tomorrow (Thursday) morning at #NeurIPS2023 Great Hall & Hall B1+B2 (level 1) #604 , 10:45–12:45 CST Stop by!
Tweet media one
@jayelmnop
Jesse Mu
1 year
Prompting is cool and all, but isn't it a waste of compute to encode a prompt over and over again? We learn to compress prompts up to 26x by using "gist tokens", saving memory+storage and speeding up LM inference: (w/ @XiangLisaLi2 and @noahdgoodman ) 🧵
14
119
602
0
15
99
@jayelmnop
Jesse Mu
4 months
Even as someone relatively optimistic about AI risk, working on this project was eye-opening. For example, I was almost certain that red-teaming the model for Bad Thing would stop the model from doing Bad Thing, but it just ended up making the model do Bad Thing more 🫠 5/5
3
8
93
@jayelmnop
Jesse Mu
1 year
I’m now seeing name.gpt Twitter names instead of name.eth. It’s over, the bubble has burst
4
1
89
@jayelmnop
Jesse Mu
1 year
Problem w/ this article is the cited LM evals are misleading: they don't measure frontier capabilities but a very narrow task distr. Claims that closed LMs have no moat must evaluate OSS models on actual knowledge work, not stuff like "name a restaurant"
2
9
88
@jayelmnop
Jesse Mu
1 year
#ChatGPT is not as creative here:
Tweet media one
Tweet media two
Tweet media three
5
0
89
@jayelmnop
Jesse Mu
4 years
I'm presenting work on regularizing visual representations with language at the #NeurIPS2019 ViGIL workshop today (Friday) - West Hall 202 @ 12:10pm - joint work with Percy Liang and Noah Goodman - stop by!
Tweet media one
2
14
82
@jayelmnop
Jesse Mu
2 years
With all of the astonishing neural net announcements coming out multiple times a week these days, it's worth keeping in mind that Spam Detection—*the* introductory example used in every blogpost, Youtube tutorial, and ML/NLP undergrad course—is *still* not solved
4
4
82
@jayelmnop
Jesse Mu
6 months
I'll be at #NeurIPS2023 this year! I've been having a lot of fun at Anthropic—excited to chat about (1) what it's like to work here, and (2) research topics including alignment, red-teaming, language+RL, and more
@sleepinyourhat
Sam Bowman
6 months
If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around. Expression of interest form here:
Tweet media one
2
21
200
2
3
78
@jayelmnop
Jesse Mu
4 years
This Friday 9/11 at 12pm PDT I'm giving a talk at Deep Learning: Classics and Trends on Compositional Explanations of Neurons () - open to the public! More info: Mailing list + zoom link:
Tweet media one
3
26
75
@jayelmnop
Jesse Mu
1 year
The 2-layer MLP I wrote in Matlab as part of Andrew Ng's ML coursera course back in 2017 achieved parity with Bard on the XOR problem
Tweet media one
2
4
70
@jayelmnop
Jesse Mu
3 years
In our new @StanfordAILab blog post, @ShikharMurty and I discuss the problem of training deep learning models with natural language explanations for tasks in vision, NLP, and beyond:
@StanfordAILab
Stanford AI Lab
3 years
Language is a powerful mechanism for people to communicate goals, beliefs and concepts. Can we use language to train machine learning models? Read our new blog post on Learning from Language Explanations:
1
44
127
0
13
65
@jayelmnop
Jesse Mu
2 years
> is emeritus ML/stats prof > uses sample size of n = 1 to extrapolate to 2 million female graduate students
@pmddomingos
Pedro Domingos
2 years
Corridor conversation: Me: What discrimination have you experienced? Female grad student: [Thinks a while] I can't think of any, but the literature says there is, so there must be.
9
4
58
4
2
66
@jayelmnop
Jesse Mu
1 year
@scalo This is Claude from @AnthropicAI
3
0
63
@jayelmnop
Jesse Mu
1 year
To be clear, there *is* lots of disruption potential from smaller, on-device, OSS LMs. Siri doesn't need PaLM-540B to figure out how to turn on your lights. But to say that these foundation models are made completely irrelevant by someone fitting LLaMA-7B on an iPhone is silly
5
2
59
@jayelmnop
Jesse Mu
2 years
@jachiam0 Isn't there still a question mark about the economic implications of DALL-E/GPT-3/PaLM/etc? It *feels* valuable but I'm not seeing any self-sustaining business models yet (besides cool prototypes, e.g. copilot) I think the fate of LMaaS startups (OpenAI, Cohere) will tell?
11
4
55
@jayelmnop
Jesse Mu
1 month
Another thorny safety challenge for LLMs. Like Sleeper Agents (), @cem__anil has found behavior that is stubbornly resistant to finetuning. Training on MSJ shifts the intercept, but not the slope, of the relationship b/t # of shots and attack efficacy.
Tweet media one
@AnthropicAI
Anthropic
1 month
New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here:
Tweet media one
83
350
2K
3
5
56
@jayelmnop
Jesse Mu
2 years
Interested in human/machine communication? Excited to share 2 papers at #NeurIPS2021 : 1/ Emergent Communication of Generalizations, poster A0 Fri 12/10 8:30-10a PT 2/ Multi-party Referential Communication in Complex Strategic Games @ MiC workshop, Mon 12/13 1:15p PT read on ⬇
Tweet media one
1
8
53
@jayelmnop
Jesse Mu
3 years
“Why did you decide to do a remote internship in NYC?”
Tweet media one
1
1
52
@jayelmnop
Jesse Mu
8 months
Tweet media one
2
0
51
@jayelmnop
Jesse Mu
2 years
web3 is incredibly effective, fatalistic branding. We should be calling machine learning stats2 and deep learning stats3
3
5
49
@jayelmnop
Jesse Mu
4 years
Our last @stanfordnlp seminar before Thanksgiving features Yonatan Bisk ( @ybisk ) from CMU: "Language Should be Embodied—But what does that mean?" Thursday 10am PT / open to the public / register at ! #robonlp #nlproc
Tweet media one
1
19
49
@jayelmnop
Jesse Mu
2 years
Seeing more tweets from people who are convinced that the DALL-E generations are simply too good/curated to be real, and that there’s a human behind the scenes. Wondering if this is the birth of an “AI isn’t real” conspiracy movement that’ll only grow stronger in the future
2
3
47
@jayelmnop
Jesse Mu
4 months
Forgetting about deceptive alignment for now, a basic and pressing cybersecurity question is: If we have a backdoored model, can we throw our whole safety pipeline (SL, RLHF, red-teaming, etc) at a model and guarantee its safety? Our work shows that in some cases, we can't 2/5
2
1
48
@jayelmnop
Jesse Mu
4 years
For this week's @stanfordnlp seminar Thursday 10am PT, we'll have Mikel Artetxe ( @artetxem ) of Facebook AI Research speaking on Unsupervised Machine Translation! Open to the public - non-Stanford affiliates register at
Tweet media one
0
11
46
@jayelmnop
Jesse Mu
4 years
For this week's @stanfordnlp seminar Thursday 10am PT, excited to have Yoon Kim (MIT-IBM Watson/incoming MIT EECS prof) speaking on Deep Unsupervised Learning of Syntactic Structure. Open to the public - non-Stanford affiliates register at
Tweet media one
4
10
44
@jayelmnop
Jesse Mu
2 years
the "import torch" to "import openai" researcher pipeline
0
0
44
@jayelmnop
Jesse Mu
2 years
@pfau The internet has also decided that lo-fi/pixelated/sloppy memes tend to be funnier
2
1
43
@jayelmnop
Jesse Mu
4 years
Excited to kick-off virtual @stanfordnlp seminars () Thursdays 10am PT - open to the public! We'll first be hearing from @_jessethomason_ on "From Human Language to Agent Action". Non-Stanford affiliates, register for Zoom link:
Tweet media one
1
6
43
@jayelmnop
Jesse Mu
3 years
Very excited for our last @stanfordnlp seminar of the year: Melanie Subbiah (now @ColumbiaCompSci , prev @OpenAI ) on GPT-3: Few-shot Learning with a Giant Language Model. Thursday 10am PT / open to the public / register at ! #gpt3 #NLProc
Tweet media one
1
12
40
@jayelmnop
Jesse Mu
1 year
1️⃣ Improving Intrinsic Exploration with Language Abstractions Using language abstractions to guide exploration in RL, e.g. by self-designing a curriculum of increasingly difficult language goals Also see @ykilcher review: (1/7)
@jayelmnop
Jesse Mu
2 years
Excited to share my work from my internship at @MetaAI : improving exploration in RL with language abstractions! Paper: 🧵 (1/8)
5
48
289
1
6
40
@jayelmnop
Jesse Mu
2 years
Really enjoyed speaking to @ykilcher about our recent work on RL exploration w/ language () - check out the interview below, as well as the excellent paper review here:
@ykilcher
Yannic Kilcher 🇸🇨
2 years
Check out this interview with Jesse Mu, author of "Improving Intrinsic Exploration with Language Abstractions"! Simple idea, big impact: Adding natural language really helps intrinsic exploration in reinforcement learning💪Watch to find out more:
Tweet media one
1
19
101
1
9
38
@jayelmnop
Jesse Mu
2 years
Classic active learning problems become more interesting when combined w/ the few-shot abilities of foundation models: in tiny datasets, the kind of data you train on matters a lot. Check out our work (w/ @AlexTamkin , Salil Deshpande, Dat Nguyen, Noah Goodman) towards this end!
@AlexTamkin
Alex Tamkin 🦣
2 years
How can we choose examples for a model that induce the intended behavior? We show how *active learning* can help pretrained models choose good examples—clarifying a user's intended behavior, breaking spurious correlations, and improving robustness! 1/
Tweet media one
4
62
259
0
7
38
@jayelmnop
Jesse Mu
4 months
Backdoored models may seem far-fetched now, but just saying "just don't train the model to be bad" is discounting the rapid progress made in the past year poisoning the entire LLM pipeline, including human feedback [1], instruction tuning [2], and even pretraining [3] data. 3/5
1
1
38
@jayelmnop
Jesse Mu
2 months
If this sounds fun, we’d love to chat! Please email {jesse,ethan,miranda} at anthropic dot com with [ASL-3] in the subject line, a paragraph about why you might be a good fit, and any previous experience you have. We will read (and try to respond to) every message we get!
2
0
35
@jayelmnop
Jesse Mu
2 months
I too have gotten Claude 3 to vertically center a <div>
@DanielJLosey
Daniel Losey 🔀
2 months
I quite literally did the work of 50 front-end web developers working for a week in one night thanks to Claude 3.
35
31
671
3
0
34
@jayelmnop
Jesse Mu
2 years
Tweet media one
2
0
35
@jayelmnop
Jesse Mu
4 years
🗣️ Can neural networks learn pragmatics via multi-agent communication, w/o explicit pragmatic reasoning? Yes! Check out our amortized Rational Speech Acts (RSA) model at #CogSci2020 (w/ Julia White, Noah Goodman) paper: talk/qa: Sat Aug 1, 2pm ET/6pm UTC
Tweet media one
0
8
33
@jayelmnop
Jesse Mu
5 years
#Stanford #AISalon with Chris Re ( @HazyResearch ), @JeffDean . Overwhelming themes of the day: huge, multi-task models, weak supervision, transfer learning
Tweet media one
1
4
33
@jayelmnop
Jesse Mu
1 year
@sleepinyourhat New LM benchmark is Inception score on rendered SVG generations New txt2img benchmark is BERTScore after OCRing the image generated from the prompt "first page of a newly discovered Shakespearean play, 1597, 35mm film photograph, colorized"
2
0
33
@jayelmnop
Jesse Mu
2 years
@Bam4d @MetaAI I'm in the same boat as Chris, travel cancelled last minute by Meta (fmr intern). I also had to pay out of pocket for registration. @EMostaque @nathanbenaich any chance of support? Can provide verification. @kchonyc any chance @NeurIPSConf could support (eg waiving registration)?
3
2
33
@jayelmnop
Jesse Mu
3 years
🗣️ New work on emergent communication! Agents trained on Lewis reference games develop successful but uninterpretable language. Meanwhile, our language conveys ideas&categories, not just ref exps. We propose communicating over *sets* of objects, increasing compositionality (1/5)
Tweet media one
Tweet media two
1
2
32
@jayelmnop
Jesse Mu
5 months
If you happened upon a tin of Cafe Du Monde coffee at #NeurIPS2023 , allow extra time at the airport for TSA to double/triple check it
3
1
32
@jayelmnop
Jesse Mu
4 months
[1] [2] [3] [4] and, of course, deceptive alignment! 4/5
1
3
31
@jayelmnop
Jesse Mu
1 year
3. Hangman
Tweet media one
Tweet media two
Tweet media three
1
1
30
@jayelmnop
Jesse Mu
2 years
Speculating on LMs, anonymity, and the internet 1/
Tweet media one
2
2
30
@jayelmnop
Jesse Mu
1 year
Covers GPT 1-3, in-context learning, (zero-shot) chain-of-thought, instruction finetuning, RLHF (w/ an intro to RL for the uninitiated), constitutional AI, etc, and discusses pros/cons of alignment methods Hope it can be helpful, and please let me know if you spot any errors!
3
0
29
@jayelmnop
Jesse Mu
1 year
There's a lot more work to be done here: parameter-efficient gisting, compressing longer prompts, etc Paper: Code: This was a joint effort with @XiangLisaLi2 and @noahdgoodman . Also thx to the Stanford Alpaca team, esp. @lxuechen !
0
3
28
@jayelmnop
Jesse Mu
3 years
I'm presenting Compositional Explanations of Neurons (w/ @jacobandreas ) at #NeurIPS2020 on Thursday: 🗣️ oral 6:30am PST/2:30pm GMT (track 28 deep learning) 📜 poster 9-11am PST/5-7pm GMT (gather town deep learning C3 - spot A3) Stop by!
@jayelmnop
Jesse Mu
4 years
New preprint with @jacobandreas : we generate explanations of the individual neurons inside deep neural networks by identifying *compositional logical concepts* that closely approximate neuron behavior (e.g. "water that isn't blue") (1/5)
Tweet media one
Tweet media two
5
113
459
0
5
28
@jayelmnop
Jesse Mu
4 years
For this week's @stanfordnlp seminar Thursday 10am PST, Douwe Kiela ( @douwekiela ) from Facebook AI Research will present "Rethinking Benchmarking in AI" and the Dynabench platform ()! Open to the public - non-Stanford registration at
Tweet media one
0
6
27
@jayelmnop
Jesse Mu
2 months
People on the team right now include me and: - @EthanJPerez - @megtong_ - @ZhongRuiqi - @MrinankSharma We exist under the broader Alignment Science team led by Sam Bowman ( @sleepinyourhat ), which has too many awesome colleagues to count.
1
1
28
@jayelmnop
Jesse Mu
2 years
I want SpamBERT to win Best Paper Award at ACL 2023
1
2
27
@jayelmnop
Jesse Mu
1 year
2. Chess (ChatGPT is a sore loser)
Tweet media one
Tweet media two
Tweet media three
2
1
27
@jayelmnop
Jesse Mu
4 years
Tweet media one
1
0
25
@jayelmnop
Jesse Mu
1 year
3️⃣ STaR: Bootstrapping Reasoning with Reasoning (led by @ericzelikman , @Yuhu_ai_ ) Improving multistep reasoning in LMs by bootstrapping off of self-generated rationales Essentially doing RL in chain-of-thought rationale space! (3/7)
@Yuhu_ai_
Yuhuai (Tony) Wu
2 years
Language models can dramatically improve their reasoning by learning from chains of thought that they generate. With STaR, just a few worked examples can boost accuracy to that of a 30X larger model (GPT-J to GPT-3). W. @ericzelikman , Noah Goodman 1/
Tweet media one
8
93
525
2
5
23
@jayelmnop
Jesse Mu
1 year
To recap, I see LMs and RL converging from 2 directions: RL➡️?⬅️LMs Starting from RL: imbuing agents w/ language priors [1️⃣,2️⃣] Starting from LMs: improving reasoning not from static corpora, but RL exploration & interaction [3️⃣] Excited for these paths to intertwine! (5/7)
1
4
23
@jayelmnop
Jesse Mu
2 years
@Miles_Brundage "OK {Mozart,The Beatles,Kendrick,...} made a good song but it was cherrypicked. Show me all the bad outputs"
1
0
22
@jayelmnop
Jesse Mu
1 year
I'm on the job market! Mostly industry (+startups). Interested in both traditional RS positions *and* applied roles deploying products to users and improving from feedback. Please DM or reach out at NeurIPS! (Also reach out in general, happy to chat about anything) (6/7)
2
2
21
@jayelmnop
Jesse Mu
2 months
Context: we’ve been pushing towards our ASL (AI Safety Level) safety commitments under our Responsible Scaling Policy—think about this as a “sprint on safety.” Red-teaming and adversarial robustness are a major part of this story.
Tweet media one
1
1
21
@jayelmnop
Jesse Mu
2 years
@NandoDF Now have it explain this
Tweet media one
2
1
20
@jayelmnop
Jesse Mu
1 year
4️⃣ (bonus!) Active Learning Helps Pretrained Models Learn the Intended Task (led by @AlexTamkin ) Revisiting classic active learning techniques in the context of modern foundation models and few-shot task ambiguity (4/7)
@AlexTamkin
Alex Tamkin 🦣
2 years
How can we choose examples for a model that induce the intended behavior? We show how *active learning* can help pretrained models choose good examples—clarifying a user's intended behavior, breaking spurious correlations, and improving robustness! 1/
Tweet media one
4
62
259
1
4
20
@jayelmnop
Jesse Mu
2 years
For your next icebreaker, try two truths and a lie, except after your initial guess, one of the other choices is revealed to be true, and you all debate whether it’s beneficial to change your initial guess
0
0
20
@jayelmnop
Jesse Mu
5 years
"Broader Context Improves Metaphor Identification" with Ekaterina Shutova, @HYannakoudakis accepted to #naacl2019 !
1
2
19
@jayelmnop
Jesse Mu
5 years
@kaifulee . @Susan_Athey explains requirements for AI to break out of existing “narrow” domains: (1) efficient causal/counterfactual reasoning from small data (vs chugging away at ImageNet) (2) interprerability, robustness, trust.
Tweet media one
1
2
20
@jayelmnop
Jesse Mu
1 year
4. Blackjack (not bad!)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
0
19
@jayelmnop
Jesse Mu
2 years
Tweet media one
1
0
19
@jayelmnop
Jesse Mu
2 months
Some examples of what our team has been up to (1/2): 1. Understanding and mitigating gradient-based attacks 2. Multimodal adversarial robustness 3. Interp-based safety interventions (working with our *world-class* interpretability team)
1
0
18
@jayelmnop
Jesse Mu
3 years
Check out our work (w/ @rose_e_wang , Julia White, Noah Goodman) on training better and more pragmatic LMs by communicating with ensembles of listeners—to appear in #EMNLP2021 Findings!
@rose_e_wang
Rose
3 years
How do we train language models (LMs) to be good pragmatic conversational partners? We investigate this in our #EMNLP2021 Findings paper: Calibrate your listeners! Robust communication-based training for pragmatic speakers. 📜: 📺:
2
9
60
0
2
18
@jayelmnop
Jesse Mu
2 months
Like any RS/RE role, writing papers is part of the job, but the biggest pitch for our team is the impact you'll have beyond that. Anthropic is (still!) small and nimble, and there are many fun opportunities to collaborate across product, T&S, and policy.
1
0
18
@jayelmnop
Jesse Mu
3 years
Eric is doing very prescient work on vulnerabilities in production NLP systems - this will be super interesting!
@stanfordnlp
Stanford NLP Group
3 years
Eric Wallace ( @Eric_Wallace_ ) from UC Berkeley presents at this week's Stanford NLP Seminar on vulnerabilities in NLP models. Thursday Mar 4 10am PT, open to the public! Sign up at
Tweet media one
0
15
70
0
7
18
@jayelmnop
Jesse Mu
1 year
2️⃣ Improving Policy Learning with Language Dynamics Distillation (led by @hllo_wrld ) Increasing RL sample efficiency by pretraining agents to model env dynamics from language-annotated demonstrations (2/7)
@hllo_wrld
Victor Zhong
2 years
Our latest reading to learn paper Language Dynamics Distillation will appear at #NeurIPS2022 ! In LDD, we pretrain the agent to read to model env dynamics. LDD improves generalization on 5 distinct language grounding envs over naive RL, VAE, inverse RL. 🧵
Tweet media one
Tweet media two
2
12
51
1
3
18