Prithviraj (Raj) Ammanabrolu @rajammanabrolu Twitter profile

Pinned Tweet

Prithviraj (Raj) Ammanabrolu

6 months

The PEARLS Lab at @UCSD_CSE is now open for business! I'm recruiting Fall 24 PhD students in all things interactive and grounded AI, RL, and NLP!! Join us in the land of 🏖️ beach (🧋pearl tea included). Apply by Dec 20. Please help spread the word! More:

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

11 months

Soon™, I'll be an Asst Prof @UCSanDiego @UCSD_CSE focusing on interactive & grounded AI, RL, NLP I will also be a research scientist @MosaicML helping lead efforts to make tech like RLHF more accessible Looking for PhD students & research eng/scientists to join me in ☀️SoCal🏖️

75

41

548

8

65

242

Last Seen Profiles

@nikif

@hardwoodflms

@sdneph

@lilsasquatch66

@M8shh777

@vanjan

@_AnnaBeahm

@NenaDeSandro

@OscarMacz693674

@lysmarieetv

@GettingOverCast

@lstone345

@YouthCampCenter

@worcesterparkfc

@elam_taron

@Dominic_2534

@iu_aragon

@PeterMelnick

@echodagger

@TCEOPERA

@FBIWFO

@cahayanur001

@l1tfr

@TGOT_msp

@traverage

@TB_Facts

@Noviantolim_

@NavoSchmavo

@Dorsey2022

@CatherineA47203

@Articvlc

@Aboalhussa76579

@TLH_Football

@aa1jf

@StockCharts

@turnodetardeCEX

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

11 months

Soon™, I'll be an Asst Prof @UCSanDiego @UCSD_CSE focusing on interactive & grounded AI, RL, NLP I will also be a research scientist @MosaicML helping lead efforts to make tech like RLHF more accessible Looking for PhD students & research eng/scientists to join me in ☀️SoCal🏖️

75

41

548

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

Using GPT-3 instead of regex

James Farmer

@JamesFarmer87

4 years

Well this has made my day.

173

6K

42K

2

44

456

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

I haven't been home in years. I stay up at night thinking of all the people I'll never see again. I'd like to have a home to go back to. All I can do is donate/RT so I'm boosting #CovidIndia posts that can help. If this bothers you, pls mute/unfollow. Don't send me DMs like this

22

18

430

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing 💻RL4LMs: library to train any @huggingface LM w/ RL 👾GRUE: benchmark of 6 NLP tasks+rewards 📈NLPO: new RL alg 4 LMs 🌐

6

115

417

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

If it doesn't work with seed 42, it'll never work.

8

254

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

I have a language modeling joke, but it's too dangerous to be released.

Ida Momennejad

@criticalneuro

4 years

I have a reinforcement learning joke, but not sure it's rewarding.

31

75

852

3

16

244

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Why do ML academics have such knee jerk reactions to writing rules or engines to ground and control an ML system? "It won't work in the real world" is such an unsubstantiated argument. Have you ever actually put an ML system in production?? How do you think those work???

9

17

239

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds This is Q*BERT, an agent that explores using an intrinsic motivation to learn a knowledge graph of world by asking questions. Paper: Code:

3

51

230

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Do embodied agents dream of 🤖pixelated sheep🐏? Meet DECKARD, an agent that "dreams" a world model hypothesizing how to achieve tasks via a LLM. Efficiently training more generally capable RL agents by grounding LMs with actions in a world! In #ICML2023

5

62

225

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

9 months

I spent my highschool and half my undergrad like this (minus the sleep and serenity). The 16 hour workday grindset is a helluva drug. Took me years to recover. It's ineffective, inefficient, and will just plain leave you miserable

Air Katakana

@airkatakana

9 months

Meanwhile the first thing Berkeley CS undergrads are shown:

68

174

2K

6

9

219

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Bit of a life update. Starting this fall (after I defend), I'll be at @allen_ai doing interactive #NLProc things with @YejinChoinka @HannaHajishirzi and the rest of the amazing @ai2_mosaic team!! Much excite!!!

28

5

207

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

18 days

NeurIPS first author paper to get into highschool soon. This is a questionable move from NeurIPS The number of emails I get from some of the highschools in Cali is insane. They're all from "top" highschools and most directly from parents who know the game.

Gautam Kamath

@thegautamkamath

18 days

NeurIPS 2024 will have a track for papers from high schoolers.

79

91

593

6

12

203

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

Hearing all the sirens outside my window and trying to go back to writing papers just makes me think, "Man, what a bubble we live in."

2

9

195

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

ACL achievement unlocked: get both a "this short paper should've been long" and a "this long paper should've been short" review

5

182

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Chaos in the Metaverse

5

22

182

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

The first paper of my PhD from three years ago with @mark_riedl has a 100 citations! Not much by today's ML/NLP standards perhaps but it means a lot to me especially because of how non "mainstream" the work is

8

4

181

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 months

Finally! A very natural next step I'm glad someone tried Here's one more free paper idea: use NLPO instead of PPO to mask out next tokens during generation based on compiler syntax feedback to make exploration more efficient. V bullish on LLM+RL+CodeGen

AK

@_akhaliq

3 months

StepCoder Improve Code Generation with Reinforcement Learning from Compiler Feedback paper page: The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning…

0

56

250

2

26

145

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 month

Unsolicited advice for (academics) interested in Big Model capabilities/scaling research. Stay grounded in reality (industry) and write fewer papers. Honestly, very very few recent papers in the area actually matter

3

8

153

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

How can we get language based reinforcement learning agents to act in more altruistic and less harmful ways towards themselves and others? One way is to constrain their actions with social commonsense. New #NAACL2022 paper on social value alignment 🧵👇

5

28

151

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 month

In under 4 hours of release, the community has taken our model and made it run locally on an M2 laptop. Open source LFG!!!

Awni Hannun

@awnihannun

1 month

4-bit quantized DBRX runs nicely in MLX on an M2 Ultra. PR:

29

114

735

1

14

152

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

🚨New Paper Alert🚨 Having trouble keeping your (AI) dragon motivated? Same here. So we figured out how to teach it, interactively w/ RL & lang pretraining, to act consistently + talk naturally wrt its motivations when questing in a fantasy text game. 1/4

3

29

146

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Now accepted to #ICLR2023 ! Look forward to our talk on open source, efficient natural language RLHF algorithms at Kigali, Rwanda!!!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing 💻RL4LMs: library to train any @huggingface LM w/ RL 👾GRUE: benchmark of 6 NLP tasks+rewards 📈NLPO: new RL alg 4 LMs 🌐

6

115

417

4

21

142

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

A major use of conversational AI is looking for information but most unrealistically rely on the user to figure out how to ask exactly the right question.The new INSCIT benchmark evals how agents can take the initiative to guide users to the info they need

5

25

137

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Announcement time! I'm on the academic job market this cycle! Please reach out if I'm a good fit! I make trustworthy and safe AI agents that communicate with language, build world models, and learn from human and environmental feedback. More:

3

36

133

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

Two #NeurIPS2023 spotlights accepted!! 1. Our work on how to improve the Human Feedback portion of RLHF to be more effective, a direction which I believe is the clear future of feedback learning. And ...

Ellen Wu

@zeqiuwu1

11 months

F in RLHF is overall preference, which conveys limited info🙁 We introduce Fine-Grained RLHF🚀and train LMs with explicit feedback like "sentence 1 is not factual", "sentence 2 is toxic" More effective & enables LM customization

10

133

574

2

11

132

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Announcing the 1st Workshop on 🎨Creative AI Across Modalities🎶 at AAAI 2023! Come chat and learn about the latest in creative AI for Art, Music, Narrative, Poetry, Sciences and so much more from the entire community! 4-8 pg submissions due: Nov 4 More:

1

34

129

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

If you're a student trying to do Big Model AI right now, my one piece of advice is to take and pay attention in both a Systems (covering GPUs) course and something covering Human Participant Study Design. These are basic prereqs for Every Thing Else.

5

4

131

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Made it through my final 5 phase boss fight. Even got a title for my trouble. Hi, I'm Dr. Prithviraj, at your service.

Mark Riedl

@mark_riedl

3 years

. @rajammanabrolu kicks off his PhD dissertation defense

2

0

51

24

0

126

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

My 280+ page magnum opus, my dissertation, has been submitted to the committee!! Only D(efense)-Day left 🤞

2

3

124

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Check out our new easy to use, off policy reinforcement learning algorithm to selectively *un*learn *un*desirable behaviors in LMs without sacrificing other capabilities!

AK

@_akhaliq

2 years

Quark: Controllable Text Generation with Reinforced Unlearning abs: introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model

0

24

163

1

20

120

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Open letter to all game devs on here. In honor of mother's day, I want more video games where I can summon my amma to hit people being mean to me with a chappal. Thanks.

Thirsty Suitors 💦🛹🔪 out now!

@OuterloopGames

2 years

Call your amma!

32

353

2K

1

22

117

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

DPO is nice and easy to get running but I have yet to see it out perform an(y) online actor critic RL algo with large scale (noisyish) human feedback data. I've burned too many GPU hours. No exploration or reward means it's not well suited to initial RLHF training on real data

12

11

118

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Massive updates to RL4LMs, library + paper! Early Christmas! 🗣️New task: chitchat dialogue 🧑‍💻Human preference collection UIs released. 🔁Continual deployment expts: how to budget expert demos vs feedback, feedback formats, reward training, and more!

GitHub - allenai/RL4LMs: A modular RL library to fine-tune language models to human preferences

A modular RL library to fine-tune language models to human preferences - allenai/RL4LMs

github.com

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing 💻RL4LMs: library to train any @huggingface LM w/ RL 👾GRUE: benchmark of 6 NLP tasks+rewards 📈NLPO: new RL alg 4 LMs 🌐

6

115

417

1

25

117

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

Announcing "Wordplay: When Language Meets Games" at #NeurIPS2020 , your one stop workshop for all things interactive (narrative + language learning + AI). Now with a amazing set of speakers and organizers spanning all these fields!

2

29

116

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

From the GPT4 tech report: "This report contains no further details about the architecture (including model size), ... dataset ... training method, or similar." It's a product. Not science. That's fine. I better not see *any* ACL prompting papers on it.

GPT-4

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less...

openai.com

3

12

114

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Was just told my approach to mentoring students "is not scalable". Why are we scaling again??? They're people not LLMs????

4

5

111

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

Do people see why we shouldn't allow all of AI development to be closed source in the hands of like three companies yet?

1

12

108

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 months

For a small fee, I will attend your enemy's job talk and ask "isn't this just fancy prompt engineering on GPT-4?"

Analisa Packham

@analisapackham

3 months

For a small fee, I will attend your enemy’s job talk and ask “how is this economics?”

37

152

2K

4

3

108

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 months

Only a year ago that our paper on when and how to use RL in NLP was accepted to ICLR-23. We're now at 100 citations and 2k GitHub stars! Less about numbers and just how excited I am that so many people are working on RL for NLP! Only a few years ago this was unimaginable!

4

9

109

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Have you met the entire nation of India? People who actually know how to make vegetarian food?

Ezra Klein

@ezraklein

3 years

There is no doubt that being veg is less delicious. People who argue otherwise are kidding themselves. But a lot of that is because there are fewer options on menus, so much less money driving creativity. The more plant-based eaters and chefs there are, the tastier it'll get.

594

62

2K

3

6

107

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

Starting to sink in that I'll never need a winter jacket again 🥹

6

0

106

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

The power of powers of 2!! We noticed this while building encoder decoder models like T5 into our RL4LMs open source RLHF toolkit, just snapping vocab size to the nearest power of 2 significantly improves run times!!

GitHub - allenai/RL4LMs: A modular RL library to fine-tune language models to human preferences

A modular RL library to fine-tune language models to human preferences - allenai/RL4LMs

github.com

Andrej Karpathy

@karpathy

1 year

The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.

86

360

5K

2

7

105

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 month

I'm writing this cause I'm a bit salty. We've implemented so many seemingly promising, published & popular papers only for them to utterly flop. At least I like to think that my personal bs Big Model paper classifier is now pretty good given my extensive training data.

4

1

101

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Basically every year but hits harder this time #EMNLP2021 😭

1

8

100

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 month

The sacrifices to the twin gods of compute and crowdworking worked! In under 3 months we built the best commercially viable open weight LLM We're committed to opening up AI research again by giving *you* the result of our efforts! We're just getting started

Jonathan Frankle

@jefrankle

1 month

Meet DBRX, a new sota open llm from @databricks . It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.

34

269

1K

7

9

96

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

4 line review I got today - Line 1: "Method lacks novelty." Line 4: "Actually if I think about it, this is very novel. I see no issues." Reject. How do you even respond to this? T_T

5

4

95

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

An AI dungeon master that delivers actually engaging experiences you say? We're already on it! Just hit up @peizNLP and @zhuexe

Will Knight

@willknight

7 months

Hmm a dungeon master that won’t talk about weapons is not ideal

15

17

255

3

11

90

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

RL4LMs has 500 stars on GitHub! Thanks for the support for your one stop shop for all things RLHF! 3000+ expts over 7 NLP tasks, 4 RL algos, any Huggingface generative LM, 20+ metrics, human preference collection UIs, continual deployment, and more!

GitHub - allenai/RL4LMs: A modular RL library to fine-tune language models to human preferences

A modular RL library to fine-tune language models to human preferences - allenai/RL4LMs

github.com

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing 💻RL4LMs: library to train any @huggingface LM w/ RL 👾GRUE: benchmark of 6 NLP tasks+rewards 📈NLPO: new RL alg 4 LMs 🌐

6

115

417

2

25

91

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

9 months

A common misconception is that I'm an AI researcher. Actually, my job... is just beach. I am an Assistant Professor of Beach. Soon in San Diego! 🤙

6

0

91

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

I'm quite tired of industry papers that don't have any released data/models/code + not enough implementation details making them impossible to reproduce. The blanket reason being "Company IP". Just don't publish then.

2

6

90

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Introducing the JerichoWorld Dataset! Designed to measure textual world modeling agents' situated knowledge representation and commonsense reasoning skills. Thousands of autoannotated (text→knowledge graph+actions) pairs across dozens of text games.

4

29

88

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 months

This week for Christmas I got two grant proposals 🎉rejected🎉 cause they are "5+ year moonshots that are not worth wasting resources on" 🥰 New to the whole professing thing, can I eventually send them the paper with the caption "here's your 5+ year moonshot, took us 1 🚀" ?

5

0

88

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

PhD visit days at @ucsd_cse covering all the essentials!!

2

0

84

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

Someone understands my pain. The root of suffering is tokenization A lot of the things people point out as "LLMs can't do X" are actually tokenizer issues. This becomes really obvious really fast once you spend time low level and see how messed up all forms of encodings are

Andrej Karpathy

@karpathy

2 months

We will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.

59

280

3K

1

7

83

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

In other news, I've finally moved to Seattle and today is my first day working at @allen_ai !!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Bit of a life update. Starting this fall (after I defend), I'll be at @allen_ai doing interactive #NLProc things with @YejinChoinka @HannaHajishirzi and the rest of the amazing @ai2_mosaic team!! Much excite!!!

28

5

207

3

0

83

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Finally got around to updating my website. It's now a lot more ✨me✨

5

2

83

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

I'm attending #NeurIPS2023 !! Presenting two spotlights and recruiting PhD students for my PEARLS lab at @ucsd_cse & research engineers/scientists @MosaicML / @databricks !! A heavy focus on LLMs+RL(HF) & Embodied NLP. Email me! 🧋 🌐

1

19

82

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 months

We need to unify rules that jobs should only consider your top 3 papers. Otherwise, job hunting (PhD) students have too much pressure to publish lots (even if faculty don't)

Aran Komatsuzaki

@arankomatsuzaki

3 months

I hate when arXiv casually releases like 500 papers almost everyday. My energy runs out around when I scans 350 papers 😅

26

4

149

8

5

81

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

11 months

Our latest work answering the question: Do you really need the RL in RLHF? Yes! You really do. But it requires work on improving the HF portion to go from very sparse pairwise preferences to something more informative and fine grained. Let's build better rewards!!

Ellen Wu

@zeqiuwu1

11 months

F in RLHF is overall preference, which conveys limited info🙁 We introduce Fine-Grained RLHF🚀and train LMs with explicit feedback like "sentence 1 is not factual", "sentence 2 is toxic" More effective & enables LM customization

10

133

574

6

14

81

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Our paper on Multimodal RL(AI)F is now accepted to #CVPR2023 especially thanks to @YoungjaeYu3 and @JiwanChung . Tune your language models to understand multimodal inputs with RL while keeping their zero shot language abilities intact!!

AK

@_akhaliq

2 years

Multimodal Knowledge Alignment with Reinforcement Learning abs:

1

24

96

0

9

77

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

The Wordplay: When Language Meets Games workshop is back y'all!!! 3rd edition will be held at #NAACL2022 in Seattle (+hybrid virtual). Your one stop shop for all things interactive language learning, narrative, and more!! Much excite!!! More updates soon:

1

23

75

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

If one more person asks me PPO vs DPO I swear I'm gonna blow a gasket. The answer (like everything else) is that it depends on your data

9

0

74

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

11 months

I missed multiple NeurIPS early in my PhD cause I couldn't get a Canadian visa. I know many who can't do CVPR or ACL this year. Statements on "fostering inclusivity" are just theatre unless conference locations are moved outside Canada/US

Vision and AI Lab, IISc

@val_iisc

11 months

Indian PhD students from @iiscbangalore , who have first-authored papers at prestigious conferences like @CVPR , are facing unjust denials of Canadian visas. With Shocking reasons "limited employment possibilities in India" and "purpose of visit not consistent with a temp. stay."

69

422

2K

2

8

72

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

Forget MMLU, AGI is only achieved when it can fully finish my visa applications

3

0

71

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Me trying to mute all the AI hypefluencers on my feed

3

8

69

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

New #SIGDIAL2021 paper on #DnD style storytelling through multi-user dialogue! Predicting relationship types between characters via sentiment while learning to talk helps #AI models be better DnD players!! Paper: 🧵👇1/3

2

18

67

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 months

PSA for PhD applicants: US School offers will start going out very soon. Exploding/short deadlines are NOT a thing, you have until April 15 to make a decision. Top schools will have visit days in March. Go to those, talk to ppl, make an informed choice

April 15 Resolution - CGS

April 15 Resolution gives applicants to April 15 to consider graduate admission offers that include financial support at signatory schools.

cgsnet.org

Nathan Lambert

@natolambert

3 months

WTF! I've heard multiple accounts that "exploding offers" on the 1-2 week timescale are now a regular occurence in AI Ph.D. application process. Not okay. If you're not in the application cycle and hear about this, speak up! Honestly, I'll help coach people on this negotiation.

15

2

71

3

6

67

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Two #NeurIPS2022 accepted papers!! Bless the ACs!! See y'all in New Orleans and let's chat interaction, language, grounding, and reinforcement learning!! 1/2 HEX-RL Explainable RL in Natural Language using Knowledge Graphs! Led by the amazing @beckypeng6 !

Inherently Explainable Reinforcement Learning in Natural Language

We focus on the task of creating a reinforcement learning agent that is inherently explainable -- with the ability to produce immediate local explanations by thinking out loud while performing a...

arxiv.org

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

🚨Preprint Alert🚨 "Inherently Explainable RL in Natural Language" The Hierarchically Explainable (HEX) RL agent that thinks out loud to tell us why decisions are made by pointing to the facts in its internal state that most influence its actions. Paper:

1

10

64

5

10

65

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

10 months

The Dungeon Meowsters are live in Toronto for #ACL2023 to talk all things: #DnD , theory of mind, multi agent grounded dialogue, reinforcement learning, table top games, and more!! Catch @peizNLP at 4 pm today at Session 8!!

Pei Zhou

@peizNLP

1 year

📍Introducing an AI Dungeon Master’s Guide🧙‍♂️, or how to make a #DnD DM dialogue agent trained with intents and theory of mind-inspired💭reinforcement learning. Predicting how your players will react to you ahead of time makes for a better DM! 📃

3

32

114

1

8

65

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

Advice from senior faculty in my dept: "Got a grant rejected eh, fck it. Go beach. Paper rejected? Same thing. Beach, then try again." 🥺😭

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 months

This week for Christmas I got two grant proposals 🎉rejected🎉 cause they are "5+ year moonshots that are not worth wasting resources on" 🥰 New to the whole professing thing, can I eventually send them the paper with the caption "here's your 5+ year moonshot, took us 1 🚀" ?

5

0

88

7

1

65

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

How is it that a relatively early stage startup has a 4000 A100 GPU cluster seemingly effortlessly when the best funded academic institutes struggle to pay for a small fraction of that?

9

1

64

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

🚨Preprint Alert🚨 "Inherently Explainable RL in Natural Language" The Hierarchically Explainable (HEX) RL agent that thinks out loud to tell us why decisions are made by pointing to the facts in its internal state that most influence its actions. Paper:

1

10

64

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

Following time honored academic traditions: I'm happy to announce that I have taken the profile pic that will stay on my website until I ascend to full professor (at least).

UCSD CSE

@ucsd_cse

7 months

A warm welcome to an impressive cohort of new faculty members who will be joining CSE over the next two academic years 👏🎉. @creamyoki and @qip_liu started this Fall. In Fall 2024, we welcome @rajammanabrolu , @Lianhuiq , @AlexTamkin , and @_kumarde !

0

9

65

1

63

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

The boba tea shops in SD do not mess around, "light of my life, my sin, my soul" is so extra 🤣

8

2

63

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Two really cool AI classes I found out about recently: 1. Interactive fiction and AI run by @ccb , @LangTechLara at UPenn 2. World Models and Intelligence run by @Matsuo_Lab , @shaneguML , @shade_tree2112 (and many others!) at UTokyo

2

11

63

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

#AAAI2021 Come chat about C2PO: the causal, commonsense plot ordering storyteller and *how* ppl think about causality using commonsense expectations in stories. Sat. 2/6 8:45-10:30, 4:45-6:30 PST. Paper: Site: w/ EILab, @mark_riedl

2

17

61

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

"Your English is pretty good, where are you 𝙧𝙚𝙖𝙡𝙡𝙮 from?"

The Linguist Formerly Known As Yate

@pinkkatydid

2 years

10

311

1K

2

15

60

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

The correct answer to "what online RL algo should you use" has always been and will always be "whatever you know how to tune the hyper parameters for best"

Arash Ahmadian

@aahmadian_

2 months

PPO has been cemented as the defacto RL algorithm for RLHF. But… is this reputation + complexity merited?🤔 Our new work revisits PPO from first principles🔎 📜 w @chriscremer_ @mgalle @mziizm @KreutzerJulia Olivier Pietquin @ahmetustun89 @sarahookr

13

95

474

2

3

60

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

“You're in an open field to the west of a white House. There's a mailbox here.” First scene of Zork1 materialized thanks to the new #dalle by @jmhessel . Automated text game -> visual novel pipeline when??

4

10

60

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Can confirm, @YejinChoinka definitely favors exploration over exploitation and encourages others (me at least for sure) to also "be adventurous and live like a game character"!! A very well deserved award!

Allen School

@uwcse

2 years

#UWAllen @uwnlp 's @YejinChoinka aims to develop #AI with the ability to reason and communicate about the world in physical and abstract terms, like humans can do. As a 2022 #MacFellow , she looks forward to taking the “adventurous route” in her research:

9

66

390

1

2

58

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

10 months

Embodied agents and pixelated sheep 🐑 To be presented at #ICML2023 next Thursday 27th in Hawaii by @kolbytn and me! Come chat with us about language grounding, embodied AI, world models, RL+NLP and more!!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Do embodied agents dream of 🤖pixelated sheep🐏? Meet DECKARD, an agent that "dreams" a world model hypothesizing how to achieve tasks via a LLM. Efficiently training more generally capable RL agents by grounding LMs with actions in a world! In #ICML2023

5

62

225

2

11

58

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Our new work on language agents that augment their action space with symbolic modules! Basically, don't teach your LM to be a calculator when it can just use an existing one instead. A step towards Neurosymbolic LM tool use for math, navigation, and more!

Peter Jansen

@peterjansen_ai

2 years

Transformers are robust reasoners, but frustratingly lack the ability for accurate math, navigation, & other easily coded tasks. In our new work "Behavior Cloned Transformers are Neurosymbolic Reasoners", we show you can have the best of both worlds. 1/3

5

50

271

0

7

56

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 years

Join us! I'm looking for interns next summer @allen_ai to work in the areas of RL for NLP to learn from human feedback and also grounding language in envs like text games, Minecraft, NetHack, etc. for open ended RL agents

MOSAIC

@ai2_mosaic

2 years

📢📢 Looking for a Summer 2023 research internship? Apply to the Mosaic team @allen_ai !! 📢📢 topics include: commonsense, language generation, vision+language, RL, + more! Applications due Nov 13th!

1

43

165

3

8

55

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

6 months

So you burned a lot of money and trained a really good RLHF model for your existing users' preferences. Now, a new user comes along with very different preferences? How do you scale effectively to new RLHF use cases without wasting everything? New paper!

Personalized Soups: Personalized Large Language Model Alignment...

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual...

arxiv.org

Joel Jang

@jang_yoel

6 months

🎯 Tired of one-size-fits-all AI chatter? ChatGPT tends to generate verbose & overly informative responses. This is because the current RLHF pipeline only allows aligning LLMs to the general preferences of the population. However, in the real world, people may have multiple,…

2

67

299

0

6

53

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

Evaling LLMs is hard but the interesting thing is that AI/ML people seem weirdly determined to get rid of humans entirely in the eval and RLHF processes. Pls ground your metrics to something real pls thanks

Max ⛅

@maxisawesome538

5 months

pls i need respite

5

15

218

2

3

54

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

18 days

I'm mostly just worried that they're the only type who will be able to submit to this. Esp cause "Highschool paper" will definitely get used as a metric Also, seriously, let the kids go touch sand, plenty of time to be in a lab later

0

1

53

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

Current industry landscape in the fight for open source AI (personal opinion)

martin_casado

@martin_casado

7 months

I feel like we’ve all been pulled into a fucked up alternate timeline … I can’t believe we have to fight again for open source …

59

101

1K

2

7

53

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

7 months

I regret nothing.

Norin 🦈 🛒 Open

@Mahoukarp

8 months

i have a lot of plushies

22

2K

7K

3

0

53

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

The Worldformer will now appear at #NeurIPS2021 at the main track alongside the JerichoWorld benchmark in the benchmarks track. Get ready for a NeurIPS where I talk about world models not once but twice!!! 🎉🎉 Wformer: Benchmark:

Modeling Worlds in Text

We provide a dataset and corresponding benchmarks with baselines that enables the creation of learning agents that can build knowledge graph-based world models of interactive narratives.

openreview.net

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Parte the second thread, as promised: Here's the Worldformer: a sota text game world model that multi-task learns to generate all possible lang actions and the *difference* between world states as a knowledge graph, using it to learn env dynamics! Paper:

3

8

35

2

7

52

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 month

Pretty funny when you get a paper review saying "method won't be of practical value" when it's been deployed in production serving millions in industry for a couple months already 🤡

2

0

51

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

When I, a vegetarian, go out to eat with my friends

1

3

52

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

5 months

(Most) Academic Labs are sleeping on selling lab merch to keep themselves funded. The PEARLS Lab is not! (Innovating funding schemes cause making merch is fun+easy!! And NSF is... not) 🌐

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

6 months

The PEARLS Lab at @UCSD_CSE is now open for business! I'm recruiting Fall 24 PhD students in all things interactive and grounded AI, RL, and NLP!! Join us in the land of 🏖️ beach (🧋pearl tea included). Apply by Dec 20. Please help spread the word! More:

8

65

242

2

52

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Now in #ACL2023 !! Look forward to @peizNLP 's presentation! See y'all in Toronto and let's chat #DnD dialogue, theory of mind, and all things interactive NLP!! Camera ready soon!

I Cast Detect Thoughts: Learning to Converse and Guide with...

We propose a novel task, G4C, to study teacher-student natural language interactions in a goal-driven and grounded environment. Dungeons and Dragons (D&D), a role-playing game, provides an ideal...

arxiv.org

Pei Zhou

@peizNLP

1 year

📍Introducing an AI Dungeon Master’s Guide🧙‍♂️, or how to make a #DnD DM dialogue agent trained with intents and theory of mind-inspired💭reinforcement learning. Predicting how your players will react to you ahead of time makes for a better DM! 📃

3

32

114

0

11

51

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months

lol. lmao even.

lmsys.org

@lmsysorg

2 months

@HeinrichKuttler Sorry for the mistake. We recognize the issue and are indeed pushing a v1.1 to fix them. Originally the errors in reference answer were left intentionally as we wanted to demonstrate the limitation of gpt-4 judge in the paper. However, since MT-Bench has become widely used, those…

1

2

11

2

3

49

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

Yes! Language as an interface!! Conversational information search works best when LMs are grounded in an underlying info source. See our recent TACL paper led by @zeqiuwu1 for more on this idea

INSCIT: Information-Seeking Conversations with Mixed-Initiative...

In an information-seeking conversation, a user may ask questions that are under-specified or unanswerable. An ideal agent would interact by initiating different response types according to the...

arxiv.org

François Chollet

@fchollet

1 year

I'm pretty optimistic that the LLM reliability / factualness issue can be fixed. The key is to use LLMs as a dialog interface and not as a store of knowledge. LLMs as the query layer between a human user an a knowledge graph with sources (which can be hybrid generated/curated).

57

131

1K

1

7

49

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

Now accepted to #NAACL2021 !! Fairly impressed by the thoroughness of the reviews, camera ready coming soon!!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

🚨New Paper Alert🚨 Having trouble keeping your (AI) dragon motivated? Same here. So we figured out how to teach it, interactively w/ RL & lang pretraining, to act consistently + talk naturally wrt its motivations when questing in a fantasy text game. 1/4

3

29

146

1

3

49

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 month

Academics will get fully priced out of Big Model Sota research v soon. Even fine tuning won't be possible for the best OSS models. The best funded unis have 1000 times less compute than Big Tech

Jack Clark

@jackclarkSF

1 month

Facebook is targeting 350,000 H100s by end of this year.

10

23

237

7

5

48

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

1 year

@timo_schick @LukeZettlemoyer Very interesting work! You might be interested in our very related work on LMs that use tools in interactive settings, to be presented at EACL this year.

Behavior Cloned Transformers are Neurosymbolic Reasoners

In this work, we explore techniques for augmenting interactive agents with information from symbolic modules, much like humans use tools like calculators and GPS systems to assist with arithmetic...

arxiv.org

Peter Jansen

@peterjansen_ai

2 years

Transformers are robust reasoners, but frustratingly lack the ability for accurate math, navigation, & other easily coded tasks. In our new work "Behavior Cloned Transformers are Neurosymbolic Reasoners", we show you can have the best of both worlds. 1/3

5

50

271

0

3

46

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

3 years

This is fascinating but also worrying for deep RL in general in some ways. If agents can be observation permutation invariant, what can we even really claim about how they learn env dynamics/semantics?

hardmaru

@hardmaru

3 years

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning We explore RL agents that still work even when their observations get shuffled around a lot! A fun paper w/ @yujin_tang web pdf

28

240

1K

4

7

47