The PEARLS Lab at
@UCSD_CSE
is now open for business! I'm recruiting Fall 24 PhD students in all things interactive and grounded AI, RL, and NLP!! Join us in the land of 🏖️ beach (🧋pearl tea included). Apply by Dec 20. Please help spread the word!
More:
Soon™, I'll be an Asst Prof
@UCSanDiego
@UCSD_CSE
focusing on interactive & grounded AI, RL, NLP
I will also be a research scientist
@MosaicML
helping lead efforts to make tech like RLHF more accessible
Looking for PhD students & research eng/scientists to join me in ☀️SoCal🏖️
Soon™, I'll be an Asst Prof
@UCSanDiego
@UCSD_CSE
focusing on interactive & grounded AI, RL, NLP
I will also be a research scientist
@MosaicML
helping lead efforts to make tech like RLHF more accessible
Looking for PhD students & research eng/scientists to join me in ☀️SoCal🏖️
I haven't been home in years. I stay up at night thinking of all the people I'll never see again. I'd like to have a home to go back to. All I can do is donate/RT so I'm boosting
#CovidIndia
posts that can help. If this bothers you, pls mute/unfollow. Don't send me DMs like this
The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing
💻RL4LMs: library to train any
@huggingface
LM w/ RL
👾GRUE: benchmark of 6 NLP tasks+rewards
📈NLPO: new RL alg 4 LMs
🌐
Why do ML academics have such knee jerk reactions to writing rules or engines to ground and control an ML system?
"It won't work in the real world" is such an unsubstantiated argument. Have you ever actually put an ML system in production?? How do you think those work???
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
This is Q*BERT, an agent that explores using an intrinsic motivation to learn a knowledge graph of world by asking questions.
Paper:
Code:
Do embodied agents dream of 🤖pixelated sheep🐏?
Meet DECKARD, an agent that "dreams" a world model hypothesizing how to achieve tasks via a LLM. Efficiently training more generally capable RL agents by grounding LMs with actions in a world!
In
#ICML2023
I spent my highschool and half my undergrad like this (minus the sleep and serenity). The 16 hour workday grindset is a helluva drug. Took me years to recover. It's ineffective, inefficient, and will just plain leave you miserable
NeurIPS first author paper to get into highschool soon. This is a questionable move from NeurIPS
The number of emails I get from some of the highschools in Cali is insane. They're all from "top" highschools and most directly from parents who know the game.
The first paper of my PhD from three years ago with
@mark_riedl
has a 100 citations! Not much by today's ML/NLP standards perhaps but it means a lot to me especially because of how non "mainstream" the work is
Finally! A very natural next step I'm glad someone tried
Here's one more free paper idea: use NLPO instead of PPO to mask out next tokens during generation based on compiler syntax feedback to make exploration more efficient. V bullish on LLM+RL+CodeGen
StepCoder
Improve Code Generation with Reinforcement Learning from Compiler Feedback
paper page:
The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning…
Unsolicited advice for (academics) interested in Big Model capabilities/scaling research. Stay grounded in reality (industry) and write fewer papers. Honestly, very very few recent papers in the area actually matter
How can we get language based reinforcement learning agents to act in more altruistic and less harmful ways towards themselves and others?
One way is to constrain their actions with social commonsense. New
#NAACL2022
paper on social value alignment 🧵👇
🚨New Paper Alert🚨
Having trouble keeping your (AI) dragon motivated? Same here. So we figured out how to teach it, interactively w/ RL & lang pretraining, to act consistently + talk naturally wrt its motivations when questing in a fantasy text game.
1/4
The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing
💻RL4LMs: library to train any
@huggingface
LM w/ RL
👾GRUE: benchmark of 6 NLP tasks+rewards
📈NLPO: new RL alg 4 LMs
🌐
A major use of conversational AI is looking for information but most unrealistically rely on the user to figure out how to ask exactly the right question.The new INSCIT benchmark evals how agents can take the initiative to guide users to the info they need
Announcement time! I'm on the academic job market this cycle! Please reach out if I'm a good fit!
I make trustworthy and safe AI agents that communicate with language, build world models, and learn from human and environmental feedback. More:
Two
#NeurIPS2023
spotlights accepted!!
1. Our work on how to improve the Human Feedback portion of RLHF to be more effective, a direction which I believe is the clear future of feedback learning. And ...
F in RLHF is overall preference, which conveys limited info🙁
We introduce Fine-Grained RLHF🚀and train LMs with explicit feedback like "sentence 1 is not factual", "sentence 2 is toxic"
More effective & enables LM customization
Announcing the 1st Workshop on 🎨Creative AI Across Modalities🎶 at AAAI 2023!
Come chat and learn about the latest in creative AI for Art, Music, Narrative, Poetry, Sciences and so much more from the entire community!
4-8 pg submissions due: Nov 4
More:
If you're a student trying to do Big Model AI right now, my one piece of advice is to take and pay attention in both a Systems (covering GPUs) course and something covering Human Participant Study Design. These are basic prereqs for Every Thing Else.
Check out our new easy to use, off policy reinforcement learning algorithm to selectively *un*learn *un*desirable behaviors in LMs without sacrificing other capabilities!
Quark: Controllable Text Generation with Reinforced Unlearning
abs:
introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model
Open letter to all game devs on here. In honor of mother's day, I want more video games where I can summon my amma to hit people being mean to me with a chappal. Thanks.
DPO is nice and easy to get running but I have yet to see it out perform an(y) online actor critic RL algo with large scale (noisyish) human feedback data. I've burned too many GPU hours. No exploration or reward means it's not well suited to initial RLHF training on real data
The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing
💻RL4LMs: library to train any
@huggingface
LM w/ RL
👾GRUE: benchmark of 6 NLP tasks+rewards
📈NLPO: new RL alg 4 LMs
🌐
Announcing "Wordplay: When Language Meets Games" at
#NeurIPS2020
, your one stop workshop for all things interactive (narrative + language learning + AI). Now with a amazing set of speakers and organizers spanning all these fields!
From the GPT4 tech report:
"This report contains no further details about the architecture (including model size), ... dataset ... training method, or similar."
It's a product. Not science. That's fine. I better not see *any* ACL prompting papers on it.
Only a year ago that our paper on when and how to use RL in NLP was accepted to ICLR-23. We're now at 100 citations and 2k GitHub stars!
Less about numbers and just how excited I am that so many people are working on RL for NLP! Only a few years ago this was unimaginable!
There is no doubt that being veg is less delicious. People who argue otherwise are kidding themselves. But a lot of that is because there are fewer options on menus, so much less money driving creativity. The more plant-based eaters and chefs there are, the tastier it'll get.
The power of powers of 2!! We noticed this while building encoder decoder models like T5 into our RL4LMs open source RLHF toolkit, just snapping vocab size to the nearest power of 2 significantly improves run times!!
The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.
I'm writing this cause I'm a bit salty. We've implemented so many seemingly promising, published & popular papers only for them to utterly flop.
At least I like to think that my personal bs Big Model paper classifier is now pretty good given my extensive training data.
The sacrifices to the twin gods of compute and crowdworking worked! In under 3 months we built the best commercially viable open weight LLM
We're committed to opening up AI research again by giving *you* the result of our efforts!
We're just getting started
Meet DBRX, a new sota open llm from
@databricks
. It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.
4 line review I got today -
Line 1: "Method lacks novelty."
Line 4: "Actually if I think about it, this is very novel. I see no issues."
Reject.
How do you even respond to this? T_T
RL4LMs has 500 stars on GitHub! Thanks for the support for your one stop shop for all things RLHF!
3000+ expts over 7 NLP tasks, 4 RL algos, any Huggingface generative LM, 20+ metrics, human preference collection UIs, continual deployment, and more!
The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing
💻RL4LMs: library to train any
@huggingface
LM w/ RL
👾GRUE: benchmark of 6 NLP tasks+rewards
📈NLPO: new RL alg 4 LMs
🌐
I'm quite tired of industry papers that don't have any released data/models/code + not enough implementation details making them impossible to reproduce. The blanket reason being "Company IP". Just don't publish then.
Introducing the JerichoWorld Dataset! Designed to measure textual world modeling agents' situated knowledge representation and commonsense reasoning skills. Thousands of autoannotated (text→knowledge graph+actions) pairs across dozens of text games.
This week for Christmas I got two grant proposals 🎉rejected🎉 cause they are "5+ year moonshots that are not worth wasting resources on" 🥰
New to the whole professing thing, can I eventually send them the paper with the caption "here's your 5+ year moonshot, took us 1 🚀" ?
Someone understands my pain. The root of suffering is tokenization
A lot of the things people point out as "LLMs can't do X" are actually tokenizer issues. This becomes really obvious really fast once you spend time low level and see how messed up all forms of encodings are
We will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.
I'm attending
#NeurIPS2023
!! Presenting two spotlights and recruiting PhD students for my PEARLS lab at
@ucsd_cse
& research engineers/scientists
@MosaicML
/
@databricks
!! A heavy focus on LLMs+RL(HF) & Embodied NLP. Email me!
🧋
🌐
We need to unify rules that jobs should only consider your top 3 papers. Otherwise, job hunting (PhD) students have too much pressure to publish lots (even if faculty don't)
Our latest work answering the question: Do you really need the RL in RLHF?
Yes! You really do. But it requires work on improving the HF portion to go from very sparse pairwise preferences to something more informative and fine grained.
Let's build better rewards!!
F in RLHF is overall preference, which conveys limited info🙁
We introduce Fine-Grained RLHF🚀and train LMs with explicit feedback like "sentence 1 is not factual", "sentence 2 is toxic"
More effective & enables LM customization
Our paper on Multimodal RL(AI)F is now accepted to
#CVPR2023
especially thanks to
@YoungjaeYu3
and
@JiwanChung
.
Tune your language models to understand multimodal inputs with RL while keeping their zero shot language abilities intact!!
The Wordplay: When Language Meets Games workshop is back y'all!!! 3rd edition will be held at
#NAACL2022
in Seattle (+hybrid virtual). Your one stop shop for all things interactive language learning, narrative, and more!! Much excite!!! More updates soon:
I missed multiple NeurIPS early in my PhD cause I couldn't get a Canadian visa. I know many who can't do CVPR or ACL this year. Statements on "fostering inclusivity" are just theatre unless conference locations are moved outside Canada/US
Indian PhD students from
@iiscbangalore
, who have first-authored papers at prestigious conferences like
@CVPR
, are facing unjust denials of Canadian visas. With Shocking reasons "limited employment possibilities in India" and "purpose of visit not consistent with a temp. stay."
New
#SIGDIAL2021
paper on
#DnD
style storytelling through multi-user dialogue! Predicting relationship types between characters via sentiment while learning to talk helps
#AI
models be better DnD players!!
Paper:
🧵👇1/3
PSA for PhD applicants: US School offers will start going out very soon. Exploding/short deadlines are NOT a thing, you have until April 15 to make a decision. Top schools will have visit days in March. Go to those, talk to ppl, make an informed choice
WTF! I've heard multiple accounts that "exploding offers" on the 1-2 week timescale are now a regular occurence in AI Ph.D. application process.
Not okay.
If you're not in the application cycle and hear about this, speak up! Honestly, I'll help coach people on this negotiation.
Two
#NeurIPS2022
accepted papers!! Bless the ACs!! See y'all in New Orleans and let's chat interaction, language, grounding, and reinforcement learning!!
1/2 HEX-RL Explainable RL in Natural Language using Knowledge Graphs! Led by the amazing
@beckypeng6
!
🚨Preprint Alert🚨
"Inherently Explainable RL in Natural Language"
The Hierarchically Explainable (HEX) RL agent that thinks out loud to tell us why decisions are made by pointing to the facts in its internal state that most influence its actions.
Paper:
The Dungeon Meowsters are live in Toronto for
#ACL2023
to talk all things:
#DnD
, theory of mind, multi agent grounded dialogue, reinforcement learning, table top games, and more!!
Catch
@peizNLP
at 4 pm today at Session 8!!
📍Introducing an AI Dungeon Master’s Guide🧙♂️, or how to make a
#DnD
DM dialogue agent trained with intents and theory of mind-inspired💭reinforcement learning.
Predicting how your players will react to you ahead of time makes for a better DM!
📃
This week for Christmas I got two grant proposals 🎉rejected🎉 cause they are "5+ year moonshots that are not worth wasting resources on" 🥰
New to the whole professing thing, can I eventually send them the paper with the caption "here's your 5+ year moonshot, took us 1 🚀" ?
How is it that a relatively early stage startup has a 4000 A100 GPU cluster seemingly effortlessly when the best funded academic institutes struggle to pay for a small fraction of that?
🚨Preprint Alert🚨
"Inherently Explainable RL in Natural Language"
The Hierarchically Explainable (HEX) RL agent that thinks out loud to tell us why decisions are made by pointing to the facts in its internal state that most influence its actions.
Paper:
Following time honored academic traditions: I'm happy to announce that I have taken the profile pic that will stay on my website until I ascend to full professor (at least).
#AAAI2021
Come chat about C2PO: the causal, commonsense plot ordering storyteller and *how* ppl think about causality using commonsense expectations in stories.
Sat. 2/6 8:45-10:30, 4:45-6:30 PST.
Paper:
Site:
w/ EILab,
@mark_riedl
The correct answer to "what online RL algo should you use" has always been and will always be "whatever you know how to tune the hyper parameters for best"
“You're in an open field to the west of a white House. There's a mailbox here.” First scene of Zork1 materialized thanks to the new
#dalle
by
@jmhessel
. Automated text game -> visual novel pipeline when??
Can confirm,
@YejinChoinka
definitely favors exploration over exploitation and encourages others (me at least for sure) to also "be adventurous and live like a game character"!! A very well deserved award!
#UWAllen
@uwnlp
's
@YejinChoinka
aims to develop
#AI
with the ability to reason and communicate about the world in physical and abstract terms, like humans can do. As a 2022
#MacFellow
, she looks forward to taking the “adventurous route” in her research:
Embodied agents and pixelated sheep 🐑 To be presented at
#ICML2023
next Thursday 27th in Hawaii by
@kolbytn
and me!
Come chat with us about language grounding, embodied AI, world models, RL+NLP and more!!
Do embodied agents dream of 🤖pixelated sheep🐏?
Meet DECKARD, an agent that "dreams" a world model hypothesizing how to achieve tasks via a LLM. Efficiently training more generally capable RL agents by grounding LMs with actions in a world!
In
#ICML2023
Our new work on language agents that augment their action space with symbolic modules!
Basically, don't teach your LM to be a calculator when it can just use an existing one instead. A step towards Neurosymbolic LM tool use for math, navigation, and more!
Transformers are robust reasoners, but frustratingly lack the ability for accurate math, navigation, & other easily coded tasks. In our new work "Behavior Cloned Transformers are Neurosymbolic Reasoners", we show you can have the best of both worlds. 1/3
Join us! I'm looking for interns next summer
@allen_ai
to work in the areas of RL for NLP to learn from human feedback and also grounding language in envs like text games, Minecraft, NetHack, etc. for open ended RL agents
📢📢
Looking for a Summer 2023 research internship? Apply to the Mosaic team
@allen_ai
!!
📢📢
topics include: commonsense, language generation, vision+language, RL, + more!
Applications due Nov 13th!
So you burned a lot of money and trained a really good RLHF model for your existing users' preferences. Now, a new user comes along with very different preferences? How do you scale effectively to new RLHF use cases without wasting everything? New paper!
🎯 Tired of one-size-fits-all AI chatter? ChatGPT tends to generate verbose & overly informative responses. This is because the current RLHF pipeline only allows aligning LLMs to the general preferences of the population. However, in the real world, people may have multiple,…
Evaling LLMs is hard but the interesting thing is that AI/ML people seem weirdly determined to get rid of humans entirely in the eval and RLHF processes. Pls ground your metrics to something real pls thanks
I'm mostly just worried that they're the only type who will be able to submit to this. Esp cause "Highschool paper" will definitely get used as a metric
Also, seriously, let the kids go touch sand, plenty of time to be in a lab later
The Worldformer will now appear at
#NeurIPS2021
at the main track alongside the JerichoWorld benchmark in the benchmarks track. Get ready for a NeurIPS where I talk about world models not once but twice!!! 🎉🎉
Wformer:
Benchmark:
Parte the second thread, as promised:
Here's the Worldformer: a sota text game world model that multi-task learns to generate all possible lang actions and the *difference* between world states as a knowledge graph, using it to learn env dynamics!
Paper:
Pretty funny when you get a paper review saying "method won't be of practical value" when it's been deployed in production serving millions in industry for a couple months already 🤡
(Most) Academic Labs are sleeping on selling lab merch to keep themselves funded. The PEARLS Lab is not! (Innovating funding schemes cause making merch is fun+easy!! And NSF is... not)
🌐
The PEARLS Lab at
@UCSD_CSE
is now open for business! I'm recruiting Fall 24 PhD students in all things interactive and grounded AI, RL, and NLP!! Join us in the land of 🏖️ beach (🧋pearl tea included). Apply by Dec 20. Please help spread the word!
More:
Now in
#ACL2023
!! Look forward to
@peizNLP
's presentation! See y'all in Toronto and let's chat
#DnD
dialogue, theory of mind, and all things interactive NLP!!
Camera ready soon!
📍Introducing an AI Dungeon Master’s Guide🧙♂️, or how to make a
#DnD
DM dialogue agent trained with intents and theory of mind-inspired💭reinforcement learning.
Predicting how your players will react to you ahead of time makes for a better DM!
📃
@HeinrichKuttler
Sorry for the mistake. We recognize the issue and are indeed pushing a v1.1 to fix them. Originally the errors in reference answer were left intentionally as we wanted to demonstrate the limitation of gpt-4 judge in the paper. However, since MT-Bench has become widely used, those…
Yes! Language as an interface!! Conversational information search works best when LMs are grounded in an underlying info source. See our recent TACL paper led by
@zeqiuwu1
for more on this idea
I'm pretty optimistic that the LLM reliability / factualness issue can be fixed. The key is to use LLMs as a dialog interface and not as a store of knowledge. LLMs as the query layer between a human user an a knowledge graph with sources (which can be hybrid generated/curated).
🚨New Paper Alert🚨
Having trouble keeping your (AI) dragon motivated? Same here. So we figured out how to teach it, interactively w/ RL & lang pretraining, to act consistently + talk naturally wrt its motivations when questing in a fantasy text game.
1/4
Academics will get fully priced out of Big Model Sota research v soon. Even fine tuning won't be possible for the best OSS models. The best funded unis have 1000 times less compute than Big Tech
@timo_schick
@LukeZettlemoyer
Very interesting work! You might be interested in our very related work on LMs that use tools in interactive settings, to be presented at EACL this year.
Transformers are robust reasoners, but frustratingly lack the ability for accurate math, navigation, & other easily coded tasks. In our new work "Behavior Cloned Transformers are Neurosymbolic Reasoners", we show you can have the best of both worlds. 1/3
This is fascinating but also worrying for deep RL in general in some ways. If agents can be observation permutation invariant, what can we even really claim about how they learn env dynamics/semantics?
The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
We explore RL agents that still work even when their observations get shuffled around a lot!
A fun paper w/
@yujin_tang
web
pdf