PhDone!!!! 👨🎓
08/2019-04/2024 What a journey 🥳🚞
I especially feel lucky to share this once-in-a-life-time moment with people I love ❤️ . And seeing my passion-driven research efforts being acknowledged by researchers I deeply admire 🌞!! Special thanks to my awesome committee
Can LLMs translate reasoning into decision-making insights?
Bad news: NO! Without any help, LLMs "thinking" doesn't really translate into "doing".
Good news: A little bit of structure goes FaR!
We present Foresee and Reflect (FaR), a 0-shot reasoning mechanism that boosts
There is an old Chinese saying: 千人千面 (Thousands of people and thousands of faces). Just as each person is unique, each problem is unique. How do we prepare LLMs to solve complex unseen problems through reasoning?
Our (
@USCViterbi
,
@GoogleDeepMind
) new paper: Self-Discover:
📍Introducing an AI Dungeon Master’s Guide🧙♂️, or how to make a
#DnD
DM dialogue agent trained with intents and theory of mind-inspired💭reinforcement learning.
Predicting how your players will react to you ahead of time makes for a better DM!
📃
Three (co) first-author
#EMNLP2021
papers after multiple rounds of rejections and iterations. Feeling extremely glad and relieved that not giving up actually will pay off in the end. Kudos to those whose work got rejected and are fighting to try again, the day will come!! 🎈
With the official letter arrived, I’m thrilled to say that I’ll be interning at
@allen_ai
@ai2_mosaic
team this summer in Seattle!! It has always been my dream to work on common sense research at Mosaic😊. Also equally excited to meet new and old friends in the Seattle area✈️
I’m at
#NeurIPS23
and on the job market🎷🧳!! Come and talk about anything LLM reasoning, evaluating communicating agents, human-AI collaboration for new discoveries, coffee and jazz in NOLA☕️
‼️New Paper‼️ Yeah, knowledge-grounded response generation models are 🆒, but have you tried using a single model to externalize the implicit common sense *and then* produce responses? We propose a “Think🤔-Before-Speak🗣️” self-talk model that generates better responses!🧵 [1/7]
Now in
#ACL2023
!! Look forward to
@peizNLP
's presentation! See y'all in Toronto and let's chat
#DnD
dialogue, theory of mind, and all things interactive NLP!!
Camera ready soon!
🚨 Can response generation models read between the lines? Our 🆕
#EMNLP2021
paper probes RG models to see if they can identify common sense reasons by annotating CS explanations in dialogues and evaluating RG models for CS reasoning capabilities.
🎯Want your dialogue model to generate impressive responses like
#ChatGPT
🤯but don’t have the compute like
#OpenAI
? Try our new data Reflect💡accepted to
#EMNLP2022
that helps models generate 30% more quality responses! Our secret is annotating common ground between speakers 🧵
Excited to introduce 𝙍𝙄𝘾𝘼, a logically-grounded challenge to probe LMs' ability to make robust commonsense inferences despite textual perturbations. 🧵
Preprint:
Project Page:
Is the quality of *ACL reviews inverse proportional to the quality of LLMs? 🤷♂️
1 review I received does not have any reasons to accept/reject and 1 review's reason to reject basically only spells out "ChatGPT" 🤡
The Dungeon Meowsters are live in Toronto for
#ACL2023
to talk all things:
#DnD
, theory of mind, multi agent grounded dialogue, reinforcement learning, table top games, and more!!
Catch
@peizNLP
at 4 pm today at Session 8!!
I’m also at ACL now! Already met bunch of familiar faces on my way from airport to the venue!
Excited to chat about theory-of-mind, communicating agents, NLP+games, life and anything!!
#ACL2023
Pic taken on flight🌅 🍁
💬Excited to finally release my last summer's intern project that was accepted at
@sigdial
2021! We align dialogs with ConceptNet triples and crowdsourced new ones prompted from SocialIQA. Models trained on these dialogs produce better responses!
Link:
Excited to introduce 𝙍𝙄𝘾𝘼, a logically-grounded challenge to probe LMs' ability to make robust commonsense inferences despite textual perturbations. 🧵
Preprint:
Project Page:
Super excited that this simple idea to externalize knowledge-grounding in RG seems to be working😃! Project from my (second-time!) intern
@AmazonScience
Alexa AI. Huge thanks to my co-authors
@AmazonScience
@nlp_USC
@USC_ISI
!!
Paper preview: 🧵 [7/7]
USC NLP (
@nlp_usc
) is like 5x of its size in the last in-person conference (Hong Kong, 2019) 👀
Can’t say enough about how amazing this
#NAACL2022
conference is for our students. So much needed 😭🙌
I live 15 min away and was planning on attending the same event tomorrow. Can’t put to words how sad and terrified to hear about this tragedy happening on the lunar new year eve. Hope we all have strength to get through this and fight for some changes in 2023.
🚩Exciting foundational work towards building an ever-improving interactive agent:
- Identifying user intents/tasks and
- Inferencing user satisfaction
Both from unstructured chat logs are necessary first-steps for self-improving.
Esp. found this fig of how ppl are actually
Learning from interaction is different from learning from annotations. Today we are excited to share how we are starting to learn from people's interactions to understand and improve Copilot (web) for our consumer customers:
#Microsoft
#Copilot
#Bing
Happy to announce that our paper on incorporating commonsense KG in LMs for social reasoning () has received The Best Paper Award at EMNLP-Deep Learning Inside Out (DeeLIO) Workshop! ✌️
@emnlp2020
#emnlp2020
#deelioEMNLP
Self-Discover is a fascinating new algorithm from researchers at Google DeepMind and USC that searches for "atomic reasoning modules"
One of the quickest ways to improve your LLM programs is to add Chain-of-Thought, "Let's think step by step ...", another primitive like this
Attending
@IC2S2
in Amsterdam! Will present our work on automatically detecting politically-polarized words from online discussions (w/ Yupeng Gu,
@YizhouSun
, and
@masonporter
) on July 20!!
Exciting work led by
@aman_madaan
on mixing and matching LMs to get performance boost with reasonable costs!
Meta-verifiers help consolidate verification results and make better decisions on model routing😎
Language model APIs now come in all shapes and sizes (
@OpenAI
,
@AnthropicAI
,
@togethercompute
), with prices varying by up to 50x (Ada < Llama7b < Chatgpt < GPT-4). It makes sense to mix and match them, using smaller models for simpler queries and saving the $$ for the more
At
#EMNLP2021
and interested in how to create logically-equivalent🧑🔬 but linguistically-varied probing sets and how big LMs 🤖 perform? Come to our 7D Oral session today 12:45-14:15 PST/16:45-18:15 AST where I present 𝙍𝙄𝘾𝘼!
@nlp_usc
Excited to introduce 𝙍𝙄𝘾𝘼, a logically-grounded challenge to probe LMs' ability to make robust commonsense inferences despite textual perturbations. 🧵
Preprint:
Project Page:
🥳Check out our theory of mind workshop at
#ICML2023
Lots of new discussions on whether LLM displays some extent of theory of mind.
Want to share your thoughts/hear more of how diff fields view ToM? Come to our workshop in July in Honolulu��️
1. 🔔**𝘾𝙖𝙡𝙡 𝙛𝙤𝙧 𝙋𝙖𝙥𝙚𝙧𝙨 𝙛𝙤𝙧 𝙏𝙝𝙚𝙤𝙧𝙮-𝙤𝙛-𝙈𝙞𝙣𝙙 𝙒𝙤𝙧𝙠𝙨𝙝𝙤𝙥**🔔
The First Workshop on Theory of Mind in Communicating Agents (ToM 2023) will be hosted at
@icmlconf
in July'23 in Honolulu 🌺
CfP:
🧵
#ICML2023
#ToM2023
#ML
#NLProc
remember back in 2020 (wow a long time ago) I got nervous receiving long and detailed reviews but learned a lot on how to do better research, now when it's release day I just feel numb to see generic and short reviews without any motivation to engage in deep scientific discussion
- wonder why we chose
#DnD
for goal-driven grounded dialogs?
- how we use RL to model theory-of-mind-inspired lookahead module?
- how good is
#GPT4
as a dungeon master?
- where did I buy my dungeon meowster shirt?!
Come by our
#ACL2023
poster today at *4:15pm*🐲😼
Now in
#ACL2023
!! Look forward to
@peizNLP
's presentation! See y'all in Toronto and let's chat
#DnD
dialogue, theory of mind, and all things interactive NLP!!
Camera ready soon!
Really enjoyed the keynote from
@AlisonGopnik
Esp. on how
#cogsci
and
#devpsych
can provide insights for
#LLM
research and more rigorous exp designs for cog abilities such as
#ToM
Very excited about designing and exploring proper evaluation along these lines, stay tuned 👀
... and finally also notifications for student research workshop have been sent out. Congratulations to the smart students who will present their work at
#acl2019nlp
!
🙋Catch us at the Ethics session for our paper on biases in commonsense knowledge bases tmr 8:30-10:30 AM PST/12:30-2:30 PM AST. Can't wait to chat with old and new friends!!!!
@nlp_usc
We will be presenting this work tomorrow with
@peizNLP
at the virtual poster session II: Ethics and NLP (8:30 PT; 12:30 AST). Come say hi if you are around 😊
#EMNLP2021
Maybe the days that I’m constantly worried about renewing my visa and afraid of cannot continue my studies once I leave the country have finally passed.
#GodBlessAmerica
Same for international students, especially PhDs. Many simply cannot risk their early academic career to go back home and see loved ones. My grandparents’ health conditions are worsening and not being able to go back breaks my heart every time I video chat with them.
It sucks not being able to visit your family
Just ask immigrant scientists who haven't seen their family in years because of visa issues & costs, and the very real possibility they would not be allowed back in the country
It's time to fix the US visa system
Sunday morning read: Self-Discover: Large Language Models Self-Compose Reasoning Structures by
@peizNLP
et al. ☕️
When building LLM applications, developers typically break down complex tasks into sub-tasks. What if we use an LLM to "self-discover" how to break down the tasks?
If you want a respite from OpenAI drama, how about joining academia?
I'm starting Conceptualization Lab, recruiting PhDs & Postdocs!
We need new abstractions to understand LLMs. Conceptualization is the act of building abstractions to see something new.
Self-Discover consists of 2 stages:
1) Discovery stage where we use meta-prompts to guide LLMs to generate a structure given a few task examples (without labels!)
2) Solving stage where we simply append the structure on each task instance. (Full prompts in the paper)
@deliprao
@allen_ai
Thanks for sharing! The paper of this dataset () actually was released in late 2019. Maybe more old (🤔) test dataset/ideas should be revisited to eval LLMs!
How FaR Are Large Language Models From Agents with Theory-of-Mind?
paper page:
"Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those
We prepared some fun examples of self-discovered structures on different tasks. We compared Self-Discover with CoT/Plan-Solve and human-written structures, check em out here!
New Preprint Alert! 📢
Classical decision theory has helped humans make rational decisions under uncertainty for decades. Can it do the same for Large Language Models?
We present DeLLMa (“dilemma”), a Decision-making LLM assistant.
🔗
1/🧵
The PEARLS Lab at
@UCSD_CSE
is now open for business! I'm recruiting Fall 24 PhD students in all things interactive and grounded AI, RL, and NLP!! Join us in the land of 🏖️ beach (🧋pearl tea included). Apply by Dec 20. Please help spread the word!
More:
Soon™, I'll be an Asst Prof
@UCSanDiego
@UCSD_CSE
focusing on interactive & grounded AI, RL, NLP
I will also be a research scientist
@MosaicML
helping lead efforts to make tech like RLHF more accessible
Looking for PhD students & research eng/scientists to join me in ☀️SoCal🏖️
Further analysis shows the discovered structures display some extent of “universality”, transferable across multiple Large/Small LMs. It retains more performance compared to prompt optimization when transferred to different LMs.
This effectively alleviates model-switching cost
Google presents Self-Discover
The proposed method improves LLM performance on reasiong tasks
It outperforms Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute
Looking forward to discussing our recent work on using inference-time compute for effective reasoning at
#NeurIPS2023
!
🗓️ Self-Refine: Iterative Refinement with Self-Feedback, Wed 13 Dec 5 p.m., Great Hall & Hall B1+B2 (level 1) Poster
#324
🗓️ AutoMix:
We test on 25 challenging reasoning tasks including BigBench-Hard (23 sub-tasks), Thing4Doing, and MATH and improvements are pretty consistent!
We also find that Self-Discover helps most on complex reasoning tasks requiring world knowledge.
📍Introducing an AI Dungeon Master’s Guide🧙♂️, or how to make a
#DnD
DM dialogue agent trained with intents and theory of mind-inspired💭reinforcement learning.
Predicting how your players will react to you ahead of time makes for a better DM!
📃
Why is acting hard?
We analyze 3 oracle settings with provided inferences and find that: whenever LLMs are given hints on what to reason about, they can choose actions much better!
Bottleneck → LLM struggles at identifying implicit inferences by itself.
@rikvannoord
Would cross-lingual analogy (to test bilingual word embedding) make better sense? Like A:B :: C:D where A, B are in one language and C,D in another. In this way D naturally cannot be either A or B.
@Leox1v95
Super cool, thanks for sharing!! We also have a new paper on similar idea to externalize implicit commonsense knowledge in response generation would love to connect and potentially discuss about this interesting direction sometime😃
Even from my brief interaction with Mohamed, I already got so impressed by his insights and learned a ton. Working with him in Nairobi would be a great delight!!
We research, build and ship some of the most exciting and challenging products in the world, with passion from Nairobi. Come join us build M365 Copilot.
We ❤️ engineers and scientists that ❤️ breaking and building things!
@MasakhaneNLP
@DeepIndaba
Since our meta-prompts operate on task-level, Self-Discover is also very efficient compared to inference-intensive methods like Self-Consistency. Here we find Self-Discover can even outperform methods requiring 40x more calls per instance!
We propose a new eval paradigm: Thinking for Doing (T4D) to probe whether LLMs can act based on inferences abt others’ mental states (theory-of-mind).
We convert an inference-probing benchmark ToMi to action-probing T4D and LLMs drop significantly!
We design Foresee and Reflect to guide models to first predict potential future events then reflect on what actions to perform.
FaR-structured reasoning outperforms chain-of-thought etc. and boosts GPT-4 from 50% to 71%.
It's easy to reject papers. You can manufacture issues and sink them based on amorphous/vague notions such as novelty, impact, clarity, etc.
Every paper has minor problems that you can amplify. It takes a lot more courage to argue to accept something.
Google Deepmind presents Self-Discover
Large Language Models Self-Compose Reasoning Structures
paper page:
SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent
The secret to aligning LMs to human preferences is reinforcement learning. But Why&How is it used? Announcing
💻RL4LMs: library to train any
@huggingface
LM w/ RL
👾GRUE: benchmark of 6 NLP tasks+rewards
📈NLPO: new RL alg 4 LMs
🌐
Have you ever wished for a large-scale public dialog dataset with quality? We'd like to tell you that your wish has finally come true🎄 To quench your thirst, we give you SODA🥤, the first MILLION-scale HIGH-quality dataset with RICH social interaction✨ 🧵
Check out our poster at
#EMNLP2022
!! I will not be there in person but the amazing
@HJCH0
will present on *December 10th at 9AM* local time at the *Atrium* !
RICA is the first project in my PhD and it has been a long (sometimes exhausting) and fun journey🏃. I'm extremely grateful for all the help along the way, especially my amazing co-authors
@ark_kade
, Seyeon Lee,
@billyuchenlin
, Danial Ho,
@jay_mlr
, and
@xiangrenNLP
!!
@HarrySurden
@Swarooprm7
@_akhaliq
Thanks! It's a great question. Here are more detailed examples of how each stage in Self-Discover works based on the modules on a movie recommendation task. We plan to include a full example in appendix next!
We compare self-talk RG models against knowledge-grounded RG models that take in *ground-truth* knowledge with word overlap with reference response but still find our models produce better responses! Noisy knowledge input falls short expectedly. 🧵 [6/7]
🪄We propose a theory-of-mind-inspired methodology for training a model to generate guidance for students with RL, where a DM with intent:
1) learns to predict how the players will react;
2) uses this prediction as reward/feedback on how effective these utterances are at guiding.
Ablations show both F and R matter and noisy foresight hurts (how can models recover?).
Crucially, we find FaR generalizes to diverse contexts including story structures and task domains.
Meet Gandalf🧙♂️: new task on generating guidance in goal-driven & grounded communication. We aim to model a teacher (DM) who attempts to guide a set of students (players) toward performing certain actions grounded in a shared world.
Compared to existing dialog datasets, Reflect💡1) contains explicit human annotations of 5 inference types for each response and 2) provides 15 different plausible responses for each dialogue context (9k in total). We argue this more naturally mimics human communication.
To better analyze this worrisome phenomenon, we define *representational harms* in a set of statements targeting various social groups and use two approx., sentiment and regard (from
@ewsheng
) to quantify the harms: non-neutral views such as prejudice and favoritism.
Human eval results comparing against traditional RG show that our model produces more informative and less generic responses. Using soft-matching to align knowledge and using information-seeking question-answer pairs as knowledge format helps produce even better ones. 🧵 [5/7]
We formulate abstract commonsense using first-order logic, use unseen strings to avoid factual recalling, and apply perturbations to form a logically-equivalent statement set. Then we probe LMs in two settings that directly examine the model’s predictions (non-parametric probes).
Models produce dull and generic dialogs due to existing data, esp. crowdsourced, containing simple and safe responses as workers want to annotate quickly. We propose a *two-step* process that asks ppl to infer about CG and then write responses based on each inference dimension.
We propose evaluation protocols targeting three dimensions: knowledge quality, knowledge-response connection, and response quality. We find that for 75% of the time, given unseen dialogues, our model produces relevant commonsense knowledge and responses are grounded. 🧵 [4/7]
#DnD
is perfect for this setting since DM intends to guide players to achieve a set of story goals.
Introducing G-Dragon🐉: large-scale D&D data with labeled guidance from online gameplay. We cast dialog generation as a POMDP and train inverse dynamics models to label guidance.
We decompose the RG process to externalize the knowledge grounding step by training RG models to self-talk (inspired by
@VeredShwartz
et. al.) in a way that it can explicitly generate the relevant commonsense knowledge and reference them for responding. 🧵 [2/7]
I'm so excited to see what the community can build off of Reflect💡and eager for more common ground-aware conversational models!! Huge thanks to my collaborators
@USC_ISI
📜Paper:
💡Data:
🧭Website: