Khanh Nguyen Profile Banner
Khanh Nguyen Profile
Khanh Nguyen

@khanhxuannguyen

1,487
Followers
460
Following
70
Media
973
Statuses

Postdoc at CHAI Berkeley with Prof. Stuart Russell, Prev. Postdoc at Princeton NLP, PhD @umdcs , Human-AI Communication, Interactive Learning, NLP.

Joined September 2014
Don't wanna be here? Send us removal request.
Pinned Tweet
@khanhxuannguyen
Khanh Nguyen
4 months
📢 Excited to announce our new paper Language-guided world models: A model-based approach to AI control • We develop LWMs: world models that can read texts to capture new environment dynamics • These models enable humans to efficiently control agents by providing language
Tweet media one
4
47
200
@khanhxuannguyen
Khanh Nguyen
1 year
The RLHF page of HuggingFace () misses many important citations. Here are some classical RLHF papers that you should cite and why.
4
60
343
@khanhxuannguyen
Khanh Nguyen
3 months
😠It is still ridiculous to me how much money/time was wasted simply because people don't read some old papers. 💡If you want to know why REINFORCE/A2C is better than Actor-Critic, read our paper: We have identified all of the common issues for you: -
Tweet media one
Tweet media two
Tweet media three
7
26
175
@khanhxuannguyen
Khanh Nguyen
10 months
The objective mismatch issue raised in John Schumann's ICML talk was already foreseen by our paper () 6 years ago. Sadly it wasn't cited nearly enough. Biased opinion: our paper deserves more readers and acknowledgement. In fact, not OpenAI's papers, but
1
20
145
@khanhxuannguyen
Khanh Nguyen
1 year
Why RL-tuning hurts calibration of LLMs? RL objective can be written as a reverse KL divergence which encourages mode-seeking behavior (i.e. peaky distribution). RL+translation has studied this phenomenon a long time ago (, )
4
11
91
@khanhxuannguyen
Khanh Nguyen
1 year
Working on calibration/uncertainty for LLMs, which papers should I cite? Guo et al. () is pretty popular but it is about classification tasks. Calibration on sequences comes with distinct challenges.
3
14
92
@khanhxuannguyen
Khanh Nguyen
2 years
I have finally graduated, thanks to tremendous support from my research mentors ( @haldaume3 @brendan642 @debadeepta , Dipendra Misra, and others), my family, and friends. I will be a postdoc @princeton_nlp and later @CHAI_Berkeley . Looking for opportunities to give talks :P
14
5
83
@khanhxuannguyen
Khanh Nguyen
7 months
📢Learning is all about communication. And who is the master of communication? Humans! 😯Our new paper enables AI agents to learn more like humans. 🔥Our agents define and share increasingly abstract intentions over time, and as a result, learn with progressive efficiency.
Tweet media one
2
15
72
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
@khanhxuannguyen
Khanh Nguyen
5 years
Our HANNA paper on Visual Navigation with Natural Multimodal Assistance has been accepted to #emnlp2019 . New task/dataset/model/learning algorithm for leveraging vision-and-language human assistance in object-finding tasks in photo-realistic environments! (with @haldaume3 )
Tweet media one
2
14
65
@khanhxuannguyen
Khanh Nguyen
9 months
After a wonderfullll year at Princeton, I am excited to join CHAI Berkeley, working Prof. Russell and Prof. Dragan to continue my effort to make AI communicate more effectively with humans. Connect with me if you are interested in learning from language feedback, learning to ask
10
2
62
@khanhxuannguyen
Khanh Nguyen
7 months
🚀 Dive into the untold story of Alignment via Human Feedback from an NLP perspective! This paper brilliantly encapsulates the epoch often overlooked in surveys written by RL groups. An absolute must-read for newcomers in the field! 📚
0
22
59
@khanhxuannguyen
Khanh Nguyen
3 years
Do AI agents know what they want? Can they ask specific questions that faithfully reflect their intrinsic needs? We develop a general decision-making framework for simultaneously learning 𝙬𝙝𝙚𝙣 𝙖𝙣𝙙 𝙬𝙝𝙖𝙩 𝙩𝙤 𝙖𝙨𝙠 (w/ @ybisk @haldaume3 )
Tweet media one
3
10
58
@khanhxuannguyen
Khanh Nguyen
3 months
This great work confirms my intuition: people have rediscovered problems of RLHF that was observed and documented many years ago when the method was first tried on machine translation. The finding in this paper is similar to . People, especially
@billyuchenlin
Bill Yuchen Lin 🤖
3 months
"Less (tuning) is more for alignment" is an intriguing hypothesis. Is alignment tuning really that “superficial”⁉️ 🤔 If so, how so? 🤔 Can any straightforward analysis explain this? 🤔 What if I tell you “no tuning can also be great for alignment”? 🫢 😉 If you’re interested in
Tweet media one
10
66
325
2
8
55
@khanhxuannguyen
Khanh Nguyen
7 years
Our code for Reinforcement Learning + Neural Machine Translation: @nlproc
@khanhxuannguyen
Khanh Nguyen
7 years
Our #emnlp2017 work improves machine translation with simulated ratings using reinforcement learning #nlproc
Tweet media one
2
8
43
0
11
50
@khanhxuannguyen
Khanh Nguyen
3 years
Maybe it's time to move beyond rewards and start 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 properly to our ML agents! Our ILIAD #ICML2021 paper formulates a learning framework where natural language is the only communication medium used by the teacher. Blog:
1
20
51
@khanhxuannguyen
Khanh Nguyen
5 years
Happy to introduce 𝗚𝗹𝗼𝗯𝗮𝗹 𝗩𝗼𝗶𝗰𝗲𝘀, an evaluation dataset for multilingual and cross-lingual summarization in 15 languages (w. @haldaume3 ). New materials for studying translation quality in downstream task, zero-shot learning, etc. #NLProc #summarization #multilingual
Tweet media one
1
10
49
@khanhxuannguyen
Khanh Nguyen
1 year
Passing false-belief tests = model HAS theory of mind Passing false-belief tests ≠ model USES theory of mind to perform tasks Our #ACL2023 paper: formulates 𝑻𝒂𝒔𝒌-𝑶𝒓𝒊𝒆𝒏𝒕𝒆𝒅 cognitive capabilities, which are used to perform tasks.
Tweet media one
1
11
45
@khanhxuannguyen
Khanh Nguyen
7 years
Our #emnlp2017 work improves machine translation with simulated ratings using reinforcement learning #nlproc
Tweet media one
2
8
43
@khanhxuannguyen
Khanh Nguyen
10 months
Very delighted to receive an Outstanding paper award at @tom_icml2023 . It is a great honor to be acknowledged by experts in the domain you have just recently ventured into :)
@khanhxuannguyen
Khanh Nguyen
1 year
Passing false-belief tests = model HAS theory of mind Passing false-belief tests ≠ model USES theory of mind to perform tasks Our #ACL2023 paper: formulates 𝑻𝒂𝒔𝒌-𝑶𝒓𝒊𝒆𝒏𝒕𝒆𝒅 cognitive capabilities, which are used to perform tasks.
Tweet media one
1
11
45
4
7
40
@khanhxuannguyen
Khanh Nguyen
5 years
HANNA: Visual Navigation with Multimodal Natural Assistance is online Our agent finds objects in photo-realistic environments by learning to query simulated humans for instructions. Paper: Github:
Tweet media one
1
11
40
@khanhxuannguyen
Khanh Nguyen
2 years
Accepted at #icml2022 :)
@khanhxuannguyen
Khanh Nguyen
3 years
Do AI agents know what they want? Can they ask specific questions that faithfully reflect their intrinsic needs? We develop a general decision-making framework for simultaneously learning 𝙬𝙝𝙚𝙣 𝙖𝙣𝙙 𝙬𝙝𝙖𝙩 𝙩𝙤 𝙖𝙨𝙠 (w/ @ybisk @haldaume3 )
Tweet media one
3
10
58
0
7
37
@khanhxuannguyen
Khanh Nguyen
3 months
I woke up to this wonderful paper!!! @KreutzerJulia (the RLHF veteran) and @CohereForAI has done it! They show REINFORCE beats PPO convincingly and propose a better version. Only those who understand the past can shape the future.
@aahmadian_
Arash Ahmadian
3 months
PPO has been cemented as the defacto RL algorithm for RLHF. But… is this reputation + complexity merited?🤔 Our new work revisits PPO from first principles🔎 📜 w @chriscremer_   @mgalle   @mziizm @KreutzerJulia Olivier Pietquin @ahmetustun89 @sarahookr
Tweet media one
13
96
474
1
5
36
@khanhxuannguyen
Khanh Nguyen
11 months
I wrote a thought piece showing RLHF = variational inference on Bayesian cognitive model (generalized RSA). I hope that realizing this connection can help better understand recent developments on LLMs and inspire future research.
Tweet media one
0
6
30
@khanhxuannguyen
Khanh Nguyen
11 months
Nice work! Remember that the SOTA LLMs do not implement SOTA learning algorithms. Imitation learning was less popular because of the expert query cost. But the cost is now much cheaper with LLMs as experts. Many cool IL work in the past now can find their ways into real-world
@xkianteb
Kianté Brantley
11 months
New paper! Learning to Generate Better Than Your LLM () RLHF has become a powerful paradigm for fine-tuning LLM, but we only use general-purpose RL algorithms. We introduce new algorithmic paradigm that takes advantage of additional feedback for learning.
1
74
372
0
4
31
@khanhxuannguyen
Khanh Nguyen
7 months
When a language model guides a human, giving false instructions can frustrate them or even put them in danger. We propose a cost-effective method for detecting hallucinations in navigation instructions. More about our #EMNLP2023 findings paper⬇️ (1/n)
Tweet media one
2
5
27
@khanhxuannguyen
Khanh Nguyen
3 months
My opinion on SORA as a world model (ignore this post if you think of it as just a video-editing tool): - Generating high-resolution, realistic outputs makes it hard to use SORA as a planner. We should have more work on planning with abstract representations of the world (e.g.,
@khanhxuannguyen
Khanh Nguyen
4 months
As humans, we influence others' worldview to shape their behavior. A kid asks his mom if he can go swimming at a nearby lake. The mom says: "there was a drowning accident over there last year." After listening to that, the kid chooses to stay home. Here, instead of giving an
1
0
5
4
2
27
@khanhxuannguyen
Khanh Nguyen
7 months
📢Internship at CHAI Berkeley. Apply by Nov 13. Opportunity to work with a group of leading experts in AI safety. I am particularly looking for students who are interested in learning from language feedback, and learning to ask questions.
0
4
24
@khanhxuannguyen
Khanh Nguyen
3 months
Do language-to-world models like OpenAI SORA excite you? We are too! In this recent paper, we lay out a vision for this type of models. Not just video-creating tool, they will enable humans to collaborate safely and control AI easily. The code has been released. Check it out!
@khanhxuannguyen
Khanh Nguyen
4 months
📢 Excited to announce our new paper Language-guided world models: A model-based approach to AI control • We develop LWMs: world models that can read texts to capture new environment dynamics • These models enable humans to efficiently control agents by providing language
Tweet media one
4
47
200
0
2
22
@khanhxuannguyen
Khanh Nguyen
5 years
Watch our agent, HANNA, find objects by asking for directions along the way. Full demo on Youtube: Paper: Github:
0
7
19
@khanhxuannguyen
Khanh Nguyen
6 years
Tweet media one
1
2
18
@khanhxuannguyen
Khanh Nguyen
3 years
The hardest paper I have ever been a part of, both in terms of arguments, experimental setup, and technical depth. Could not achieve without help from amazing co-authors, and the open-minded reviewers. Learning from language is challenging but (to me) it is the future of AI!
@patrickshafto
Patrick Shafto
3 years
"Interactive Learning from Activity Description", led by the fantastic @khanhxuannguyen with Dipendra Misra, Robert Schapire, Miro Dudík ( @MSFTResearch ) has been accepted to #ICML2021 !
Tweet media one
0
0
7
0
2
16
@khanhxuannguyen
Khanh Nguyen
1 year
First time co-organize a workshop at a major conference. Great interactive audience, wonderful talks and discussions about #interactiveNLP . Simultaneous interpretation still awkward. Everyone seemed to be happy. Thank you all for contributing to this experience :D
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
3
14
@khanhxuannguyen
Khanh Nguyen
5 years
7 predictions of Abhinav Gupta at the "Computer Vision after 5 years" workshop @cvpr2019 1/7
Tweet media one
1
1
13
@khanhxuannguyen
Khanh Nguyen
5 years
I had a wonderful visit and learned about cool research at NYU and FAIR thanks to the hospitality and generosity of @kchonyc @W4ngatang and @uralik1 . Thank you very much and wish you all the best!
@kchonyc
Kyunghyun Cho
5 years
Khan Nguyen at NYU on empowering navigation agents with human assistance cc his advisor @haldaume3
Tweet media one
3
1
16
1
0
12
@khanhxuannguyen
Khanh Nguyen
3 months
Just right when I just asked the question. Google Gemma uses the old good REINFORCE! This confirms my belief that the algorithm doesn't really matter (hyper-tuning matters though). What you should care about is the structure in the data and how to formulate the problem in a way
@natolambert
Nathan Lambert
3 months
RLHF details in @GoogleDeepMind 's Gemma: * Confirm Google uses REINFORCE algo * KL penalty in reward to SFT distribution (like InstructGPT), would be in addition to policy KL * "we relied on a high capacity model" big RMs >> small, as Anthropic results have shown More soon.
4
13
246
1
0
11
@khanhxuannguyen
Khanh Nguyen
1 year
Nguyen and O'Connor () and Kuleshov and Liang () are the first papers on calibration for sequences. They formulate and discuss challenges to this problem. Consider reading and citing these papers if you work on this topic :)
0
1
11
@khanhxuannguyen
Khanh Nguyen
3 years
No offense to my Chinese friends. But if you are speaking to a general audience and you are unsure that they are all from China, use the term "𝗟𝘂𝗻𝗮𝗿 𝗡𝗲𝘄 𝗬𝗲𝗮𝗿". In Vietnam, we call it "Tết Nguyên Đán" (if anyone cares about inclusiveness).
1
0
11
@khanhxuannguyen
Khanh Nguyen
7 months
This is great! It might imply that we have been doing actor-critic the wrong way the whole time? Actor critic seems like coordiante descent but the problem is that the coordinates are correlated?
@furongh
Furong Huang
7 months
🔥Major Breakthrough in #RLHF ! Traditional approaches fall short in characterizing policy-driven data dependency. Introducing PARL: a Unified Stochastic Bilevel Formulation. One of the FIRST provable solutions to #Alignment . 🚀 Essential for ethical AI! 📄
2
15
93
3
2
10
@khanhxuannguyen
Khanh Nguyen
9 years
Model uncertainty must be correct to be useful. Posterior calibration for NLP models http://t.co/5NJUQXhxWE #nlproc #mlearning #datascience
2
4
9
@khanhxuannguyen
Khanh Nguyen
2 years
Presenting this work today (Thu) at #ICML2022 in Room 310 at 11.30. Poster at 18.00 EDT :)
@khanhxuannguyen
Khanh Nguyen
3 years
Do AI agents know what they want? Can they ask specific questions that faithfully reflect their intrinsic needs? We develop a general decision-making framework for simultaneously learning 𝙬𝙝𝙚𝙣 𝙖𝙣𝙙 𝙬𝙝𝙖𝙩 𝙩𝙤 𝙖𝙨𝙠 (w/ @ybisk @haldaume3 )
Tweet media one
3
10
58
0
6
10
@khanhxuannguyen
Khanh Nguyen
5 years
@umdclip @ml_umd @umdcs students presenting their work at EMNLP'19 in Hong Kong. A memorable event: first EMNLP paper for @swetaagrawal20 and last for @yogarshi and Weiwei Yang as PhD candidates.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
3
8
@khanhxuannguyen
Khanh Nguyen
1 year
(5/7) Julia Kreutzer is a veteran on this topic. She authors so many papers that analyze the feasbility of learning translation systems from human feedback (those with Sokolov, and , ).
1
0
9
@khanhxuannguyen
Khanh Nguyen
3 months
@aahmadian_ @chriscremer_ @mgalle @mziizm @KreutzerJulia @ahmetustun89 @sarahookr this is the comparison I have been looking for! in fact, all of the early work on RLHF for text generation employed simple algorithms like A2C and REINFORCE and they worked fine.
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
1
0
9
@khanhxuannguyen
Khanh Nguyen
5 years
I had a fantastic internship at @MSFTResearch working with @debadeepta @chris_brockett and Bill Dolan on empowering navigation agents with the ability to leverage help from humans. Human-assisted AI agents can accomplish tasks that surpass their knowledge and skill levels.
@MSFTResearch
Microsoft Research
5 years
When in doubt, people ask for help. What if our personal digital assistants could do the same? Microsoft researchers have created a novel method of training agents to strategically ask for assistance during vision-language tasks: #CVPR2019
0
28
59
0
2
9
@khanhxuannguyen
Khanh Nguyen
4 months
By the way, @a1zhang is on the PhD market this year. He is smart, diligent, and productive, and is experienced with vision&language research. Grab him while you can 😃
@khanhxuannguyen
Khanh Nguyen
4 months
I am super proud of my collaborators @a1zhang @JensTuyls , Albert Lin, and @karthik_r_n . The problem turned out be much more challenging than we had anticipated, but we didn't give up. Our paper has just tackled an easy version of the general problem. We hope it will spark
1
0
7
1
2
8
@khanhxuannguyen
Khanh Nguyen
1 year
I should say that the scope of this tweet is text gen. The history of RL from humans of course dates way further back than this (e.g. TAMER by Knox and Stone, Littman et al., etc.)
0
0
8
@khanhxuannguyen
Khanh Nguyen
3 years
@fhuszar “Enough” does not mean “efficient”. A two-layer neural network with sufficient width can approximate any function. But the width could grow exponentially with the complexity of the function. Deep nets are more efficient function appriximators.
0
1
8
@khanhxuannguyen
Khanh Nguyen
7 months
Another goal of this work is to construct an agent that asks increasingly abstract questions to reduce the effort of the human assisting it. When I started my PhD, I asked my advisor about every little detail. But near the end, we mostly exchanged high-level ideas. Now I am
@khanhxuannguyen
Khanh Nguyen
7 months
📢Learning is all about communication. And who is the master of communication? Humans! 😯Our new paper enables AI agents to learn more like humans. 🔥Our agents define and share increasingly abstract intentions over time, and as a result, learn with progressive efficiency.
Tweet media one
2
15
72
0
1
8
@khanhxuannguyen
Khanh Nguyen
1 year
(1/7) In terms of RL for text gen, cite Ranzato+15 () and Shen+ () who pioneer training text generators to optimize rewards, and Bahdanau+17 () who attempt the first actor-critic solution.
1
0
8
@khanhxuannguyen
Khanh Nguyen
7 years
Come to our #emnlp2017 poster at 10.30am today (Sep 10 GMT+2) on Reinforcement Learning for Neural MT with Simulated Ratings. #nlproc
Tweet media one
1
1
8
@khanhxuannguyen
Khanh Nguyen
3 years
@xwang_lk You look like a rich guy who owns multiple casinos in HK in this photo :))
2
0
8
@khanhxuannguyen
Khanh Nguyen
3 months
@QuanquanGu @iampanxu Indeed, all the early RLHF papers on text generation use REINFORCE and A2C.
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
0
0
8
@khanhxuannguyen
Khanh Nguyen
3 months
I wonder if there has been work that compares DPO, PPO with simpler RL algo like A2C or even REINFORCE in fine-tuning LLMs. DPO can be interpreted as actor-critic with a cool math trick to obtain a reliable critic for free (i.e. use the policy itself as critic). It also has a
4
1
8
@khanhxuannguyen
Khanh Nguyen
1 year
(6/7) All of these works happened before or around the time of Christiano+17 () who introduce the now well-known method for learning from rankings, and Stiennon+20 () who apply the method with real humans on text summarization.
1
0
7
@khanhxuannguyen
Khanh Nguyen
1 year
(4/7) Our 2017 paper () is first to present and simulate the risk of using user ratings for training text generators. People have different opinions; one's opinion varies over time. We show RL is robust to granularity, skew in rewards but not variance.
1
0
7
@khanhxuannguyen
Khanh Nguyen
1 year
(7/7) I hope this tweet conveys a better snapshot of the history of RLHF. Thanks for reading :)
1
0
7
@khanhxuannguyen
Khanh Nguyen
1 year
(0/7) To some people, RLHF means "learn a reward model from human rankings and RL on it". But the term literally conveys a much broader meaning: any RL method that can learn from any type of human scalar feedback.
1
0
7
@khanhxuannguyen
Khanh Nguyen
4 months
I am super proud of my collaborators @a1zhang @JensTuyls , Albert Lin, and @karthik_r_n . The problem turned out be much more challenging than we had anticipated, but we didn't give up. Our paper has just tackled an easy version of the general problem. We hope it will spark
1
0
7
@khanhxuannguyen
Khanh Nguyen
7 months
Imitation learning and reinforcement learning have taken us really far. But I can't teach my AI complex things efficiently if I keep talking to it using primitive actions and rewards. I want our conversation to evolve to be more efficient over time.
Tweet media one
1
0
7
@khanhxuannguyen
Khanh Nguyen
10 months
@StephenLCasper Nice survey but missing key citations. Please see this tweet for a deeper history of RLHF .
@khanhxuannguyen
Khanh Nguyen
1 year
The RLHF page of HuggingFace () misses many important citations. Here are some classical RLHF papers that you should cite and why.
4
60
343
2
0
7
@khanhxuannguyen
Khanh Nguyen
7 years
Be careful! The bias argument in PyTorch's linear is True by default. If you do NMT or LM and forget to turn this off, the pre-softmax linear's weight may not be valid embeddings.
1
0
7
@khanhxuannguyen
Khanh Nguyen
8 years
Finally had time to write some introduction about my research on calibration #NLP #calibration #machinelearning
0
2
5
@khanhxuannguyen
Khanh Nguyen
9 months
@ShunyuYao12 @tedsumers @karthik_r_n @cocosci_lab @princeton_nlp @PrincetonCS Share many of the opinions <3 In , I was also thinking of a two-system architecture because inference with the rigorous reasoning could be slow.
Tweet media one
0
0
6
@khanhxuannguyen
Khanh Nguyen
7 months
We give our agents these elements and press the button. Bam!!! Progressively efficient learning emerges. Our agents conveys increasingly abstract intention over time.
Tweet media one
1
0
6
@khanhxuannguyen
Khanh Nguyen
4 years
The discussion on VLN reminds me of our motivation for creating VLNA (). The first thing we changed was to replace initial detailed instructions with high-level instructions, essentially removing the assumption that the requester knows the task solutions...
@chrmanning
Christopher Manning
4 years
The need for open data & benchmarks in modern ML research has led to an outpouring of #NLProc data creation. But @harm_devries , @DBahdanau & I suggest the low ecological validity of most of this data undermines the resulting research. Comments welcome!
Tweet media one
11
84
336
1
2
6
@khanhxuannguyen
Khanh Nguyen
4 months
A qualitative example: here, the Observational (no language) model mistakenly captures the movement patterns of the queen and the whale entities. It also misrecognizes the whale as an enemy. GPTHard is an approach that leverages ChatGPT to ground descriptions to entities. It
1
0
6
@khanhxuannguyen
Khanh Nguyen
3 months
@DrJimFan We did Sora+Genie but at a much more humble scale :p Still we realize that the problem of grounding language to dynamics is extremely difficult. With immense data, maybe you will generalize in distribution well, but achieving true compositional
0
2
5
@khanhxuannguyen
Khanh Nguyen
5 years
Looking for a new challenge because SOTA of @panderson_me VLN advanced too much @cvpr2019 ? Come check out VNLA, where an agent learns to request and understand human assistance in object-finding tasks. Novel imitation learning framework for language feedback!
1
2
6
@khanhxuannguyen
Khanh Nguyen
5 years
On the grand stage of @emnlp2019 , @kchonyc serves the community with his wisdom. The historical journey of how neural language generation was revived and took the spotlight of NLP research. Tips: Be 𝗰𝘂𝗿𝗶𝗼𝘂𝘀 and if don't have "attention", 𝗶𝗻𝘃𝗲𝗻𝘁 it!
Tweet media one
Tweet media two
Tweet media three
1
0
6
@khanhxuannguyen
Khanh Nguyen
4 months
We demonstrate this scenario in Messenger. Without ever interacting with the real environment, our LWM-based agent can raise its final performance significantly by effectively incorporating language feedback (EMMA-LWM vs. Observational).
Tweet media one
1
0
6
@khanhxuannguyen
Khanh Nguyen
1 year
(2/7) In those works, rewards given to the model were dense and computed automatically (BLEU). Sokolov+15,16,17 (, ) is one of the first to really think about learning from human ratings, modeling the problem as bandit learning.
2
0
6
@khanhxuannguyen
Khanh Nguyen
1 year
@DrJimFan @yoavgo @johnschulman2 yeah, the (learned) reward function may be still imperfect but the (unconfirmed) hypothesis is that evaluation is easier than generation so the reward function may still be of higher quality than a policy learned with the same amount of labeling effort.
2
0
6
@khanhxuannguyen
Khanh Nguyen
7 years
About the OpenAI dota bot: there were other people beating it, just didn't make into the news :)
0
0
5
@khanhxuannguyen
Khanh Nguyen
5 years
@SemanticScholar There are a lot of authors that have the same name as mine. SS seems to merge all of them into a single page. Why not let the authors create their own page and add papers?
1
0
5
@khanhxuannguyen
Khanh Nguyen
3 months
@natolambert All the early RLHF papers on text generation use REINFORCE and A2C.
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
0
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
To implement this approach, we need world models that make it easy for humans to adapt them. Traditional world models can be adapted with only observations, which are inadequate for humans to convey intentions. We develop world models that can be adapted through language.
1
0
5
@khanhxuannguyen
Khanh Nguyen
1 year
The theoretical fact that RL = Reverse KL optimization is pretty well-known and has been re-discovered multiple times (e.g., , , , ).
@khanhxuannguyen
Khanh Nguyen
1 year
Why RL-tuning hurts calibration of LLMs? RL objective can be written as a reverse KL divergence which encourages mode-seeking behavior (i.e. peaky distribution). RL+translation has studied this phenomenon a long time ago (, )
4
11
91
0
1
5
@khanhxuannguyen
Khanh Nguyen
3 months
I strongly encourage @GoogleDeepMind to acknowledge the early work on RLHF for text generation that pioneers the use of REINFORCE on this problem. Simplicity prevails!
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
0
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
The model-based approach is not only human-compatible but practically efficient: because an agent's policies are optimized w.r.t. to a world model, changing that model systematically shifts all the policies.
1
0
5
@khanhxuannguyen
Khanh Nguyen
1 year
(3/7) "Bandit" is important because naturally you could only ask a human to give one rating for a whole text. Sokolov formulation characterizes how difficult the problem is compared to video-game dense-reward RL problems.
1
0
5
@khanhxuannguyen
Khanh Nguyen
3 months
I am talking about these papers and the paper mentioned in the previous tweets.
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
1
0
5
@khanhxuannguyen
Khanh Nguyen
8 years
NAACL 2016 best papers: flexible, LEGO-style neural architectures #naacl2016 #nlproc
0
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
As humans, we influence others' worldview to shape their behavior. A kid asks his mom if he can go swimming at a nearby lake. The mom says: "there was a drowning accident over there last year." After listening to that, the kid chooses to stay home. Here, instead of giving an
1
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
We find that the standard Transformer architecture struggle to generalize compositionally, and augment it with a more effective attention mechanism. More details and results are in the paper.
Tweet media one
1
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
AI control has mainly taken a model-free approach: constructing agents made of black-box policies, then directly updating the policies to change their behavior. In contrast, a model-based approach constructs agents with explicit mental states and enables humans to easily
1
0
5
@khanhxuannguyen
Khanh Nguyen
3 months
and “alignment” is the new name for RL for structured prediction… (I guess that is not the originally intended meaning but that is what it turns out to be now)
@haldaume3
Hal Daumé III
3 months
Convince me I'm wrong: Generative AI is the new name for structured prediction. An interviewer asked for a def of GenAI & offhand: "an AI system that generates a complex output at once (vs a single prediction)" I later realized that's ≈identical to the def of SP I'd give ~2005
20
7
108
1
1
5
@khanhxuannguyen
Khanh Nguyen
3 months
For those who are unfamilar. This is the past I talked about. I apologize if you have seen this slide too often recently. But not enough people have seen it.
@khanhxuannguyen
Khanh Nguyen
6 months
Made this slide for my recent talk.
Tweet media one
2
11
67
1
1
4
@khanhxuannguyen
Khanh Nguyen
5 years
@haldaume3 @debadeepta @chris_brockett @panderson_me @_jessethomason_ @emnlp2019 The ACL organizers should consider putting the rebuttal period back to ACL and NAACL. While the number of papers is increasing exponentially, reducing the amount of review time spent on each paper may not be the best move to maintain/improve the review quality.
0
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
Finally, we illustrate a promising application of LWMs, in which these models enable agents to generate and discuss plans with humans before execution. This makes agents more safe, intepretable, and robust! In this setting, humans can not only provide action-correcting
Tweet media one
1
0
5
@khanhxuannguyen
Khanh Nguyen
7 years
My thought on the similarity between actor-critic and GAN
1
0
5
@khanhxuannguyen
Khanh Nguyen
1 year
@yoavgo @johnschulman2 i think the viewing of llm having a fixed knowledge graph is slightly misleading, by instruct-tune you also add knowledge and modify the knowledge graph. the issue to me is overgeneralization: instead of learning just the taught knowledge, llm also learns hallucination behavior.
2
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
Our modeling approach converts trajectories into sequences of tokens and trains a Transformer as a world model to auto-regressively generate those sequences.
Tweet media one
1
0
5
@khanhxuannguyen
Khanh Nguyen
7 months
@amritsinghbedi3 Another remark is whether the current formulation would result in overly conservative agents because an easy way to optimize the objective is to make the data distribution have very low support. RLHF is known to hurt calibration. This problem has also been studied in machine
1
0
4
@khanhxuannguyen
Khanh Nguyen
4 months
Website: Paper: [end]
0
0
5
@khanhxuannguyen
Khanh Nguyen
4 months
We first construct a benchmark based on the Messenger environment. There, a model needs to interpret a language manual to predict environment dynamics. This is a hard language-grounding problem. A model has to learn representations of entities, correctly extract textual features,
Tweet media one
1
0
5