🚨 Reverse Training to Nurse the Reversal Curse🚨
LLMs fail on “B is A” if only trained on "A is B".
- Reverse training doubles training tokens by reversing strings
- Outperforms data-matched standard baselines
- Fixes issues on reversal tasks
🧵(1/6)
🚨 New paper! 🚨
We introduce System 2 Attention (S2A).
- Soft attention in Transformers is susceptible to irrelevant/biased info
- S2A uses LLM reasoning to generate what to attend to
Improves factuality & objectivity, decreases sycophancy.
🧵(1/5)
🚨New paper!🚨
Self-Rewarding LMs
- LM itself provides its own rewards on own generations via LLM-as-a-Judge during Iterative DPO
- Reward modeling ability improves during training rather than staying fixed
...opens the door to superhuman feedback?
🧵(1/5)
🚨New Paper 🚨
Self-Alignment with Instruction Backtranslation
- New method auto-labels web text with instructions & curates high quality ones for FTing
- Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst
(1/4)🧵
Our team in FAIR labs (at Meta) is hiring researchers (RE, RS & PostDoc)! DM if interested.
We work on the topics of Reasoning, Alignment and Memory/architectures (RAM).
Recent work:
Self-Rewarding LMs:
Pairwise Cringe Loss: …
🚨New Paper🚨
Chain-of-Verification Reduces Hallucination in LLMs
- Reduces longform hallucinations via LLM double-checking its own work with shortform questions
- Important not to reattend to the original hallucinations or they get copied
(1/4)🧵
🚨 New paper! 🚨
We introduce Branch-Solve-Merge (BSM) reasoning in LLMs for:
- Improving LLM-as-Evaluator: makes Llama 70B chat+BSM close to GPT4. GPT4+BSM is better than GPT4.
- Constrained Story Generation: improves coherence & constraints satisfied.
…
There's always something cringe on Twitter, here's a useful one!
🚨 new paper 🚨
The CRINGE Loss: Learning what language not to model
Train your LM to not generate bad sequences.
Shows improvements on three tasks (safety, contradictions, open dialogue).
🚨New paper!🚨
ToolVerifier.
- Method to generalize to new tools
- Self-asks contrastive questions to select between best tools and parameter choices
- Fine-tuned on self-built synthetic data
- 22% performance improvement over few-shot baseline
🧵(1/4)
🚨 New work 🚨
SeeKeR: An open source search-augmented language model
- uses a search engine to stay up-to-date
- hallucinates less & is more topical than GPT2 or GPT3, with less parameters
- applied to dialogue, superior to BlenderBot 2
Read more here:
🚨 Introducing Branch-Train-miX (BTX) 🚨
BTX improves a generalist LLM on multiple fronts:
- Train expert LLMs in parallel for new skills in domains such as math, code & world knowledge
- Join (mix) them together & finetune as a Mixture-of-Experts
🧵(1/4)
How can you improve a model -- add more parameters or add more compute?
Both work! But the model design matters.
Two new methods:
- "Hash Layers" for more parameters
- "Staircase Attention" for more power per parameter
Read here:
The unique goal of BB3's open research model is to improve future AI safety+skills for all models as participating data/feedback will be shared with the community to make AI more safe. Currently 70k convos & counting (!!), let's do this together!
Hope your year hasn’t been too Cringe!
Next year you might want to make it more so…
🚨New method for alignment!🚨
- Pairwise Cringe Loss for Preference Optimization
- Generalizes Cringe Loss with soft margin
- Outperforms DPO & PPO on AlpacaFarm
🧵 1/4
🚨 New work: BlenderBot 3x 🚨
- Public data release & analysis of 6M chat interactions.
- Learns by conversing with people in the real world: training on this data improves BB3 from 85.3% → 94.4% good messages.
paper:
project:
We’ve built and open-sourced BlenderBot 2.0, the first
#chatbot
that can store and access long-term memory, search the internet for timely information, and converse intelligently on nearly any topic. It’s a significant advancement in conversational AI.
We're releasing BlenderBot 3: a 175B param chatbot to improve model safety.
Users can give feedback as they interact and flag inappropriate text.
We'll share participating data + model weights with the community in order to improve future models.
(1/4) Meet BlenderBot 3, the first publicly available 175B-parameter chatbot with model weights, code & datasets. It can chat about nearly any topic & is designed to learn & improve by conversing with people in the real world.
Try the interactive demo:
(1/3) New paper!
Instead of a *static* train/valid/test setup, ML systems should become more useful as they interact with people & the world.
As a step in that direction, we deploy dialogue as a game and show that models improve by talking to humans!
We can make dialogue agents safer by asking humans to attack our models during conversations and learn from the experience!
BlenderBot (BST 2.7B) with *adversarial safety training on top* is as engaging as standard BST 2.7B but far more safe.
Paper:
Dream: a setting to study (RL) agents that can _speak_ and act, grounded in a rich, diverse world, interacting with other agents. Open-domain and goal-oriented.
Reality: you can do this in LIGHT! New paper:
Announcing the NIPS ConvAI2 competition!
Train Dialogue Agents to chat about personal interests and get to know their dialogue partner -- using the PersonaChat dataset as a training source.
Competition starts now! Ends September 1st.
Our new work, where LMs can generate internal thoughts as they read text (interleaved with the input tokens).
Learning to Reason and Memorize with Self-Notes
Jack lanchantin
@ShubhamToshniw6
@jaseweston
Arthur Szlam
@tesatory
Self-Notes: LLMs Learning to Reason & Use Memory
-Allow LLM to deviate from input context at any time to explicitly think
-LLM can recall info & perform reasoning on the fly, extending memory
-Generalizes to longer & more complicated setups than training
System-Level Natural Language Feedback
New framework: a human-in-the-loop process is used to derive criteria from NL feedback, which are then used to design LM prompts to refine model responses, and to define metrics to measure these improvements.
🚨 New paper 🚨
Leveraging Implicit Feedback from Deployment Data in Dialogue
Optimizing for implicit feedback signals using BlenderBot conversations gives improved social agents, e.g. using length or sentiment of future human responses.
Findings:
- Several methods give gains;…
🚨New paper🚨
"Learning New Skills after Deployment:
Improving open-domain internet-driven dialogue with human feedback"
We compare feedback types + learning methods & release models + dataset of convos & human feedback.
For findings, see thread:
(1/4)
Conclusion:
S2A uses the full reasoning power of LLMs via generation to make complex attention decisions when soft attention fails.
We show this works with 0-shot prompting, but other approaches are possible. Lots of avenues to explore!
Thanks for your attention! 🙇
🧵(5/5)
Soft attention is automatic = System 1.
System 2: allocate effortful mental activity, pay deliberate attention e.g. when System 1 makes errors.
S2A Recipe:
1) Given input, regenerate context so irrelevant parts are removed.
2) Apply LLM as usual with rewritten context.
🧵(3/5)
We have studied recipes for large-scale open domain chatbots, and are releasing 90M, 2.7B and 9.4B parameter models with SOTA results.
Paper:
Project:
Blog post:
Example prompts where SeeKeR LM (which uses a search engine in the loop) provides topical completions with less hallucination than GPT3, despite being >50x smaller.
Further info + paper:
🚨New paper🚨 SOTA dialogue models are not winning Oscars anytime soon, as they cannot effectively stay in character.
We analyze and propose methods to measure & mitigate -- but it's still an open problem.
@shtruk
@JackUrbs
Arthur Szlam
@jaseweston
💡Update on "Neural Text Generation with Unlikelihood Training" !💡
new:
- beam+ngram blocking & nucleus sampling in the human evaluation
- analysis of token generation frequency distributions
(with examples!)
arxiv:
w/
@wellecks
Some conversations from Multi-Modal BlenderBot, new work on arXiv and in ParlAI:
SOTA on both text-only and image-based open-domain dialogue. (Still lots more to improve ofc!)
New EMNLP paper:
Making dialogue safer by asking humans to attack our models and learning from the experience!
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
@em_dinan
, Sam Humeau, B. Chintagunta,
@jaseweston
While our initial results seem good, lots more to explore:
- "Scaling laws" of iterations & different LMs
- Further evaluations & benchmarks
- Study ever-improving safety reward models?
Thanks for reading, and.. reward yourself for getting this far into the thread! 🏅
🧵(5/5)
LLMs are good, but still make simple mistakes.
E.g. given irrelevant context (see figure) or opinion in the input (sycophancy).
Hypothesis:
Underlying problem is soft attention: assigns probability to too much context, including irrelevant/biased portions.
🧵(2/5)
Happy to share our new paper on addressing contradictions in dialogue modeling.
We introduce DialoguE COntradiction DEtection (DECODE) task and a new dataset with contradictory dialogues to study how well NLU models can capture consistency in dialogues.
🚨New paper🚨
MemWalker: builds and navigates a structured long-term memory via LLM prompting.
Outperforms long context, retrieval & recurrent baselines.
Great work during
@__howardchen
's internship.
Long context models are popular, but is it the final solution to long text reading?
We introduce a fundamentally different method, MemWalker:
1. Build a data structure (memory tree)
2. Traverse it via LLM prompting
Outperforms long context, retrieval, & recurrent baselines. (1/n)
Want a chatbot more engaging than Meena and DialoGPT, as engaging as Blenderbot, but it can SEE and TALK?
So, it's not just text-only -- it can ground on images and chat about them as well.
Introducing Multi-Modal BlenderBot:
(1/2) BB3 data analysis: lots of troll users, but they're v. useful for training robust models. BB3 is superhuman, BB3x is more(!) superhuman. Still lots of scope.
We can implement step (1) of S2A with LLM prompting: simple & works!
S2A increases factuality on modified TriviaQA & GSM8K from SycophancyEval & GSM-IC. Close to oracle which doesn't have biased contexts.
S2A also increases objectivity & reduces sycophancy, see paper.
🧵(4/5)
A lot of research has concentrated on automatic evaluation being hard, but our new paper finds human evaluation requires further research!
“Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents”
New paper with Orion Hsu, Rebecca Qian,
@stephenroller
,
@yboureau
, and
@jaseweston
:
Human dialogue evaluation is still an open problem (just like auto evaluation)! Different methods are preferable in different conditions, with no overall winner.
@ylecun
@giffmana
@oFFMetaSweat
NEC was getting more applied -- but we still did some good research there after you left! I believe you interviewed me, and then you were gone by the time I joined :) -- thanks for (presumably) giving me a positive interview feedback though!
Introducing LIGHT - a text adventure game platform for dialogue agents that can speak and act, along with a dataset of ~11K human conversations between people acting as game agents.
@facebookai
(1/2)
Recipe👩🍳: LLM finetuned on small seed data; access to web docs
(1) Self-augment: label each web doc with an instruction via the LLM
(2) Self-curate: label each new example with a quality score via the LLM
Then FT with the newly curated data.
Optionally Iterate.
(2/4) 🧵
DodecaDialogue: a 12 (existing) task dodecathlon challenge for building agents that can see and talk! We build a strong baseline system with SOTA on many tasks.
Kurt Shuster Da Ju
@stephenroller
@em_dinan
, Y-Lan Boureau
@jaseweston
We find reward modeling ability, measured via alignment with humans, improves across iterations of training.
Exciting, as this opens the door to the possibility of models that continually improve in both axes: instruction following & reward modeling -> virtuous circle?!
🧵(4/5)
(2/2) Our best methods stop our chatbots confidently proclaiming that their favorite Elvis Presley song was his 1992 hit "Love Me Do", or that Thierry Henry was born in 1931 and played soccer for England.
At CVPR this week!
Engaging Image Captioning via Personality
Kurt Shuster; Samuel Humeau; Hexiang Hu; Antoine Bordes; Jason Weston
Poster 193 @ 15:20 on Thursday 20th June.
@AntoineBordes
@jaseweston
We perform evaluations using GPT-4 on general instruction following prompts.
We find a steady improvement from training iteration 1 -> 2 -> 3 when comparing to each other in head-2-head evaluations, or to a fixed supervised fine-tuning (SFT) baseline.
🧵(3/5)
Humpback outperforms other Llama based models that don’t distill from more powerful models.
Exciting because it could be scaled up further, use a stronger base model, & much more!
Read more:
Thanks for diving in, and hope it makes a splash! 💦
(4/4) 🧵
New paper on arXiv: "Reason first, then respond: Modular Generation for Knowledge-infused Dialogue" 🤔→💬
We propose a modular two-step model, Knowledge to Response (K2R), for incorporating knowledge into conversational agents.
(1/6)
The resulting data is remarkably high quality/impactful for training, even though it’s through self-alignment, outperforming other instruction tuning datasets for the same data size (🐋 > 🐪)
(3/4) 🧵
New work, new model !
Director: supervised/guided language modeling using labeled training examples. Great working with
@karora4u
on this (and
@shtruk
&
@tesatory
!)
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
w/
@jaseweston
,
@tesatory
, and
@shtruk
DIRECTOR is a supervised LM that can leverage "negative" examples to avoid undesirable behaviors such as toxicity, contradiction, and repetition. 1/8
(2/3)
...and the more models improve, the more humans want to talk to them! Virtuous circle!
Intrinsically motivated players provide a better distbn & collection is more efficient than crowdsourcing.
We collect ~461k utterances over ~13k players, and release the data.
@gneubig
Lol, I didn't want to put this footnote, but my coauthors were worried it wasn't obvious ...?!
Note it contradicts the great Herman Melville: "how shall we define the whale, by his obvious externals .. a whale is A SPOUTING FISH WITH A HORIZONTAL TAIL."
@natolambert
@tesatory
@jingxu_ml
@kchonyc
@yzpang_
@WeizheY
Hi! Well, as you know releasing models got way harder for corps in the current landscape, and we're a small team in FAIR Labs [+ NYU] with limited resources (e.g., not part of Llama team). For code, we've also had some approvals + other issues .. but hopeful to get there soon
Thanks for the shoutout
@natolambert
. This is the paper referenced:
Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss
Outperforms PPO & DPO on AlpacaFarm.
RLHF lit review
#2
on
@interconnectsai
desperately needed at this point. This self-play method mirroring GANs, cringe loss (DPO style + RM) from
@jaseweston
, Nash RLFH from
@GoogleDeepMind
, and ton's of notable mentions.
Preference fine-tuning going big in 2024
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Label
Proposes JUICER, a framework to make use of both binary and free-form textual human feedback.
Factored CoVe: make sure in step (3) the LLM doesn't attend to results of (1) so hallucinations aren't copied
Factor+Revise: adds cross-checks between steps
Overall, CoVe gives large gains in multiple tasks.
Read the paper for more (hopefully non-hallucinated) facts!
(4/4)🧵
Together, the community can build ever-improving open AI systems that can interact with people in safer and more helpful ways.
Project + papers:
Demo to chat & give feedback:
ToolSelect Dataset:
- Data creation: self-generate 550 samples of synthetic tools, instructions & gold tools
- Finetune Llama2-70B on data
- Can pick tools using only names + descriptions of candidates
- Generalizes to large tool sets & new tools
- Will publicly release
🧵(3/4)
- We test on 4 tasks from ToolBench
- ToolVerifier outperforms few-shot baselines by 22%
- Self-verification alone improves avg perf by 8%
- Significantly better than Tool-Augmented LLMS
- Outperforms GPT3.5-T & even GPT4 on some tasks despite being based on Llama 70B
🧵(2/4)
Pairwise Cringe Optimization (PCO):
Just as in Cringe Loss we use iterative learning, by first training and then labeling the model generations, and then training again.
This improves performance.
🧵 3/4
Examples of BlenderBot 3 feedback and look inside mechanisms -- for understanding the model and feedback for helping advance helpful & responsible AI (we'll be sharing models and participating data with the community).
Hash Layers For Large Sparse Models
Modifies FFN to hash to different sets of weights.
Outperforms or is competitive with methods such as Switch and BASE Transformers, while requiring no routing parameters or extra terms in the objective function.
Pairwise Cringe Loss:
Our loss uses the existing Cringe Loss (push down negative tokens contrasting to top k samples from the model) for negatives, and MLE for positives.
It only applies this loss if the soft margin of the pair is violated via a sigmoid gating function.
🧵 2/4
@swarnaNLP
@omerlevy_
@real_asli
@mohitban47
@xl_nlp
👩🍳 Given a task, BSM is an LLM program with 3 steps:
Branch: plan (output prompts) for separate subtasks
Solve: generate k solutions for given k prompts (parallel)
Merge: combine for final answer
BSM helps for complex tasks requiring multiple aspects or constraints.
🧵(2/5)
Other findings:
- Outperforms binary Cringe
- Soft outperforms hard margin
- Works on human or simulated preferences
- Our iterative approach improves DPO too, but not as much
More results in the paper that'll hopefully make you Cringe in the future (algorithmically)!
🧵 4/4
The Chain-of-Verification (CoVe) recipe:
1. Generate baseline response
2. Generate a verification plan (set of questions)
3. Execute plan (answer questions)
4. Generate Final Verified Response (using answers)
(3/4)🧵
@kchonyc
what was the prompt used to generate this? could be a bit more exciting tbh.. try appending "8k ultra realistic, beautiful light, cinematic lighting, trending on artstation, hyperrealistic, focused, extreme details, cinematic, masterpiece" to it?
🚨New Paper Alert🚨
Having trouble keeping your (AI) dragon motivated? Same here. So we figured out how to teach it, interactively w/ RL & lang pretraining, to act consistently + talk naturally wrt its motivations when questing in a fantasy text game.
1/4
@denny_zhou
Hi Denny! This is the "instructed prompting" method from your paper right? Instructed prompting is both cited in our paper and compared in the experiments, see Figures 7 & 8. We found instructed prompting can help, but not as much as S2A.
Based on two papers: for integrating long-term memory (), and for internet search engine-augmented generation ().
Overall BlenderBot 2.0 project page is here: .
Shoutout to concurrent work:
They train on permuted semantic segments using an LLM to segment, and finetune.
Our work explores pretraining & reverse training is faster (e.g. random reversal without an LLM).
Both works help paint overall picture!
🧵(6/6)
- LLMs like ChatGPT & Llama are prone to hallucinate in longform generation
- Our method generates short questions that check facts in the full generation. These are answered correctly more often & are used to generate an improved response.
(2/4)🧵
Cringe works by contrasting negative tokens with top-k tokens sampled from the model itself. Iterative application to model generations improves results.
The loss is simple, easy to train and implement, with no changes to the Transformer model.
Code:
Staircase Attention for Recurrent Processing of Sequences
A new family of models that stacks attention across time to give powerful recurrence for tracking giving improvements on LM tasks for the same number of parameters.
Self-Verification:
- ToolVerifier generates top-2 most likely candidates: common mistakes are between these two
- LM self-asks a contrastive question to help decide between the two
- Similarly for parameters, we obtain 2 sets of parameters & verify to pick one
- Profit!
🧵(4/4)
We can also combine these two ideas to good effect (see pic).
Overall, these results open up a new way of looking at deep learning methods, where we disentangle the concepts of parameters and computation. Thinking in this way, we believe we can arrive at more powerful models!