Jason Weston @jaseweston Twitter profile

Pinned Tweet

Jason Weston

1 month

🚨 Reverse Training to Nurse the Reversal Curse🚨 LLMs fail on “B is A” if only trained on "A is B". - Reverse training doubles training tokens by reversing strings - Outperforms data-matched standard baselines - Fixes issues on reversal tasks 🧵(1/6)

1

24

171

Last Seen Profiles

@MelissaWha21736

@A3DGhost

@ArkingKyle

@ferro_nicolas

@bokuranoijyu

@Idobkesmijal

@RobertESheridan

@koasa_la

@Clayburn

@Krasi_81

@itsAliceWonder_

@CliveFPalmer

@baseball_HU

@T_Comer1

@FintechHubSA

@hesp_esports

@kaka_swit

@o05nTn8PaABf0ti

@takizawa0914

@Petra_Kvitova

@baka_artist

@Kaidenmill05

@9Ojw1rTxfLxS0zD

@PinkShonen

@bihari_girl02

@ildaro

@TheUFL

@forencos_japan

@HELIX_es

@DrBgay

@ResFamilyLaw

@sayo_oo_

@YenpecthPakpoom

@zoo_fukurotoji

@kimetsu_fr

@gekikarashiko

Jason Weston

@jaseweston

5 months

🚨 New paper! 🚨 We introduce System 2 Attention (S2A). - Soft attention in Transformers is susceptible to irrelevant/biased info - S2A uses LLM reasoning to generate what to attend to Improves factuality & objectivity, decreases sycophancy. 🧵(1/5)

1

272

1K

Jason Weston

@jaseweston

3 months

🚨New paper!🚨 Self-Rewarding LMs - LM itself provides its own rewards on own generations via LLM-as-a-Judge during Iterative DPO - Reward modeling ability improves during training rather than staying fixed ...opens the door to superhuman feedback? 🧵(1/5)

5

223

1K

Jason Weston

@jaseweston

9 months

🚨New Paper 🚨 Self-Alignment with Instruction Backtranslation - New method auto-labels web text with instructions & curates high quality ones for FTing - Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst (1/4)🧵

13

144

676

Jason Weston

@jaseweston

3 months

Our team in FAIR labs (at Meta) is hiring researchers (RE, RS & PostDoc)! DM if interested. We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Recent work: Self-Rewarding LMs: Pairwise Cringe Loss: …

0

80

570

Jason Weston

@jaseweston

7 months

🚨New Paper🚨 Chain-of-Verification Reduces Hallucination in LLMs - Reduces longform hallucinations via LLM double-checking its own work with shortform questions - Important not to reattend to the original hallucinations or they get copied (1/4)🧵

1

117

560

Jason Weston

@jaseweston

6 months

🚨 New paper! 🚨 We introduce Branch-Solve-Merge (BSM) reasoning in LLMs for: - Improving LLM-as-Evaluator: makes Llama 70B chat+BSM close to GPT4. GPT4+BSM is better than GPT4. - Constrained Story Generation: improves coherence & constraints satisfied. …

2

121

526

Jason Weston

@jaseweston

1 year

There's always something cringe on Twitter, here's a useful one! 🚨 new paper 🚨 The CRINGE Loss: Learning what language not to model Train your LM to not generate bad sequences. Shows improvements on three tasks (safety, contradictions, open dialogue).

5

79

501

Jason Weston

@jaseweston

3 years

I'm looking for a PhD intern to join me summer 2021 at FAIR NY to work on ML, Conversational AI and/or LIGHT (). Apply at and send me a heads up.

20

85

414

Jason Weston

@jaseweston

2 months

🚨New paper!🚨 ToolVerifier. - Method to generalize to new tools - Self-asks contrastive questions to select between best tools and parameter choices - Fine-tuned on self-built synthetic data - 22% performance improvement over few-shot baseline 🧵(1/4)

1

71

422

Jason Weston

@jaseweston

2 years

🚨 New work 🚨 SeeKeR: An open source search-augmented language model - uses a search engine to stay up-to-date - hallucinates less & is more topical than GPT2 or GPT3, with less parameters - applied to dialogue, superior to BlenderBot 2 Read more here:

3

69

408

Jason Weston

@jaseweston

2 months

🚨 Introducing Branch-Train-miX (BTX) 🚨 BTX improves a generalist LLM on multiple fronts: - Train expert LLMs in parallel for new skills in domains such as math, code & world knowledge - Join (mix) them together & finetune as a Mixture-of-Experts 🧵(1/4)

1

67

382

Jason Weston

@jaseweston

3 years

How can you improve a model -- add more parameters or add more compute? Both work! But the model design matters. Two new methods: - "Hash Layers" for more parameters - "Staircase Attention" for more power per parameter Read here:

1

59

378

Jason Weston

@jaseweston

2 years

The unique goal of BB3's open research model is to improve future AI safety+skills for all models as participating data/feedback will be shared with the community to make AI more safe. Currently 70k convos & counting (!!), let's do this together!

7

49

350

Jason Weston

@jaseweston

4 months

Hope your year hasn’t been too Cringe! Next year you might want to make it more so… 🚨New method for alignment!🚨 - Pairwise Cringe Loss for Preference Optimization - Generalizes Cringe Loss with soft margin - Outperforms DPO & PPO on AlpacaFarm 🧵 1/4

1

72

342

Jason Weston

@jaseweston

11 months

🚨 New work: BlenderBot 3x 🚨 - Public data release & analysis of 6M chat interactions. - Learns by conversing with people in the real world: training on this data improves BB3 from 85.3% → 94.4% good messages. paper: project:

2

73

287

Jason Weston

@jaseweston

3 years

🤖 BlenderBot 2.0 🤖: Our new work, an open-source chatbot with a long-term memory and ability to search the internet during conversation.

AI at Meta

@AIatMeta

3 years

We’ve built and open-sourced BlenderBot 2.0, the first #chatbot that can store and access long-term memory, search the internet for timely information, and converse intelligently on nearly any topic. It’s a significant advancement in conversational AI.

71

327

2K

10

41

272

Jason Weston

@jaseweston

3 years

(1/2) 🚨 Our new work! 🚨 "Retrieval Augmentation Reduces Hallucination in Conversation" @shtruk @spencerpoff @moyapchen @douwekiela @jaseweston We infuse dialogue models with knowledge, significantly reducing hallucinated facts during conversation.

3

50

269

Jason Weston

@jaseweston

2 years

We're releasing BlenderBot 3: a 175B param chatbot to improve model safety. Users can give feedback as they interact and flag inappropriate text. We'll share participating data + model weights with the community in order to improve future models.

AI at Meta

@AIatMeta

2 years

(1/4) Meet BlenderBot 3, the first publicly available 175B-parameter chatbot with model weights, code & datasets. It can chat about nearly any topic & is designed to learn & improve by conversing with people in the real world. Try the interactive demo:

32

175

678

6

40

251

Jason Weston

@jaseweston

4 years

(1/3) New paper! Instead of a *static* train/valid/test setup, ML systems should become more useful as they interact with people & the world. As a step in that direction, we deploy dialogue as a game and show that models improve by talking to humans!

Deploying Lifelong Open-Domain Dialogue Learning

Much of NLP research has focused on crowdsourced static datasets and the supervised learning paradigm of training once and then evaluating test performance. As argued in de Vries et al. (2020),...

arxiv.org

3

44

247

Jason Weston

@jaseweston

4 years

We can make dialogue agents safer by asking humans to attack our models during conversations and learn from the experience! BlenderBot (BST 2.7B) with *adversarial safety training on top* is as engaging as standard BST 2.7B but far more safe. Paper:

3

32

174

Jason Weston

@jaseweston

4 years

Dream: a setting to study (RL) agents that can _speak_ and act, grounded in a rich, diverse world, interacting with other agents. Open-domain and goal-oriented. Reality: you can do this in LIGHT! New paper:

1

20

148

Jason Weston

@jaseweston

6 years

Announcing the NIPS ConvAI2 competition! Train Dialogue Agents to chat about personal interests and get to know their dialogue partner -- using the PersonaChat dataset as a training source. Competition starts now! Ends September 1st.

3

53

149

Jason Weston

@jaseweston

1 year

Our new work, where LMs can generate internal thoughts as they read text (interleaved with the input tokens). Learning to Reason and Memorize with Self-Notes Jack lanchantin @ShubhamToshniw6 @jaseweston Arthur Szlam @tesatory

John Nay

@johnjnay

1 year

Self-Notes: LLMs Learning to Reason & Use Memory -Allow LLM to deviate from input context at any time to explicitly think -LLM can recall info & perform reasoning on the fly, extending memory -Generalizes to longer & more complicated setups than training

2

73

398

2

37

147

Jason Weston

@jaseweston

10 months

System-Level Natural Language Feedback New framework: a human-in-the-loop process is used to derive criteria from NL feedback, which are then used to design LM prompts to refine model responses, and to define metrics to measure these improvements.

Aran Komatsuzaki

@arankomatsuzaki

10 months

System-Level Natural Language Feedback Proposes a general framework for unlocking the system-level use of NL feedback

1

21

133

0

36

132

Jason Weston

@jaseweston

9 months

🚨 New paper 🚨 Leveraging Implicit Feedback from Deployment Data in Dialogue Optimizing for implicit feedback signals using BlenderBot conversations gives improved social agents, e.g. using length or sentiment of future human responses. Findings: - Several methods give gains;…

3

19

131

Jason Weston

@jaseweston

4 years

Generating text adventures games with ML -- another step in our LIGHT project

ParlAI

@parlai_parley

4 years

Generating Interactive Worlds with Text @ AAAI'20 Built in ParlAI Angela Fan @JackUrbs Pratik Ringshia @em_dinan Emma Qian @siddkaramcheti @shrimai_ @douwekiela @_rockt Arthur Szlam @jaseweston

0

41

146

0

30

123

Jason Weston

@jaseweston

2 years

🚨New paper🚨 "Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback" We compare feedback types + learning methods & release models + dataset of convos & human feedback. For findings, see thread: (1/4)

2

12

111

Jason Weston

@jaseweston

1 year

NLP Postdoc opportunity at FAIR! Mike Lewis ( @ml_perception ) and I are seeking to co-mentor. Apply here:

Error

Meta's mission is to give people the power to build community and bring the world closer together. Together, we can help people build stronger communities - join us.

www.metacareers.com

1

19

104

Jason Weston

@jaseweston

5 months

Conclusion: S2A uses the full reasoning power of LLMs via generation to make complex attention decisions when soft attention fails. We show this works with 0-shot prompting, but other approaches are possible. Lots of avenues to explore! Thanks for your attention! 🙇 🧵(5/5)

0

2

76

Jason Weston

@jaseweston

5 months

Soft attention is automatic = System 1. System 2: allocate effortful mental activity, pay deliberate attention e.g. when System 1 makes errors. S2A Recipe: 1) Given input, regenerate context so irrelevant parts are removed. 2) Apply LLM as usual with rewritten context. 🧵(3/5)

1

7

71

Jason Weston

@jaseweston

4 years

Been self-hiding in our caves, working hard on this.

ParlAI

@parlai_parley

4 years

We have studied recipes for large-scale open domain chatbots, and are releasing 90M, 2.7B and 9.4B parameter models with SOTA results. Paper: Project: Blog post:

16

155

570

3

6

70

Jason Weston

@jaseweston

2 years

Example prompts where SeeKeR LM (which uses a search engine in the loop) provides topical completions with less hallucination than GPT3, despite being >50x smaller. Further info + paper:

0

16

70

Jason Weston

@jaseweston

2 years

🚨New paper🚨 SOTA dialogue models are not winning Oscars anytime soon, as they cannot effectively stay in character. We analyze and propose methods to measure & mitigate -- but it's still an open problem. @shtruk @JackUrbs Arthur Szlam @jaseweston

1

8

67

Jason Weston

@jaseweston

5 years

Unlikelihood training beats nucleus sampling and beam blocking for LM generation (new human eval results added on arXiv paper!)

Ilia Kulikov

@uralik1

5 years

💡Update on "Neural Text Generation with Unlikelihood Training" !💡 new: - beam+ngram blocking & nucleus sampling in the human evaluation - analysis of token generation frequency distributions (with examples!) arxiv: w/ @wellecks

0

23

115

0

15

67

Jason Weston

@jaseweston

4 years

Some conversations from Multi-Modal BlenderBot, new work on arXiv and in ParlAI: SOTA on both text-only and image-based open-domain dialogue. (Still lots more to improve ofc!)

2

10

66

Jason Weston

@jaseweston

5 years

New EMNLP paper: Making dialogue safer by asking humans to attack our models and learning from the experience! Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack @em_dinan , Sam Humeau, B. Chintagunta, @jaseweston

Build it Break it Fix it for Dialogue Safety: Robustness from...

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums...

arxiv.org

0

16

64

Jason Weston

@jaseweston

3 months

While our initial results seem good, lots more to explore: - "Scaling laws" of iterations & different LMs - Further evaluations & benchmarks - Study ever-improving safety reward models? Thanks for reading, and.. reward yourself for getting this far into the thread! 🏅 🧵(5/5)

1

61

Jason Weston

@jaseweston

5 months

LLMs are good, but still make simple mistakes. E.g. given irrelevant context (see figure) or opinion in the input (sycophancy). Hypothesis: Underlying problem is soft attention: assigns probability to too much context, including irrelevant/biased portions. 🧵(2/5)

1

3

61

Jason Weston

@jaseweston

3 years

Detecting and fixing contradictions in dialogue -- our new work just dropped on arXiv and github🐬🐬🐬

Yixin Nie

@EasonNie

3 years

Happy to share our new paper on addressing contradictions in dialogue modeling. We introduce DialoguE COntradiction DEtection (DECODE) task and a new dataset with contradictory dialogues to study how well NLU models can capture consistency in dialogues.

4

21

116

1

9

60

Jason Weston

@jaseweston

7 months

🚨New paper🚨 MemWalker: builds and navigates a structured long-term memory via LLM prompting. Outperforms long context, retrieval & recurrent baselines. Great work during @__howardchen 's internship.

Howard Chen

@__howardchen

7 months

Long context models are popular, but is it the final solution to long text reading? We introduce a fundamentally different method, MemWalker: 1. Build a data structure (memory tree) 2. Traverse it via LLM prompting Outperforms long context, retrieval, & recurrent baselines. (1/n)

22

132

862

0

7

59

Jason Weston

@jaseweston

4 years

Want a chatbot more engaging than Meena and DialoGPT, as engaging as Blenderbot, but it can SEE and TALK? So, it's not just text-only -- it can ground on images and chat about them as well. Introducing Multi-Modal BlenderBot:

2

11

58

Jason Weston

@jaseweston

11 months

(1/2) BB3 data analysis: lots of troll users, but they're v. useful for training robust models. BB3 is superhuman, BB3x is more(!) superhuman. Still lots of scope.

2

17

55

Jason Weston

@jaseweston

5 months

We can implement step (1) of S2A with LLM prompting: simple & works! S2A increases factuality on modified TriviaQA & GSM8K from SycophancyEval & GSM-IC. Close to oracle which doesn't have biased contexts. S2A also increases objectivity & reduces sycophancy, see paper. 🧵(4/5)

1

51

Jason Weston

@jaseweston

3 months

Recipe 👩‍🍳: LM finetuned on small seed data. Iterate 2 steps: (1) Self-instruction creation: generate prompts, responses & self-rewards with LM (2) Instruction-training: DPO on selected preference pairs Iterations improve instruction following & reward modeling ability! 🧵(2/5)

1

4

50

Jason Weston

@jaseweston

2 years

A lot of research has concentrated on automatic evaluation being hard, but our new paper finds human evaluation requires further research! “Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents”

Eric Michael Smith

@ericsmithnyc

2 years

New paper with Orion Hsu, Rebecca Qian, @stephenroller , @yboureau , and @jaseweston : Human dialogue evaluation is still an open problem (just like auto evaluation)! Different methods are preferable in different conditions, with no overall winner.

2

18

56

0

7

47

Jason Weston

@jaseweston

3 years

Giving a SIGDIAL 2021 keynote right now! "A journey from ML & NNs to NLP and Beyond: Just more of the same isn’t enough?"

0

8

47

Jason Weston

@jaseweston

4 months

@ylecun @giffmana @oFFMetaSweat NEC was getting more applied -- but we still did some good research there after you left! I believe you interviewed me, and then you were gone by the time I joined :) -- thanks for (presumably) giving me a positive interview feedback though!

1

0

45

Jason Weston

@jaseweston

5 years

🤺⚔️🛡️👹🧙‍♂️🔮👻🧙‍♀️🧝‍♀️🧝‍♂️🔥🔥🔥

Jack Urbanek

@JackUrbs

5 years

Introducing LIGHT - a text adventure game platform for dialogue agents that can speak and act, along with a dataset of ~11K human conversations between people acting as game agents. @facebookai (1/2)

3

73

274

0

10

48

Jason Weston

@jaseweston

9 months

Recipe👩‍🍳: LLM finetuned on small seed data; access to web docs (1) Self-augment: label each web doc with an instruction via the LLM (2) Self-curate: label each new example with a quality score via the LLM Then FT with the newly curated data. Optionally Iterate. (2/4) 🧵

2

5

43

Jason Weston

@jaseweston

4 years

Excited to get multi-tasking on many large-scale open-domain dialogue (+image) tasks going.. this is why we built ParlAI @parlai_parley

ParlAI

@parlai_parley

4 years

DodecaDialogue: a 12 (existing) task dodecathlon challenge for building agents that can see and talk! We build a strong baseline system with SOTA on many tasks. Kurt Shuster Da Ju @stephenroller @em_dinan , Y-Lan Boureau @jaseweston

0

13

38

0

8

44

Jason Weston

@jaseweston

3 months

We find reward modeling ability, measured via alignment with humans, improves across iterations of training. Exciting, as this opens the door to the possibility of models that continually improve in both axes: instruction following & reward modeling -> virtuous circle?! 🧵(4/5)

1

3

40

Jason Weston

@jaseweston

4 years

New paper: A study of safety recipes for conversational AI. 😷🔬

Emily Dinan

@em_dinan

4 years

Excited to share this new work on safer conversational AI systems, just in time for the Safety for ConvAI Workshop today! Fun working with @jingxu_ml @dexterJu27 @margs_li Y-Lan and @jaseweston !

0

13

72

0

2

37

Jason Weston

@jaseweston

3 years

(2/2) Our best methods stop our chatbots confidently proclaiming that their favorite Elvis Presley song was his 1992 hit "Love Me Do", or that Thierry Henry was born in 1931 and played soccer for England.

1

3

37

Jason Weston

@jaseweston

5 years

Come see our poster on Thurs! Made with ParlAI, data, models & code available.

ParlAI

@parlai_parley

5 years

At CVPR this week! Engaging Image Captioning via Personality Kurt Shuster; Samuel Humeau; Hexiang Hu; Antoine Bordes; Jason Weston Poster 193 @ 15:20 on Thursday 20th June. @AntoineBordes @jaseweston

0

22

80

1

9

36

Jason Weston

@jaseweston

4 years

Generation that is trained to be consistent -- and not contradict itself !!

ParlAI

@parlai_parley

4 years

Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training We control copies, repeats, vocab & avoid contradictions using NLI for generative dialogue. @Margaretmli @stephenroller @uralik1 @wellecks Y-Lan Boureau @kchonyc @jaseweston

0

15

54

0

4

32

Jason Weston

@jaseweston

3 months

We perform evaluations using GPT-4 on general instruction following prompts. We find a steady improvement from training iteration 1 -> 2 -> 3 when comparing to each other in head-2-head evaluations, or to a fixed supervised fine-tuning (SFT) baseline. 🧵(3/5)

1

3

29

Jason Weston

@jaseweston

9 months

Humpback outperforms other Llama based models that don’t distill from more powerful models. Exciting because it could be scaled up further, use a stronger base model, & much more! Read more: Thanks for diving in, and hope it makes a splash! 💦 (4/4) 🧵

3

0

29

Jason Weston

@jaseweston

4 years

So happy to work on this topic!

ParlAI

@parlai_parley

4 years

Mitigating gender bias in dialogue generation: analysis & methods, using the LIGHT dataset as testbed. Our methods are as engaging + less gendered, gender balanced & less offensive. @em_dinan Angela Fan @adinamwilliams @JackUrbs @douwekiela @jaseweston

0

11

33

1

2

28

Jason Weston

@jaseweston

2 years

New work done with @Leox1v95 for injecting knowledge into dialogue systems.

Leonard Adolphs

@Leox1v95

2 years

New paper on arXiv: "Reason ﬁrst, then respond: Modular Generation for Knowledge-infused Dialogue" 🤔→💬 We propose a modular two-step model, Knowledge to Response (K2R), for incorporating knowledge into conversational agents. (1/6)

3

11

68

1

27

Jason Weston

@jaseweston

2 years

@ClementDelangue We use the term "lemon-picked" in our BlenderBot papers etc (and we show those examples) @stephenroller

1

0

25

Jason Weston

@jaseweston

3 months

@ericmitchellai The things people will say when they're on the job market..

1

0

25

Jason Weston

@jaseweston

9 months

The resulting data is remarkably high quality/impactful for training, even though it’s through self-alignment, outperforming other instruction tuning datasets for the same data size (🐋 > 🐪) (3/4) 🧵

3

1

24

Jason Weston

@jaseweston

4 years

(3/3) with @kurt_shuster * @JackUrbs * @em_dinan Arthur Szlam @jaseweston (* joint first) Built in ParlAI @parlai_parley . See:

0

4

22

Jason Weston

@jaseweston

2 years

New work, new model ! Director: supervised/guided language modeling using labeled training examples. Great working with @karora4u on this (and @shtruk & @tesatory !)

Kushal Arora

@karora4u

2 years

DIRECTOR: Generator-Classifiers For Supervised Language Modeling w/ @jaseweston , @tesatory , and @shtruk DIRECTOR is a supervised LM that can leverage "negative" examples to avoid undesirable behaviors such as toxicity, contradiction, and repetition. 1/8

2

14

48

0

2

23

Jason Weston

@jaseweston

4 years

(2/3) ...and the more models improve, the more humans want to talk to them! Virtuous circle! Intrinsically motivated players provide a better distbn & collection is more efficient than crowdsourcing. We collect ~461k utterances over ~13k players, and release the data.

1

2

22

Jason Weston

@jaseweston

3 years

@gneubig Lol, I didn't want to put this footnote, but my coauthors were worried it wasn't obvious ...?! Note it contradicts the great Herman Melville: "how shall we define the whale, by his obvious externals .. a whale is A SPOUTING FISH WITH A HORIZONTAL TAIL."

2

0

22

Jason Weston

@jaseweston

3 months

@natolambert @tesatory @jingxu_ml @kchonyc @yzpang_ @WeizheY Hi! Well, as you know releasing models got way harder for corps in the current landscape, and we're a small team in FAIR Labs [+ NYU] with limited resources (e.g., not part of Llama team). For code, we've also had some approvals + other issues .. but hopeful to get there soon

1

0

22

Jason Weston

@jaseweston

2 years

arXiv paper: Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion @shtruk @MKomeili @Leox1v95 @stephenroller Arthur Szlam @jaseweston

Language Models that Seek for Knowledge: Modular Search &...

Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the...

arxiv.org

0

3

22

Jason Weston

@jaseweston

4 months

Thanks for the shoutout @natolambert . This is the paper referenced: Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss Outperforms PPO & DPO on AlpacaFarm.

Some things are more CRINGE than others: Iterative Preference...

Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly,...

arxiv.org

Nathan Lambert

@natolambert

4 months

RLHF lit review #2 on @interconnectsai desperately needed at this point. This self-play method mirroring GANs, cringe loss (DPO style + RM) from @jaseweston , Nash RLFH from @GoogleDeepMind , and ton's of notable mentions. Preference fine-tuning going big in 2024

5

14

101

0

4

20

Jason Weston

@jaseweston

7 months

@kchonyc @douwekiela @tesatory @rob_fergus Yes, in 2015 with stacked layers of attention.. but with Ronan there was relative position embeddings in 2008..

0

3

21

Jason Weston

@jaseweston

1 year

@kchonyc hardly a kernel of truth

1

0

19

Jason Weston

@jaseweston

3 years

with @moyapchen @douwekiela Mojtaba Komeili @spencerpoff @stephenroller @shtruk Arthur Szlam @jaseweston @jingxu_ml

1

0

19

Jason Weston

@jaseweston

1 year

Work by @shi_weiyan @em_dinan @shtruk @jaseweston @jingxu_ml

Aran Komatsuzaki

@arankomatsuzaki

1 year

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Label Proposes JUICER, a framework to make use of both binary and free-form textual human feedback.

3

14

104

1

0

18

Jason Weston

@jaseweston

7 months

Factored CoVe: make sure in step (3) the LLM doesn't attend to results of (1) so hallucinations aren't copied Factor+Revise: adds cross-checks between steps Overall, CoVe gives large gains in multiple tasks. Read the paper for more (hopefully non-hallucinated) facts! (4/4)🧵

0

1

18

Jason Weston

@jaseweston

2 years

Together, the community can build ever-improving open AI systems that can interact with people in safer and more helpful ways. Project + papers: Demo to chat & give feedback:

2

3

17

Jason Weston

@jaseweston

1 year

Work by @Leox1v95 , @gaotianyu1350 , @jingxu_ml , @shtruk , @tesatory & me ( @jaseweston )! Cringe loss is inspired by Unlikelihood loss, Director (), and particularly @Shaojie_Jiang et al’s recent work on “simple contrastive learning”: .

0

17

Jason Weston

@jaseweston

2 months

ToolSelect Dataset: - Data creation: self-generate 550 samples of synthetic tools, instructions & gold tools - Finetune Llama2-70B on data - Can pick tools using only names + descriptions of candidates - Generalizes to large tool sets & new tools - Will publicly release 🧵(3/4)

1

0

16

Jason Weston

@jaseweston

2 months

- We test on 4 tasks from ToolBench - ToolVerifier outperforms few-shot baselines by 22% - Self-verification alone improves avg perf by 8% - Significantly better than Tool-Augmented LLMS - Outperforms GPT3.5-T & even GPT4 on some tasks despite being based on Llama 70B 🧵(2/4)

1

17

Jason Weston

@jaseweston

4 months

Pairwise Cringe Optimization (PCO): Just as in Cringe Loss we use iterative learning, by first training and then labeling the model generations, and then training again. This improves performance. 🧵 3/4

1

0

17

Jason Weston

@jaseweston

2 years

Examples of BlenderBot 3 feedback and look inside mechanisms -- for understanding the model and feedback for helping advance helpful & responsible AI (we'll be sharing models and participating data with the community).

0

1

17

Jason Weston

@jaseweston

5 years

@kchonyc ...that i cut you out of the shot???

1

0

16

Jason Weston

@jaseweston

3 years

Hash Layers For Large Sparse Models Modifies FFN to hash to different sets of weights. Outperforms or is competitive with methods such as Switch and BASE Transformers, while requiring no routing parameters or extra terms in the objective function.

1

0

15

Jason Weston

@jaseweston

4 months

Pairwise Cringe Loss: Our loss uses the existing Cringe Loss (push down negative tokens contrasting to top k samples from the model) for negatives, and MLE for positives. It only applies this loss if the soft margin of the pair is violated via a sigmoid gating function. 🧵 2/4

1

0

15

Jason Weston

@jaseweston

2 months

ToolSelect Dataset is live! 🔧 Here:

GitHub - facebookresearch/ToolVerifier: This repository contains the ToolSelect dataset which was...

This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection. - facebookresearch/ToolVerifier

github.com

0

1

16

Jason Weston

@jaseweston

6 months

@swarnaNLP @omerlevy_ @real_asli @mohitban47 @xl_nlp 👩‍🍳 Given a task, BSM is an LLM program with 3 steps: Branch: plan (output prompts) for separate subtasks Solve: generate k solutions for given k prompts (parallel) Merge: combine for final answer BSM helps for complex tasks requiring multiple aspects or constraints. 🧵(2/5)

1

0

15

Jason Weston

@jaseweston

4 months

Other findings: - Outperforms binary Cringe - Soft outperforms hard margin - Works on human or simulated preferences - Our iterative approach improves DPO too, but not as much More results in the paper that'll hopefully make you Cringe in the future (algorithmically)! 🧵 4/4

0

16

Jason Weston

@jaseweston

7 months

The Chain-of-Verification (CoVe) recipe: 1. Generate baseline response 2. Generate a verification plan (set of questions) 3. Execute plan (answer questions) 4. Generate Final Verified Response (using answers) (3/4)🧵

1

3

15

Jason Weston

@jaseweston

2 years

@kchonyc what was the prompt used to generate this? could be a bit more exciting tbh.. try appending "8k ultra realistic, beautiful light, cinematic lighting, trending on artstation, hyperrealistic, focused, extreme details, cinematic, masterpiece" to it?

1

0

15

Jason Weston

@jaseweston

4 years

Extending LIGHT agents to have short, mid and long-term goals & motivations. Super fun working with @rajammanabrolu @JackUrbs @margs_li aszlam @_rockt on this!

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

4 years

🚨New Paper Alert🚨 Having trouble keeping your (AI) dragon motivated? Same here. So we figured out how to teach it, interactively w/ RL & lang pretraining, to act consistently + talk naturally wrt its motivations when questing in a fantasy text game. 1/4

3

29

146

0

4

15

Jason Weston

@jaseweston

5 months

@denny_zhou Hi Denny! This is the "instructed prompting" method from your paper right? Instructed prompting is both cited in our paper and compared in the experiments, see Figures 7 & 8. We found instructed prompting can help, but not as much as S2A.

1

0

14

Jason Weston

@jaseweston

3 years

Based on two papers: for integrating long-term memory (), and for internet search engine-augmented generation (). Overall BlenderBot 2.0 project page is here: .

0

2

14

Jason Weston

@jaseweston

1 month

Shoutout to concurrent work: They train on permuted semantic segments using an LLM to segment, and finetune. Our work explores pretraining & reverse training is faster (e.g. random reversal without an LLM). Both works help paint overall picture! 🧵(6/6)

Mitigating Reversal Curse in Large Language Models via...

While large language models (LLMs) have achieved impressive performance across diverse tasks, recent studies showcase that causal LLMs suffer from the "reversal curse". It is a typical example...

arxiv.org

0

2

14

Jason Weston

@jaseweston

11 months

(2/2) BlenderBot 3x work by @jingxu_ml @dexterJu27 Joshua Lane @MKomeili @ericsmithnyc Megan Ung Morteza Behrooz William Ngan Rashel Moritz @tesatory Y-Lan Boureau @jaseweston @shtruk We also conducted an HCI study of the deployment, details here:

4

2

14

Jason Weston

@jaseweston

7 months

- LLMs like ChatGPT & Llama are prone to hallucinate in longform generation - Our method generates short questions that check facts in the full generation. These are answered correctly more often & are used to generate an improved response. (2/4)🧵

1

0

13

Jason Weston

@jaseweston

1 year

Cringe works by contrasting negative tokens with top-k tokens sampled from the model itself. Iterative application to model generations improves results. The loss is simple, easy to train and implement, with no changes to the Transformer model. Code:

1

0

13

Jason Weston

@jaseweston

3 years

Staircase Attention for Recurrent Processing of Sequences A new family of models that stacks attention across time to give powerful recurrence for tracking giving improvements on LM tasks for the same number of parameters.

1

13

Jason Weston

@jaseweston

2 months

Self-Verification: - ToolVerifier generates top-2 most likely candidates: common mistakes are between these two - LM self-asks a contrastive question to help decide between the two - Similarly for parameters, we obtain 2 sets of parameters & verify to pick one - Profit! 🧵(4/4)

1

0

13

Jason Weston

@jaseweston

3 months

Also… huge shoutout to first author & amazing NYU PhD student @WeizheY who did basically all the work!

0

1

13

Jason Weston

@jaseweston

3 years

We can also combine these two ideas to good effect (see pic). Overall, these results open up a new way of looking at deep learning methods, where we disentangle the concepts of parameters and computation. Thinking in this way, we believe we can arrive at more powerful models!

1

0

12