Sewon Min @sewon__min Twitter profile | Pikagi

Pikagi

Sewon Min

@sewon__min

7,573

Followers

650

Following

11

Media

853

Statuses

PhD student at @uwcse @uwnlp

Seattle, WA

https://t.co/8Kgf4Yojvv

Joined November 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@sewon__min

Sewon Min

10 months

Excited to present SILO, a new nonparametric LM that * excludes copyrighted data from parameters❌ * instead stores it in a datastore and retrieves it at inference time✨ * achieves performance that is close to the model trained on all data🚀 📄

Tweet card media

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on...

@ssgrn

Suchin Gururangan

10 months

Feel risky to train your language model on copyrighted data? Check out our new LM called SILO✨, with co-lead @sewon__min Recipe: collect public domain & permissively licensed text data, fit parameters on it, and use the rest of the data in an inference-time-only datastore.

2

55

241

0

68

296

Last Seen Profiles

@MikaFinazzo2028

@Paneba

@kairarj

@Aaron_MatthewIL

@HankWillia38945

@m_akinsarica

@giovanna46926

@Ritholtz

@clinicmanager76

@Lighter_xyz

@DavidGalligan19

@T3X4N_

@EarleenK15113

@GLBouchez

@CianoAnalogico

@evligiz60125833

@TallyDigitCoin

@zuzuzu365

@sydneyperez2025

@nicoleponyxo

@Alison120693

@ri333i

@Omeezyy12

@3RACHAonmyphone

@gnarlykarleeee

@EvaCGonzalez1

@CandyGuaraco

@__ghbals2

@scarlettsdenver

@co3248817619260

@BTSJungkook_ie1

@MazitaChaves

@degens_ndc

@yop936004342518

@FrostyRealLive

@FranArt2005

@sewon__min

Sewon Min

1 year

Most if not all language models use a softmax that gives a categorical probability distribution over a finite vocab. We introduce NPM: the first nonparametric masked LM that replaces this softmax with a nonparametric distribution over a text corpus. (1/4)

12

81

441

@sewon__min

Sewon Min

2 years

LMs can learn via inference alone through demonstrations -- but how does it work? We find that LMs do not really need correct input-output pairs. Randomly replacing labels in the demonstrations barely hurts performance, consistently over 12 models.

Tweet media one

10

82

427

@sewon__min

Sewon Min

4 years

I wrote a PyTorch & BART-based code for closed-book QA, following @ada_rob and @colinraffel ’s TF & T5-based model (). Code based on @huggingface 's Transformers.

Tweet card media

GitHub - shmsw25/bart-closed-book-qa: A BART version of an open-domain QA model in a closed-book...

A BART version of an open-domain QA model in a closed-book setup - shmsw25/bart-closed-book-qa

2

28

194

@sewon__min

Sewon Min

3 years

Introducing ✨MetaICL✨, where an LM is learned how to in-context learn, and then is tested frozen on an unseen target task. #NLProc Paper: Code: Demo: with @ml_perception @LukeZettlemoyer @HannaHajishirzi

Tweet media one

4

39

178

@sewon__min

Sewon Min

3 years

New paper!✨We introduce a noisy channel approach for LM prompting in few-shot text classification. Channel models are more stable (much lower variance), and better with limited data / imbalanced labels. w/ @ml_perception @HannaHajishirzi @LukeZettlemoyer

Tweet card media

Noisy Channel Language Model Prompting for Few-Shot Text Classification

We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models),...

2

43

167

@sewon__min

Sewon Min

2 years

This *unintentionally* spreads the idea of which person gets the x-th place, who are the top-x, etc. Please don't rank researchers and judge them based on # of papers. I know the original tweet never meant this, but seeing this will implicitly affect young researchers like us.

@MarekRei

Marek Rei

2 years

Analysis of ML and NLP publication statistics from 2021. #machinelearning #NLProc

2

51

172

3

20

144

@sewon__min

Sewon Min

4 months

Excited to be hosting the workshop on Mathematical & Empirical Understanding of Foundation Models at #ICLR2024 in Vienna! Website: Paper deadline: Feb 3 We welcome unpublished/ongoing work, or work published to non-ML venues!✨

Tweet card media

Update April 21, 2024: Schedule is available here! Foundation models (FMs) have revolutionized machine learning research across domains. These models are trained on extensive, highly varied datasets...

sites.google.com

@SadhikaMalladi

Sadhika Malladi

@SadhikaMalladi

4 months

Announcing the 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) at ICLR 2024! Improving our understanding helps us advance capabilities and build safer, more aligned models. Paper deadline is Feb 3! Website:

Tweet media one

0

15

107

3

8

129

@sewon__min

Sewon Min

3 years

Happy new year! #NeurIPS2020 EfficientQA organizers, together with participants, wrote a paper that includes systems, analyses, and lessons learned from the competition. Thanks to everyone who took part in it!

2

35

123

@sewon__min

Sewon Min

1 year

Check out our new work that tries to make the evaluation of LM's factuality📘 easier & simpler🚗 w/o compromising thoroughness🔎

@kalpeshk2011

Kalpesh Krishna

1 year

Factuality in long-form generation is hard to evaluate because (1) we don't know how to assign an accuracy value when a generation has mixed pieces of true/false info, and (2) human evaluation is extremely costly. But from now on, you can use FActScore!

Tweet media one

4

53

281

0

20

121

@sewon__min

Sewon Min

4 years

Does "When did harry potter and sorcerer's stone movie come out?" look ambiguous? Ambiguity is inherent to open-domain QA. We introduce a new QA task for predicting question-answer pairs that represent different interpretations of the original question.

Tweet media one

5

17

117

@sewon__min

Sewon Min

2 months

I agree! Evaluating factuality of long-form text in general is very difficult as some sentences are hard to decompose into independent claims and many claims are not easily verifiable. "Biography" is a *very special case* where these things are relatively easy.

@gregd_nlp

Greg Durrett

2 months

This is a cool method, but "superhuman" is an overclaim based on the data shown. There are better datasets than FActScore for evaluating this: ExpertQA by @cmalaviya11 +al Factcheck-GPT by Yuxia Wang +al (+ same methodology) 🧵

3

26

183

4

7

117

@sewon__min

Sewon Min

5 years

New #emnlp2019 paper w/ @danqi_chen @HannaHajishirzi @LukeZettlemoyer We formulate many recent QA tasks as weakly-supervised learning problems & show that Hard-EM-style learning outperforms MML/previous methods on 6 tasks (RC, Open-domain QA, discrete reasoning & SQL generation)

Tweet media one

1

22

108

@sewon__min

Sewon Min

2 years

Hi #acl2022nlp folks! The 7th #Repl4NLP workshop ( @sigrep_acl ) is happening tomorrow (May 26) 8:50 Irish Time with an amazing line of speakers: @_beenkim @strubell @monojitchou @percyliang @PSH_Lewis

Tweet media one

1

19

102

@sewon__min

Sewon Min

4 years

Interested in unstructured/structured KBs? Curious how to combine benefits from different paradigms of KBs? We invite you to #uskb2020 workshop at #akbc2020 on June 25 (Thu). . w/ @danqi_chen , @raj_umass_nlp , Angela Fan, @sivareddyg & @pat_verga (1/2)

1

31

83

@sewon__min

Sewon Min

6 months

#EMNLP2023 Talk on "The Role of Demonstrations" at the Big Picture Workshop at 4PM today (Dec 7th), co- with Junyeob! We will debate about "activation of priors from training vs. learning new capabilities" in ICL, and the broader picture related to this. Come and say hi!

@yanaiela

Yanai Elazar

6 months

The Big Picture workshop @ EMNLP23 is just one week away, and we ( @AllysonEttinger , @KassnerNora , @seb_ruder , @nlpnoah ) have an incredible program awaiting you!

3

8

78

0

10

81

@sewon__min

Sewon Min

2 years

Join us for ⭐ #ACL2022 Tutorial on Zero- and Few-Shot NLP with Pretrained Language Models⭐! It's Sunday 2:30pm Irish Time (The Liffey A if you're in-person)

@armancohan

Arman Cohan

2 years

Finding it hard to keep up with the zero- and few-shot learning (FSL) literature in #NLProc ? Join us at our #ACL2022 @aclmeeting tutorial on FSL. w/ @i_beltagy @sewon__min @sameer_ @rloganiv we will review and discuss the latest developments on FSL and (L)LMs. 1/2

2

29

157

0

10

77

@sewon__min

Sewon Min

1 year

It was fun chatting about my research with @spaniel_bashir on the @gradientpub podcast. Thank you for inviting me! 🙌

@gradientpub

The Gradient

1 year

🎙️ How does in-context learning work in large language models? Where do language models fall short and how can we improve them? @spaniel_bashir speaks with @sewon__min in our latest podcast episode:

0

4

26

1

13

68

@sewon__min

Sewon Min

5 years

Excited to share our two #acl2019nlp papers! "Multi-hop Reading Comprehension through Question Decomposition and Rescoring" w/ @hllo_wrld @LukeZettlemoyer & @HannaHajishirzi Paper Check out our demo which answers to multi-hop Qs! 1/

1

13

65

@sewon__min

Sewon Min

3 years

#EMNLP2021 Presenting JPR: Joint Passage Ranking for Multi-Answer Retrieval. Poster session tomorrow! Virtual: 8:30-10:30 PST / 11:30-1:30 EST In-person: 2:45-4:15 AST with @kentonctlee @mchang21 @toutanova @HannaHajishirzi (while interning at @GoogleAI )

Tweet media one

2

14

65

@sewon__min

Sewon Min

4 years

Thanks everyone for attending #uskb workshop @akbc_conf . It went very well! Esp. thanks again for great talks & panel discussions, @pcimiano , Kenneth, @williamleif , @yunyao_li , @Fabio_Petroni , @colinraffel & @eunsolc .

@sewon__min

Sewon Min

4 years

Interested in unstructured/structured KBs? Curious how to combine benefits from different paradigms of KBs? We invite you to #uskb2020 workshop at #akbc2020 on June 25 (Thu). . w/ @danqi_chen , @raj_umass_nlp , Angela Fan, @sivareddyg & @pat_verga (1/2)

1

31

83

2

6

55

@sewon__min

Sewon Min

2 years

Very excited to have an amazing line of speakers! Join us at 9:20am Irish time (Liffey meeting room 2 if you are onsite) to discuss ideas for combining parametric and non-parametric NLP #acl2022nlp

@Spa_NLP

Semi-parametric Methods in NLP Workshop

2 years

"Semi-parametric Methods in NLP" is next week at #ACL2022 ! Top off your conference with the Spa-NLP workshop for all things retrieval-augmented! Includes great contributed works & amazing keynote speakers: @danqi_chen , @HannaHajishirzi , @andrewmccallum , Anna Potapenko & @jaseweston

Tweet media one

0

17

79

1

4

54

@sewon__min

Sewon Min

5 years

#emnlp2019 Check out our talk tomorrow (Wednesday) at 14:42-15:00, AWE 201, "A Discrete Hard EM Approach for Weakly Supervised Question Answering" with @danqi_chen @HannaHajishirzi @LukeZettlemoyer

1

7

48

@sewon__min

Sewon Min

4 years

Check out our competition on Efficient Open-domain Question Answering, which will be hosted at #NeurIPS2020 ! Website: (w/ data + tutorials on baseline systems)

@GoogleAI

Google AI

4 years

Announcing the EfficientQA competition and #NeurIPS2020 workshop, a collaborative effort with @Princeton and @UW that challenges developers to create end-to-end open-domain question answering systems that are small, yet robust. Learn all about it ↓

9

134

376

0

10

48

@sewon__min

Sewon Min

2 years

@_jasonwei @Hou_Le @hwchung27 lol. we actually decided the name before the new name was internally announced. not intended... 😅

1

1

46

@sewon__min

Sewon Min

1 year

We're presenting "Rethinking the Role of Demonstrations" in-person today (Friday)! See you at 2PM @ Atrium 😊 #EMNLP2022

@artetxem

Mikel Artetxe

1 year

6) “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, led by @sewon__min ⏰ Friday Dec 9 @ 14:00-15:30 📍 Atrium 📄

1

1

5

0

4

43

@sewon__min

Sewon Min

4 years

Check out our AmbigQA paper at #emnlp2020 and hear about a new task for answering ambiguous open-domain questions. Our zoom QnA is tomorrow (Nov 17 Tues) 5pm PT / 8pm ET. Come and say hi👋 Zoom link: Talk & Rocket Chat:

@sewon__min

Sewon Min

4 years

Does "When did harry potter and sorcerer's stone movie come out?" look ambiguous? Ambiguity is inherent to open-domain QA. We introduce a new QA task for predicting question-answer pairs that represent different interpretations of the original question.

Tweet media one

5

17

117

0

6

41

@sewon__min

Sewon Min

2 years

Great work that not only improves meta-training but also analyzes in-depth on how counterintuitive which data helps or not could be (the best resource turns out to be tasks from just one website that does not have any overlap in tasks/domains with test data! 🤯)

@junshernchan

JunShern

2 years

New paper: is all you need! Training on odd data (eg tables from ) improves few-shot learning (FSL) w language models, as much/more than diverse NLP data. Questions common wisdom that diverse data helps w FSL

Tweet media one

4

34

191

0

4

40

@sewon__min

Sewon Min

5 years

#acl2019nlp Check out our two presentations on QA today! (1) 11:30 Hall 4 "Compositional Questions Do Not Necessitate Multi-hop Reasoning" (2) 4-5:40 Poster #B "Multi-hop Reading Comprehension through Question Decomposition and Rescoring" Come and say hi 😀

0

5

37

@sewon__min

Sewon Min

2 years

Had an amazing time presenting our recent work and discussing on LMs and in-context learning. Thank you for having me!

@jacobandreas

Jacob Andreas

2 years

We were lucky to host @sewon__min at MIT today for a talk on noisy channel classification with LMs () and understanding in-context learning ().

1

3

29

0

1

38

@sewon__min

Sewon Min

3 years

We are organizing the 2nd Workshop on Unstructured/Structured KBs (USKB) @ #AKBC2021 w/ awesome line-up of speakers: @eunsolc @professorwcohen Luna Dong @kelvin_guu @colinraffel @iatitov Website: Check out our Call for Abstracts! (deadline August 31)

@bhuwandhingra

Bhuwan Dhingra

3 years

Excited to organize the 2nd workshop on Unstructured & Structured KBs (USKB) @ #AKBC2021 with @pat_verga , @sewon__min , Rajarshi Das, @nfitz , Aleksandra Piktus and Siamak Shakeri.

1

6

25

0

8

34

@sewon__min

Sewon Min

2 years

The idea of ranking researchers is harmful, and # of papers isn't even close to being meaningful. But just being told is sometimes not enough, with numerous metrics and counts all around that are keep affecting us.

2

0

33

@sewon__min

Sewon Min

3 years

Great work by @sun_haitian ! QA with multiple possible answers (or no answer) based on different conditions is a "real" problem, and I like how careful the dataset is created (e.g. questions are posed without knowing the answer)

@rsalakhu

Russ Salakhutdinov

3 years

Many real world questions can’t be answered deterministically. Instead, answers can only be true if certain conditions apply. Check out our new dataset ConditionalQA: an extremely challenging but super exciting task! Dataset Paper

Tweet media one

1

20

99

0

2

31

@sewon__min

Sewon Min

5 years

"Compositional Questions Do Not Necessitate Multi-hop Reasoning" w/ @Eric_Wallace_ @sameer_ @nlpmattg @HannaHajishirzi & @LukeZettlemoyer See how/why complex questions can be simpler than you think, esp if you work on multi-hop QA! #acl2019nlp 2/

Tweet card media

Compositional Questions Do Not Necessitate Multi-hop Reasoning

Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs. We argue that it can be difficult to construct large multi-hop...

3

5

28

@sewon__min

Sewon Min

2 years

Very interesting dataset that provides long-formed answers (conveying multiple short answers & their disambiguations) to ambiguous questions! Really like the direction toward ambiguous and generative QA 🙌

@bhuwandhingra

Bhuwan Dhingra

2 years

🤔 When does a factoid question need a *long* answer? 🤖 "Long" could mean multiple things: either you ask for a city with a very long name or … Read Ivan Stelmakh's internship paper to get the second part of the answer!

3

15

54

0

3

28

@sewon__min

Sewon Min

3 years

In an hour (starting 8:25am PT), we have the 2nd Workshop on Unstructured/Structured KBs at #AKBC2021 . Interested in different forms of knowledge from structured/unstructured to purely implicit, parametrized? Come to our workshop! talks/schedule:

0

10

27

@sewon__min

Sewon Min

4 years

To study natural ambiguity, we construct a new dataset with questions from NQ; we find that ambiguity is frequent, diverse & subtle. See our paper and website for data, models, and more! Work with @_julianmichael_ @HannaHajishirzi @LukeZettlemoyer

1

2

26

@sewon__min

Sewon Min

3 years

#EMNLP2021 Join us 9am AST tomorrow (Nov 10) at the MRQA workshop! Great line of speakers, panelists, and papers. The workshop is "hybrid" and half of the talks are "in-person"😊 Schedule:

@MRQA_workshop

MRQA Workshop

3 years

MRQA 2021 is *tomorrow*! Join us Weds 9am AST, Workshop W12 #EMNLP2021 This year there's a special focus on Multilingual and interpretable QA, with talks from @rtsarfaty @JonClarkSeattle @KCrosner @JonathanBerant @marcotcr @HannaHajishirzi , 2 panels, best paper talks, & posters!

Tweet media one

1

22

63

0

2

23

@sewon__min

Sewon Min

3 years

SOTA AmbigQA model from AWS AI & CUHK with cool disambiguation strategy, achieving significant improvements over the prev best models 🙂 Congrats @YifanGaoNLP ! #ACL2021NLP AmbigQA Leaderboard:

@Yifan__Gao

Yifan Gao

3 years

My internship work at AWS AI is accepted by #ACL2021NLP main conference! "Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction" Previous preprint: . The camera-ready version will be updated soon!

3

5

54

0

0

22

@sewon__min

Sewon Min

2 years

Spa-NLP, a workshop on ✨Semiparametric methods✨ in NLP will be co-located with ACL 2022! Featuring a series of exciting keynote talks & Seeking submissions (both archival and non-archival; ddl Feb 28th). Find more info here:

@Spa_NLP

Semi-parametric Methods in NLP Workshop

2 years

Introducing Spa-NLP, a workshop on Semiparametric methods in NLP, co-located with ACL2022! Read the CFP: For the inaugural workshop, the theme is "Decoupling Logic from Knowledge" Sub Deadline 28th Feb for direct submissions or 19th Mar with ARR reviews 1/

Tweet media one

1

13

35

0

0

20

@sewon__min

Sewon Min

1 year

@_jasonwei @BoshiWang2 @XiangDeng1 @mickeysjm @LukeZettlemoyer @hhsun1 That's a great question! I just tried one example out of curiosity, and surprisingly text-davinci-003 does well even with a completely wrong rationale. It's just one run, one example, but systematic evaluation on this task might be very interesting 😮

Tweet media one

Tweet media two

2

0

19

@sewon__min

Sewon Min

2 years

Then, how do demonstrations lead to performance gains? We find that (1) gains mainly come from (independent) specification of the input space and the label space, and (2) models still sometimes retain ~95% of gains with either inputs only or labels only *given the right format.*

1

1

18

@sewon__min

Sewon Min

2 years

Further analysis provides a new way of understanding the role of the demonstrations and what we can say about the model "learning at test time". More discussion in the paper! Work with Xinxi Lyu, @universeinanegg , @artetxem , @ml_perception , @HannaHajishirzi , @LukeZettlemoyer

0

1

15

@sewon__min

Sewon Min

2 years

We also have panel discussion on a special theme "Traditional (Vector) representation & LM representation" which I'm sure all of you are very interested in! w/ @_beenkim @strubell @eunsolc @MohitIyyer @monojitchou @omerlevy_ moderated by @colinraffel

Tweet media one

1

1

15

@sewon__min

Sewon Min

2 years

@_jasonwei @tallinzen @albertwebson @AndrewLampinen Just to clarify: we definitely did not mean *any randomized output* is fine -- I think the broader message is factuality of the prompt may not be the necessary condition. Using random labels is perhaps just one way to remove factuality in closed-set tasks like multi-choice

1

0

13

@sewon__min

Sewon Min

1 year

NPM has various functions such as - filling a <mask> with an arbitrary length phrase, - predicting extremely rare/unseen words, - allowing a very large vocab size, and - being able to be effectively updated/scaled at test time by replacing/expanding the corpus (2/4)

1

0

13

@sewon__min

Sewon Min

1 year

@_jasonwei @BoshiWang2 @XiangDeng1 @mickeysjm @LukeZettlemoyer @hhsun1 again out of curiosity, I tried concat of second last characters but with the same instruction (so the meaning of "last" is re-defined) -- the model still finds last characters instead of second last. (although it misses concat.) Maybe the model knows what "last" means too well..

Tweet media one

3

0

11

@sewon__min

Sewon Min

3 years

MRQA deadline extended to 🌟August 12🌟. Consider submitting your work! #emnlp2021 #NLProc

@MRQA_workshop

MRQA Workshop

3 years

‼️ MRQA (at #emnlp2021 ) deadline extended to **Thursday, August 12** ‼️ (September 3 w/ ARR reviews) More info: We welcome submissions in interpretability track, multilinguality track & regular research track. Please consider submitting your work!

0

7

12

0

1

11

@sewon__min

Sewon Min

4 years

Here are our awesome speakers, @pcimiano , Kenneth Forbus, @williamleif , @yunyao_li , @Fabio_Petroni & @colinraffel . We’ll also have panel discussions with speakers + @eunsolc . Workshop registration included in #akbc2020 registration (). #uskb2020 (2/2)

0

2

9

@sewon__min

Sewon Min

3 years

@gneubig Hi Graham, this is really cool work! This ACL'20 paper on calibration in QA seems to be related: , although it hasn't looked at T5.

Tweet card media

Selective Question Answering under Domain Shift

To avoid giving wrong answers, question answering (QA) models need to know when to abstain from answering. Moreover, users often ask questions that diverge from the model's training data, making...

1

1

9

@sewon__min

Sewon Min

5 years

Paper: Code:

Tweet card media

GitHub - shmsw25/qa-hard-em: An original implementation of EMNLP 2019, "A Discrete Hard EM Approach...

An original implementation of EMNLP 2019, "A Discrete Hard EM Approach for Weakly Supervised Question Answering" - shmsw25/qa-hard-em

0

0

8

@sewon__min

Sewon Min

2 months

(This was why we framed FActScore as a tool to use when you immediately need to evaluate brand new LMs, with current available techniques, rather than for evaluating any text - we leveraged the easiness of bio, as bio only is still better than not evaluating factuality at all)

1

0

9

@sewon__min

Sewon Min

2 months

It is still cool that @JerryWeiAI studied factuality of a larger set of SOTA LMs on a larger set of topics. If the rankings by the automatic eval are consistent to those by humans on these topics, then even if the auto eval doesn't outperform human, it'd still be super useful!

1

1

9

@sewon__min

Sewon Min

1 year

Stay tuned for the code and model checkpoints! Work done with @WeijiaShi2 @ml_perception @ccsasuke @scottyih @HannaHajishirzi @LukeZettlemoyer at @uwnlp @MetaAI @allen_ai (4/4)

1

0

9

@sewon__min

Sewon Min

5 years

Also check out the concurrent work from Jifan Chen & @gregd_nlp , and see how we have conducted different analyses (human studies & analyses of reasons & future directions)

0

1

8

@sewon__min

Sewon Min

3 years

We claim that jointly ranking a set of passages (as opposed to independently scoring each passage) is important for multi-answer retrieval, propose a model under this formulation, and show empirically strong results over competitive baselines. Come to the poster for details!

Tweet media one

0

0

7

@sewon__min

Sewon Min

5 years

TLDR: For QA where only answer text is given but not a solution (e.g. span, equation or SQL query), if you can precompute a set of solutions that derive the answer, try a hard-EM approach which further increases the likelihood of most likely solution at each parameter update.

1

0

6

@sewon__min

Sewon Min

2 years

@WenhuChen Yup, generation is a bit complicated, and definitely it's not like any random output would work. I think the general message is that factuality of the prompt may not be the necessary condition, and this paper partially showed that in closed-set tasks like multi-choice.

1

0

6

@sewon__min

Sewon Min

2 years

@_jasonwei @tallinzen @albertwebson @AndrewLampinen (although I agree it's still task-dependent, as Jason mentioned). For generation, it can be more complicated. Actually, a very recent work studies this with chain-of-thought with a similar conclusion as ours:

Tweet card media

Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango

The past decade has witnessed dramatic gains in natural language processing and an unprecedented scaling of large language models. These developments have been accelerated by the advent of...

1

1

6

@sewon__min

Sewon Min

4 years

@Fabio_Petroni @akbc_conf @PSH_Lewis @olapiktus @_rockt @mindjimmy @alex_h_miller @riedelcastro @facebookai @ucl_nlp Congrats!!

0

0

5

@sewon__min

Sewon Min

2 years

It's actually not my work 😂 but agree, very interesting work on understanding instruction-reading models. Awesome work @albertwebson @Brown_NLP !!

1

0

5

@sewon__min

Sewon Min

4 years

@complingy @_julianmichael_ @HannaHajishirzi @LukeZettlemoyer Thanks! Yes many are underspecification/vagueness which we consider as kinds of ambiguity. As other types of ambiguity, they often cannot be controlled, e.g. specifying "Who sings"->"Which lead singer sings"/"Which band sings" is only possible when you know the song is by a band.

1

0

4

@sewon__min

Sewon Min

5 years

@qi2peng2 My understanding is that k-fold CV is used when the test set is hidden (or doesn’t exist) and the dev set is not given so there’s only train set, and you want to make sure your model generalizes to all possible subsets of given data as the dev set.

1

0

4

@sewon__min

Sewon Min

2 years

@_jasonwei I agree finding mentors you like and trying to imitate them is great (and I think that's what you do!)

1

0

4

@sewon__min

Sewon Min

4 years

@arankomatsuzaki @Tim_Dettmers @zacharylipton Just to clarify, Fusion-in-Decoder took the exact same model as Dense Passage Retrieval (), and the novelty in Fusion-in-Decoder is on the reading comprehension part rather than IR part (having cross-attention over multiple passages after the retrieval). :)

Tweet card media

Dense Passage Retrieval for Open-Domain Question Answering

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In...

1

0

4

@sewon__min

Sewon Min

2 years

Post your questions for the panelists in advance!

0

0

4

@sewon__min

Sewon Min

2 years

@WenhuChen The paper Denny shared on CoT did great job on digging into more in generation -- using a wrong equation drops performance only slightly, but using a totally random output won't work -- and I think this matches with the general message of our paper.

1

0

4

@sewon__min

Sewon Min

1 year

See the paper for (1) what we did for challenges in training the model, and (2) our zero-shot evaluation on 9 closed-set and 7 open-set tasks, including tasks highlighting the need to predict new facts or unseen characters. (3/4)

1

0

4

@sewon__min

Sewon Min

2 years

@_jasonwei @tallinzen @albertwebson @AndrewLampinen factual correctness of "thought" is not necessary (e.g. using wrong equation in the MWP task drops performance only slightly) but simply using random output won't work.

0

0

4

@sewon__min

Sewon Min

6 years

This is neat! You can print an image in the terminal🏞

GitHub - wookayin/python-imgcat: 🖼 imgcat in Python (for iTerm2)

🖼 imgcat in Python (for iTerm2). Contribute to wookayin/python-imgcat development by creating an account on GitHub.

0

0

3

@sewon__min

Sewon Min

3 months

@Shujian_Liu @AkariAsai @SaratChinni @ZexuanZhong @danqi_chen Recording is available at !

1

0

3

@sewon__min

Sewon Min

4 years

@nlpmattg this might be relevant?

0

0

3

@sewon__min

Sewon Min

4 years

@colinraffel @ada_rob @huggingface That is a great point, and the fact that it gives comparable (or even better on TriviaQA) results is really interesting. I updated README to clarify it. Thanks for pointing it out!

1

0

3

@sewon__min

Sewon Min

2 years

@qi2peng2 @AmazonScience Congratulations, and welcome to Seattle!

1

0

3

@sewon__min

Sewon Min

3 years

@_anthonychen Really nice work! Would be a great dataset to train (and evaluate) the model not to be biased toward popular entities.

1

0

3

@sewon__min

Sewon Min

3 years

@sleepinyourhat @AliciaVParrish @boydgraber is one of the experts on bias in various QA datasets, so should be helpful to talk with him!

1

0

3

@sewon__min

Sewon Min

6 years

Great slides about giving good talk

@RanjitJhala

Ranjit Jhala

6 years

Here are my slides:

4

39

125

1

0

3

@sewon__min

Sewon Min

3 years

@PSH_Lewis @riedelcastro Congratulations, Patrick! Awesome news!

0

0

3

@sewon__min

Sewon Min

3 years

We also compare with work using instructions (learning to in-context learn without template is better than learning to read instruction, and combining both ideas achieves the best performance). All experiments reproducible from our repo, so check it out!

0

0

3

@sewon__min

Sewon Min

4 years

@ada_rob @colinraffel @huggingface Oh didn't realize that, thanks for the pointer! Updated it now 😃

0

0

2

@sewon__min

Sewon Min

3 years

An LM is tuned over a large set of tasks matching the test setup: conditioning on k train examples to make prediction. This directly leads to better in-context learning: model learns to recover semantics of a task from given examples, as must be done for a new task at test time.

2

0

2

@sewon__min

Sewon Min

4 years

@robinomial @USCViterbi @CSatUSC Congratulations, Robin!

0

0

2

@sewon__min

Sewon Min

2 years

@DanielKhashabi @jhuclsp @JHUCompSci @JohnsHopkins Congratulations @DanielKhashabi ! 🥳🥳

0

0

2

@sewon__min

Sewon Min

3 years

@ml_perception @HannaHajishirzi @LukeZettlemoyer We also compare with some of surprisingly strong baselines that are often ignored (e.g. direct head tuning) --- see our extensive ablations for when to use channel prompt tuning vs. other competitive models. As a bonus, find a new way of doing in-context demonstration. #NLProc

1

0

2

@sewon__min

Sewon Min

3 years

@najoungkim @BULinguistics @NYUDataScience Congrats!!

0

0

2

@sewon__min

Sewon Min

3 years

@swabhz @CSatUSC @nlp_usc Congrats Swabha!!

0

0

2

@sewon__min

Sewon Min

3 years

@ml_perception @HannaHajishirzi @LukeZettlemoyer Instead of P(y|x), channel models compute P(x|y)P(y) ~ P(x|y). We use this approach for in-context demonstration and for prompt tuning. We find channel models have lower variance & better worst-case accuracy, are more robust to imbalance in training data, and generalize better.

1

0

2

@sewon__min

Sewon Min

3 years

@kalpeshk2011 @MohitIyyer @UMass_NLP @umasscs Congratulations Kalpesh!

0

0

2

@sewon__min

Sewon Min

4 years

@sameer_ @marcotcr @tongshuangwu Congrats, Sameer! Really great work! 👏👏

0

0

2

@sewon__min

Sewon Min

3 years

Code/Data available at . This includes our method as well as many baselines from prior work - hope this is useful for everyone interested in LM prompting!

Tweet card media

GitHub - shmsw25/Channel-LM-Prompting: An original implementation of "Noisy Channel Language Model...

An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification" - shmsw25/Channel-LM-Prompting

0

0

2

@sewon__min

Sewon Min

2 years

@shikhergupta30 @StanfordAILab @sangmichaelxie (although these results are only shown on NLP benchmarks, and for other tasks like unusual synthetic tasks, results can be different, as we briefly noted in )

0

0

2

@sewon__min

Sewon Min

3 years

@aaron_j_chavez Thanks Aaron! I agree with you that a cycle of better models and better data is an ideal scenario. We tried semi-supervised learning in the paper in line with the idea you mentioned, and saw potential.

1

0

2

@sewon__min

Sewon Min

1 year

@jang_yoel @ai2_mosaic @allen_ai @rajammanabrolu @xiangrenNLP Congrats!!

1

0

2

@sewon__min

Sewon Min

2 years

@shikhergupta30 @StanfordAILab @sangmichaelxie Yes! You can keep much of performance if you preserve the format and the distribution of the input.

1

0

2

@sewon__min

Sewon Min

3 years

@nlpmattg @ryandcotterell BTW, ARR papers should remain anonymous until acceptance to *CL confs if ARR reviews will be submitted to the *CL confs together with the paper (asked this to the organizers and this was the response)

1

0

2

@sewon__min

Sewon Min

4 years

@MujumdarRohit Thanks! Yes, we already have leaderboard although there are only baselines now 🙂

0

0

2

@sewon__min

Sewon Min

1 year

@mi3fa5sol4mi2 The key idea is very related and we were largely inspired by knn-lm work! We are more extreme in that we are using retrieval only (no "LM"), and we are training the model to do retrieval. Also, we does it in a phrase-level, and evaluate on downstream tasks.

0

0

1

@sewon__min

Sewon Min

2 years

@awasthi_a_ @sangmichaelxie @VictoriaLinML (cont'd) I hypothesize the high-level idea still holds, but showing this is non-trivial, because we need outputs that randomize input-label correspondence but still keep the right distribution of the output. I think this would be an interesting avenue for future work!

1

0

1

@sewon__min

Sewon Min

3 years

@nlpmattg @ryandcotterell @ACLReview @ReviewAcl Yes, @ReviewAcl clarified with a new email. It seems possible to post a paper public after ARR reviews unless it will be submitted/resubmitted to confs/ARR within a month. Sorry for the confusion!

0

0

1

@sewon__min

Sewon Min

2 years

@awasthi_a_ @sangmichaelxie @VictoriaLinML Hi! Do you mean the task is free-form generation, or the task is still classification but outputs are multi-token (instead of single-token)? If you mean the latter -- yes! In fact our experiments include such cases. If you mean the former (generation) -- (cont'd)

1

0

1