Sewon Min Profile Banner
Sewon Min Profile
Sewon Min

@sewon__min

7,573
Followers
650
Following
11
Media
853
Statuses

PhD student at @uwcse @uwnlp

Seattle, WA
Joined November 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@sewon__min
Sewon Min
10 months
Excited to present SILO, a new nonparametric LM that * excludes copyrighted data from parameters❌ * instead stores it in a datastore and retrieves it at inference time✨ * achieves performance that is close to the model trained on all data🚀 📄
@ssgrn
Suchin Gururangan
10 months
Feel risky to train your language model on copyrighted data? Check out our new LM called SILO✨, with co-lead @sewon__min Recipe: collect public domain & permissively licensed text data, fit parameters on it, and use the rest of the data in an inference-time-only datastore.
2
55
241
0
68
296
@sewon__min
Sewon Min
1 year
Most if not all language models use a softmax that gives a categorical probability distribution over a finite vocab. We introduce NPM: the first nonparametric masked LM that replaces this softmax with a nonparametric distribution over a text corpus. (1/4)
12
81
441
@sewon__min
Sewon Min
2 years
LMs can learn via inference alone through demonstrations -- but how does it work? We find that LMs do not really need correct input-output pairs. Randomly replacing labels in the demonstrations barely hurts performance, consistently over 12 models.
Tweet media one
10
82
427
@sewon__min
Sewon Min
3 years
Introducing ✨MetaICL✨, where an LM is learned how to in-context learn, and then is tested frozen on an unseen target task. #NLProc Paper: Code: Demo: with @ml_perception @LukeZettlemoyer @HannaHajishirzi
Tweet media one
4
39
178
@sewon__min
Sewon Min
2 years
This *unintentionally* spreads the idea of which person gets the x-th place, who are the top-x, etc. Please don't rank researchers and judge them based on # of papers. I know the original tweet never meant this, but seeing this will implicitly affect young researchers like us.
@MarekRei
Marek Rei
2 years
Analysis of ML and NLP publication statistics from 2021. #machinelearning #NLProc
2
51
172
3
20
144
@sewon__min
Sewon Min
4 months
Excited to be hosting the workshop on Mathematical & Empirical Understanding of Foundation Models at #ICLR2024 in Vienna! Website: Paper deadline: Feb 3 We welcome unpublished/ongoing work, or work published to non-ML venues!✨
@SadhikaMalladi
Sadhika Malladi
4 months
Announcing the 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) at ICLR 2024! Improving our understanding helps us advance capabilities and build safer, more aligned models. Paper deadline is Feb 3! Website:
Tweet media one
0
15
107
3
8
129
@sewon__min
Sewon Min
3 years
Happy new year! #NeurIPS2020 EfficientQA organizers, together with participants, wrote a paper that includes systems, analyses, and lessons learned from the competition. Thanks to everyone who took part in it!
2
35
123
@sewon__min
Sewon Min
1 year
Check out our new work that tries to make the evaluation of LM's factuality📘 easier & simpler🚗 w/o compromising thoroughness🔎
@kalpeshk2011
Kalpesh Krishna
1 year
Factuality in long-form generation is hard to evaluate because (1) we don't know how to assign an accuracy value when a generation has mixed pieces of true/false info, and (2) human evaluation is extremely costly. But from now on, you can use FActScore!
Tweet media one
4
53
281
0
20
121
@sewon__min
Sewon Min
4 years
Does "When did harry potter and sorcerer's stone movie come out?" look ambiguous? Ambiguity is inherent to open-domain QA. We introduce a new QA task for predicting question-answer pairs that represent different interpretations of the original question.
Tweet media one
5
17
117
@sewon__min
Sewon Min
2 months
I agree! Evaluating factuality of long-form text in general is very difficult as some sentences are hard to decompose into independent claims and many claims are not easily verifiable. "Biography" is a *very special case* where these things are relatively easy.
@gregd_nlp
Greg Durrett
2 months
This is a cool method, but "superhuman" is an overclaim based on the data shown. There are better datasets than FActScore for evaluating this: ExpertQA by @cmalaviya11 +al Factcheck-GPT by Yuxia Wang +al (+ same methodology) 🧵
3
26
183
4
7
117
@sewon__min
Sewon Min
5 years
New #emnlp2019 paper w/ @danqi_chen @HannaHajishirzi @LukeZettlemoyer We formulate many recent QA tasks as weakly-supervised learning problems & show that Hard-EM-style learning outperforms MML/previous methods on 6 tasks (RC, Open-domain QA, discrete reasoning & SQL generation)
Tweet media one
1
22
108
@sewon__min
Sewon Min
2 years
Hi #acl2022nlp folks! The 7th #Repl4NLP workshop ( @sigrep_acl ) is happening tomorrow (May 26) 8:50 Irish Time with an amazing line of speakers: @_beenkim @strubell @monojitchou @percyliang @PSH_Lewis
Tweet media one
1
19
102
@sewon__min
Sewon Min
4 years
Interested in unstructured/structured KBs? Curious how to combine benefits from different paradigms of KBs? We invite you to #uskb2020 workshop at #akbc2020 on June 25 (Thu). . w/ @danqi_chen , @raj_umass_nlp , Angela Fan, @sivareddyg & @pat_verga (1/2)
1
31
83
@sewon__min
Sewon Min
6 months
#EMNLP2023 Talk on "The Role of Demonstrations" at the Big Picture Workshop at 4PM today (Dec 7th), co- with Junyeob! We will debate about "activation of priors from training vs. learning new capabilities" in ICL, and the broader picture related to this. Come and say hi!
@yanaiela
Yanai Elazar
6 months
The Big Picture workshop @ EMNLP23 is just one week away, and we ( @AllysonEttinger , @KassnerNora , @seb_ruder , @nlpnoah ) have an incredible program awaiting you!
3
8
78
0
10
81
@sewon__min
Sewon Min
2 years
Join us for ⭐ #ACL2022 Tutorial on Zero- and Few-Shot NLP with Pretrained Language Models⭐! It's Sunday 2:30pm Irish Time (The Liffey A if you're in-person)
@armancohan
Arman Cohan
2 years
Finding it hard to keep up with the zero- and few-shot learning (FSL) literature in #NLProc ? Join us at our #ACL2022 @aclmeeting tutorial on FSL. w/ @i_beltagy @sewon__min @sameer_ @rloganiv we will review and discuss the latest developments on FSL and (L)LMs. 1/2
2
29
157
0
10
77
@sewon__min
Sewon Min
1 year
It was fun chatting about my research with @spaniel_bashir on the @gradientpub podcast. Thank you for inviting me! 🙌
@gradientpub
The Gradient
1 year
🎙️ How does in-context learning work in large language models? Where do language models fall short and how can we improve them? @spaniel_bashir speaks with @sewon__min in our latest podcast episode:
0
4
26
1
13
68
@sewon__min
Sewon Min
5 years
Excited to share our two #acl2019nlp papers! "Multi-hop Reading Comprehension through Question Decomposition and Rescoring" w/ @hllo_wrld @LukeZettlemoyer & @HannaHajishirzi Paper Check out our demo which answers to multi-hop Qs! 1/
1
13
65
@sewon__min
Sewon Min
3 years
#EMNLP2021 Presenting JPR: Joint Passage Ranking for Multi-Answer Retrieval. Poster session tomorrow! Virtual: 8:30-10:30 PST / 11:30-1:30 EST In-person: 2:45-4:15 AST with @kentonctlee @mchang21 @toutanova @HannaHajishirzi (while interning at @GoogleAI )
Tweet media one
2
14
65
@sewon__min
Sewon Min
4 years
Thanks everyone for attending #uskb workshop @akbc_conf . It went very well! Esp. thanks again for great talks & panel discussions, @pcimiano , Kenneth, @williamleif , @yunyao_li , @Fabio_Petroni , @colinraffel & @eunsolc .
@sewon__min
Sewon Min
4 years
Interested in unstructured/structured KBs? Curious how to combine benefits from different paradigms of KBs? We invite you to #uskb2020 workshop at #akbc2020 on June 25 (Thu). . w/ @danqi_chen , @raj_umass_nlp , Angela Fan, @sivareddyg & @pat_verga (1/2)
1
31
83
2
6
55
@sewon__min
Sewon Min
2 years
Very excited to have an amazing line of speakers! Join us at 9:20am Irish time (Liffey meeting room 2 if you are onsite) to discuss ideas for combining parametric and non-parametric NLP #acl2022nlp
@Spa_NLP
Semi-parametric Methods in NLP Workshop
2 years
"Semi-parametric Methods in NLP" is next week at #ACL2022 ! Top off your conference with the Spa-NLP workshop for all things retrieval-augmented! Includes great contributed works & amazing keynote speakers: @danqi_chen , @HannaHajishirzi , @andrewmccallum , Anna Potapenko & @jaseweston
Tweet media one
0
17
79
1
4
54
@sewon__min
Sewon Min
5 years
#emnlp2019 Check out our talk tomorrow (Wednesday) at 14:42-15:00, AWE 201, "A Discrete Hard EM Approach for Weakly Supervised Question Answering" with @danqi_chen @HannaHajishirzi @LukeZettlemoyer
1
7
48
@sewon__min
Sewon Min
4 years
Check out our competition on Efficient Open-domain Question Answering, which will be hosted at #NeurIPS2020 ! Website: (w/ data + tutorials on baseline systems)
@GoogleAI
Google AI
4 years
Announcing the EfficientQA competition and #NeurIPS2020 workshop, a collaborative effort with @Princeton and @UW that challenges developers to create end-to-end open-domain question answering systems that are small, yet robust. Learn all about it ↓
9
134
376
0
10
48
@sewon__min
Sewon Min
2 years
@_jasonwei @Hou_Le @hwchung27 lol. we actually decided the name before the new name was internally announced. not intended... 😅
1
1
46
@sewon__min
Sewon Min
1 year
We're presenting "Rethinking the Role of Demonstrations" in-person today (Friday)! See you at 2PM @ Atrium 😊 #EMNLP2022
@artetxem
Mikel Artetxe
1 year
6) “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, led by @sewon__min ⏰ Friday Dec 9 @ 14:00-15:30 📍 Atrium 📄
1
1
5
0
4
43
@sewon__min
Sewon Min
4 years
Check out our AmbigQA paper at #emnlp2020 and hear about a new task for answering ambiguous open-domain questions. Our zoom QnA is tomorrow (Nov 17 Tues) 5pm PT / 8pm ET. Come and say hi👋 Zoom link: Talk & Rocket Chat:
@sewon__min
Sewon Min
4 years
Does "When did harry potter and sorcerer's stone movie come out?" look ambiguous? Ambiguity is inherent to open-domain QA. We introduce a new QA task for predicting question-answer pairs that represent different interpretations of the original question.
Tweet media one
5
17
117
0
6
41
@sewon__min
Sewon Min
2 years
Great work that not only improves meta-training but also analyzes in-depth on how counterintuitive which data helps or not could be (the best resource turns out to be tasks from just one website that does not have any overlap in tasks/domains with test data! 🤯)
@junshernchan
JunShern
2 years
New paper: is all you need! Training on odd data (eg tables from ) improves few-shot learning (FSL) w language models, as much/more than diverse NLP data. Questions common wisdom that diverse data helps w FSL
Tweet media one
4
34
191
0
4
40
@sewon__min
Sewon Min
5 years
#acl2019nlp Check out our two presentations on QA today! (1) 11:30 Hall 4 "Compositional Questions Do Not Necessitate Multi-hop Reasoning" (2) 4-5:40 Poster #B "Multi-hop Reading Comprehension through Question Decomposition and Rescoring" Come and say hi 😀
0
5
37
@sewon__min
Sewon Min
2 years
Had an amazing time presenting our recent work and discussing on LMs and in-context learning. Thank you for having me!
@jacobandreas
Jacob Andreas
2 years
We were lucky to host @sewon__min at MIT today for a talk on noisy channel classification with LMs () and understanding in-context learning ().
1
3
29
0
1
38
@sewon__min
Sewon Min
3 years
We are organizing the 2nd Workshop on Unstructured/Structured KBs (USKB) @ #AKBC2021 w/ awesome line-up of speakers: @eunsolc @professorwcohen Luna Dong @kelvin_guu @colinraffel @iatitov Website: Check out our Call for Abstracts! (deadline August 31)
@bhuwandhingra
Bhuwan Dhingra
3 years
Excited to organize the 2nd workshop on Unstructured & Structured KBs (USKB) @ #AKBC2021 with @pat_verga , @sewon__min , Rajarshi Das, @nfitz , Aleksandra Piktus and Siamak Shakeri.
1
6
25
0
8
34
@sewon__min
Sewon Min
2 years
The idea of ranking researchers is harmful, and # of papers isn't even close to being meaningful. But just being told is sometimes not enough, with numerous metrics and counts all around that are keep affecting us.
2
0
33
@sewon__min
Sewon Min
3 years
Great work by @sun_haitian ! QA with multiple possible answers (or no answer) based on different conditions is a "real" problem, and I like how careful the dataset is created (e.g. questions are posed without knowing the answer)
@rsalakhu
Russ Salakhutdinov
3 years
Many real world questions can’t be answered deterministically. Instead, answers can only be true if certain conditions apply. Check out our new dataset ConditionalQA: an extremely challenging but super exciting task! Dataset Paper
Tweet media one
1
20
99
0
2
31
@sewon__min
Sewon Min
2 years
Very interesting dataset that provides long-formed answers (conveying multiple short answers & their disambiguations) to ambiguous questions! Really like the direction toward ambiguous and generative QA 🙌
@bhuwandhingra
Bhuwan Dhingra
2 years
🤔 When does a factoid question need a *long* answer? 🤖 "Long" could mean multiple things: either you ask for a city with a very long name or … Read Ivan Stelmakh's internship paper to get the second part of the answer!
3
15
54
0
3
28
@sewon__min
Sewon Min
3 years
In an hour (starting 8:25am PT), we have the 2nd Workshop on Unstructured/Structured KBs at #AKBC2021 . Interested in different forms of knowledge from structured/unstructured to purely implicit, parametrized? Come to our workshop! talks/schedule:
0
10
27
@sewon__min
Sewon Min
4 years
To study natural ambiguity, we construct a new dataset with questions from NQ; we find that ambiguity is frequent, diverse & subtle. See our paper and website for data, models, and more! Work with @_julianmichael_ @HannaHajishirzi @LukeZettlemoyer
1
2
26
@sewon__min
Sewon Min
3 years
#EMNLP2021 Join us 9am AST tomorrow (Nov 10) at the MRQA workshop! Great line of speakers, panelists, and papers. The workshop is "hybrid" and half of the talks are "in-person"😊 Schedule:
@MRQA_workshop
MRQA Workshop
3 years
MRQA 2021 is *tomorrow*! Join us Weds 9am AST, Workshop W12 #EMNLP2021 This year there's a special focus on Multilingual and interpretable QA, with talks from @rtsarfaty @JonClarkSeattle @KCrosner @JonathanBerant @marcotcr @HannaHajishirzi , 2 panels, best paper talks, & posters!
Tweet media one
1
22
63
0
2
23
@sewon__min
Sewon Min
3 years
SOTA AmbigQA model from AWS AI & CUHK with cool disambiguation strategy, achieving significant improvements over the prev best models 🙂 Congrats @YifanGaoNLP ! #ACL2021NLP AmbigQA Leaderboard:
@Yifan__Gao
Yifan Gao
3 years
My internship work at AWS AI is accepted by #ACL2021NLP main conference! "Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction" Previous preprint: . The camera-ready version will be updated soon!
3
5
54
0
0
22
@sewon__min
Sewon Min
2 years
Spa-NLP, a workshop on ✨Semiparametric methods✨ in NLP will be co-located with ACL 2022! Featuring a series of exciting keynote talks & Seeking submissions (both archival and non-archival; ddl Feb 28th). Find more info here:
@Spa_NLP
Semi-parametric Methods in NLP Workshop
2 years
Introducing Spa-NLP, a workshop on Semiparametric methods in NLP, co-located with ACL2022! Read the CFP: For the inaugural workshop, the theme is "Decoupling Logic from Knowledge" Sub Deadline 28th Feb for direct submissions or 19th Mar with ARR reviews 1/
Tweet media one
1
13
35
0
0
20
@sewon__min
Sewon Min
1 year
@_jasonwei @BoshiWang2 @XiangDeng1 @mickeysjm @LukeZettlemoyer @hhsun1 That's a great question! I just tried one example out of curiosity, and surprisingly text-davinci-003 does well even with a completely wrong rationale. It's just one run, one example, but systematic evaluation on this task might be very interesting 😮
Tweet media one
Tweet media two
2
0
19
@sewon__min
Sewon Min
2 years
Then, how do demonstrations lead to performance gains? We find that (1) gains mainly come from (independent) specification of the input space and the label space, and (2) models still sometimes retain ~95% of gains with either inputs only or labels only *given the right format.*
1
1
18
@sewon__min
Sewon Min
2 years
Further analysis provides a new way of understanding the role of the demonstrations and what we can say about the model "learning at test time". More discussion in the paper! Work with Xinxi Lyu, @universeinanegg , @artetxem , @ml_perception , @HannaHajishirzi , @LukeZettlemoyer
0
1
15
@sewon__min
Sewon Min
2 years
We also have panel discussion on a special theme "Traditional (Vector) representation & LM representation" which I'm sure all of you are very interested in! w/ @_beenkim @strubell @eunsolc @MohitIyyer @monojitchou @omerlevy_ moderated by @colinraffel
Tweet media one
1
1
15
@sewon__min
Sewon Min
2 years
@_jasonwei @tallinzen @albertwebson @AndrewLampinen Just to clarify: we definitely did not mean *any randomized output* is fine -- I think the broader message is factuality of the prompt may not be the necessary condition. Using random labels is perhaps just one way to remove factuality in closed-set tasks like multi-choice
1
0
13
@sewon__min
Sewon Min
1 year
NPM has various functions such as - filling a <mask> with an arbitrary length phrase, - predicting extremely rare/unseen words, - allowing a very large vocab size, and - being able to be effectively updated/scaled at test time by replacing/expanding the corpus (2/4)
1
0
13
@sewon__min
Sewon Min
1 year
@_jasonwei @BoshiWang2 @XiangDeng1 @mickeysjm @LukeZettlemoyer @hhsun1 again out of curiosity, I tried concat of second last characters but with the same instruction (so the meaning of "last" is re-defined) -- the model still finds last characters instead of second last. (although it misses concat.) Maybe the model knows what "last" means too well..
Tweet media one
3
0
11
@sewon__min
Sewon Min
3 years
MRQA deadline extended to 🌟August 12🌟. Consider submitting your work! #emnlp2021 #NLProc
@MRQA_workshop
MRQA Workshop
3 years
‼️ MRQA (at #emnlp2021 ) deadline extended to **Thursday, August 12** ‼️ (September 3 w/ ARR reviews) More info: We welcome submissions in interpretability track, multilinguality track & regular research track. Please consider submitting your work!
0
7
12
0
1
11
@sewon__min
Sewon Min
4 years
Here are our awesome speakers, @pcimiano , Kenneth Forbus, @williamleif , @yunyao_li , @Fabio_Petroni & @colinraffel . We’ll also have panel discussions with speakers + @eunsolc . Workshop registration included in #akbc2020 registration (). #uskb2020 (2/2)
0
2
9
@sewon__min
Sewon Min
2 months
(This was why we framed FActScore as a tool to use when you immediately need to evaluate brand new LMs, with current available techniques, rather than for evaluating any text - we leveraged the easiness of bio, as bio only is still better than not evaluating factuality at all)
1
0
9
@sewon__min
Sewon Min
2 months
It is still cool that @JerryWeiAI studied factuality of a larger set of SOTA LMs on a larger set of topics. If the rankings by the automatic eval are consistent to those by humans on these topics, then even if the auto eval doesn't outperform human, it'd still be super useful!
1
1
9
@sewon__min
Sewon Min
5 years
Also check out the concurrent work from Jifan Chen & @gregd_nlp , and see how we have conducted different analyses (human studies & analyses of reasons & future directions)
0
1
8
@sewon__min
Sewon Min
3 years
We claim that jointly ranking a set of passages (as opposed to independently scoring each passage) is important for multi-answer retrieval, propose a model under this formulation, and show empirically strong results over competitive baselines. Come to the poster for details!
Tweet media one
0
0
7
@sewon__min
Sewon Min
5 years
TLDR: For QA where only answer text is given but not a solution (e.g. span, equation or SQL query), if you can precompute a set of solutions that derive the answer, try a hard-EM approach which further increases the likelihood of most likely solution at each parameter update.
1
0
6
@sewon__min
Sewon Min
2 years
@WenhuChen Yup, generation is a bit complicated, and definitely it's not like any random output would work. I think the general message is that factuality of the prompt may not be the necessary condition, and this paper partially showed that in closed-set tasks like multi-choice.
1
0
6
@sewon__min
Sewon Min
2 years
It's actually not my work 😂 but agree, very interesting work on understanding instruction-reading models. Awesome work @albertwebson @Brown_NLP !!
1
0
5
@sewon__min
Sewon Min
4 years
@complingy @_julianmichael_ @HannaHajishirzi @LukeZettlemoyer Thanks! Yes many are underspecification/vagueness which we consider as kinds of ambiguity. As other types of ambiguity, they often cannot be controlled, e.g. specifying "Who sings"->"Which lead singer sings"/"Which band sings" is only possible when you know the song is by a band.
1
0
4
@sewon__min
Sewon Min
5 years
@qi2peng2 My understanding is that k-fold CV is used when the test set is hidden (or doesn’t exist) and the dev set is not given so there’s only train set, and you want to make sure your model generalizes to all possible subsets of given data as the dev set.
1
0
4
@sewon__min
Sewon Min
2 years
@_jasonwei I agree finding mentors you like and trying to imitate them is great (and I think that's what you do!)
1
0
4
@sewon__min
Sewon Min
4 years
@arankomatsuzaki @Tim_Dettmers @zacharylipton Just to clarify, Fusion-in-Decoder took the exact same model as Dense Passage Retrieval (), and the novelty in Fusion-in-Decoder is on the reading comprehension part rather than IR part (having cross-attention over multiple passages after the retrieval). :)
1
0
4
@sewon__min
Sewon Min
2 years
Post your questions for the panelists in advance!
0
0
4
@sewon__min
Sewon Min
2 years
@WenhuChen The paper Denny shared on CoT did great job on digging into more in generation -- using a wrong equation drops performance only slightly, but using a totally random output won't work -- and I think this matches with the general message of our paper.
1
0
4
@sewon__min
Sewon Min
1 year
See the paper for (1) what we did for challenges in training the model, and (2) our zero-shot evaluation on 9 closed-set and 7 open-set tasks, including tasks highlighting the need to predict new facts or unseen characters. (3/4)
1
0
4
@sewon__min
Sewon Min
2 years
@_jasonwei @tallinzen @albertwebson @AndrewLampinen factual correctness of "thought" is not necessary (e.g. using wrong equation in the MWP task drops performance only slightly) but simply using random output won't work.
0
0
4
@sewon__min
Sewon Min
4 years
@nlpmattg this might be relevant?
0
0
3
@sewon__min
Sewon Min
4 years
@colinraffel @ada_rob @huggingface That is a great point, and the fact that it gives comparable (or even better on TriviaQA) results is really interesting. I updated README to clarify it. Thanks for pointing it out!
1
0
3
@sewon__min
Sewon Min
2 years
@qi2peng2 @AmazonScience Congratulations, and welcome to Seattle!
1
0
3
@sewon__min
Sewon Min
3 years
@_anthonychen Really nice work! Would be a great dataset to train (and evaluate) the model not to be biased toward popular entities.
1
0
3
@sewon__min
Sewon Min
3 years
@sleepinyourhat @AliciaVParrish @boydgraber is one of the experts on bias in various QA datasets, so should be helpful to talk with him!
1
0
3
@sewon__min
Sewon Min
6 years
Great slides about giving good talk
@RanjitJhala
Ranjit Jhala
6 years
Here are my slides:
4
39
125
1
0
3
@sewon__min
Sewon Min
3 years
@PSH_Lewis @riedelcastro Congratulations, Patrick! Awesome news!
0
0
3
@sewon__min
Sewon Min
3 years
We also compare with work using instructions (learning to in-context learn without template is better than learning to read instruction, and combining both ideas achieves the best performance). All experiments reproducible from our repo, so check it out!
0
0
3
@sewon__min
Sewon Min
4 years
@ada_rob @colinraffel @huggingface Oh didn't realize that, thanks for the pointer! Updated it now 😃
0
0
2
@sewon__min
Sewon Min
3 years
An LM is tuned over a large set of tasks matching the test setup: conditioning on k train examples to make prediction. This directly leads to better in-context learning: model learns to recover semantics of a task from given examples, as must be done for a new task at test time.
2
0
2
@sewon__min
Sewon Min
4 years
0
0
2
@sewon__min
Sewon Min
3 years
@ml_perception @HannaHajishirzi @LukeZettlemoyer We also compare with some of surprisingly strong baselines that are often ignored (e.g. direct head tuning) --- see our extensive ablations for when to use channel prompt tuning vs. other competitive models. As a bonus, find a new way of doing in-context demonstration. #NLProc
1
0
2
@sewon__min
Sewon Min
3 years
0
0
2
@sewon__min
Sewon Min
3 years
@ml_perception @HannaHajishirzi @LukeZettlemoyer Instead of P(y|x), channel models compute P(x|y)P(y) ~ P(x|y). We use this approach for in-context demonstration and for prompt tuning. We find channel models have lower variance & better worst-case accuracy, are more robust to imbalance in training data, and generalize better.
1
0
2
@sewon__min
Sewon Min
4 years
@sameer_ @marcotcr @tongshuangwu Congrats, Sameer! Really great work! 👏👏
0
0
2
@sewon__min
Sewon Min
3 years
Code/Data available at . This includes our method as well as many baselines from prior work - hope this is useful for everyone interested in LM prompting!
0
0
2
@sewon__min
Sewon Min
2 years
@shikhergupta30 @StanfordAILab @sangmichaelxie (although these results are only shown on NLP benchmarks, and for other tasks like unusual synthetic tasks, results can be different, as we briefly noted in )
0
0
2
@sewon__min
Sewon Min
3 years
@aaron_j_chavez Thanks Aaron! I agree with you that a cycle of better models and better data is an ideal scenario. We tried semi-supervised learning in the paper in line with the idea you mentioned, and saw potential.
1
0
2
@sewon__min
Sewon Min
2 years
@shikhergupta30 @StanfordAILab @sangmichaelxie Yes! You can keep much of performance if you preserve the format and the distribution of the input.
1
0
2
@sewon__min
Sewon Min
3 years
@nlpmattg @ryandcotterell BTW, ARR papers should remain anonymous until acceptance to *CL confs if ARR reviews will be submitted to the *CL confs together with the paper (asked this to the organizers and this was the response)
1
0
2
@sewon__min
Sewon Min
4 years
@MujumdarRohit Thanks! Yes, we already have leaderboard although there are only baselines now 🙂
0
0
2
@sewon__min
Sewon Min
1 year
@mi3fa5sol4mi2 The key idea is very related and we were largely inspired by knn-lm work! We are more extreme in that we are using retrieval only (no "LM"), and we are training the model to do retrieval. Also, we does it in a phrase-level, and evaluate on downstream tasks.
0
0
1
@sewon__min
Sewon Min
2 years
@awasthi_a_ @sangmichaelxie @VictoriaLinML (cont'd) I hypothesize the high-level idea still holds, but showing this is non-trivial, because we need outputs that randomize input-label correspondence but still keep the right distribution of the output. I think this would be an interesting avenue for future work!
1
0
1
@sewon__min
Sewon Min
3 years
@nlpmattg @ryandcotterell @ACLReview @ReviewAcl Yes, @ReviewAcl clarified with a new email. It seems possible to post a paper public after ARR reviews unless it will be submitted/resubmitted to confs/ARR within a month. Sorry for the confusion!
0
0
1
@sewon__min
Sewon Min
2 years
@awasthi_a_ @sangmichaelxie @VictoriaLinML Hi! Do you mean the task is free-form generation, or the task is still classification but outputs are multi-token (instead of single-token)? If you mean the latter -- yes! In fact our experiments include such cases. If you mean the former (generation) -- (cont'd)
1
0
1