Suchin Gururangan Profile Banner
Suchin Gururangan Profile
Suchin Gururangan

@ssgrn

3,757
Followers
251
Following
68
Media
939
Statuses

he/him Research scientist 🦙 Llama team, @meta GenAI PhD @uwcse + @uwnlp

SF x LA
Joined November 2011
Don't wanna be here? Send us removal request.
Pinned Tweet
@ssgrn
Suchin Gururangan
1 month
Llama3-8B and 70B have dropped!! Extremely grateful to have been part of this journey. More coming soon :)
4
3
109
@ssgrn
Suchin Gururangan
2 years
We present the ELMforest🌳: an embarrassingly parallel language model. An ELMforest contains many smaller expert LMs (ELMs) that can be added/removed, ensembled, or parameter-averaged at any time for efficient scaling and rapid customization. 🧵👇
Tweet media one
7
87
495
@ssgrn
Suchin Gururangan
4 months
Extremely happy to announce that I defended my PhD; next week is my first day on the LLaMA team at @Meta GenAI 🥳🦙!! Excited to help build the best open models in the world. To the UWNLP community, thank you for everything. You can watch my defense at
32
12
455
@ssgrn
Suchin Gururangan
2 years
In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions. Paper: 🧵👇
Tweet media one
9
104
448
@ssgrn
Suchin Gururangan
8 months
Excited to introduce ✨OpenLM: a simple, efficient, and customizable LLM training library! Made with @Mitchnw , @Vaishaal , @sy_gadre , @achalddave , @lschmidt3 , and others, in collaboration with @laion_ai and @StabilityAI . /1
4
85
429
@ssgrn
Suchin Gururangan
5 months
Introducing time vectors! Time vectors are a simple way adapt LMs to new time periods; our results suggest that time is encoded in the weights of finetuned models. Led by my incredible undergrad mentee, Kai Nylund! Paper: Code: /1
Tweet media one
5
61
332
@ssgrn
Suchin Gururangan
3 years
Excited to introduce DEMix layers, a module with domain "experts" that make a language model modular! You can mix, add, or remove experts, enabling rapid adaptation. 🧵👇 Paper: Work with @ml_perception , @universeinanegg , @nlpnoah , and @LukeZettlemoyer
Tweet media one
4
69
289
@ssgrn
Suchin Gururangan
4 years
BioMed-RoBERTa is now available on @huggingface transformers! Check it out on the new @allen_ai model repository: . We hope this model is useful for researchers working on bioNLP applications, like those for CORD-19. 1/4
2
57
241
@ssgrn
Suchin Gururangan
10 months
Feel risky to train your language model on copyrighted data? Check out our new LM called SILO✨, with co-lead @sewon__min Recipe: collect public domain & permissively licensed text data, fit parameters on it, and use the rest of the data in an inference-time-only datastore.
2
55
241
@ssgrn
Suchin Gururangan
4 years
New Findings #emnlp2020 paper is live at ! With @samgehman @MaartenSap @YejinChoinka @nlpnoah , we present RealToxicityPrompts, a scalable evaluation framework for measuring toxicity in NLG, and discover pervasive toxicity in training data of recent LMs.👇
Tweet media one
Tweet media two
Tweet media three
5
57
230
@ssgrn
Suchin Gururangan
1 year
We present Cluster-Branch-Train-Merge (c-BTM), a new way to scale sparse expert LLMs on any dataset — completely asynchronously. 🧵👇 Paper: Code + Models:
Tweet media one
3
40
218
@ssgrn
Suchin Gururangan
4 years
Labeling researchers like @timnitgebru who are not being heard as "emotional" is a sexist and condescending prescription. The real way one delays solutions to bias in ML is by gaslighting the experts, and then saying if they call out your BS, they are preventing progress.
@ylecun
Yann LeCun
4 years
@timnitGebru @soumithchintala It's also important to avoid assuming bad intent from your interlocutor. It only serves to inflame emotions, to hurt people who could be helpful, to mask the real issues, to delay the development of meaningful solutions, and to delay meaningful action. 17/N N=17.
20
14
341
3
31
204
@ssgrn
Suchin Gururangan
4 years
1/ Really excited about this one! "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks" is live! With @anmarasovic , @swabhz , @kylelostat , @i_beltagy , Doug Downey, and @nlpnoah , to appear at ACL2020. Paper: Code:
3
34
159
@ssgrn
Suchin Gururangan
1 year
Thanks to @TechAtBloomberg for supporting my research, and a special thanks to @uwnlp , @MetaAI , and @allen_ai for helping me grow as a scientist and human. In the interest of open access to academic materials, I've released my fellowship statement here:
@TechAtBloomberg
Tech At Bloomberg
1 year
Congratulations to @uwcse + @uwnlp 's @ssgrn on being named one of the 2022-2023 @Bloomberg #DataScience Ph.D. Fellows! Learn more about his research focus and the other Fellows in our newest cohort: #AI #ML #NLProc
Tweet media one
0
4
38
15
12
159
@ssgrn
Suchin Gururangan
4 years
Excited to share that I'm joining @uwcse and @uwnlp to start my PhD in Computer Science this fall! 🥳 Looking forward to continue being part of the incredible research community in Seattle. Thanks to everyone for advice and discussions along the way!!
12
2
157
@ssgrn
Suchin Gururangan
4 years
So honored to have won honorable mention at #acl2020nlp ! This project would not have been possible without the amazing cross-team collaboration at AI2, many thanks to @anmarasovic @swabhz @kylelostat @_DougDowney @i_beltagy @nlpnoah . Come to our QA sessions at 10am and 2p PST!
@aclmeeting
ACL 2024
4 years
Honorable mention for overall best paper (1) at #acl2020nlp : Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey and Noah A. Smith
2
11
94
7
18
148
@ssgrn
Suchin Gururangan
3 years
Happy to share that I’m joining @facebookai as a visiting researcher in the Seattle NLP group! Looking forward to some fun collaborations next year.
1
1
136
@ssgrn
Suchin Gururangan
1 year
Since it's PhD fellowship season, I wrote down some tips I learned while writing a fellowship proposal last year. Also include my Bloomberg fellowship proposal as reference: Deadline for the Bloomberg fellowship this year is 4/28!
1
22
133
@ssgrn
Suchin Gururangan
4 years
@swabhz and I got hitched - pandemic style! Many thanks to @waleed_ammar @lucyluwang @nlpnoah Bryan, Karen, and Maddy for making this last-minute, DIY wedding so smooth and memorable. 🎉🍾
Tweet media one
Tweet media two
20
0
131
@ssgrn
Suchin Gururangan
4 years
Following @nelsonfliu 's example, I've also shared my personal research statement from my NLP PhD applications last cycle, along with some salient pieces of advice I received while writing it! Check it out here: Good luck to all prospective applicants!
4
18
97
@ssgrn
Suchin Gururangan
2 years
@swabhz and I are driving down to LA tomorrow — we’ll miss all our dear friends and our beloved Seattle! I’ll work remotely from the beach 🏖 while @swabhz starts her new gig @nlp_usc :) LA peeps, please hit us up!! We’re looking forward to all the new adventures 🌊🏄‍♀️☀️
2
1
87
@ssgrn
Suchin Gururangan
2 years
DEMix was accepted at #NAACL2022 @naaclmeeting ! Stay tuned for an updated version with additional experiments/baselines. Super excited about modularity as a mechanism to address the many customization, efficiency, and safety concerns of dense language models!
@ssgrn
Suchin Gururangan
3 years
Excited to introduce DEMix layers, a module with domain "experts" that make a language model modular! You can mix, add, or remove experts, enabling rapid adaptation. 🧵👇 Paper: Work with @ml_perception , @universeinanegg , @nlpnoah , and @LukeZettlemoyer
Tweet media one
4
69
289
3
4
69
@ssgrn
Suchin Gururangan
2 years
We’ll present “Whose Language is High Quality” at the #EMNLP2022 theme track: “Open questions, major obstacles, and unresolved issues in NLP”! We argue that there is no such thing as a general purpose corpus for language models, due to language ideologies of the data curator.
@ssgrn
Suchin Gururangan
2 years
In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions. Paper: 🧵👇
Tweet media one
9
104
448
1
6
68
@ssgrn
Suchin Gururangan
2 months
Shoutout to @orevaahia et al who wrote a great paper that revealed this issue!
@aidangomez
Aidan Gomez
2 months
One subtlety worth mentioning is how significant the tokenizer is to the cost to use models in non-english languages. Our tokenizer is meaningfully better than others at the 9 non-English languages, achieving up to a 2x effective cost reduction to use.
Tweet media one
5
14
125
3
7
71
@ssgrn
Suchin Gururangan
4 years
This model is part of our larger ACL 2020 paper, "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks"; work with @anmarasovic , @swabhz , @kylelostat , @i_beltagy , Doug Downey, and @nlpnoah
Tweet media one
3
9
61
@ssgrn
Suchin Gururangan
3 years
Really proud about my department. Swift response, no gaslighting, no tiptoeing around facts, no indirect language. Clear and unequivocal support for AI ethics and marginalized people in our community. That’s how it’s done!
@uwcse
Allen School
3 years
#UWAllen leadership is aware of recent “discussions” involving Pedro Domingos, a professor emeritus (retired) in our school. We do not condone a member of our community engaging in a Twitter flame war belittling individuals and downplaying valid concerns over ethics in AI. 1/11
30
217
2K
1
2
59
@ssgrn
Suchin Gururangan
3 years
The website/syllabus for the computing ethics class I'm TAing this quarter is up! I wanted to highlight a few of its features that I'm particularly excited about.
1
8
52
@ssgrn
Suchin Gururangan
1 year
I’ll be talking about quality filtering and language ideologies at the 11am session in Hall A-D! Do swing by :) #EMNLP2022
@ssgrn
Suchin Gururangan
2 years
In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions. Paper: 🧵👇
Tweet media one
9
104
448
1
4
52
@ssgrn
Suchin Gururangan
5 years
Code and pre-print of our #ACL2019 paper "Variational Pretraining for Semi-supervised Text Classification" are now available! With @dangitstam , @dallascard , and @nlpnoah . Paper: Code: [1/14]
2
14
50
@ssgrn
Suchin Gururangan
6 years
@NAACLHLT paper with @swabhz @omerlevy_ @royschwartz02 @sleepinyourhat and @nlpnoah is online now! Our work reveals annotation artifacts that inflate the performance of natural language inference models. Take a read:
0
18
50
@ssgrn
Suchin Gururangan
5 years
“Variational Pretraining for Semisupervised Text Classification” with @dangitstam , @dallascard , and @nlpnoah will be at @ACL2019_Italy ! We’ll present a new framework for minimal-compute (ie CPU-friendly) pretraining with VAEs. Code and arxiv link in flight!
1
5
48
@ssgrn
Suchin Gururangan
4 years
Editing ACL talk transcriptions -- among my favorite mistakes so far, at the intro: "Hi everyone, my name is such a girl again...this is a project with on a mirosevic, Swallow swam that the, kylo, is Beltagy, Doug Downey, and it was Smith." 🤦‍♂️
1
0
47
@ssgrn
Suchin Gururangan
3 years
With the academic year starting, just wanted to bump a short post I wrote about writing personal statements for NLP/AI grad school apps: Also, UW has a nice application mentorship program: Hope these are useful resources!
0
6
46
@ssgrn
Suchin Gururangan
4 years
I‘m hosting an #acl2020nlp mentoring session on PhD apps with @sjmielke and @sebgehr on Mon 9am PT! Ping us if you have any burning questions. I’ll also be talking about “Don’t Stop Pretraining” () on Wed 10a (14A) and 2p PT (15B). Hope to see y’all there!
2
7
44
@ssgrn
Suchin Gururangan
1 year
Super excited to give this talk! I'll be discussing BTM () and c-BTM (), and making the argument that we shouldn't train dense language models anymore :) If you'd like to tune in, check the zoom link below!
@USC_ISI
USC ISI
1 year
New #naturallanguage seminar this Thursday! @ssgrn , PhD candidate at @uwcse , will discuss the issues associated with dense-training #languagemodels and introduce a new class of #LMs that are fundamentally modular. Tune in on Zoom here: @USC @USCViterbi
Tweet media one
0
2
12
0
6
44
@ssgrn
Suchin Gururangan
8 years
Tweet media one
0
20
39
@ssgrn
Suchin Gururangan
3 years
Yahooo so proud of @swabhz ! We're soo excited to move to LA!! Getting ready for some sun ☀️and waves 🏄
@swabhz
Swabha Swayamdipta
3 years
I'm thrilled to share some personal news - I'll be joining the University of Southern California @CSatUSC as an Assistant Professor of CS and the Gabilan Assistant Professor in Fall 2022. Super excited to be part of the NLP group at USC @nlp_usc and more broadly, SoCal NLP 😃 🏖️
96
25
777
1
0
36
@ssgrn
Suchin Gururangan
1 year
. @colinraffel , @margs_li , @SamuelAinsworth , and I are proposing a workshop on Collaborative, Communal, and Continual Machine Learning at NeurIPS 2023! If you'd like to be a reviewer for our workshop, please sign up here:
2
12
36
@ssgrn
Suchin Gururangan
5 years
Excited to have a paper around improving reproducibility in NLP with @JesseDodge @royschwartz02 @dallascard and @nlpnoah at #emnlp2019 ! We present a new framework for incorporating computational budget into model performance benchmarks. Stay tuned for arxiv link and code!
0
4
34
@ssgrn
Suchin Gururangan
5 years
A project I've been helping out on! Hopefully it's useful for anyone wanting to speed up their pretraining/fine-tuning setups with @huggingface -compatible LMs.
@i_beltagy
Iz Beltagy
5 years
At last, language model pretraining with PyTorch+TPUs Our code trains PyTorch BERT/RoBERTa on TPUs, which is faster and cheaper than GPUs. Also check the repo for a more detailed comparison between TPUs/GPUs on PyTorch/Tensorflow.
6
66
290
0
4
32
@ssgrn
Suchin Gururangan
1 year
Check out our new paper on editing models with task arithmetic! I’m really excited about this direction. Our results suggest that model behaviors organize in a metric space, and interpolation allows us to adopt new behaviors with no additional training. So many cool ideas here!
@gabriel_ilharco
Gabriel Ilharco
1 year
Introducing task vectors! A new way to steer models by doing arithmetic with model weights. Subtract to make models forget, add to make them learn 📜: 🖥️:
Tweet media one
20
271
1K
0
3
32
@ssgrn
Suchin Gururangan
1 year
Looking forward to hanging out with folks at #emnlp2022 this week!!
4
0
30
@ssgrn
Suchin Gururangan
3 months
New paper on using gradient similarity search to select instruction tuning data! We have tricks to make the computation and search efficient, and show gradients from small models can identify useful instructions for larger models. Led by @xiamengzhou and @SadhikaMalladi !
@xiamengzhou
Mengzhou Xia
3 months
Lots of instruction tuning data out there...but how to best adapt LLMs for specific queries? Don’t use ALL of the data, use LESS! 5% beats the full dataset. Can even use one small model to select data for others! Paper: Code: [1/n]
Tweet media one
13
98
434
1
3
29
@ssgrn
Suchin Gururangan
4 months
We found a trick to characterize quality filtering patterns in big web dumps — using “about me” pages where people self-identify with various social dimensions. Lucy discovered a bunch of interesting implicit preferences in filters; check out the paper below to learn more!
@lucy3_li
Lucy Li
4 months
New preprint! 📜 We investigate how ten “quality” and English langID filters, drawn from prior lit on LLM pretraining data curation pipelines, affect webpages linked to self-descriptions of their creators. Paper: Data: 🧵(1/6)
Tweet media one
3
22
139
0
1
26
@ssgrn
Suchin Gururangan
2 years
Hell yeah!! 🔥🔥 Go @swabhz and team!!
@swabhz
Swabha Swayamdipta
2 years
I'm super honored to receive an outstanding paper award at NeurIPS 2021 for MAUVE🟪 with the awesome team of @KrishnaPillutla @rown @jwthickstun @wellecks @YejinChoinka and Zaid Harchaoui! Learn more about our work 👉
18
34
369
0
0
25
@ssgrn
Suchin Gururangan
2 years
If you're interested in doing a PhD, do consider Ana as your adviser! She's an awesome researcher, collaborator, and friend! Also Utah!! ⛰️🏂
@anmarasovic
Ana Marasović
2 years
Some good news! I'm joining the University of Utah @UUtah @UtahSoC as an Assistant Professor this summer. I'm excited I'll be part of the U's NLP group and work alongside @viveksrikumar & @EllenRiloff !
79
33
524
1
1
25
@ssgrn
Suchin Gururangan
2 years
Looking forward to seeing everyone at #NAACL2022 . I'm going to talk about DEMix on Tuesday (4:15p, Elwha A), stop by if you're also excited about modular LMs. New paper version: Happy to chat about research, PhD life, anything -- just say hi!
@ssgrn
Suchin Gururangan
3 years
Excited to introduce DEMix layers, a module with domain "experts" that make a language model modular! You can mix, add, or remove experts, enabling rapid adaptation. 🧵👇 Paper: Work with @ml_perception , @universeinanegg , @nlpnoah , and @LukeZettlemoyer
Tweet media one
4
69
289
0
2
24
@ssgrn
Suchin Gururangan
1 year
For more discussion around the harms associated with data selection for LLMs, and not being transparent about it, check out our paper on the very subject!
@benmschmidt
Ben Schmidt / @[email protected]
1 year
I think we can call it shut on 'Open' AI: the 98 page paper introducing GPT-4 proudly declares that they're disclosing *nothing* about the contents of their training set.
Tweet media one
103
1K
7K
0
2
23
@ssgrn
Suchin Gururangan
8 months
We used OpenLM to train two new LMs: a 1B model on 1.6T tokens, and a 7B model on 1.3T tokens. OpenLM-1B is one of the best 1B models out there, and OpenLM-7B gets similar zero shot accuracy as LLaMA-7B and MPT-7B. Both are publicly available: /2
1
4
22
@ssgrn
Suchin Gururangan
1 year
Excited for our tag-team talk tomorrow on BTM! Check the registration link below if you'd like to attend :)
@stanfordnlp
Stanford NLP Group
1 year
For this week's NLP Seminar, we are excited to host @margs_li and @ssgrn ! The talk will happen Thursday at 11 AM PT. Non-Stanford affiliates registration link: . Information will be sent out one hour before the talk.
Tweet media one
0
12
32
0
4
22
@ssgrn
Suchin Gururangan
1 year
Thanks for having me yesterday, @USC_ISI ! If you missed the talk, you can catch the recording here:
@USC_ISI
USC ISI
1 year
New #naturallanguage seminar this Thursday! @ssgrn , PhD candidate at @uwcse , will discuss the issues associated with dense-training #languagemodels and introduce a new class of #LMs that are fundamentally modular. Tune in on Zoom here: @USC @USCViterbi
Tweet media one
0
2
12
1
1
22
@ssgrn
Suchin Gururangan
4 months
Check out our new paper on x-BTM, multilingual Branch-Train-Merge! More evidence that dense models leave a lot on the table when training on very heterogeneous data.
@TerraBlvns
Terra Blevins
4 months
Expert language models go multilingual! Introducing ✨X-ELM✨(Cross-lingual Expert Language Models), a multilingual generalization of the BTM paradigm to efficiently and fairly scale model capacity for many languages! Paper:
Tweet media one
2
39
171
0
1
21
@ssgrn
Suchin Gururangan
3 years
Twitter-verse! @adityakusupati & I are doing a user study to understand how people perceive Internet text, possibly generated by machines. We'd appreciate if you could take our fun 6-question survey, which should take about 15 mins. Thanks a ton! Survey:
2
12
21
@ssgrn
Suchin Gururangan
3 years
Also, for anyone considering UW CSE, don’t let the toxic threads deter you. There are tons of opportunities to study and do research in computing ethics here; lots of friendly and welcoming people working on it from a POV of HCI, accessibility, security, AI.
0
3
21
@ssgrn
Suchin Gururangan
2 years
We introduced lo-fi, a simple way to reduce the costs of fine tuning large models on multi-node GPU clusters. Instead of fine-tuning a single model across nodes, fine-tune k single-node workers and average their parameters at the end! Check out more details from @Mitchnw below.
@Mitchnw
Mitchell Wortsman
2 years
Progress in model averaging raises the Q: is communication between nodes necessary during fine-tuning? A: In certain settings (e.g., DeiT IN1k or OPT CC fine-tune), local fine-tuning matches performance lo-fi: distributed fine-tuning without communication
Tweet media one
2
27
105
1
1
20
@ssgrn
Suchin Gururangan
4 years
Just gave a talk at the UW Computing + Society colloquium! Discussed RealToxicityPrompts and ethical questions surrounding LM pretraining. Session includes talks on toxicity detection, experience with disability, CS education, and social values in ML:
0
2
20
@ssgrn
Suchin Gururangan
2 years
Do consider @dallascard as an adviser! You’ll work on extremely fun and creative ideas! I’ve learned a ton from him over the years — he’s an awesome collaborator and overall human.
@dallascard
Dallas Card
2 years
I'm really looking forward to reading PhD applications for @umsi ! If you're interested in historical or political text, the cultures and practices of science, or the societal impacts of machine learning, please consider applying by December 1st!
3
30
133
0
1
19
@ssgrn
Suchin Gururangan
6 years
Excited to announce I'll be joining @allenai_org this fall!
2
0
18
@ssgrn
Suchin Gururangan
11 months
Thanks so much for having me! It was a great discussion :)
@MilaNLProc
MilaNLP
11 months
📖For our weekly @MilaNLProc lab seminar, it was a pleasure to have @ssgrn for a discussion on all things language models, open-sourcing and regulation. #NLProc
0
0
5
0
1
17
@ssgrn
Suchin Gururangan
1 year
Lots of great findings in this paper, highly recommend checking it out if you are interested in the food for LMs!
@ShayneRedford
Shayne Longpre
1 year
#NewPaperAlert When and where does pretraining (PT) data matter? We conduct the largest published PT data study, varying: 1⃣ Corpus age 2⃣ Quality/toxicity filters 3⃣ Domain composition We have several recs for model creators… 📜: 1/ 🧵
Tweet media one
12
85
349
0
2
17
@ssgrn
Suchin Gururangan
5 months
Last year, we introduced a new method called “task vectors” (), which allows one to steer model behavior with very simple arithmetic operations. In our new work, we extend this approach to temporal adaptation. /2
1
1
17
@ssgrn
Suchin Gururangan
5 years
In "Show Your Work", we propose a method to compare model performance in the context of computational budget. We also release a new AllenNLP hyperparameter search library! Check it out here:
0
2
17
@ssgrn
Suchin Gururangan
11 months
Heading to #ACL2023NLP ! Looking forward to seeing everyone this week :)
0
0
17
@ssgrn
Suchin Gururangan
4 years
Echoing @sarameghanbeery : I just went through the grad app cycle for NLP — can demystify the process, give feedback, or connect you with folks @uwnlp or other universities. If interested in a research internship/residency @allen_ai hit me up too! DMs are open. @black_in_ai
@sarameghanbeery
Sara Beery
4 years
More concretely, any Black students in AI interested in applying to @Caltech this fall? I want to help - with application strategy, successful NSF essays, introductions to Caltech profs, and at the very least I'll fight to make sure your app is seen. @black_in_ai @BlackAFinSTEM
4
37
112
1
2
15
@ssgrn
Suchin Gururangan
4 years
This is an amazing program to get more research experience before starting grad school! Let me know if you have any questions about it.
@ai2_allennlp
AllenNLP
4 years
Wondering about deferring a grad school admittance for a year, or want to prepare more before applying? The AllenNLP research team at @allen_ai will be considering applications for our predoctoral young investigator program soon. Get your application in by Feb. 15!
1
16
81
0
1
15
@ssgrn
Suchin Gururangan
8 months
@jeremyphoward @Mitchnw @Vaishaal @sy_gadre @achalddave @lschmidt3 @laion_ai @StabilityAI The main feature of OpenLM is that it is minimal - no major dependencies beyond PyTorch/webdataset and one can read/modify all of the code in a short period of time! so like nanoGPT, but scales to thousands of GPUs :)
2
0
14
@ssgrn
Suchin Gururangan
4 years
Stay tuned for preprint, code, and many more RoBERTas in the coming weeks! 4/4
1
0
14
@ssgrn
Suchin Gururangan
3 years
Also, I wouldn’t be doing NLP research if it weren’t for the UW CLMS program, which @emilymbender spearheads! She has helped build the careers of a *ton* of people in the field.
Apparently the distinguished Emily M. Bender is "mediocre and mostly ignored." I didn't know what "AI" was when Emily finished her PhD at Stanford 1 year before I joined as an undergrad. I'm pretty sure she taught Meg. And dude wants to say we gave HER "fame"? GTFOH.
9
23
182
0
2
13
@ssgrn
Suchin Gururangan
3 years
This team was such a pleasure to work with, and @eaclark07 was an incredible project lead! Thanks for the recognition :D
@eaclark07
Elizabeth Clark
3 years
Update: it’s been selected as an #ACL2021NLP Outstanding Paper! 🎉 The zoom conversations with @tal_august , Sofia, Nikita, @ssgrn , and @nlpnoah about human evals were a weekly highlight for me, so I’m very excited they’re being recognized like this!
7
8
160
0
0
13
@ssgrn
Suchin Gururangan
5 months
Our results show that time is really structured in LM weight space, and further shows the power of interpolation for model editing! Let us know if you have any comments or questions. /8
2
0
13
@ssgrn
Suchin Gururangan
10 months
We are releasing all resources, including our permissively licensed training data (Open License Corpus 📄) & pretrained LMs! See our paper for more details and rich areas for future work toward mitigating the legal risk of LMs 🌟
Tweet media one
2
2
13
@ssgrn
Suchin Gururangan
2 years
To appear at #EMNLP2022 , we introduce a new massively multi domain dataset called M2D2! M2D2 is built on Wikipedia and Semantic Scholar, and serves as a testbed for studying language model domain adaptation in the wild. Led by the great @machelreid ! Check out the thread below 👇
@machelreid
Machel Reid
2 years
New paper at #EMNLP2022 ! We introduce M2D2, the first massively multi-domain language modeling dataset with ~150 hierarchically organized domains! Work with @hllo_wrld @ssgrn @LukeZettlemoyer Paper: Code:
Tweet media one
6
53
294
0
1
13
@ssgrn
Suchin Gururangan
2 years
A key issue is that language ideologies in data selection go undocumented and implicit. We believe the community could be much more intentional and transparent in their data selection practices. These decisions may be informed by specific downstream applications! /12
1
3
12
@ssgrn
Suchin Gururangan
11 months
BTM ftw 😉
@soumithchintala
Soumith Chintala
11 months
i might have heard the same 😃 -- I guess info like this is passed around but no one wants to say it out loud. GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference. Glad that Geohot said it out loud. Though, at this point, GPT-4 is
57
390
2K
1
0
12
@ssgrn
Suchin Gururangan
4 years
It’s SO easy to find toxic content in Webtext/Common Crawl, which is considered “standard” for pretraining LMs. Very important, hard research to be done on how to better detect such content, understand how LMs are affected by it, and how to control unwanted behavior.
@ani_nenkova
Ani Nenkova
4 years
The average engineer deploying ML systems may or may not understand how much the resulting product depends on the training data. They are likely to say they train on the ‘standard dataset used in research’ ML researchers are responsible for clarifying the possibilities for bias
8
44
308
2
3
12
@ssgrn
Suchin Gururangan
4 years
Problematic use case of LMs: elevating a single variant of English as "Standard", erasing valid variants of the language, like those spoken by non-L1 speakers!
Tweet media one
2
4
12
@ssgrn
Suchin Gururangan
5 months
Time vectors are constructed by fine-tuning a language model towards text from a certain time period, and then subtracting those weights from the pretrained model. This represents a direction of movement in weight space that improves performance on a time period. /3
1
0
12
@ssgrn
Suchin Gururangan
1 year
Awesome comprehensive survey of a burgeoning area of research!
@seb_ruder
Sebastian Ruder
1 year
In our new survey “Modular Deep Learning”, we provide a unified taxonomy of the building blocks of modular neural nets and connect disparate threads of research. 📄 📢 🌐 w/ @PfeiffJo @licwu @PontiEdoardo
Tweet media one
8
97
426
0
0
11
@ssgrn
Suchin Gururangan
5 months
We find that time vectors are structured in a manifold, where time vectors closer together are also adjacent in time. The angle between two time vectors are really well correlated with the degree of temporal misalignment. /5
Tweet media one
2
0
11
@ssgrn
Suchin Gururangan
2 years
We argue that quality filtering implies a language ideology – a sociolinguistics term for a subjective belief about language use. These ideologies are often implicit/undocumented. What language is high quality enough to be included in the corpus? Whose language is excluded? /3
1
3
11
@ssgrn
Suchin Gururangan
9 months
This is a great compilation of advice! One thing that helped me a lot is to learn to be shameless about bad first drafts. Good papers, like any art, are the product of refinement and iteration! Just be sure to be open to feedback and messing around with stuff till it clicks.
@VeredShwartz
Vered Shwartz
9 months
I just published Tips for Writing NLP Papers I wrote it for my students so I don't have to sound like a broken record (and edit papers for the same issues over and over again 😃). But some of you might find it useful too.
28
238
1K
0
1
10
@ssgrn
Suchin Gururangan
5 years
If you're at #WecNLP19 , come by the poster session at 5:30p (location #25 ) to discuss cheap, low-resource pretraining!
Tweet media one
0
0
10
@ssgrn
Suchin Gururangan
10 months
Why does SILO mitigate legal risks? 🤔 Because data owners can *remove* their data from the model entirely at any time. Use of their data can also be *attributed* at the token-level. SILO better aligns with various data-use regulations (e.g. fair use, GDPR, etc).
1
0
10
@ssgrn
Suchin Gururangan
2 years
Current large language models require massive multi-node synchronization during training. In contrast, for ELMforests, different parts of the model are independently trained on different subsets of training data, with no need for multi-node training/inference. 2/n
Tweet media one
2
0
10
@ssgrn
Suchin Gururangan
4 years
Bubbling this up -- please sign up, NLP friends!!
@sarameghanbeery
Sara Beery
4 years
@ssgrn @uwnlp @allen_ai @black_in_ai From @red_abebe - @black_in_ai has a mentorship program for grad apps in place, you can join BAI as an ally to get matched with mentees:
0
0
2
0
8
10
@ssgrn
Suchin Gururangan
2 years
Our results suggest there is no such thing as a general-purpose corpus, since selecting data for language models is highly subjective. The data curator must adopt a language ideology, and it will likely conflict with other perspectives of what makes text high quality. /11
1
2
9
@ssgrn
Suchin Gururangan
4 years
In this work, we investigate how to effectively adapt language models to distant domains. We explore questions like: What are domains in NLP? Are they hierarchical and overlapping? Can you generate a micro-domain around a task? Are language models truly language universal? 3/4
1
1
10
@ssgrn
Suchin Gururangan
8 years
masterchief data scientist @rapid7 @hrbrmstr getting ready to speak @SOURCEConf
Tweet media one
Tweet media two
1
2
10
@ssgrn
Suchin Gururangan
3 years
Super cool! @snehaark et al show that task level routing leads to better expert specialization and efficient inference options for large models. These results are super related to the modular features we observed in DEMix layers. Task/domain level routing looks really promising!
@snehaark
Sneha Kudugunta
3 years
#EMNLP2021 Findings paper “Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference” w/ @bignamehyp , @ankurbpn , Maxim Krikun, @lepikhin , @lmthang , @orf_bnw about TaskMoE, an inference friendly alternative to token-based MoEs. Link: 1/n
Tweet media one
2
13
101
0
1
9
@ssgrn
Suchin Gururangan
1 year
Happy to share that lo-fi will appear in TMLR!
@Mitchnw
Mitchell Wortsman
2 years
Progress in model averaging raises the Q: is communication between nodes necessary during fine-tuning? A: In certain settings (e.g., DeiT IN1k or OPT CC fine-tune), local fine-tuning matches performance lo-fi: distributed fine-tuning without communication
Tweet media one
2
27
105
0
2
9
@ssgrn
Suchin Gururangan
10 months
But how good is SILO against existing LMs? 👀 Its performance is much better than using low-risk data only, and in fact, is close to the model trained on all data unrestrictedly (Pythia). (parametric-only is much worse, but using a datastore is a key 🚀)
1
0
9
@ssgrn
Suchin Gururangan
4 years
Super cool effort and the right way to go with NLP benchmarks -- there is lots of exciting research to do on how to handle evolving model evaluations. I think this will have a particularly strong impact in bias measurements in NLP, where annotator subjectivity plays an huge role!
@douwekiela
Douwe Kiela
4 years
I’m super excited to announce Dynabench - a new and ambitious research platform for dynamic data collection and benchmarking: 1/n
9
117
460
0
1
9
@ssgrn
Suchin Gururangan
5 months
We also extrapolate task-specific models _to the future_ by doing an analogy arithmetic: create a time vector with a language model finetuned to unlabeled data in the year you want to generalize to, and then interpolate that time vector with your task specific model. /7
Tweet media one
2
0
9
@ssgrn
Suchin Gururangan
2 years
ELMforests with BTM scale *really* well compared to regular dense LMs, both in and out of domain. ELM ensembles perform the best, but we also get big boosts by collapsing ELMs into a single LM with weighted parameter-averaging! 5/n
Tweet media one
2
1
9
@ssgrn
Suchin Gururangan
8 years
#RustConf 2016 /r/playrust classifier slides available here
0
6
9
@ssgrn
Suchin Gururangan
4 years
This thread beautifully articulates the assimilationist racism that Indian American immigrants simultaneously endure and perpetuate. 1/5
@SahajKohli
Sahaj Kaur Kohli
4 years
When I was a teenager -- with religiously unshorn, long, thick hair -- I'd stand in front of the mirror wishing I had what I'd call a "white girl ponytail" -- thin, short, light hair that could be thrown on the top of my head all undemanding & bouncy.
8
31
232
1
0
9
@ssgrn
Suchin Gururangan
2 years
Right now, despite their impressive performance, LLMs are expensive, centralized monoliths. These results make me excited about the possibility of making LLMs more democratized, collaborative, and flexible! Only more research will tell how this will play out :) 10/n
1
0
9
@ssgrn
Suchin Gururangan
4 years
Applications are live for PYIs and interns at AllenNLP! Feel free to DM if you have any questions about the positions, team, etc.
1
2
9