Suchin Gururangan @ssgrn Twitter profile

Pinned Tweet

Suchin Gururangan

@ssgrn

1 month

Llama3-8B and 70B have dropped!! Extremely grateful to have been part of this journey. More coming soon :)

4

3

109

Last Seen Profiles

@pocketsfanpage

@mono_n9

@Rexy2024

@clavery13

@BarratAdel

@noob1884_

@MBXXVV

@ambush_official

@brave_nakaya

@SpeckerLaura

@officialamro1

@mattypatties

@minjoon

@icardo8

@kooo_gi

@wfcovidvaccines

@Bluegatestud2

@TAMUT_Athletics

@helicopterosaur

@awzzr

@ProTVSports

@kayana_traylor

@ilymisuga

@mokazujyounetu_

@A_V_West

@AR3KYM

@Ssb4_Zackray

@ZoeDoyle

@LucieEbrey

@Wonyouyng

@bigmiikee

@csu_apa

@kooo_gi

@_StephanieMyers

@mcconnellbj1

@JustPlainJoeRI

Suchin Gururangan

@ssgrn

2 years

We present the ELMforest🌳: an embarrassingly parallel language model. An ELMforest contains many smaller expert LMs (ELMs) that can be added/removed, ensembled, or parameter-averaged at any time for efficient scaling and rapid customization. 🧵👇

7

87

495

Suchin Gururangan

@ssgrn

4 months

Extremely happy to announce that I defended my PhD; next week is my first day on the LLaMA team at @Meta GenAI 🥳🦙!! Excited to help build the best open models in the world. To the UWNLP community, thank you for everything. You can watch my defense at

Data-Centric Methods for Decentralizing Large Language Models

Suchin Gururangan's PhD defense

www.youtube.com

32

12

455

Suchin Gururangan

@ssgrn

2 years

In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions. Paper: 🧵👇

9

104

448

Suchin Gururangan

@ssgrn

8 months

Excited to introduce ✨OpenLM: a simple, efficient, and customizable LLM training library! Made with @Mitchnw , @Vaishaal , @sy_gadre , @achalddave , @lschmidt3 , and others, in collaboration with @laion_ai and @StabilityAI . /1

Introducing OpenLM | LAION

laion.ai

4

85

429

Suchin Gururangan

@ssgrn

5 months

Introducing time vectors! Time vectors are a simple way adapt LMs to new time periods; our results suggest that time is encoded in the weights of finetuned models. Led by my incredible undergrad mentee, Kai Nylund! Paper: Code: /1

5

61

332

Suchin Gururangan

@ssgrn

3 years

Excited to introduce DEMix layers, a module with domain "experts" that make a language model modular! You can mix, add, or remove experts, enabling rapid adaptation. 🧵👇 Paper: Work with @ml_perception , @universeinanegg , @nlpnoah , and @LukeZettlemoyer

4

69

289

Suchin Gururangan

@ssgrn

4 years

BioMed-RoBERTa is now available on @huggingface transformers! Check it out on the new @allen_ai model repository: . We hope this model is useful for researchers working on bioNLP applications, like those for CORD-19. 1/4

allenai/biomed_roberta_base · Hugging Face

huggingface.co

2

57

241

Suchin Gururangan

@ssgrn

10 months

Feel risky to train your language model on copyrighted data? Check out our new LM called SILO✨, with co-lead @sewon__min Recipe: collect public domain & permissively licensed text data, fit parameters on it, and use the rest of the data in an inference-time-only datastore.

2

55

241

Suchin Gururangan

@ssgrn

4 years

New Findings #emnlp2020 paper is live at ! With @samgehman @MaartenSap @YejinChoinka @nlpnoah , we present RealToxicityPrompts, a scalable evaluation framework for measuring toxicity in NLG, and discover pervasive toxicity in training data of recent LMs.👇

5

57

230

Suchin Gururangan

@ssgrn

1 year

We present Cluster-Branch-Train-Merge (c-BTM), a new way to scale sparse expert LLMs on any dataset — completely asynchronously. 🧵👇 Paper: Code + Models:

3

40

218

Suchin Gururangan

@ssgrn

4 years

Labeling researchers like @timnitgebru who are not being heard as "emotional" is a sexist and condescending prescription. The real way one delays solutions to bias in ML is by gaslighting the experts, and then saying if they call out your BS, they are preventing progress.

Yann LeCun

@ylecun

4 years

@timnitGebru @soumithchintala It's also important to avoid assuming bad intent from your interlocutor. It only serves to inflame emotions, to hurt people who could be helpful, to mask the real issues, to delay the development of meaningful solutions, and to delay meaningful action. 17/N N=17.

20

14

341

3

31

204

Suchin Gururangan

@ssgrn

4 years

1/ Really excited about this one! "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks" is live! With @anmarasovic , @swabhz , @kylelostat , @i_beltagy , Doug Downey, and @nlpnoah , to appear at ACL2020. Paper: Code:

GitHub - allenai/dont-stop-pretraining: Code associated with the Don't Stop Pretraining ACL 2020...

Code associated with the Don't Stop Pretraining ACL 2020 paper - allenai/dont-stop-pretraining

github.com

3

34

159

Suchin Gururangan

@ssgrn

1 year

Thanks to @TechAtBloomberg for supporting my research, and a special thanks to @uwnlp , @MetaAI , and @allen_ai for helping me grow as a scientist and human. In the interest of open access to academic materials, I've released my fellowship statement here:

Tech At Bloomberg

@TechAtBloomberg

1 year

Congratulations to @uwcse + @uwnlp 's @ssgrn on being named one of the 2022-2023 @Bloomberg #DataScience Ph.D. Fellows! Learn more about his research focus and the other Fellows in our newest cohort: #AI #ML #NLProc

0

4

38

15

12

159

Suchin Gururangan

@ssgrn

4 years

Excited to share that I'm joining @uwcse and @uwnlp to start my PhD in Computer Science this fall! 🥳 Looking forward to continue being part of the incredible research community in Seattle. Thanks to everyone for advice and discussions along the way!!

12

2

157

Suchin Gururangan

@ssgrn

4 years

So honored to have won honorable mention at #acl2020nlp ! This project would not have been possible without the amazing cross-team collaboration at AI2, many thanks to @anmarasovic @swabhz @kylelostat @_DougDowney @i_beltagy @nlpnoah . Come to our QA sessions at 10am and 2p PST!

ACL 2024

@aclmeeting

4 years

Honorable mention for overall best paper (1) at #acl2020nlp : Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey and Noah A. Smith

2

11

94

7

18

148

Suchin Gururangan

@ssgrn

3 years

Happy to share that I’m joining @facebookai as a visiting researcher in the Seattle NLP group! Looking forward to some fun collaborations next year.

1

136

Suchin Gururangan

@ssgrn

1 year

Since it's PhD fellowship season, I wrote down some tips I learned while writing a fellowship proposal last year. Also include my Bloomberg fellowship proposal as reference: Deadline for the Bloomberg fellowship this year is 4/28!

Data Science Ph.D. Fellowship | Bloomberg LP

Apply now for the Bloomberg Data Science Ph.D. Fellowship program. Applications are due by April 28, 2023 for the 2023-2024 academic year.

www.bloomberg.com

1

22

133

Suchin Gururangan

@ssgrn

4 years

@swabhz and I got hitched - pandemic style! Many thanks to @waleed_ammar @lucyluwang @nlpnoah Bryan, Karen, and Maddy for making this last-minute, DIY wedding so smooth and memorable. 🎉🍾

20

0

131

Suchin Gururangan

@ssgrn

4 years

Following @nelsonfliu 's example, I've also shared my personal research statement from my NLP PhD applications last cycle, along with some salient pieces of advice I received while writing it! Check it out here: Good luck to all prospective applicants!

4

18

97

Suchin Gururangan

@ssgrn

2 years

@swabhz and I are driving down to LA tomorrow — we’ll miss all our dear friends and our beloved Seattle! I’ll work remotely from the beach 🏖 while @swabhz starts her new gig @nlp_usc :) LA peeps, please hit us up!! We’re looking forward to all the new adventures 🌊🏄‍♀️☀️

2

1

87

Suchin Gururangan

@ssgrn

2 years

DEMix was accepted at #NAACL2022 @naaclmeeting ! Stay tuned for an updated version with additional experiments/baselines. Super excited about modularity as a mechanism to address the many customization, efficiency, and safety concerns of dense language models!

Suchin Gururangan

@ssgrn

3 years

Excited to introduce DEMix layers, a module with domain "experts" that make a language model modular! You can mix, add, or remove experts, enabling rapid adaptation. 🧵👇 Paper: Work with @ml_perception , @universeinanegg , @nlpnoah , and @LukeZettlemoyer

4

69

289

3

4

69

Suchin Gururangan

@ssgrn

2 years

We’ll present “Whose Language is High Quality” at the #EMNLP2022 theme track: “Open questions, major obstacles, and unresolved issues in NLP”! We argue that there is no such thing as a general purpose corpus for language models, due to language ideologies of the data curator.

Suchin Gururangan

@ssgrn

2 years

In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions. Paper: 🧵👇

9

104

448

1

6

68

Suchin Gururangan

@ssgrn

2 months

Shoutout to @orevaahia et al who wrote a great paper that revealed this issue!

Do All Languages Cost the Same? Tokenization in the Era of...

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The...

arxiv.org

Aidan Gomez

@aidangomez

2 months

One subtlety worth mentioning is how significant the tokenizer is to the cost to use models in non-english languages. Our tokenizer is meaningfully better than others at the 9 non-English languages, achieving up to a 2x effective cost reduction to use.

5

14

125

3

7

71

Suchin Gururangan

@ssgrn

4 years

This model is part of our larger ACL 2020 paper, "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks"; work with @anmarasovic , @swabhz , @kylelostat , @i_beltagy , Doug Downey, and @nlpnoah

3

9

61

Suchin Gururangan

@ssgrn

3 years

Really proud about my department. Swift response, no gaslighting, no tiptoeing around facts, no indirect language. Clear and unequivocal support for AI ethics and marginalized people in our community. That’s how it’s done!

Allen School

@uwcse

3 years

#UWAllen leadership is aware of recent “discussions” involving Pedro Domingos, a professor emeritus (retired) in our school. We do not condone a member of our community engaging in a Twitter flame war belittling individuals and downplaying valid concerns over ethics in AI. 1/11

30

217

2K

1

2

59

Suchin Gururangan

@ssgrn

3 years

The website/syllabus for the computing ethics class I'm TAing this quarter is up! I wanted to highlight a few of its features that I'm particularly excited about.

UW CSE Computing Ethics, Fall 2021

Paul G. Allen School of Computer Science & Engineering, University of Washington

uw-cse599p.github.io

1

8

52

Suchin Gururangan

@ssgrn

1 year

I’ll be talking about quality filtering and language ideologies at the 11am session in Hall A-D! Do swing by :) #EMNLP2022

Suchin Gururangan

@ssgrn

2 years

In our new interdisciplinary work, “Whose Language Counts as High Quality?”, we empirically demonstrate that the data selection procedures for language models like GPT-3 implicitly favor text written by authors from powerful social positions. Paper: 🧵👇

9

104

448

1

4

52

Suchin Gururangan

@ssgrn

5 years

Code and pre-print of our #ACL2019 paper "Variational Pretraining for Semi-supervised Text Classification" are now available! With @dangitstam , @dallascard , and @nlpnoah . Paper: Code: [1/14]

GitHub - allenai/vampire: Variational Methods for Pretraining in Resource-limited Environments

Variational Methods for Pretraining in Resource-limited Environments - allenai/vampire

github.com

2

14

50

Suchin Gururangan

@ssgrn

6 years

@NAACLHLT paper with @swabhz @omerlevy_ @royschwartz02 @sleepinyourhat and @nlpnoah is online now! Our work reveals annotation artifacts that inflate the performance of natural language inference models. Take a read:

0

18

50

Suchin Gururangan

@ssgrn

5 years

“Variational Pretraining for Semisupervised Text Classification” with @dangitstam , @dallascard , and @nlpnoah will be at @ACL2019_Italy ! We’ll present a new framework for minimal-compute (ie CPU-friendly) pretraining with VAEs. Code and arxiv link in flight!

1

5

48

Suchin Gururangan

@ssgrn

4 years

Editing ACL talk transcriptions -- among my favorite mistakes so far, at the intro: "Hi everyone, my name is such a girl again...this is a project with on a mirosevic, Swallow swam that the, kylo, is Beltagy, Doug Downey, and it was Smith." 🤦‍♂️

1

0

47

Suchin Gururangan

@ssgrn

3 years

With the academic year starting, just wanted to bump a short post I wrote about writing personal statements for NLP/AI grad school apps: Also, UW has a nice application mentorship program: Hope these are useful resources!

Personal Statement Advice

I’m writing this post because I was inspired by Nelson Liu, who made his personal research statement public. The information needed to get into the ivory tower is not that accessible, and it’s...

suchin.io

0

6

46

Suchin Gururangan

@ssgrn

4 years

I‘m hosting an #acl2020nlp mentoring session on PhD apps with @sjmielke and @sebgehr on Mon 9am PT! Ping us if you have any burning questions. I’ll also be talking about “Don’t Stop Pretraining” () on Wed 10a (14A) and 2p PT (15B). Hope to see y’all there!

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still...

arxiv.org

2

7

44

Suchin Gururangan

@ssgrn

1 year

Super excited to give this talk! I'll be discussing BTM () and c-BTM (), and making the argument that we shouldn't train dense language models anymore :) If you'd like to tune in, check the zoom link below!

Scaling Expert Language Models with Unsupervised Domain Discovery

Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We...

arxiv.org

USC ISI

@USC_ISI

1 year

New #naturallanguage seminar this Thursday! @ssgrn , PhD candidate at @uwcse , will discuss the issues associated with dense-training #languagemodels and introduce a new class of #LMs that are fundamentally modular. Tune in on Zoom here: @USC @USCViterbi

0

2

12

0

6

44

Suchin Gururangan

@ssgrn

8 years

Yes. #rustconf

0

20

39

Suchin Gururangan

@ssgrn

3 years

Yahooo so proud of @swabhz ! We're soo excited to move to LA!! Getting ready for some sun ☀️and waves 🏄

Swabha Swayamdipta

@swabhz

3 years

I'm thrilled to share some personal news - I'll be joining the University of Southern California @CSatUSC as an Assistant Professor of CS and the Gabilan Assistant Professor in Fall 2022. Super excited to be part of the NLP group at USC @nlp_usc and more broadly, SoCal NLP 😃 🏖️

96

25

777

1

0

36

Suchin Gururangan

@ssgrn

1 year

. @colinraffel , @margs_li , @SamuelAinsworth , and I are proposing a workshop on Collaborative, Communal, and Continual Machine Learning at NeurIPS 2023! If you'd like to be a reviewer for our workshop, please sign up here:

CoML Workshop Reviewer Recruitment

Thanks for considering to be a reviewer for our proposed workshop on Collaborative, Communal and Continual Machine Learning (CoML) at NeurIPS 2023. We will contact you with further details if our...

docs.google.com

2

12

36

Suchin Gururangan

@ssgrn

5 years

Excited to have a paper around improving reproducibility in NLP with @JesseDodge @royschwartz02 @dallascard and @nlpnoah at #emnlp2019 ! We present a new framework for incorporating computational budget into model performance benchmarks. Stay tuned for arxiv link and code!

0

4

34

Suchin Gururangan

@ssgrn

5 years

A project I've been helping out on! Hopefully it's useful for anyone wanting to speed up their pretraining/fine-tuning setups with @huggingface -compatible LMs.

Iz Beltagy

@i_beltagy

5 years

At last, language model pretraining with PyTorch+TPUs Our code trains PyTorch BERT/RoBERTa on TPUs, which is faster and cheaper than GPUs. Also check the repo for a more detailed comparison between TPUs/GPUs on PyTorch/Tensorflow.

6

66

290

0

4

32

Suchin Gururangan

@ssgrn

1 year

Check out our new paper on editing models with task arithmetic! I’m really excited about this direction. Our results suggest that model behaviors organize in a metric space, and interpolation allows us to adopt new behaviors with no additional training. So many cool ideas here!

Gabriel Ilharco

@gabriel_ilharco

1 year

Introducing task vectors! A new way to steer models by doing arithmetic with model weights. Subtract to make models forget, add to make them learn 📜: 🖥️:

20

271

1K

0

3

32

Suchin Gururangan

@ssgrn

1 year

Looking forward to hanging out with folks at #emnlp2022 this week!!

4

0

30

Suchin Gururangan

@ssgrn

3 months

New paper on using gradient similarity search to select instruction tuning data! We have tricks to make the computation and search efficient, and show gradients from small models can identify useful instructions for larger models. Led by @xiamengzhou and @SadhikaMalladi !

Mengzhou Xia

@xiamengzhou

3 months

Lots of instruction tuning data out there...but how to best adapt LLMs for specific queries? Don’t use ALL of the data, use LESS! 5% beats the full dataset. Can even use one small model to select data for others! Paper: Code: [1/n]

13

98

434

1

3

29

Suchin Gururangan

@ssgrn

4 months

We found a trick to characterize quality filtering patterns in big web dumps — using “about me” pages where people self-identify with various social dimensions. Lucy discovered a bunch of interesting implicit preferences in filters; check out the paper below to learn more!

Lucy Li

@lucy3_li

4 months

New preprint! 📜 We investigate how ten “quality” and English langID filters, drawn from prior lit on LLM pretraining data curation pipelines, affect webpages linked to self-descriptions of their creators. Paper: Data: 🧵(1/6)

3

22

139

0

1

26

Suchin Gururangan

@ssgrn

2 years

Hell yeah!! 🔥🔥 Go @swabhz and team!!

Swabha Swayamdipta

@swabhz

2 years

I'm super honored to receive an outstanding paper award at NeurIPS 2021 for MAUVE🟪 with the awesome team of @KrishnaPillutla @rown @jwthickstun @wellecks @YejinChoinka and Zaid Harchaoui! Learn more about our work 👉

18

34

369

0

25

Suchin Gururangan

@ssgrn

2 years

If you're interested in doing a PhD, do consider Ana as your adviser! She's an awesome researcher, collaborator, and friend! Also Utah!! ⛰️🏂

Ana Marasović

@anmarasovic

2 years

Some good news! I'm joining the University of Utah @UUtah @UtahSoC as an Assistant Professor this summer. I'm excited I'll be part of the U's NLP group and work alongside @viveksrikumar & @EllenRiloff !

79

33

524

1

25

Suchin Gururangan

@ssgrn

2 years

Looking forward to seeing everyone at #NAACL2022 . I'm going to talk about DEMix on Tuesday (4:15p, Elwha A), stop by if you're also excited about modular LMs. New paper version: Happy to chat about research, PhD life, anything -- just say hi!

DEMix Layers: Disentangling Domains for Modular Language Modeling

Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human...

aclanthology.org

Suchin Gururangan

@ssgrn

3 years

Excited to introduce DEMix layers, a module with domain "experts" that make a language model modular! You can mix, add, or remove experts, enabling rapid adaptation. 🧵👇 Paper: Work with @ml_perception , @universeinanegg , @nlpnoah , and @LukeZettlemoyer

4

69

289

0

2

24

Suchin Gururangan

@ssgrn

1 year

For more discussion around the harms associated with data selection for LLMs, and not being transparent about it, check out our paper on the very subject!

Whose Language Counts as High Quality? Measuring Language...

Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and newswire often...

arxiv.org

Ben Schmidt / @[email protected]

@benmschmidt

1 year

I think we can call it shut on 'Open' AI: the 98 page paper introducing GPT-4 proudly declares that they're disclosing *nothing* about the contents of their training set.

103

1K

7K

0

2

23

Suchin Gururangan

@ssgrn

8 months

We used OpenLM to train two new LMs: a 1B model on 1.6T tokens, and a 7B model on 1.3T tokens. OpenLM-1B is one of the best 1B models out there, and OpenLM-7B gets similar zero shot accuracy as LLaMA-7B and MPT-7B. Both are publicly available: /2

mlfoundations (ML Foundations)

huggingface.co

1

4

22

Suchin Gururangan

@ssgrn

1 year

Excited for our tag-team talk tomorrow on BTM! Check the registration link below if you'd like to attend :)

Stanford NLP Group

@stanfordnlp

1 year

For this week's NLP Seminar, we are excited to host @margs_li and @ssgrn ! The talk will happen Thursday at 11 AM PT. Non-Stanford affiliates registration link: . Information will be sent out one hour before the talk.

0

12

32

0

4

22

Suchin Gururangan

@ssgrn

1 year

Thanks for having me yesterday, @USC_ISI ! If you missed the talk, you can catch the recording here:

Modular Language Models

Date Presented: 04/20/2023Speaker: Suchin Gururangan, University of WashingtonAbstract:Conventional language models (LMs) are trained densely: all parameters...

www.youtube.com

USC ISI

@USC_ISI

1 year

New #naturallanguage seminar this Thursday! @ssgrn , PhD candidate at @uwcse , will discuss the issues associated with dense-training #languagemodels and introduce a new class of #LMs that are fundamentally modular. Tune in on Zoom here: @USC @USCViterbi

0

2

12

1

22

Suchin Gururangan

@ssgrn

4 months

Check out our new paper on x-BTM, multilingual Branch-Train-Merge! More evidence that dense models leave a lot on the table when training on very heterogeneous data.

Terra Blevins

@TerraBlvns

4 months

Expert language models go multilingual! Introducing ✨X-ELM✨(Cross-lingual Expert Language Models), a multilingual generalization of the BTM paradigm to efficiently and fairly scale model capacity for many languages! Paper:

2

39

171

0

1

21

Suchin Gururangan

@ssgrn

3 years

Twitter-verse! @adityakusupati & I are doing a user study to understand how people perceive Internet text, possibly generated by machines. We'd appreciate if you could take our fun 6-question survey, which should take about 15 mins. Thanks a ton! Survey:

2

12

21

Suchin Gururangan

@ssgrn

3 years

Also, for anyone considering UW CSE, don’t let the toxic threads deter you. There are tons of opportunities to study and do research in computing ethics here; lots of friendly and welcoming people working on it from a POV of HCI, accessibility, security, AI.

0

3

21

Suchin Gururangan

@ssgrn

2 years

We introduced lo-fi, a simple way to reduce the costs of fine tuning large models on multi-node GPU clusters. Instead of fine-tuning a single model across nodes, fine-tune k single-node workers and average their parameters at the end! Check out more details from @Mitchnw below.

Mitchell Wortsman

@Mitchnw

2 years

Progress in model averaging raises the Q: is communication between nodes necessary during fine-tuning? A: In certain settings (e.g., DeiT IN1k or OPT CC fine-tune), local fine-tuning matches performance lo-fi: distributed fine-tuning without communication

2

27

105

1

20

Suchin Gururangan

@ssgrn

7 months

@abacaj We have a paper on this very subject!

Whose Language Counts as High Quality? Measuring Language...

Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and newswire often...

arxiv.org

1

0

20

Suchin Gururangan

@ssgrn

4 years

Just gave a talk at the UW Computing + Society colloquium! Discussed RealToxicityPrompts and ethical questions surrounding LM pretraining. Session includes talks on toxicity detection, experience with disability, CS education, and social values in ML:

0

2

20

Suchin Gururangan

@ssgrn

2 years

Do consider @dallascard as an adviser! You’ll work on extremely fun and creative ideas! I’ve learned a ton from him over the years — he’s an awesome collaborator and overall human.

Dallas Card

@dallascard

2 years

I'm really looking forward to reading PhD applications for @umsi ! If you're interested in historical or political text, the cultures and practices of science, or the societal impacts of machine learning, please consider applying by December 1st!

3

30

133

0

1

19

Suchin Gururangan

@ssgrn

6 years

Excited to announce I'll be joining @allenai_org this fall!

2

0

18

Suchin Gururangan

@ssgrn

11 months

Thanks so much for having me! It was a great discussion :)

MilaNLP

@MilaNLProc

11 months

📖For our weekly @MilaNLProc lab seminar, it was a pleasure to have @ssgrn for a discussion on all things language models, open-sourcing and regulation. #NLProc

0

5

0

1

17

Suchin Gururangan

@ssgrn

1 year

Lots of great findings in this paper, highly recommend checking it out if you are interested in the food for LMs!

Shayne Longpre

@ShayneRedford

1 year

#NewPaperAlert When and where does pretraining (PT) data matter? We conduct the largest published PT data study, varying: 1⃣ Corpus age 2⃣ Quality/toxicity filters 3⃣ Domain composition We have several recs for model creators… 📜: 1/ 🧵

12

85

349

0

2

17

Suchin Gururangan

@ssgrn

5 months

Last year, we introduced a new method called “task vectors” (), which allows one to steer model behavior with very simple arithmetic operations. In our new work, we extend this approach to temporal adaptation. /2

Editing Models with Task Arithmetic

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine...

arxiv.org

1

17

Suchin Gururangan

@ssgrn

5 years

In "Show Your Work", we propose a method to compare model performance in the context of computational budget. We also release a new AllenNLP hyperparameter search library! Check it out here:

GitHub - allenai/allentune: Hyperparameter Search for AllenNLP

Hyperparameter Search for AllenNLP. Contribute to allenai/allentune development by creating an account on GitHub.

github.com

0

2

17

Suchin Gururangan

@ssgrn

11 months

Heading to #ACL2023NLP ! Looking forward to seeing everyone this week :)

0

17

Suchin Gururangan

@ssgrn

4 years

Echoing @sarameghanbeery : I just went through the grad app cycle for NLP — can demystify the process, give feedback, or connect you with folks @uwnlp or other universities. If interested in a research internship/residency @allen_ai hit me up too! DMs are open. @black_in_ai

Sara Beery

@sarameghanbeery

4 years

More concretely, any Black students in AI interested in applying to @Caltech this fall? I want to help - with application strategy, successful NSF essays, introductions to Caltech profs, and at the very least I'll fight to make sure your app is seen. @black_in_ai @BlackAFinSTEM

4

37

112

1

2

15

Suchin Gururangan

@ssgrn

4 years

This is an amazing program to get more research experience before starting grad school! Let me know if you have any questions about it.

AllenNLP

@ai2_allennlp

4 years

Wondering about deferring a grad school admittance for a year, or want to prepare more before applying? The AllenNLP research team at @allen_ai will be considering applications for our predoctoral young investigator program soon. Get your application in by Feb. 15!

1

16

81

0

1

15

Suchin Gururangan

@ssgrn

8 months

@jeremyphoward @Mitchnw @Vaishaal @sy_gadre @achalddave @lschmidt3 @laion_ai @StabilityAI The main feature of OpenLM is that it is minimal - no major dependencies beyond PyTorch/webdataset and one can read/modify all of the code in a short period of time! so like nanoGPT, but scales to thousands of GPUs :)

2

0

14

Suchin Gururangan

@ssgrn

4 years

Stay tuned for preprint, code, and many more RoBERTas in the coming weeks! 4/4

1

0

14

Suchin Gururangan

@ssgrn

3 years

Also, I wouldn’t be doing NLP research if it weren’t for the UW CLMS program, which @emilymbender spearheads! She has helped build the careers of a *ton* of people in the field.

@[email protected] on Mastodon

@timnitGebru

3 years

Apparently the distinguished Emily M. Bender is "mediocre and mostly ignored." I didn't know what "AI" was when Emily finished her PhD at Stanford 1 year before I joined as an undergrad. I'm pretty sure she taught Meg. And dude wants to say we gave HER "fame"? GTFOH.

9

23

182

0

2

13

Suchin Gururangan

@ssgrn

3 years

This team was such a pleasure to work with, and @eaclark07 was an incredible project lead! Thanks for the recognition :D

Elizabeth Clark

@eaclark07

3 years

Update: it’s been selected as an #ACL2021NLP Outstanding Paper! 🎉 The zoom conversations with @tal_august , Sofia, Nikita, @ssgrn , and @nlpnoah about human evals were a weekly highlight for me, so I’m very excited they’re being recognized like this!

7

8

160

0

13

Suchin Gururangan

@ssgrn

5 months

Our results show that time is really structured in LM weight space, and further shows the power of interpolation for model editing! Let us know if you have any comments or questions. /8

2

0

13

Suchin Gururangan

@ssgrn

10 months

We are releasing all resources, including our permissively licensed training data (Open License Corpus 📄) & pretrained LMs! See our paper for more details and rich areas for future work toward mitigating the legal risk of LMs 🌟

2

13

Suchin Gururangan

@ssgrn

2 years

To appear at #EMNLP2022 , we introduce a new massively multi domain dataset called M2D2! M2D2 is built on Wikipedia and Semantic Scholar, and serves as a testbed for studying language model domain adaptation in the wild. Led by the great @machelreid ! Check out the thread below 👇

Machel Reid

@machelreid

2 years

New paper at #EMNLP2022 ! We introduce M2D2, the first massively multi-domain language modeling dataset with ~150 hierarchically organized domains! Work with @hllo_wrld @ssgrn @LukeZettlemoyer Paper: Code:

6

53

294

0

1

13

Suchin Gururangan

@ssgrn

2 years

A key issue is that language ideologies in data selection go undocumented and implicit. We believe the community could be much more intentional and transparent in their data selection practices. These decisions may be informed by specific downstream applications! /12

1

3

12

Suchin Gururangan

@ssgrn

11 months

BTM ftw 😉

Soumith Chintala

@soumithchintala

11 months

i might have heard the same 😃 -- I guess info like this is passed around but no one wants to say it out loud. GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference. Glad that Geohot said it out loud. Though, at this point, GPT-4 is

57

390

2K

1

0

12

Suchin Gururangan

@ssgrn

4 years

It’s SO easy to find toxic content in Webtext/Common Crawl, which is considered “standard” for pretraining LMs. Very important, hard research to be done on how to better detect such content, understand how LMs are affected by it, and how to control unwanted behavior.

Ani Nenkova

@ani_nenkova

4 years

The average engineer deploying ML systems may or may not understand how much the resulting product depends on the training data. They are likely to say they train on the ‘standard dataset used in research’ ML researchers are responsible for clarifying the possibilities for bias

8

44

308

2

3

12

Suchin Gururangan

@ssgrn

4 years

Problematic use case of LMs: elevating a single variant of English as "Standard", erasing valid variants of the language, like those spoken by non-L1 speakers!

2

4

12

Suchin Gururangan

@ssgrn

5 months

Time vectors are constructed by fine-tuning a language model towards text from a certain time period, and then subtracting those weights from the pretrained model. This represents a direction of movement in weight space that improves performance on a time period. /3

1

0

12

Suchin Gururangan

@ssgrn

1 year

Awesome comprehensive survey of a burgeoning area of research!

Sebastian Ruder

@seb_ruder

1 year

In our new survey “Modular Deep Learning”, we provide a unified taxonomy of the building blocks of modular neural nets and connect disparate threads of research. 📄 📢 🌐 w/ @PfeiffJo @licwu @PontiEdoardo

8

97

426

0

11

Suchin Gururangan

@ssgrn

5 months

We find that time vectors are structured in a manifold, where time vectors closer together are also adjacent in time. The angle between two time vectors are really well correlated with the degree of temporal misalignment. /5

2

0

11

Suchin Gururangan

@ssgrn

2 years

We argue that quality filtering implies a language ideology – a sociolinguistics term for a subjective belief about language use. These ideologies are often implicit/undocumented. What language is high quality enough to be included in the corpus? Whose language is excluded? /3

1

3

11

Suchin Gururangan

@ssgrn

6 years

My talk on annotation artifacts in NLI at Northwest NLP workshop is online! Check it out here:

NW-NLP 2018: Annotation Artifacts in Natural Language Inference Data

The fifth Pacific Northwest Regional Natural Language Processing Workshop will be held on Friday, April 27, 2018, in Redmond, WA. We accepted abstracts and p...

www.youtube.com

0

6

11

Suchin Gururangan

@ssgrn

3 years

@yoavgo Just to bubble the citations up, there's lots of recent work around pervasive toxicity on Reddit, including: , , ,

Discovering and Categorising Language Biases in Reddit

We present a data-driven approach using word embeddings to discover and categorise language biases on the discussion platform Reddit. As spaces for isolated user communities, platforms such as...

arxiv.org

0

11

Suchin Gururangan

@ssgrn

9 months

This is a great compilation of advice! One thing that helped me a lot is to learn to be shameless about bad first drafts. Good papers, like any art, are the product of refinement and iteration! Just be sure to be open to feedback and messing around with stuff till it clicks.

Vered Shwartz

@VeredShwartz

9 months

I just published Tips for Writing NLP Papers I wrote it for my students so I don't have to sound like a broken record (and edit papers for the same issues over and over again 😃). But some of you might find it useful too.

28

238

1K

0

1

10

Suchin Gururangan

@ssgrn

5 years

If you're at #WecNLP19 , come by the poster session at 5:30p (location #25 ) to discuss cheap, low-resource pretraining!

0

10

Suchin Gururangan

@ssgrn

10 months

Why does SILO mitigate legal risks? 🤔 Because data owners can *remove* their data from the model entirely at any time. Use of their data can also be *attributed* at the token-level. SILO better aligns with various data-use regulations (e.g. fair use, GDPR, etc).

1

0

10

Suchin Gururangan

@ssgrn

2 years

Current large language models require massive multi-node synchronization during training. In contrast, for ELMforests, different parts of the model are independently trained on different subsets of training data, with no need for multi-node training/inference. 2/n

2

0

10

Suchin Gururangan

@ssgrn

4 years

Bubbling this up -- please sign up, NLP friends!!

Sara Beery

@sarameghanbeery

4 years

@ssgrn @uwnlp @allen_ai @black_in_ai From @red_abebe - @black_in_ai has a mentorship program for grad apps in place, you can join BAI as an ally to get matched with mentees:

0

2

0

8

10

Suchin Gururangan

@ssgrn

2 years

Our results suggest there is no such thing as a general-purpose corpus, since selecting data for language models is highly subjective. The data curator must adopt a language ideology, and it will likely conflict with other perspectives of what makes text high quality. /11

1

2

9

Suchin Gururangan

@ssgrn

4 years

In this work, we investigate how to effectively adapt language models to distant domains. We explore questions like: What are domains in NLP? Are they hierarchical and overlapping? Can you generate a micro-domain around a task? Are language models truly language universal? 3/4

1

10

Suchin Gururangan

@ssgrn

8 years

masterchief data scientist @rapid7 @hrbrmstr getting ready to speak @SOURCEConf

1

2

10

Suchin Gururangan

@ssgrn

3 years

Super cool! @snehaark et al show that task level routing leads to better expert specialization and efficient inference options for large models. These results are super related to the modular features we observed in DEMix layers. Task/domain level routing looks really promising!

Sneha Kudugunta

@snehaark

3 years

#EMNLP2021 Findings paper “Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference” w/ @bignamehyp , @ankurbpn , Maxim Krikun, @lepikhin , @lmthang , @orf_bnw about TaskMoE, an inference friendly alternative to token-based MoEs. Link: 1/n

2

13

101

0

1

9

Suchin Gururangan

@ssgrn

1 year

Happy to share that lo-fi will appear in TMLR!

Mitchell Wortsman

@Mitchnw

2 years

Progress in model averaging raises the Q: is communication between nodes necessary during fine-tuning? A: In certain settings (e.g., DeiT IN1k or OPT CC fine-tune), local fine-tuning matches performance lo-fi: distributed fine-tuning without communication

2

27

105

0

2

9

Suchin Gururangan

@ssgrn

10 months

But how good is SILO against existing LMs? 👀 Its performance is much better than using low-risk data only, and in fact, is close to the model trained on all data unrestrictedly (Pythia). (parametric-only is much worse, but using a datastore is a key 🚀)

1

0

9

Suchin Gururangan

@ssgrn

4 years

Super cool effort and the right way to go with NLP benchmarks -- there is lots of exciting research to do on how to handle evolving model evaluations. I think this will have a particularly strong impact in bias measurements in NLP, where annotator subjectivity plays an huge role!

Douwe Kiela

@douwekiela

4 years

I’m super excited to announce Dynabench - a new and ambitious research platform for dynamic data collection and benchmarking: 1/n

9

117

460

0

1

9

Suchin Gururangan

@ssgrn

5 months

We also extrapolate task-specific models _to the future_ by doing an analogy arithmetic: create a time vector with a language model finetuned to unlabeled data in the year you want to generalize to, and then interpolate that time vector with your task specific model. /7

2

0

9

Suchin Gururangan

@ssgrn

2 years

ELMforests with BTM scale *really* well compared to regular dense LMs, both in and out of domain. ELM ensembles perform the best, but we also get big boosts by collapsing ELMs into a single LM with weighted parameter-averaging! 5/n

2

1

9

Suchin Gururangan

@ssgrn

8 years

#RustConf 2016 /r/playrust classifier slides available here

The PlayRust Classifier: RustConf 2016

PlayRustClassifier

slides.com

0

6

9

Suchin Gururangan

@ssgrn

4 years

This thread beautifully articulates the assimilationist racism that Indian American immigrants simultaneously endure and perpetuate. 1/5

Sahaj Kaur Kohli

@SahajKohli

4 years

When I was a teenager -- with religiously unshorn, long, thick hair -- I'd stand in front of the mirror wishing I had what I'd call a "white girl ponytail" -- thin, short, light hair that could be thrown on the top of my head all undemanding & bouncy.

8

31

232

1

0

9

Suchin Gururangan

@ssgrn

2 years

Right now, despite their impressive performance, LLMs are expensive, centralized monoliths. These results make me excited about the possibility of making LLMs more democratized, collaborative, and flexible! Only more research will tell how this will play out :) 10/n

1

0

9

Suchin Gururangan

@ssgrn

4 years

Applications are live for PYIs and interns at AllenNLP! Feel free to DM if you have any questions about the positions, team, etc.

1

2

9