Sneha Kudugunta @snehaark Twitter profile

Pinned Tweet

Sneha Kudugunta

7 months

MatFormer is a small but significant step towards true conditional computation models. Why use many neuron when few neuron do trick? 🙃

Aditya Kusupati

@adityakusupati

7 months

Announcing MatFormer - a nested🪆(Matryoshka) Transformer that offers elasticity across deployment constraints. MatFormer is an architecture that lets us use 100s of accurate smaller models that we never actually trained for! 1/9

7

110

612

1

8

74

Last Seen Profiles

@inkonbtc

@JafferJiky

@perttu_h

@IroquoisAthlet1

@mme_daigaku

@casta_56

@app_vplan

@NiallnbBrooks

@takephotoPCT

@cumoku

@ado_staff

@jake_k

@cjranson

@GrolStum

@rbl1973

@poer739

@Helen_Barnard

@agamynp

@KigaliLionsClub

@gfdalton

@deabalocouture

@Sarah_HBM

@giddierone

@llamaonthebrink

@Amber_funable

@Abysssuu

@SigmaEleMent

@GiulioBtt

@Endress_MESC

@nem_cos_

@_leehhzz

@imanilester32

@Sidraspeaks

@DiasporaBergen

@sonpavde

@_gh3tto

Sneha Kudugunta

@snehaark

5 months

bro have you made your neurips poster yet

15

30

821

Sneha Kudugunta

@snehaark

8 months

Excited to announce MADLAD-400 - a 2.8T token web-domain dataset that covers 419 languages(!). Arxiv: Github: 1/n

24

137

810

Sneha Kudugunta

@snehaark

5 years

New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn , @orf_bnw , @caswell_isaac , @naveenariva . We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n

9

73

288

Sneha Kudugunta

@snehaark

4 months

Tired: Catching imposter syndrome by reading PhD applications from students way smarter than you. Wired: Getting excited about talking them into building cool things with you ✨

2

169

Sneha Kudugunta

@snehaark

2 years

We wrote a blogpost about our work on Task-level Mixture-of-Experts (TaskMoE), and why they're a great way to efficiently serve large models (vs more common approaches like training-> compression via distillation).

Google AI

@GoogleAI

2 years

Read all about Task-level Mixture-of-Experts (TaskMoE), a promising step towards efficiently training and deploying large models, with no loss in quality and with significantly reduced inference latency ↓

3

84

334

3

21

114

Sneha Kudugunta

@snehaark

3 years

#EMNLP2021 Findings paper “Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference” w/ @bignamehyp , @ankurbpn , Maxim Krikun, @lepikhin , @lmthang , @orf_bnw about TaskMoE, an inference friendly alternative to token-based MoEs. Link: 1/n

2

13

101

Sneha Kudugunta

@snehaark

5 months

Late tweet, but thank you ENSLP #NeurIPS2023 for the best paper award, and @Devvrit_Khatri for the excellent presentation on behalf of the team @adityakusupati ! Excited to push further on conditional computation for tiny fast flexible models 🚀

Aditya Kusupati

@adityakusupati

7 months

Announcing MatFormer - a nested🪆(Matryoshka) Transformer that offers elasticity across deployment constraints. MatFormer is an architecture that lets us use 100s of accurate smaller models that we never actually trained for! 1/9

7

110

612

3

8

88

Sneha Kudugunta

@snehaark

5 years

Our Colab is out! Link: I'll be talking about our paper today (11/5) (w/ @ankurbpn , @iseeaswell , @naveenariva , @orf_bnw ) "Investigating Multilingual NMT Representations at Scale" at AWE Hall 2C (17:24) @emnlp2019 . #emnlp2019 #NLProc #googleAI

Sneha Kudugunta

@snehaark

5 years

New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn , @orf_bnw , @caswell_isaac , @naveenariva . We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n

9

73

288

1

27

88

Sneha Kudugunta

@snehaark

5 years

@ankurbpn @orf_bnw @naveenariva @GoogleAI Huge thanks to my collaborators at @GoogleAI , without whom this work would not have been possible. This work was done as a part of the Google AI Residency - applications open soon, so definitely check it out! 8/8

1

12

83

Sneha Kudugunta

@snehaark

1 year

Which one of y'all did this?

7

5

84

Sneha Kudugunta

@snehaark

6 months

GPU bewafa hain

6

1

68

Sneha Kudugunta

@snehaark

5 months

I'm at #NeurIPS2023 today presenting MADLAD-400 with @BZhangGo and @adityakusupati at 5:15pm in Hall B1/B2 #314 ! Come by and chat w/ us about creating *massive* datasets, making sure they're not garbage, and multilingual LMs :D

Sneha Kudugunta

@snehaark

8 months

Excited to announce MADLAD-400 - a 2.8T token web-domain dataset that covers 419 languages(!). Arxiv: Github: 1/n

24

137

810

1

12

68

Sneha Kudugunta

@snehaark

4 years

Be kind to yourself. Don't be your own Reviewer 2. ✨ #selfcare

0

4

67

Sneha Kudugunta

@snehaark

7 months

We made the Matformer code for ViT public!

Aditya Kusupati

@adityakusupati

7 months

📢🪆MatViT-B/16 & L/16 model checkpoints & code are public - drop-in replacements that enable elastic compute for free!🔥 Try them out; let us know😉 Shout out to @kfrancischen for the release; @anuragarnab & @m__dehghani for the amazing Scenic library.

0

23

108

1

10

62

Sneha Kudugunta

@snehaark

5 years

A summary of all the exciting work on massively multilingual massive NMT that's been happening over the past year @GoogleAI .

Google AI

@GoogleAI

5 years

New research demonstrates how a model for multilingual #MachineTranslation of 100+ languages trained with a single massive #NeuralNetwork significantly improves performance on both low- and high-resource language translation. Read all about it at:

6

236

572

2

7

49

Sneha Kudugunta

@snehaark

7 months

MADLAD-400 is now available publicly. Big thanks to Dustin and @mechanicaldirk at @ai2_allennlp ! Arxiv:

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations revealed by self-auditing...

arxiv.org

AllenNLP

@ai2_allennlp

7 months

We just released the MADLAD-400 dataset on @huggingface ! Big (7.2T tokens), remarkably multilingual (419 languages), and cleaner than mC4, check it out:

4

74

286

0

11

47

Sneha Kudugunta

@snehaark

5 months

chalo bhai tpu udathe hain

1

38

Sneha Kudugunta

@snehaark

4 years

as a researcher my love language is sending papers that remind me of you how do i have friends

0

34

Sneha Kudugunta

@snehaark

4 years

Expectation: Now I'm at home all the time I'll be so good at texting and calling my friends. Reality: Replies "haha" to text after 2 business days.

1

32

Sneha Kudugunta

@snehaark

4 years

A great dataset on VLN - I don't see Telugu datasets very often. I was super excited to help change this as a Telugu speaker!

Jason Baldridge

@jasonbaldridge

4 years

Kudos to @996roma for doing the analysis of linguistic phenomena in RxR, and many thanks to @snehaark for working with Roma for Telugu! Also, to @yoavartzi and team for establishing this overall approach with Touchdown.

0

4

17

2

4

32

Sneha Kudugunta

@snehaark

5 years

"It's okay, who even looks at the supplementary material?"

0

4

32

Sneha Kudugunta

@snehaark

5 months

Reasons to hire Aditya: 1) v cool representation learning research with real world impact 2) genuinely cares about and bats for his mentees and collaborators 3) vibes are immaculate ✨

Aditya Kusupati

@adityakusupati

5 months

📢📢At the last minute, I decided to go on the job market this year!!! Grateful for RTs & promotion at your univ.😇 CV & Statements: Will be at #NeurIPS2023 ! presenting AdANNS, Priming, Objaverse & MADLAD. DM if you are around, would love to catch up👋

2

49

182

0

1

25

Sneha Kudugunta

@snehaark

8 months

Most CommonCrawl based datasets cover 200-250 languages - we applied state-of-the-art LangID models to crawl over 498 languages. 2/n

1

2

25

Sneha Kudugunta

@snehaark

4 years

I'll be talking about this paper at @WiMLworkshop (East Hall C) today at 11.10am and at the evening poster session 6:30pm onwards (Poster #257 , East Hall B)! Paper: Colab: Co-authors: @ankurbpn , @iseeaswell , @naveenariva , @orf_bnw

Sneha Kudugunta

@snehaark

5 years

New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn , @orf_bnw , @caswell_isaac , @naveenariva . We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n

9

73

288

0

3

24

Sneha Kudugunta

@snehaark

4 years

Literally all my shitty NLU model does: The The the a a . . . A a the the #%$@^%$# A a a a a a a a a. A I I I I I I Of the of the of the. The. #@$!

0

22

Sneha Kudugunta

@snehaark

5 years

I look forward to talking about our paper "Investigating NMT Representations at Scale" at @WiMLworkshop 🥳 #WiML2019 Original Tweet:

WiML

@WiMLworkshop

5 years

The #WiML2019 program features 8 remarkable contributed talks by: Kimia Nadjahi ( @TelecomParis_ ), Xinyi Chen ( @Google ), Liyan Chen ( @FollowStevens ), @snehaark ( @Google ), Qian Huang ( @Cornell ), Mansi Gupta ( @PetuumInc ), Margarita Boyarskaya ( @nyuniversity ), @natashajaques ( @MIT )

1

19

57

0

1

22

Sneha Kudugunta

@snehaark

2 years

A 🧵 on all the research involving a 1000+ language model (🤯) that went into adding 24 new languages to Google Translate

Isaac R Caswell

@iseeaswell

2 years

How many languages can we support with Machine Translation? We train a translation model on 1000+ languages, using it to launch 24 new languages on Google Translate without any parallel data for these languages. Technical 🧵below: 1/18

7

88

336

0

20

Sneha Kudugunta

@snehaark

8 months

However, crawled datasets are often noisy, and is even worse for under-resourced languages, with many datasets having data that is not even in the labeled language (). So, we self-audited our initial dataset, and kept only 419 languages of 498. 3/n

1

2

21

Sneha Kudugunta

@snehaark

5 months

like actually tho

Sneha Kudugunta

@snehaark

5 months

I'm at #NeurIPS2023 today presenting MADLAD-400 with @BZhangGo and @adityakusupati at 5:15pm in Hall B1/B2 #314 ! Come by and chat w/ us about creating *massive* datasets, making sure they're not garbage, and multilingual LMs :D

1

12

68

2

20

Sneha Kudugunta

@snehaark

5 years

These are stunning and well made and the puns are even better.

helena sarin

@NeuralBricolage

5 years

NeuralBricolage Publishing House presents The Book of GANesis the link to the full version #NeurIPS2019 #unpaper #folkaiart

7

47

154

0

4

20

Sneha Kudugunta

@snehaark

8 months

Data cleaning, documentation and auditing practices beyond English still have a long way to go, and we hope that this work furthers work in this area! 6/n

1

0

20

Sneha Kudugunta

@snehaark

4 years

Neighborhood cafe crammed with tech bros on a Wednesday afternoon after our employers make us WFH to slow down #coronavirus .

0

1

18

Sneha Kudugunta

@snehaark

8 months

We also train baselines on this dataset for MT and LMs, and release the checkpoints. 5/n

2

18

Sneha Kudugunta

@snehaark

5 years

I used to use a small notebook to set my agenda, but I started a version of this after reading @deviparikh 's post . The most immediate benefit was the reduced mental overload of remembering to talk to people/returning stuff.

Devon ☀️

@devonzuegel

5 years

A calendar is not just a reminder device for keeping track of external events. Used right, a calendar can be a full-fledged tool for thought.

12

15

263

1

16

Sneha Kudugunta

@snehaark

4 years

confused yet happy about the general praise of a major conference's review process on my feed why is nobody crying #acl2020nlp

0

16

Sneha Kudugunta

@snehaark

5 months

to clarify I am on the left idk what beamer is

2

1

15

Sneha Kudugunta

@snehaark

4 years

Difficult work on an often neglected problem by @iseeaswell and team (to appear at COLING 2020)

Isaac R Caswell

@iseeaswell

4 years

What do we need to scale NLP research to 1000 languages? We started off with a goal to build a monolingual corpus in 1000 languages by mining data from the web. Here’s our work documenting our struggles with Language Identification (LangID): 1/8

10

89

316

1

0

15

Sneha Kudugunta

@snehaark

4 months

v. excited that MiTTenS 🧤 covers languages not commonly seen in misgendering datasets!

Jasmijn Bastings

@jasmijnbastings

4 months

We released a new paper and dataset: 🧤 MiTTenS 🧤: A Dataset for Evaluating Misgendering in Translation #NLProc

2

21

98

0

1

13

Sneha Kudugunta

@snehaark

5 years

Every time I use Colab, I'm struck by how genuinely useful I find each new feature.

Colaboratory

@GoogleColab

5 years

You can now edit📝, create🆕, save💾, and move➡️ files and folders📂 through file browser on the left.

22

147

696

0

12

Sneha Kudugunta

@snehaark

8 months

Manual smell tests of your data are limited, but super useful! I would love for all new large scale datasets to define their own audits AND release the results in all its messy glory.

Isaac R Caswell

@iseeaswell

8 months

Great work by Sneha to create a new, open, and highly multilingual web dataset...with a great acronym! It also sets a nice precedent that every single one of the 419 languages in the crawl was looked at and considered for specific filtering.

2

1

22

0

1

12

Sneha Kudugunta

@snehaark

4 years

Me: Hmm, I wonder which parts of the results section I should edit? Past Me: Me: Thanks...

1

13

Sneha Kudugunta

@snehaark

8 months

Work done with amazing collaborators @iseeaswell , @BZhangGo , @xgarcia238 , Christopher Choquette-Choo, @katherine1ee , Derrick Xin, @adityakusupati , @romistella , @ankurbpn and @orf_bnw ! 7/7

3

1

12

Sneha Kudugunta

@snehaark

4 years

In 2040 people will study how Gen-Zers coped with 2020 by pretending to be ants in an ant colony on Facebook.

0

1

11

Sneha Kudugunta

@snehaark

5 months

A blog on MatFormer with an OSS reimplementation of MatLMs!

GitHub - RAIVNLab/MatFormer-OLMo: Code repository for the public reproduction of the language...

Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference" - RAIVNLab/MatFormer-OLMo

github.com

Kempner Institute at Harvard University

@KempnerInst

5 months

In our latest Deeper Learning blog post, the authors introduce an algorithmic method to elastically deploy large models, the #MatFormer . Read more: #KempnerInstitute @adityakusupati @snehaark @Devvrit_Khatri @Tim_Dettmers

0

12

21

0

1

12

Sneha Kudugunta

@snehaark

3 years

Highly recommend this book by @chipro - I used an early draft of this for interview prep and there's nothing quite like it!

Chip Huyen

@chipro

3 years

An early draft of the machine learning interviews book is out 🥳 The book is open-sourced and free. Job search is a stressful process, and I hope that this effort can help in some way. Contributions and feedback are appreciated!

48

562

3K

1

11

Sneha Kudugunta

@snehaark

8 months

Measuring the social impacts of foundation models for as many languages we support is super important - @Chris_Choquette and @katherine1ee led some intriguing work investigating the memorization properties of multilingual models.

Katherine Lee

@katherine1ee

8 months

419 languages is so many languages (!!) Side note: We investigated how having lots of different languages in one model impacts what and how much is memorized. Which examples get memorized depends on what other examples are in the training data!

0

2

21

0

1

10

Sneha Kudugunta

@snehaark

5 years

@ankurbpn @orf_bnw @naveenariva @GoogleAI We also find that representations of high resource and/or linguistically similar languages are more robust when fine-tuning on an arbitrary language pair, which is critical to determining how much cross-lingual transfer can be expected in a zero or few-shot setting. 6/n

1

2

10

Sneha Kudugunta

@snehaark

3 years

A paper on the state of multilingual datasets - it's a harder problem than you'd think, and so much more work is needed!

Isaac R Caswell

@iseeaswell

3 years

Does the data used for multilingual modeling really contain content in the languages it says it does? Short answer: sometimes 🙁 1/n

7

53

126

0

10

Sneha Kudugunta

@snehaark

5 years

@ankurbpn @orf_bnw @naveenariva @GoogleAI We find that encoder representations of different languages cluster according to linguistic similarity... 3/n

3

2

9

Sneha Kudugunta

@snehaark

6 months

People are everything 💖

0

1

10

Sneha Kudugunta

@snehaark

5 years

@ankurbpn @orf_bnw @naveenariva @GoogleAI Colab to play with coming soon! 7/n

1

0

9

Sneha Kudugunta

@snehaark

5 years

@ankurbpn @orf_bnw @naveenariva @GoogleAI … Even at a more fine-grained level. 4/n

2

1

9

Sneha Kudugunta

@snehaark

5 years

Watching this project evolve was fascinating - definitely read if you're interested in multitask learning 🎉

Orhan Firat

@orf_bnw

5 years

Massively Multilingual NMT in the wild: 100+ languages, 1B+ parameters, trained using 25B+ examples. Check out our new paper for an in depth analysis: #GoogleAI

1

54

153

0

8

Sneha Kudugunta

@snehaark

5 years

Is it just me or is the worst part about transitioning from school to the industry getting around the fact that you're productive only at 10am and 10pm.

0

1

8

Sneha Kudugunta

@snehaark

5 years

Where were you all my undergrad life.

Suzana Ilić

@suzatweet

5 years

This is really neat! You take a screenshot of an equation, it gives you the LaTeX code, you can directly modify in the taskbar, copy, paste, done.

93

3K

8K

0

8

Sneha Kudugunta

@snehaark

2 years

Paper:

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

Sparse Mixture-of-Experts (MoE) has been a successful approach for scaling multilingual translation models to billions of parameters without a proportional increase in training computation....

arxiv.org

0

7

Sneha Kudugunta

@snehaark

2 years

@TaliaRinger No, go for it! For North Indian weddings you should be fine with either; for a South Indian wedding I'd err on the side on wearing a sari (ask the host!). If you want to wear a sari, book an appointment with a local salon to get someone to tie a sari for you it's less stressful.

0

7

Sneha Kudugunta

@snehaark

8 months

An interesting resource for lightweight extremely multilingual LangID - how FUN!

Isaac R Caswell

@iseeaswell

8 months

Have you ever wanted a LangID model that works on 1500+ languages? check out FUN-LangID: !

1

8

50

0

7

Sneha Kudugunta

@snehaark

5 years

Models in post+more: Multilingual NMT: Cross-lingual transfer: GPipe: Transfer (self-plug 😉): Adapters:

Simple, Scalable Adaptation for Neural Machine Translation

Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a...

arxiv.org

0

2

7

Sneha Kudugunta

@snehaark

5 years

Me going over my Google Keep notes near the end of the week: Note: Talk to xyz Me: Who is xyz? Note: Weekend Me: Yes, it exists. Note: what isuseful? Me: Not you, clearly. 🙄

0

7

Sneha Kudugunta

@snehaark

5 years

Oops

🔥Kareem Carr | Statistician 🔥

@kareem_carr

5 years

therapist: and what do we do when we’re stressed about deadlines? me: start a new research project? therapist: no

56

1K

7K

0

6

Sneha Kudugunta

@snehaark

5 years

Yep. Too old for that.

0

6

Sneha Kudugunta

@snehaark

5 years

This is art.

Drew Steen

@drdrewsteen

5 years

OK y'all, it is now my pleasure to show off some of the truly, genuinely heinous plots students in my Reproducible Data Analysis class made. Content warning: these plots are f***ing awful.

64

831

3K

0

6

Sneha Kudugunta

@snehaark

5 years

My parents started out teaching me both Telugu and English, but prioritized the latter for far too long for similar reasons. I took Telugu class for ~8 years when we moved back to India, but it simply isn't the same.

Ellen K. Pao

@ekp

5 years

My older sister's kindergarten teacher told my parents to stop speaking Chinese at home or she would struggle, so my dad spoke only English to us. My mom spoke a mix, and my grandparents spoke only Chinese, so we still learned some Chinese, but the message was assimilate or fail

48

161

963

0

6

Sneha Kudugunta

@snehaark

5 years

There 8 copies of this paper lying around the nearest office printer already. 🤓

Quoc Le

@quocleix

5 years

XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) arxiv: github (code + pretrained models): with Zhilin Yang, @ZihangDai , Yiming Yang, Jaime Carbonell, @rsalakhu

19

700

2K

0

5

Sneha Kudugunta

@snehaark

1 year

An absolutely OP resource 😺

Isaac R Caswell

@iseeaswell

1 year

Just added 84 more languages to GATITOS! It now has a total of 113 languages, many of which with no other public resources 😊

1

11

37

0

5

Sneha Kudugunta

@snehaark

3 years

Easily deploying large models is an important direction of research, and we believe TaskMoE is a promising step towards more inference friendly algorithms that retain the quality gains of scaling. 9/9

0

5

Sneha Kudugunta

@snehaark

5 years

@ankurbpn @orf_bnw @naveenariva @GoogleAI We look at how our similarity measure capturing linguistic similarity vs script. 5/n

1

5

Sneha Kudugunta

@snehaark

5 years

*Adds to pile of things to tell myself when I feel a bout of imposter syndrome coming on*

Justin Kan

@justinkan

5 years

Sometimes the best experience is the one you’re wholly unqualified for.

11

154

807

1

0

5

Sneha Kudugunta

@snehaark

5 years

@ass_deans Caveat: You cannot possibly catch them all.

1

4

Sneha Kudugunta

@snehaark

5 years

Now that mindful AI is a thing, can we have AI mindfulness retreats?

Roy Schwartz

@royschwartzNLP

5 years

The focus on SOTA has caused a dramatic increase in the cost of AI, leading to environmental tolls and inclusiveness issues. We advocate research on efficiency in addition to accuracy ( #greenai ). Work w/ @JesseDodge @nlpnoah and @etzioni at @allen_ai

1

50

160

1

0

4

Sneha Kudugunta

@snehaark

8 months

@iseeaswell @BZhangGo @xgarcia238 @katherine1ee @adityakusupati @romistella @ankurbpn @orf_bnw * @Chris_Choquette

0

4

Sneha Kudugunta

@snehaark

4 years

@asiddhant1 @orf_bnw @ankurbpn You need to think bigger - we can't have AGI without modeling the Multiverse too 😤 👽🚀🌌

0

4

Sneha Kudugunta

@snehaark

5 years

- Write a novel or collection of short stories with women who both work on sciencey things and have an ok personal life. - Perform Standup. - Write/direct a film with women who both work on sciencey things and have an ok personal life. - Paint enough for an art exhibition.

Sahil Lavingia

@shl

5 years

I want to: - Publish a fantasy novel - Perform standup comedy - Produce an animated film What do you want to do that's outside of your traditional "career" trajectory?

322

81

1K

0

1

4

Sneha Kudugunta

@snehaark

3 years

At inference time, we can extract sub-networks by discarding unused experts for each task. 5/n

1

3

Sneha Kudugunta

@snehaark

5 years

This is good and this is important.

Julia Carrie Wong

@juliacarriew

5 years

The Guardian is updating our style guide to accurately reflect the nature of the environmental crisis. “Climate change” —> “climate emergency, crisis or breakdown” “Global warming” —> “global heating” “Climate skeptic” —> “climate science denier”

332

10K

27K

0

3

Sneha Kudugunta

@snehaark

5 years

@chipro Speculation, but I'm certain more than 8% of women viewing are interested: women of twitter, never hesitate to reach out! I caught myself doing this recently - when @chipro organized Brunchpropagation, I deffo held back for a bit thinking "Why would anyone want to talk to me?"

0

3

Sneha Kudugunta

@snehaark

4 years

@scychan_brains @DeepMind @FelixHill84 @AndrewLampinen Congratulations Stephanie, super excited for you!!! 🎉🎊💃

0

3

Sneha Kudugunta

@snehaark

3 years

@KreutzerJulia @MasakhaneNLP Congratulations, Julia!

0

3

Sneha Kudugunta

@snehaark

3 years

Finally, when scaling up to 200 language pairs, our 128-expert task-MoE (13B parameters) still performs competitively with a token-level counterpart, while improving the peak inference throughput by a factor of 2.6x. 8/n

1

0

3

Sneha Kudugunta

@snehaark

5 years

A study of my brain?

Jesse Gomez

@JesseGomezBrain

5 years

The Pokémon paper is out! In this study, we scanned the brains of adults, who as children, became Pokémon experts. We find a region of their brain becomes uniquely responsive to @Pokemon , helping us get at why the brain is organized the way it is.

32

433

1K

0

3

Sneha Kudugunta

@snehaark

5 years

That's not even what phones are for, you see. 🙃

0

3

Sneha Kudugunta

@snehaark

10 months

@akshitajha @aidaa @chandankreddy @shachi_dave @vinodkpg @sunipa17 Excited to see this out!!!

0

3

Sneha Kudugunta

@snehaark

5 months

@yacineMTB @mallocmyheart @dingboard_ 👀 I promise to make the worst shitposts

0

3

Sneha Kudugunta

@snehaark

4 years

@iseeaswell L I F T

0

3

Sneha Kudugunta

@snehaark

5 years

@sarahookr Soup and Hummus - I was grateful for the non sweet food!

0

3

Sneha Kudugunta

@snehaark

5 years

This sounds wonderful 😍

Aleksander Madry

@aleks_madry

5 years

Took a while (don't ask) but here they are: Notes from "Science of Deep Learning" class co-taught with @KonstDaskalakis now available: . More coming soon (promise!). Feedback very welcome! Thanks to @andrew_ilyas for heroic effort on doing final revisions.

11

121

473

0

3

Sneha Kudugunta

@snehaark

5 years

@archit_sharma97 @gamaga_ai Why can't you have a 100 tabs open like a civilized human being @archit_sharma97 ?

0

3

Sneha Kudugunta

@snehaark

3 years

Another significant advantage of TaskMoE is that we retain all the gains from scaling - our method is +2.1 BLEU on average across all languages vs distilling the TokenMoE to a student model with size comparable to the subnetwork extracted from TaskMoE. 7/n

1

3

Sneha Kudugunta

@snehaark

3 years

@wellformedness Avoid common deficiencies in vegan diets by supplementing/carefully including specific foods in your diet

7 Supplements You Need on a Vegan Diet

While vegan diets can offer health benefits, they may be low in certain nutrients. Here are 7 supplements that you may need on a vegan diet.

www.healthline.com

0

3

Sneha Kudugunta

@snehaark

5 years

🥳

Archit Sharma

@archit_sharma97

5 years

1/ Can we use model-based planning in behavior space rather than action space? DADS can discover skills without any rewards, which can later be composed zero-shot via planning in the behavior space for new tasks. Paper: Website:

1

9

47

0

3

Sneha Kudugunta

@snehaark

3 years

So we route tokens according to broader categories (route by task boundaries vs route per token) - that is, every token of a language is routed to the same subnetwork. This enables the model to dedicate a fewer experts to a single task identity during training and inference.4/n

1

2

Sneha Kudugunta

@snehaark

5 years

PS - Silliness aside, this is definitely something we should care about for the long term

0

2

Sneha Kudugunta

@snehaark

5 years

Yes, pls

Hank Green

@hankgreen

5 years

WE CAN FUCKING DO THIS

58

4K

25K

0

2

Sneha Kudugunta

@snehaark

4 years

@putadent Didn't submit anything, I'm just 🍿

1

0

2

Sneha Kudugunta

@snehaark

5 years

I keep discovering things I wish I found during my undergrad. *sigh*

Jonas K. Lindeløv

@jonaslindeloev

5 years

I've made this cheat sheet and I think it's important. Most stats 101 tests are simple linear models - including "non-parametric" tests. It's so simple we should only teach regression. Avoid confusing students with a zoo of named tests. 1/n

93

3K

9K

0

2

Sneha Kudugunta

@snehaark

3 years

Mixture-of-Experts (or MoE) models are a great way to scale! Researchers have successfully scaled multilingual neural machine translation (MNMT) models up to 1 Trillion parameters and beyond. 2/n