Sneha Kudugunta Profile
Sneha Kudugunta

@snehaark

2,262
Followers
749
Following
23
Media
497
Statuses

addicted to tpus @GoogleDeepMind @uwcse | varying proportions of AI and mediocre jokes (not mutually exclusive) | she/her/hers

San Francisco Bay Area, CA
Joined September 2014
Don't wanna be here? Send us removal request.
Pinned Tweet
@snehaark
Sneha Kudugunta
7 months
MatFormer is a small but significant step towards true conditional computation models. Why use many neuron when few neuron do trick? 🙃
@adityakusupati
Aditya Kusupati
7 months
Announcing MatFormer - a nested🪆(Matryoshka) Transformer that offers elasticity across deployment constraints. MatFormer is an architecture that lets us use 100s of accurate smaller models that we never actually trained for! 1/9
Tweet media one
7
110
612
1
8
74
@snehaark
Sneha Kudugunta
5 months
bro have you made your neurips poster yet
Tweet media one
15
30
821
@snehaark
Sneha Kudugunta
8 months
Excited to announce MADLAD-400 - a 2.8T token web-domain dataset that covers 419 languages(!). Arxiv: Github:   1/n
Tweet media one
24
137
810
@snehaark
Sneha Kudugunta
5 years
New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn , @orf_bnw , @caswell_isaac , @naveenariva . We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n
Tweet media one
9
73
288
@snehaark
Sneha Kudugunta
4 months
Tired: Catching imposter syndrome by reading PhD applications from students way smarter than you. Wired: Getting excited about talking them into building cool things with you ✨
2
2
169
@snehaark
Sneha Kudugunta
2 years
We wrote a blogpost about our work on Task-level Mixture-of-Experts (TaskMoE), and why they're a great way to efficiently serve large models (vs more common approaches like training-> compression via distillation).
@GoogleAI
Google AI
2 years
Read all about Task-level Mixture-of-Experts (TaskMoE), a promising step towards efficiently training and deploying large models, with no loss in quality and with significantly reduced inference latency ↓
3
84
334
3
21
114
@snehaark
Sneha Kudugunta
3 years
#EMNLP2021 Findings paper “Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference” w/ @bignamehyp , @ankurbpn , Maxim Krikun, @lepikhin , @lmthang , @orf_bnw about TaskMoE, an inference friendly alternative to token-based MoEs. Link: 1/n
Tweet media one
2
13
101
@snehaark
Sneha Kudugunta
5 months
Late tweet, but thank you ENSLP #NeurIPS2023 for the best paper award, and @Devvrit_Khatri for the excellent presentation on behalf of the team @adityakusupati ! Excited to push further on conditional computation for tiny fast flexible models 🚀
Tweet media one
@adityakusupati
Aditya Kusupati
7 months
Announcing MatFormer - a nested🪆(Matryoshka) Transformer that offers elasticity across deployment constraints. MatFormer is an architecture that lets us use 100s of accurate smaller models that we never actually trained for! 1/9
Tweet media one
7
110
612
3
8
88
@snehaark
Sneha Kudugunta
5 years
Our Colab is out! Link: I'll be talking about our paper today (11/5) (w/ @ankurbpn , @iseeaswell , @naveenariva , @orf_bnw ) "Investigating Multilingual NMT Representations at Scale" at AWE Hall 2C (17:24) @emnlp2019 . #emnlp2019 #NLProc #googleAI
@snehaark
Sneha Kudugunta
5 years
New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn , @orf_bnw , @caswell_isaac , @naveenariva . We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n
Tweet media one
9
73
288
1
27
88
@snehaark
Sneha Kudugunta
5 years
@ankurbpn @orf_bnw @naveenariva @GoogleAI Huge thanks to my collaborators at @GoogleAI , without whom this work would not have been possible. This work was done as a part of the Google AI Residency - applications open soon, so definitely check it out! 8/8
1
12
83
@snehaark
Sneha Kudugunta
1 year
Which one of y'all did this?
Tweet media one
7
5
84
@snehaark
Sneha Kudugunta
6 months
GPU bewafa hain
6
1
68
@snehaark
Sneha Kudugunta
5 months
I'm at #NeurIPS2023 today presenting MADLAD-400 with @BZhangGo and @adityakusupati at 5:15pm in Hall B1/B2 #314 ! Come by and chat w/ us about creating *massive* datasets, making sure they're not garbage, and multilingual LMs :D
@snehaark
Sneha Kudugunta
8 months
Excited to announce MADLAD-400 - a 2.8T token web-domain dataset that covers 419 languages(!). Arxiv: Github:   1/n
Tweet media one
24
137
810
1
12
68
@snehaark
Sneha Kudugunta
4 years
Be kind to yourself. Don't be your own Reviewer 2. ✨ #selfcare
0
4
67
@snehaark
Sneha Kudugunta
7 months
We made the Matformer code for ViT public!
@adityakusupati
Aditya Kusupati
7 months
📢🪆MatViT-B/16 & L/16 model checkpoints & code are public - drop-in replacements that enable elastic compute for free!🔥 Try them out; let us know😉 Shout out to @kfrancischen for the release; @anuragarnab & @m__dehghani for the amazing Scenic library.
Tweet media one
0
23
108
1
10
62
@snehaark
Sneha Kudugunta
5 years
A summary of all the exciting work on massively multilingual massive NMT that's been happening over the past year @GoogleAI .
@GoogleAI
Google AI
5 years
New research demonstrates how a model for multilingual #MachineTranslation of 100+ languages trained with a single massive #NeuralNetwork significantly improves performance on both low- and high-resource language translation. Read all about it at:
6
236
572
2
7
49
@snehaark
Sneha Kudugunta
7 months
MADLAD-400 is now available publicly. Big thanks to Dustin and @mechanicaldirk at @ai2_allennlp ! Arxiv:
@ai2_allennlp
AllenNLP
7 months
We just released the MADLAD-400 dataset on @huggingface ! Big (7.2T tokens), remarkably multilingual (419 languages), and cleaner than mC4, check it out:
4
74
286
0
11
47
@snehaark
Sneha Kudugunta
5 months
chalo bhai tpu udathe hain
1
1
38
@snehaark
Sneha Kudugunta
4 years
as a researcher my love language is sending papers that remind me of you how do i have friends
0
0
34
@snehaark
Sneha Kudugunta
4 years
Expectation: Now I'm at home all the time I'll be so good at texting and calling my friends. Reality: Replies "haha" to text after 2 business days.
1
1
32
@snehaark
Sneha Kudugunta
4 years
A great dataset on VLN - I don't see Telugu datasets very often. I was super excited to help change this as a Telugu speaker!
@jasonbaldridge
Jason Baldridge
4 years
Kudos to @996roma for doing the analysis of linguistic phenomena in RxR, and many thanks to @snehaark for working with Roma for Telugu! Also, to @yoavartzi and team for establishing this overall approach with Touchdown.
0
4
17
2
4
32
@snehaark
Sneha Kudugunta
5 years
"It's okay, who even looks at the supplementary material?"
Tweet media one
0
4
32
@snehaark
Sneha Kudugunta
5 months
Reasons to hire Aditya: 1) v cool representation learning research with real world impact 2) genuinely cares about and bats for his mentees and collaborators 3) vibes are immaculate ✨
@adityakusupati
Aditya Kusupati
5 months
📢📢At the last minute, I decided to go on the job market this year!!! Grateful for RTs & promotion at your univ.😇 CV & Statements: Will be at #NeurIPS2023 ! presenting AdANNS, Priming, Objaverse & MADLAD. DM if you are around, would love to catch up👋
2
49
182
0
1
25
@snehaark
Sneha Kudugunta
8 months
Most CommonCrawl based datasets cover 200-250 languages - we applied state-of-the-art LangID models to crawl over 498 languages. 2/n
Tweet media one
1
2
25
@snehaark
Sneha Kudugunta
4 years
I'll be talking about this paper at @WiMLworkshop (East Hall C) today at 11.10am and at the evening poster session 6:30pm onwards (Poster #257 , East Hall B)! Paper: Colab: Co-authors: @ankurbpn , @iseeaswell , @naveenariva , @orf_bnw
@snehaark
Sneha Kudugunta
5 years
New EMNLP paper “Investigating Multilingual NMT Representation at Scale” w/ @ankurbpn , @orf_bnw , @caswell_isaac , @naveenariva . We study transfer in massively multilingual NMT @GoogleAI from the perspective of representational similarity. Paper: 1/n
Tweet media one
9
73
288
0
3
24
@snehaark
Sneha Kudugunta
4 years
Literally all my shitty NLU model does: The The the a a . . . A a the the #%$@^%$# A a a a a a a a a. A I I I I I I Of the of the of the. The. #@$!
0
0
22
@snehaark
Sneha Kudugunta
5 years
I look forward to talking about our paper "Investigating NMT Representations at Scale" at @WiMLworkshop 🥳 #WiML2019 Original Tweet:
@WiMLworkshop
WiML
5 years
The #WiML2019 program features 8 remarkable contributed talks by: Kimia Nadjahi ( @TelecomParis_ ), Xinyi Chen ( @Google ), Liyan Chen ( @FollowStevens ), @snehaark ( @Google ), Qian Huang ( @Cornell ), Mansi Gupta ( @PetuumInc ), Margarita Boyarskaya ( @nyuniversity ), @natashajaques ( @MIT )
Tweet media one
1
19
57
0
1
22
@snehaark
Sneha Kudugunta
2 years
A 🧵 on all the research involving a 1000+ language model (🤯) that went into adding 24 new languages to Google Translate
@iseeaswell
Isaac R Caswell
2 years
How many languages can we support with Machine Translation? We train a translation model on 1000+ languages, using it to launch 24 new languages on Google Translate without any parallel data for these languages. Technical 🧵below: 1/18
Tweet media one
7
88
336
0
0
20
@snehaark
Sneha Kudugunta
8 months
However, crawled datasets are often noisy, and is even worse for under-resourced languages, with many datasets having data that is not even in the labeled language  (). So, we self-audited our initial dataset, and kept only 419 languages of 498. 3/n
Tweet media one
1
2
21
@snehaark
Sneha Kudugunta
5 months
like actually tho
@snehaark
Sneha Kudugunta
5 months
I'm at #NeurIPS2023 today presenting MADLAD-400 with @BZhangGo and @adityakusupati at 5:15pm in Hall B1/B2 #314 ! Come by and chat w/ us about creating *massive* datasets, making sure they're not garbage, and multilingual LMs :D
1
12
68
2
2
20
@snehaark
Sneha Kudugunta
5 years
These are stunning and well made and the puns are even better.
@NeuralBricolage
helena sarin
5 years
NeuralBricolage Publishing House presents The Book of GANesis the link to the full version #NeurIPS2019 #unpaper #folkaiart
Tweet media one
7
47
154
0
4
20
@snehaark
Sneha Kudugunta
8 months
Data cleaning, documentation and auditing practices beyond English still have a long way to go, and we hope that this work furthers work in this area! 6/n
1
0
20
@snehaark
Sneha Kudugunta
4 years
Neighborhood cafe crammed with tech bros on a Wednesday afternoon after our employers make us WFH to slow down #coronavirus .
0
1
18
@snehaark
Sneha Kudugunta
8 months
We also train baselines on this dataset for MT and LMs, and release the checkpoints. 5/n
Tweet media one
2
2
18
@snehaark
Sneha Kudugunta
5 years
I used to use a small notebook to set my agenda, but I started a version of this after reading @deviparikh 's post . The most immediate benefit was the reduced mental overload of remembering to talk to people/returning stuff.
@devonzuegel
Devon ☀️
5 years
A calendar is not just a reminder device for keeping track of external events. Used right, a calendar can be a full-fledged tool for thought.
12
15
263
1
1
16
@snehaark
Sneha Kudugunta
4 years
confused yet happy about the general praise of a major conference's review process on my feed why is nobody crying #acl2020nlp
0
0
16
@snehaark
Sneha Kudugunta
5 months
to clarify I am on the left idk what beamer is
2
1
15
@snehaark
Sneha Kudugunta
4 years
Difficult work on an often neglected problem by @iseeaswell and team (to appear at COLING 2020)
@iseeaswell
Isaac R Caswell
4 years
What do we need to scale NLP research to 1000 languages? We started off with a goal to build a monolingual corpus in 1000 languages by mining data from the web. Here’s our work documenting our struggles with Language Identification (LangID): 1/8
Tweet media one
10
89
316
1
0
15
@snehaark
Sneha Kudugunta
4 months
v. excited that MiTTenS 🧤 covers languages not commonly seen in misgendering datasets!
@jasmijnbastings
Jasmijn Bastings
4 months
We released a new paper and dataset: 🧤 MiTTenS 🧤: A Dataset for Evaluating Misgendering in Translation #NLProc
2
21
98
0
1
13
@snehaark
Sneha Kudugunta
5 years
Every time I use Colab, I'm struck by how genuinely useful I find each new feature.
@GoogleColab
Colaboratory
5 years
You can now edit📝, create🆕, save💾, and move➡️ files and folders📂 through file browser on the left.
22
147
696
0
0
12
@snehaark
Sneha Kudugunta
8 months
Manual smell tests of your data are limited, but super useful! I would love for all new large scale datasets to define their own audits AND release the results in all its messy glory.
@iseeaswell
Isaac R Caswell
8 months
Great work by Sneha to create a new, open, and highly multilingual web dataset...with a great acronym! It also sets a nice precedent that every single one of the 419 languages in the crawl was looked at and considered for specific filtering.
2
1
22
0
1
12
@snehaark
Sneha Kudugunta
4 years
Me: Hmm, I wonder which parts of the results section I should edit? Past Me: Me: Thanks...
Tweet media one
1
1
13
@snehaark
Sneha Kudugunta
8 months
Work done with amazing collaborators @iseeaswell , @BZhangGo , @xgarcia238 , Christopher Choquette-Choo, @katherine1ee , Derrick Xin, @adityakusupati , @romistella , @ankurbpn and @orf_bnw ! 7/7
3
1
12
@snehaark
Sneha Kudugunta
4 years
In 2040 people will study how Gen-Zers coped with 2020 by pretending to be ants in an ant colony on Facebook.
Tweet media one
Tweet media two
0
1
11
@snehaark
Sneha Kudugunta
5 months
A blog on MatFormer with an OSS reimplementation of MatLMs!
@KempnerInst
Kempner Institute at Harvard University
5 months
In our latest Deeper Learning blog post, the authors introduce an algorithmic method to elastically deploy large models, the #MatFormer . Read more: #KempnerInstitute @adityakusupati @snehaark @Devvrit_Khatri @Tim_Dettmers
Tweet media one
0
12
21
0
1
12
@snehaark
Sneha Kudugunta
3 years
Highly recommend this book by @chipro - I used an early draft of this for interview prep and there's nothing quite like it!
@chipro
Chip Huyen
3 years
An early draft of the machine learning interviews book is out 🥳 The book is open-sourced and free. Job search is a stressful process, and I hope that this effort can help in some way. Contributions and feedback are appreciated!
48
562
3K
1
1
11
@snehaark
Sneha Kudugunta
8 months
Measuring the social impacts of foundation models for as many languages we support is super important - @Chris_Choquette and @katherine1ee led some intriguing work investigating the memorization properties of multilingual models.
@katherine1ee
Katherine Lee
8 months
419 languages is so many languages (!!) Side note: We investigated how having lots of different languages in one model impacts what and how much is memorized. Which examples get memorized depends on what other examples are in the training data!
Tweet media one
0
2
21
0
1
10
@snehaark
Sneha Kudugunta
5 years
@ankurbpn @orf_bnw @naveenariva @GoogleAI We also find that representations of high resource and/or linguistically similar languages are more robust when fine-tuning on an arbitrary language pair, which is critical to determining how much cross-lingual transfer can be expected in a zero or few-shot setting. 6/n
Tweet media one
1
2
10
@snehaark
Sneha Kudugunta
3 years
A paper on the state of multilingual datasets - it's a harder problem than you'd think, and so much more work is needed!
@iseeaswell
Isaac R Caswell
3 years
Does the data used for multilingual modeling really contain content in the languages it says it does? Short answer: sometimes 🙁 1/n
7
53
126
0
0
10
@snehaark
Sneha Kudugunta
5 years
@ankurbpn @orf_bnw @naveenariva @GoogleAI We find that encoder representations of different languages cluster according to linguistic similarity... 3/n
Tweet media one
3
2
9
@snehaark
Sneha Kudugunta
6 months
People are everything 💖
0
1
10
@snehaark
Sneha Kudugunta
5 years
@ankurbpn @orf_bnw @naveenariva @GoogleAI Colab to play with coming soon! 7/n
1
0
9
@snehaark
Sneha Kudugunta
5 years
@ankurbpn @orf_bnw @naveenariva @GoogleAI … Even at a more fine-grained level. 4/n
Tweet media one
2
1
9
@snehaark
Sneha Kudugunta
5 years
Watching this project evolve was fascinating - definitely read if you're interested in multitask learning 🎉
@orf_bnw
Orhan Firat
5 years
Massively Multilingual NMT in the wild: 100+ languages, 1B+ parameters, trained using 25B+ examples. Check out our new paper for an in depth analysis: #GoogleAI
1
54
153
0
0
8
@snehaark
Sneha Kudugunta
5 years
Is it just me or is the worst part about transitioning from school to the industry getting around the fact that you're productive only at 10am and 10pm.
0
1
8
@snehaark
Sneha Kudugunta
5 years
Where were you all my undergrad life.
@suzatweet
Suzana Ilić
5 years
This is really neat! You take a screenshot of an equation, it gives you the LaTeX code, you can directly modify in the taskbar, copy, paste, done.
93
3K
8K
0
0
8
@snehaark
Sneha Kudugunta
2 years
@TaliaRinger No, go for it! For North Indian weddings you should be fine with either; for a South Indian wedding I'd err on the side on wearing a sari (ask the host!). If you want to wear a sari, book an appointment with a local salon to get someone to tie a sari for you it's less stressful.
0
0
7
@snehaark
Sneha Kudugunta
8 months
An interesting resource for lightweight extremely multilingual LangID - how FUN!
@iseeaswell
Isaac R Caswell
8 months
Have you ever wanted a LangID model that works on 1500+ languages? check out FUN-LangID: !
1
8
50
0
0
7
@snehaark
Sneha Kudugunta
5 years
Me going over my Google Keep notes near the end of the week: Note: Talk to xyz Me: Who is xyz? Note: Weekend Me: Yes, it exists. Note: what isuseful? Me: Not you, clearly. 🙄
0
0
7
@snehaark
Sneha Kudugunta
5 years
Oops
@kareem_carr
🔥Kareem Carr | Statistician 🔥
5 years
therapist: and what do we do when we’re stressed about deadlines? me: start a new research project? therapist: no
56
1K
7K
0
0
6
@snehaark
Sneha Kudugunta
5 years
Yep. Too old for that.
0
0
6
@snehaark
Sneha Kudugunta
5 years
This is art.
@drdrewsteen
Drew Steen
5 years
OK y'all, it is now my pleasure to show off some of the truly, genuinely heinous plots students in my Reproducible Data Analysis class made. Content warning: these plots are f***ing awful.
64
831
3K
0
0
6
@snehaark
Sneha Kudugunta
5 years
My parents started out teaching me both Telugu and English, but prioritized the latter for far too long for similar reasons. I took Telugu class for ~8 years when we moved back to India, but it simply isn't the same.
@ekp
Ellen K. Pao
5 years
My older sister's kindergarten teacher told my parents to stop speaking Chinese at home or she would struggle, so my dad spoke only English to us. My mom spoke a mix, and my grandparents spoke only Chinese, so we still learned some Chinese, but the message was assimilate or fail
48
161
963
0
0
6
@snehaark
Sneha Kudugunta
5 years
There 8 copies of this paper lying around the nearest office printer already. 🤓
@quocleix
Quoc Le
5 years
XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) arxiv: github (code + pretrained models): with Zhilin Yang, @ZihangDai , Yiming Yang, Jaime Carbonell, @rsalakhu
Tweet media one
19
700
2K
0
0
5
@snehaark
Sneha Kudugunta
1 year
An absolutely OP resource 😺
@iseeaswell
Isaac R Caswell
1 year
Just added 84 more languages to GATITOS! It now has a total of 113 languages, many of which with no other public resources 😊
1
11
37
0
0
5
@snehaark
Sneha Kudugunta
3 years
Easily deploying large models is an important direction of research, and we believe TaskMoE is a promising step towards more inference friendly algorithms that retain the quality gains of scaling. 9/9
0
0
5
@snehaark
Sneha Kudugunta
5 years
@ankurbpn @orf_bnw @naveenariva @GoogleAI We look at how our similarity measure capturing linguistic similarity vs script. 5/n
Tweet media one
1
1
5
@snehaark
Sneha Kudugunta
5 years
*Adds to pile of things to tell myself when I feel a bout of imposter syndrome coming on*
@justinkan
Justin Kan
5 years
Sometimes the best experience is the one you’re wholly unqualified for.
11
154
807
1
0
5
@snehaark
Sneha Kudugunta
5 years
@ass_deans Caveat: You cannot possibly catch them all.
1
1
4
@snehaark
Sneha Kudugunta
5 years
Now that mindful AI is a thing, can we have AI mindfulness retreats?
@royschwartzNLP
Roy Schwartz
5 years
The focus on SOTA has caused a dramatic increase in the cost of AI, leading to environmental tolls and inclusiveness issues. We advocate research on efficiency in addition to accuracy ( #greenai ). Work w/ @JesseDodge @nlpnoah and @etzioni at @allen_ai
1
50
160
1
0
4
@snehaark
Sneha Kudugunta
4 years
@asiddhant1 @orf_bnw @ankurbpn You need to think bigger - we can't have AGI without modeling the Multiverse too 😤 👽🚀🌌
0
0
4
@snehaark
Sneha Kudugunta
5 years
- Write a novel or collection of short stories with women who both work on sciencey things and have an ok personal life. - Perform Standup. - Write/direct a film with women who both work on sciencey things and have an ok personal life. - Paint enough for an art exhibition.
@shl
Sahil Lavingia
5 years
I want to: - Publish a fantasy novel - Perform standup comedy - Produce an animated film What do you want to do that's outside of your traditional "career" trajectory?
322
81
1K
0
1
4
@snehaark
Sneha Kudugunta
3 years
At inference time, we can extract sub-networks by discarding unused experts for each task. 5/n
Tweet media one
1
1
3
@snehaark
Sneha Kudugunta
5 years
This is good and this is important.
@juliacarriew
Julia Carrie Wong
5 years
The Guardian is updating our style guide to accurately reflect the nature of the environmental crisis. “Climate change” —> “climate emergency, crisis or breakdown” “Global warming” —> “global heating” “Climate skeptic” —> “climate science denier”
332
10K
27K
0
0
3
@snehaark
Sneha Kudugunta
5 years
@chipro Speculation, but I'm certain more than 8% of women viewing are interested: women of twitter, never hesitate to reach out! I caught myself doing this recently - when @chipro organized Brunchpropagation, I deffo held back for a bit thinking "Why would anyone want to talk to me?"
0
0
3
@snehaark
Sneha Kudugunta
4 years
@scychan_brains @DeepMind @FelixHill84 @AndrewLampinen Congratulations Stephanie, super excited for you!!! 🎉🎊💃
0
0
3
@snehaark
Sneha Kudugunta
3 years
@KreutzerJulia @MasakhaneNLP Congratulations, Julia!
0
0
3
@snehaark
Sneha Kudugunta
3 years
Finally, when scaling up to 200 language pairs, our 128-expert task-MoE (13B parameters) still performs competitively with a token-level counterpart, while improving the peak inference throughput by a factor of 2.6x. 8/n
1
0
3
@snehaark
Sneha Kudugunta
5 years
A study of my brain?
@JesseGomezBrain
Jesse Gomez
5 years
The Pokémon paper is out! In this study, we scanned the brains of adults, who as children, became Pokémon experts. We find a region of their brain becomes uniquely responsive to @Pokemon , helping us get at why the brain is organized the way it is.
32
433
1K
0
0
3
@snehaark
Sneha Kudugunta
5 years
That's not even what phones are for, you see. 🙃
0
0
3
@snehaark
Sneha Kudugunta
5 months
@yacineMTB @mallocmyheart @dingboard_ 👀 I promise to make the worst shitposts
0
0
3
@snehaark
Sneha Kudugunta
4 years
0
0
3
@snehaark
Sneha Kudugunta
5 years
@sarahookr Soup and Hummus - I was grateful for the non sweet food!
0
0
3
@snehaark
Sneha Kudugunta
5 years
This sounds wonderful 😍
@aleks_madry
Aleksander Madry
5 years
Took a while (don't ask) but here they are: Notes from "Science of Deep Learning" class co-taught with @KonstDaskalakis now available: . More coming soon (promise!). Feedback very welcome! Thanks to @andrew_ilyas for heroic effort on doing final revisions.
11
121
473
0
0
3
@snehaark
Sneha Kudugunta
5 years
@archit_sharma97 @gamaga_ai Why can't you have a 100 tabs open like a civilized human being @archit_sharma97 ?
0
0
3
@snehaark
Sneha Kudugunta
3 years
Another significant advantage of TaskMoE is that we retain all the gains from scaling - our method is +2.1 BLEU on average across all languages vs distilling the TokenMoE to a student model with size comparable to the subnetwork extracted from TaskMoE. 7/n
Tweet media one
1
1
3
@snehaark
Sneha Kudugunta
5 years
🥳
@archit_sharma97
Archit Sharma
5 years
1/ Can we use model-based planning in behavior space rather than action space? DADS can discover skills without any rewards, which can later be composed zero-shot via planning in the behavior space for new tasks. Paper: Website:
1
9
47
0
0
3
@snehaark
Sneha Kudugunta
3 years
So we route tokens according to broader categories (route by task boundaries vs route per token) - that is, every token of a language is routed to the same subnetwork. This enables the model to dedicate a fewer experts to a single task identity during training and inference.4/n
Tweet media one
1
1
2
@snehaark
Sneha Kudugunta
5 years
PS - Silliness aside, this is definitely something we should care about for the long term
0
0
2
@snehaark
Sneha Kudugunta
5 years
Yes, pls
@hankgreen
Hank Green
5 years
WE CAN FUCKING DO THIS
58
4K
25K
0
0
2
@snehaark
Sneha Kudugunta
4 years
@putadent Didn't submit anything, I'm just 🍿
1
0
2
@snehaark
Sneha Kudugunta
5 years
I keep discovering things I wish I found during my undergrad. *sigh*
@jonaslindeloev
Jonas K. Lindeløv
5 years
I've made this cheat sheet and I think it's important. Most stats 101 tests are simple linear models - including "non-parametric" tests. It's so simple we should only teach regression. Avoid confusing students with a zoo of named tests. 1/n
93
3K
9K
0
0
2
@snehaark
Sneha Kudugunta
3 years
Mixture-of-Experts (or MoE) models are a great way to scale! Researchers have successfully scaled multilingual neural machine translation (MNMT) models up to 1 Trillion parameters and beyond. 2/n
Tweet media one
1
1
2
@snehaark
Sneha Kudugunta
5 years
I've been hoping @waitbutwhy would release one of these since @neuralink 's tweet last week - this is going to be a delightful Saturday afternoon 🤓
@waitbutwhy
Tim Urban
7 years
It's finally here: the full story on Neuralink. I knew the future would be nuts but this is a whole other level.
Tweet media one
754
6K
15K
0
0
2