Colin Raffel @colinraffel Twitter profile

Pinned Tweet

Colin Raffel

5 months

📢Life update:📢 I moved to Toronto, where I'm now an associate professor at the University of Toronto and an associate research director at the Vector Institute. I wrote a blog post about the long winding path that led me here:

129

42

1K

Last Seen Profiles

@_BrendanStanton

@cousin_1916

@MajorCantrell8

@BandarStw

@MistressCamii

@saladaszaa

@TylerAronsonQB

@harupyon_hirata

@StirileProTV

@MM11official

@CoachPhilReacts

@alunschool

@derinisaleeko

@HeritierBahizi

@anoud8a

@Vottun

@coachbobbyreid

@HelpSupercard

@Scat_BRC20

@UEMSO_FDNY

@Urbanasview

@hotgirltraumas

@seplpadome

@keboola

@IamUzenvic

@JasCephasJones

@1989nicolas

@sdneph

@GettingOverCast

@YouthCampCenter

@GlassCrafty

@tatisabini

@KCRoosWSoccer

@mynropmail

@CR_cards

@WaltripBaseball

Colin Raffel

@colinraffel

2 years

Announcing a new research focus in my lab: Developing tools to enable collaboratively-built and continually-improved models. Blog post: Paper on model "patches": Paper on "merging" models: Thread ⬇️ (1/11)

23

386

2K

Colin Raffel

@colinraffel

2 years

18

139

2K

Colin Raffel

@colinraffel

2 years

The year is 2012. I am learning deep learning. We pre-train models as denoising autoencoders to provide a better initialization. The year is 2022. I am teaching deep learning. We pre-train models as denoising autoencoders to provide a better initialization.

17

80

1K

Colin Raffel

@colinraffel

3 years

A student recently asked me if they should use BERT, GPT-n, or T5 for a simple NLP problem; I recommended a bag-of-words model. Where do I sign up for my curmudgeon license?

27

66

1K

Colin Raffel

@colinraffel

5 years

New paper! We perform a systematic study of transfer learning for NLP using a unified text-to-text model, then push the limits to achieve SoTA on GLUE, SuperGLUE, CNN/DM, and SQuAD. Paper: Code/models/data/etc: Summary ⬇️ (1/14)

9

370

1K

Colin Raffel

@colinraffel

3 years

New preprint! We demonstrate an attack that can extract non-trivial chunks of training data from GPT-2. Should we be worried about this? Probably! Paper: Blog post:

Privacy Considerations in Large Language Models

Posted by Nicholas Carlini, Research Scientist, Google Research Machine learning-based language models trained to predict the next word in a senten...

research.google

15

236

1K

Colin Raffel

@colinraffel

3 years

Today is my first day as a faculty researcher at @huggingface ! I am extremely excited to join this incredible community. Expect awesome things soon! 🤗🚀

22

36

1K

Colin Raffel

@colinraffel

4 years

FixMatch was accepted at NeurIPS with 7/7/7/7 scores... after being rejected from CVPR and ICML for being "too simple". If you're dealing with a bogus rejection and know your work is good - don't quit, resubmit! Or just post to arxiv and skip the conference review roulette...

20

86

991

Colin Raffel

@colinraffel

4 years

I often get emails from enthusiastic new researchers from outside the US. They take free ML courses and develop OSS, but can't afford MS programs, can't get into PhDs w/o publications, have trouble publishing w/o mentorship, and can't get visas for an RAship. Any advice for them?

79

146

942

Colin Raffel

@colinraffel

4 years

I'm starting a professorship in the CS department at UNC in fall 2020 (!!) and am hiring students! If you're interested in doing a PhD @unccs please get in touch. More info here:

82

146

893

Colin Raffel

@colinraffel

3 years

I recently have had a number of aspiring ML researchers ask me how to stay on top of the paper onslaught. Here are three concrete tips: 1) Pick a tiny subfield to focus on 2) Skim 3) Rely on your community Thread to explain ⬇️ (1/5)

9

136

891

Colin Raffel

@colinraffel

4 years

📢 I am hiring PhD students for Fall 2021! 📢 If you want to work with us on semi-supervised/unsupervised/transfer learning and beyond, you should apply: Also, GRE is optional and we offer need-based admissions fee waivers! Contact me for more info.

Admission to Graduate Programs - Computer Science

Why Apply to UNC for Graduate School? Founded by Turing Award winner Fred Brooks, the Computer Science department at UNC has an outstanding research tradition as one of the oldest computer science...

cs.unc.edu

20

254

830

Colin Raffel

@colinraffel

4 years

I'm starting to think my main job as an ML professor is to be an associative memory for arxiv papers

9

23

823

Colin Raffel

@colinraffel

2 years

This semester I'm teaching a role-playing paper-reading seminar on Large Language Models, covering 57 (!) papers on the good, bad, and ugly of LLMs. Follow along here:

GitHub - craffel/llm-seminar: Seminar on Large Language Models (COMP790-101 at UNC Chapel Hill,...

Seminar on Large Language Models (COMP790-101 at UNC Chapel Hill, Fall 2022) - craffel/llm-seminar

github.com

10

114

820

Colin Raffel

@colinraffel

6 years

Stages of implementing a machine learning algorithm: 1) Syntax errors 2) Dimension mismatch errors 3) NaNs 4) Model trains, but results are bad 5) Hyperparameter tweaking ... N) Success!

19

182

800

Colin Raffel

@colinraffel

4 years

Best drawing of a neural network that I have ever seen.

Mohamed Adel Musallam | محمد عادل مسلم

@EljazryMohamed

4 years

Part about the Perceptron from Frank Rosenblatt.

5

44

238

6

92

736

Colin Raffel

@colinraffel

5 years

If you are reeling from a NeurIPS rejection or stressing about an ICLR submission, remember that some of the best papers were never published anywhere except arxiv. Thread of a few favorites (1/5):

19

155

718

Colin Raffel

@colinraffel

3 years

Can your NLP model handle noooisy mEsSy #realworldtext ? ByT5 works on raw UTF-8 bytes (no tokenization!), beats SoTA models on many popular tasks, and is more robust to noise. 📜 Preprint: 💾 Code/Models: Summary thread ⬇️ (1/9)

6

149

650

Colin Raffel

@colinraffel

4 years

The T5 paper has been published in JMLR! 🎉 Since I have already talked more than enough about T5, here instead is a thread about the (awesome) process of publishing in JMLR: (1/10)

6

100

641

Colin Raffel

@colinraffel

5 years

New blog post: "GANs and Divergence Minimization", which covers the perspective of GANs as minimizing an "adversarial divergence" and draws parallels to maximum likelihood training. Also provides some motivation for better evaluation of GANs.

4

142

571

Colin Raffel

@colinraffel

6 years

New work on arxiv with @avitaloliver @gstsdn @ekindogus @goodfellow_ian on realistically evaluating deep semi-supervised learning algorithms: Thread with our contributions and findings ⬇️ 1/10

3

195

555

Colin Raffel

@colinraffel

2 years

As of today, I've been an assistant professor for two years. It's been both awesome and difficult. I wrote a blog post about some of the things I've struggled with and how I've coped with them.

13

45

531

Colin Raffel

@colinraffel

4 years

Hot take: Mathiness [1] is like an adversarial patch [2] for ML conference reviewers: Mathiness causes a reviewer to classify the paper as "accept" regardless of whether the math is useful/valid and the paper is any good. [3] Fig. 6 has some empirical evidence of this. (refs ⬇️)

14

82

511

Colin Raffel

@colinraffel

2 years

New preprint! We introduce 𝚃-𝙵𝚎𝚠 and (𝙸𝙰)³, a few-shot learning recipe that outperforms in-context learning at dramatically lower costs and gets super-human results on the RAFT benchmark for the first time. 📄 💾 🧵⬇️ (1/9)

15

100

504

Colin Raffel

@colinraffel

5 months

New blog post where I argue that "large language model development" can be considered a new subfield that grew out of deep learning, NLP, etc. and reflect on what to do when your field of study gives birth to a new one:

7

85

502

Colin Raffel

@colinraffel

5 years

I got married this weekend. 🧡

58

0

488

Colin Raffel

@colinraffel

3 months

How can we recycle specialized PEFT modules to create a generalist MoE-style model? We introduce PHATGOOSE, which learns a post-hoc routing scheme and significantly improves zero-shot generalization. 📜 📝 💾

11

80

479

Colin Raffel

@colinraffel

5 months

Also, I am 1000% hiring PhD students this round! If you want to work on - open models - collaborative/decentralized training - building models like OSS - coordinating model ecosystems - mitigating risks you should definitely apply! Deadline is Friday 😬

How to Apply — Department of Computer Science, University of Toronto

web.cs.toronto.edu

Colin Raffel

@colinraffel

5 months

📢Life update:📢 I moved to Toronto, where I'm now an associate professor at the University of Toronto and an associate research director at the Vector Institute. I wrote a blog post about the long winding path that led me here:

129

42

1K

12

75

458

Colin Raffel

@colinraffel

4 years

The t5 library now has a simple API that connects the text-to-text data loading/processing/evaluation pipeline to @huggingface Transformers' PyTorch implementation of the T5 models! Here's a usage example:

3

101

421

Colin Raffel

@colinraffel

3 years

New blog post about a course format @_AlecJacobson and I have been using: the role-playing seminar. It's an alternative to the standard one-presenter-per-class graduate-level paper-reading seminar and is dramatically more interactive, informative, and fun.

14

69

394

Colin Raffel

@colinraffel

4 years

Many people are familiar with code smell () but researchers should also have a good sense of "paper smell". Here are some examples for ML papers (thread):

Code smell - Wikipedia

en.wikipedia.org

6

96

371

Colin Raffel

@colinraffel

6 years

New paper w/ @D_Berthelot_ML Aurko Roy and @goodfellow_ian where we propose an adversarial regularizer for improving interpolation in autoencoders and measure whether it also improves representation learning performance. Paper , code

5

129

371

Colin Raffel

@colinraffel

1 year

What's it take for an LLM to learn a fact? And can an LLM tell what's factual and not? Check out our 💥two💥 new papers! LLMs Struggle to Learn Long-Tail Knowledge Evaluating the Factual Consistency of LLMs Through Summarization

Evaluating the Factual Consistency of Large Language Models...

While large language models (LLMs) have proven to be effective on a large variety of tasks, they are also known to hallucinate information. To measure whether an LLM prefers factually consistent...

arxiv.org

10

61

353

Colin Raffel

@colinraffel

4 years

I just made this figure for a class I am teaching on "learning from limited labeled data". The left plot represents 6 years of results; the right plot is ~1 year. Anyone else feel like our field is moving kinda fast?

10

46

332

Colin Raffel

@colinraffel

4 years

Me at the #neurips poster session when I see a paper I reviewed and fought for accepting

1

7

323

Colin Raffel

@colinraffel

5 years

New blog post: "You Don't Know JAX", a brief tutorial which covers the basics of computing gradients, just-in-time compilation, and auto-batching with JAX.

5

78

309

Colin Raffel

@colinraffel

6 years

The slides from my talk "A Few Unusual Autoencoders" which I gave last month at @VectorInst and @nyuMARL are now online: The talk covers MusicVAE, ACAI, and some unpublished "adversarial denoising autoencoder" work.

6

70

303

Colin Raffel

@colinraffel

6 years

Super happy that our code for "Realistic Evaluation of Semi-Supervised Learning Algorithms" () is finally released on GitHub: Send us pull requests! Joint work with @avitaloliver @gstsdn @ekindogus @goodfellow_ian .

GitHub - brain-research/realistic-ssl-evaluation: Open source release of the evaluation benchmark...

Open source release of the evaluation benchmark suite described in "Realistic Evaluation of Deep Semi-Supervised Learning Algorithms" - brain-research/realistic-ssl-evaluation

github.com

0

80

302

Colin Raffel

@colinraffel

3 years

Now that "Do Transformer Modifications Transfer Across Implementations and Applications?" has been accepted to #EMNLP2021 , we can finally tweet about it! Paper 📝: Code 💾: Thread summary: ⬇️ (1/8)

Do Transformer Modifications Transfer Across Implementations and...

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In...

arxiv.org

8

62

302

Colin Raffel

@colinraffel

2 years

My PhD student @zhenlinx made me liang pi (a mutual favorite) in celebration for passing his thesis defense! Congrats Zhenlin.

6

3

300

Colin Raffel

@colinraffel

2 years

When and why is it possible to extract training data from large language models? In a new preprint, we show that the number of times a sequence is duplicated in the training data heavily impacts whether it can be successfully extracted. Thread⬇️ (1/8)

4

62

296

Colin Raffel

@colinraffel

4 years

Does anyone know of a list of ML PhD research internships? Asking for some friends...

16

35

289

Colin Raffel

@colinraffel

1 year

After ~1 year, my article on building ML models like OSS has been published in the communications of the ACM! Lots of exciting work in this direction since then and lots to come. If you are interested, join our community:

Collaborative + Communal + Continual ML

Discussion about collaborative, communal, and continual model development.

cccml.zulipchat.com

Colin Raffel

@colinraffel

2 years

Announcing a new research focus in my lab: Developing tools to enable collaboratively-built and continually-improved models. Blog post: Paper on model "patches": Paper on "merging" models: Thread ⬇️ (1/11)

23

386

2K

7

34

292

Colin Raffel

@colinraffel

7 years

Belated blog post about what I did during the Brain residency and what I'm doing now:

7

75

282

Colin Raffel

@colinraffel

3 years

I contributed to the "Learning with Fewer Labeled Examples" chapter of this incredible book. The chapter is a very broad and up-to-date overview of semi-supervised/transfer/meta/few-shot learning, domain adaptation, data augmentation, and beyond.

Kevin Patrick Murphy

@sirbayes

3 years

I am pleased to announce that the camera ready version of my new textbook, "Probabilistic Machine Learning: An Introduction", is finally available from . Hardcopies will be available from MIT Press in Feb 2022.

47

787

4K

3

34

278

Colin Raffel

@colinraffel

1 year

Last year, @yisongyue told me that he has his students meet without him to brainstorm honest collective feedback. I had my advisees do this and it was super helpful, so I wrote a blog post about it:

6

32

271

Colin Raffel

@colinraffel

4 years

Friendly reminder that when the BERT paper came out less than two years ago, the authors considered 340M parameters 🔥🔥🔥extreme🔥🔥🔥

8

19

268

Colin Raffel

@colinraffel

4 years

There is a strange situation in our field: Most people I know and respect (and most people on Twitter in general) agree that "simple is better than complex". But the consensus of the cabal of anonymous, faceless reviewers seems to be the opposite. What is going on?

Jimmy Lin

@lintool

4 years

Reviewers automatically assume that simple is not novel. This is sheer laziness. Yes, it may be simple and obvious in retrospect, but someone had to have that insight first. Simple is good. Simple is robust, easy to implement and reproduce, broadly applicable, etc.

58

509

4K

20

16

264

Colin Raffel

@colinraffel

6 years

TIL there are many (near) duplicates in CIFAR-10. For example, variants of this car image appears (at least) 16 times in the training set. (thread)

6

93

262

Colin Raffel

@colinraffel

3 years

Now that "How Much Knowledge Can You Pack Into the Parameters of a Language Model?" has been published at #EMNLP2020 (poster at Gather Session 3G, 11/17 UTC-05:00), I can tell you the funny and awful story of how this paper came to be. (1/19)

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure...

arxiv.org

4

30

266

Colin Raffel

@colinraffel

6 years

I gave this talk again today to an audience of CS majors who didn't have any ML experience. It's really rewarding to be forced to explain things like variational inference and autoregressive models without using _any_ technical language.

Colin Raffel

@colinraffel

6 years

The slides from my talk "A Few Unusual Autoencoders" which I gave last month at @VectorInst and @nyuMARL are now online: The talk covers MusicVAE, ACAI, and some unpublished "adversarial denoising autoencoder" work.

6

70

303

5

44

259

Colin Raffel

@colinraffel

5 years

New pre-print! Monotonic Infinite Lookback Attention (MILk): an online attention mechanism which we applied to simultaneous machine translation. It allows the model to attend to the entire input sequence up to a location set by a monotonic attention head.

2

60

254

Colin Raffel

@colinraffel

2 years

I'm glad everyone likes my dumb joke

Hamid Palangi

@hmd_palangi

2 years

#ACL2022 @colinraffel @Diyi_Yang @ank_parikh

13

105

917

2

6

257

Colin Raffel

@colinraffel

4 years

Hot take: When evaluating a self-supervised model's performance on a new task without fine-tuning, don't call it "zero-shot"; call it "weakly supervised multi-task". These models only succeed when their unsupervised pre-training actually provides weak supervision for the task.

6

23

253

Colin Raffel

@colinraffel

6 years

New paper with Chung-Cheng Chiu: Monotonic Chunkwise Attention (MoChA), an online/linear-time attention mechanism which computes soft attention over small chunks with adaptively set boundaries. Matches the performance of (offline) softmax attention on WSJ!

2

68

247

Colin Raffel

@colinraffel

3 years

📣 Announcing the ICLR 2021 Workshop on Enormous Language Models 📣 We have an incredible speaker lineup that covers building, evaluating, critiquing, and improving large LMs, as well as a collaborative parcipant-driven benchmark and 2 panels! More info:

6

47

249

Colin Raffel

@colinraffel

4 years

In case you missed our #neurips poster on MixMatch () today because you aren't in Vancouver or didn't survive the poster session stampede, here's the PDF: and here's a transcript of what I said to everyone who came by: ⬇️ 1/11

MixMatch: A Holistic Approach to Semi-Supervised Learning

Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant...

arxiv.org

4

52

249

Colin Raffel

@colinraffel

3 years

New preprint! We introduce a simplified version of pattern-exploiting training called ADAPET. ADAPET outperforms PET and iPET on SuperGLUE without using task-specific unlabeled data or ensembling and beats few-shot GPT-3 with a much smaller model.

5

44

247

Colin Raffel

@colinraffel

6 years

Controversial (?) opinion: Hinton diagrams are cooler than heatmaps. (from )

22

41

242

Colin Raffel

@colinraffel

2 years

I am reading "A Neural Probabilistic Language Model" in detail for the first time and wow is it a fun read - discusses and justifies word embeddings, advocates scaling up models and data, uses rudimentary data- and model- parallel training... all done from scratch on CPUs.

2

17

239

Colin Raffel

@colinraffel

2 years

Single-blind: Reviewers know author's identities Double-blind: Reviewers don't know author's identities Triple-blind: Reviewers must write reviews without reading their assigned submissions Quadruple-blind: Authors are never told if their paper was accepted or rejected ...

10

11

235

Colin Raffel

@colinraffel

4 years

I recently came across , which "assumes 2-3 runs" of T5-11B. In fact, we trained T5-11B *once*. That's why we spend 35 pages figuring out how we should train before we start training. You don't want to mess up a training run that big.

The Cost of Training NLP Models: A Concise Overview

We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as...

arxiv.org

9

16

228

Colin Raffel

@colinraffel

3 years

We showed last year (with OpenAI co-authors!) that it's surprisingly easy to extract verbatim training data from large LMs: It kind of boggles my mind that they included GPL'd source code in the training set for this model.

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a...

arxiv.org

Armin Ronacher

@mitsuhiko

3 years

I don't want to say anything but that's not the right license Mr Copilot.

72

1K

5K

3

32

229

Colin Raffel

@colinraffel

6 years

Protip: if a random person asks you what you do and you want to avoid talking about the singularity, Sophia the robot, or "Facebook had to shut down AI when it invented its new language", just say "statistics".

12

34

226

Colin Raffel

@colinraffel

4 years

I somehow missed this great paper by @tuvuumass et al.: They learn "task embeddings" (a la task2vec) for NLP tasks and show how they can be used to predict the effectiveness of intermediate-task transfer. Lots of experiments and a promising direction!

Exploring and Predicting Transferability across NLP Tasks

Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language...

arxiv.org

3

28

227

Colin Raffel

@colinraffel

3 years

Mind-boggling results on the final EfficientQA leaderboard: The best system beat the REALM baseline by almost 20 points, and a 30 megabyte model got > 25% accuracy! Looking forward to hearing more about these systems at NeurIPS.

0

31

222

Colin Raffel

@colinraffel

3 years

The mT5 paper was accepted to NAACL 🎉 so now we can stop pretending that it doesn't exist! Updated arxiv with many juicy new results, including a simple way to prevent "accidental translation" exhibited by generative models in zero-shot settings.

Adam Roberts

@ada_rob

4 years

We are releasing mT5: A massively-multilingual version of T5 that supports over 💯 languages! mT5 was pre-trained on a multilingual version of C4 and achieves SoTA on many cross-lingual NLP tasks. 📜Pre-print: 💾Code/models:

4

124

506

4

38

222

Colin Raffel

@colinraffel

2 years

As a contributor to this book, I've been offered a free copy. However, I don't know what I'd do with an actual physical book in 2022. If you'd like my copy, please reply with a < 280 character description of the benefit you'd get from receiving a copy and I'll pick a recipient.

Kevin Patrick Murphy

@sirbayes

2 years

I am delighted to announce that my new book, “Probabilistic Machine Learning: An Introduction”, is finally available in print format! You can order it from , or from Amazon. Also available at 1/4

45

487

3K

47

25

205

Colin Raffel

@colinraffel

3 years

In 15 minutes I'll be giving a talk on "The Benefits of Unified Frameworks for Language Understanding" at the "Conceptual Understanding of Deep Learning" workshop (). Livestream here:

4

26

201

Colin Raffel

@colinraffel

6 years

I think we need a taxonomy of adjectives for describing neural network size. "Large neural networks" "Outrageously large neural networks" () "Ridiculously large neural networks" "Inconceivably large neural networks" "Uncomfortably large neural networks" ...

Outrageously Large Neural Networks: The Sparsely-Gated...

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been...

arxiv.org

19

40

197

Colin Raffel

@colinraffel

5 years

I saw this paper when it was presented at NeurIPS 2018 and really enjoyed it. It's worth a read for anyone who works on or thinks about generative models.

Stefano Ermon

@StefanoErmon

5 years

If all training images for a GAN/VAE/PixelCNN have 2 objects, will they only generate images with 2 objects? If trained on (🔵,💙,🔴), will they also generate ❤️? Find out in @shengjia_zhao 's blog post on generalization and bias for generative models. 👉

1

136

520

2

32

194

Colin Raffel

@colinraffel

4 years

Hot take: The most surprising thing about BERT isn't how well it worked when it was proposed, but how much better it would have worked if they had just pre-trained for longer on a more diverse dataset.

5

19

192

Colin Raffel

@colinraffel

4 years

NeurIPS95 "Learning to Learn" workshop focused on "unsupervised learning on a large corpus of unlabelled data to learn features for subsequent supervised learning on a smaller labelled corpus" and "using models previously learned for other problems when learning new problems" 🤔

3

11

189

Colin Raffel

@colinraffel

3 years

😢

5

17

187

Colin Raffel

@colinraffel

4 years

The problem with "let the data speak for itself" is that most of the time data doesn't know how to talk

9

12

183

Colin Raffel

@colinraffel

4 years

#neurips tip! Try to learn about *one* great new paper a day. Any more than that can be overwhelming, any less and you're missing the point a little.

3

15

178

Colin Raffel

@colinraffel

5 years

Should we agree as a field not to post ICLR submissions on arxiv until after the review period is over? The paper is already public thanks to OpenReview, so it can (and should) be cited as existing work. arxiv'ing only serves to deanonymize it, which is probably a net negative.

8

14

174

Colin Raffel

@colinraffel

6 years

A video of my talk "Doing Strange Things with Attention" which I gave at AI @WithTheBest in October is now online: Covers feedforward attention, sequence embedding using attention, monotonic attention, and a new variant called MoChA.

Colin Raffel - Doing Strange Things with Attention - AI With The Best...

AI With The Best hosted 50+ speakers and hundreds of attendees from all over the world on a single platform on October 14-15, 2017. The platform held live ta...

www.youtube.com

1

59

169

Colin Raffel

@colinraffel

5 years

Fitting a 2D Gaussian to a mixture distribution via KL (aka maximum likelihood) and reverse KL (GAN-like).

3

36

171

Colin Raffel

@colinraffel

4 years

Most underhyped paper of 2019 IMO ()

Big Transfer (BiT): General Visual Representation Learning

Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on...

arxiv.org

Google AI

@GoogleAI

4 years

Presenting BiT, an open-source approach for large-scale pre-training of models covering a wide range of visual tasks, which highlights the importance of choices in the model architecture for downstream performance. Learn all about it below:

9

233

737

3

18

171

Colin Raffel

@colinraffel

4 years

Does anyone else feel a unique kind of anxiety as they watch the months tick by on the arxiv IDs of new preprints?

3

4

159

Colin Raffel

@colinraffel

1 year

Super excited to be heading to #NeurIPS2022 with five (!) of my students! Here's a thread of all the places you can find us: (1/9)

4

7

149

Colin Raffel

@colinraffel

4 years

The only measure of intelligence I'm comfortable with is perplexity

7

28

155

Colin Raffel

@colinraffel

4 years

#neurips tips day 5 (h/t @chris_j_beckham )! Conferences are a parade of successes. Remember that for every impressive paper there are many (unpublished) ideas that didn't pan out. Take this opportunity to ask people about negative results!

2

17

153

Colin Raffel

@colinraffel

3 years

The #ICLR2021 Workshop on Enormous Language Models (WELM) is tomorrow, May 7th! Full info: Livestream: gathertown info for ICLR registrants: Thread summarizing the talks & panels ⬇️ (1/14)

2

49

151

Colin Raffel

@colinraffel

6 years

I'll (help) present 3 posters @ICLR18 : Realistic Evaluation of Semi-Supervised Learning Mon 4/30 4:30-6:30 #3 , Thermometer Encoding Tue 5/1 4:30-6:30 #14 , Monotonic Chunkwise Attention Wed 5/2 11:00-1:00 #28 !

0

34

147

Colin Raffel

@colinraffel

5 months

I'll be at #NeurIPS2023 supporting my collaborators who are presenting , , , and . Find me to chat about decentralizing/democratizing/de-risking ML!

5

11

147

Colin Raffel

@colinraffel

4 years

Today, the T5 team competed against T5 in a "pub quiz" on (context-free) questions from the TriviaQA/NQ validation sets. We LOST! We only got 20% right; T5 got 35%. To see how to fine-tune T5 on context-free QA (or any other task) with a free TPU, check out our Colab tutorial ⬇️

Adam Roberts

@ada_rob

4 years

As promised, we have made the Text-To-Text Transfer Transformer (T5) models much easier to fine-tune for new tasks, and we just released a Colab notebook where you can try it yourself on a free TPU! 👇 (1/3)

6

109

402

1

36

146

Colin Raffel

@colinraffel

3 years

I actually encourage my students & colleagues to get on Twitter, because (for better or worse) it remains the best place to find out about new papers. Most of the time, I only check a filtered version of my timeline that only shows tweets with an link. 🤷

3

7

143

Colin Raffel

@colinraffel

5 years

New work w/ @yaoqinucsd , Nicholas Carlini, @goodfellow_ian , and Gary Cottrell on generating imperceptible, robust, and targeted adversarial examples for speech recognition systems! Paper: Audio samples:

Imperceptible, Robust, and Targeted Adversarial Examples for...

Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output. So far, adversarial examples have been studied most extensively in the image...

arxiv.org

1

38

139

Colin Raffel

@colinraffel

5 years

Google has open-sourced Lingvo, which is the excellent codebase we used for the Monotonic (Chunkwise) Attention papers! Has also been used in dozens of other Brain papers. Code: Pre-print:

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of...

arxiv.org

0

40

140

Colin Raffel

@colinraffel

4 years

Protip: It is not too late to apply to start a PhD in Fall 2020 at the UNC CS department! The deadline for applications is, amazingly, not until March 10th.

Colin Raffel

@colinraffel

4 years

I'm starting a professorship in the CS department at UNC in fall 2020 (!!) and am hiring students! If you're interested in doing a PhD @unccs please get in touch. More info here:

82

146

893

4

32

142

Colin Raffel

@colinraffel

2 years

TIL that ICLR is the #1 conference in "Artificial Intelligence" according to Google Scholar Metrics () but it's still not included in . All rankings are silly and arbitrary, but this seems especially silly and especially arbitrary.

10

8

140

Colin Raffel

@colinraffel

3 years

2) Skim You'll find that many papers within your subfield of choice have a lot in common - there is often only a small nugget of novelty in each paper. It's incredibly important to develop your ability to find this nugget as quickly as possible. (3/5)

3

140

Colin Raffel

@colinraffel

4 years

I finally put up the slides for my faculty job talk from last year: They are now pretty out-of-date but I spent a ton of time making them fancy and clear. Includes overviews of a few frameworks for deep generative modeling, +MoChA/MILk, MusicVAE, and ACAI.

3

21

138

Colin Raffel

@colinraffel

6 years

The camera ready version of "Realistic Evaluation of Deep Semi-Supervised Learning Algorithms" is now up on arxiv: Includes an entire bonus page, two new tables, a new figure, and a couple of new experiments!

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have...

arxiv.org

0

27

138

Colin Raffel

@colinraffel

6 years

PSA: If a paper on a generative model of images only presents results on MNIST/SVHN/CelebA, you should be skeptical that it will work in general. These datasets are extremely regular - they are normalized so that objects tend to appear in the same location/orientation.

2

25

137

Colin Raffel

@colinraffel

1 year

We're having a *debate* at the Transfer Learning for NLP workshop @NeurIPSConf this year. @kchonyc is one of our debaters; the other one can't make it to NeurIPS anymore 😢 Who do you want to see go toe-to-toe with Cho?

25

4

133