Colin Raffel Profile Banner
Colin Raffel Profile
Colin Raffel

@colinraffel

30,270
Followers
655
Following
114
Media
1,577
Statuses

nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlp

Joined March 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@colinraffel
Colin Raffel
5 months
📢Life update:📢 I moved to Toronto, where I'm now an associate professor at the University of Toronto and an associate research director at the Vector Institute. I wrote a blog post about the long winding path that led me here:
129
42
1K
@colinraffel
Colin Raffel
2 years
Announcing a new research focus in my lab: Developing tools to enable collaboratively-built and continually-improved models. Blog post: Paper on model "patches": Paper on "merging" models: Thread ⬇️ (1/11)
Tweet media one
23
386
2K
@colinraffel
Colin Raffel
2 years
Tweet media one
18
139
2K
@colinraffel
Colin Raffel
2 years
The year is 2012. I am learning deep learning. We pre-train models as denoising autoencoders to provide a better initialization. The year is 2022. I am teaching deep learning. We pre-train models as denoising autoencoders to provide a better initialization.
17
80
1K
@colinraffel
Colin Raffel
3 years
A student recently asked me if they should use BERT, GPT-n, or T5 for a simple NLP problem; I recommended a bag-of-words model. Where do I sign up for my curmudgeon license?
27
66
1K
@colinraffel
Colin Raffel
5 years
New paper! We perform a systematic study of transfer learning for NLP using a unified text-to-text model, then push the limits to achieve SoTA on GLUE, SuperGLUE, CNN/DM, and SQuAD. Paper: Code/models/data/etc: Summary ⬇️ (1/14)
Tweet media one
9
370
1K
@colinraffel
Colin Raffel
3 years
New preprint! We demonstrate an attack that can extract non-trivial chunks of training data from GPT-2. Should we be worried about this? Probably! Paper: Blog post:
15
236
1K
@colinraffel
Colin Raffel
3 years
Today is my first day as a faculty researcher at @huggingface ! I am extremely excited to join this incredible community. Expect awesome things soon! 🤗🚀
22
36
1K
@colinraffel
Colin Raffel
4 years
FixMatch was accepted at NeurIPS with 7/7/7/7 scores... after being rejected from CVPR and ICML for being "too simple". If you're dealing with a bogus rejection and know your work is good - don't quit, resubmit! Or just post to arxiv and skip the conference review roulette...
20
86
991
@colinraffel
Colin Raffel
4 years
I often get emails from enthusiastic new researchers from outside the US. They take free ML courses and develop OSS, but can't afford MS programs, can't get into PhDs w/o publications, have trouble publishing w/o mentorship, and can't get visas for an RAship. Any advice for them?
79
146
942
@colinraffel
Colin Raffel
4 years
I'm starting a professorship in the CS department at UNC in fall 2020 (!!) and am hiring students! If you're interested in doing a PhD @unccs please get in touch. More info here:
82
146
893
@colinraffel
Colin Raffel
3 years
I recently have had a number of aspiring ML researchers ask me how to stay on top of the paper onslaught. Here are three concrete tips: 1) Pick a tiny subfield to focus on 2) Skim 3) Rely on your community Thread to explain ⬇️ (1/5)
9
136
891
@colinraffel
Colin Raffel
4 years
📢 I am hiring PhD students for Fall 2021! 📢 If you want to work with us on semi-supervised/unsupervised/transfer learning and beyond, you should apply: Also, GRE is optional and we offer need-based admissions fee waivers! Contact me for more info.
20
254
829
@colinraffel
Colin Raffel
4 years
I'm starting to think my main job as an ML professor is to be an associative memory for arxiv papers
9
23
823
@colinraffel
Colin Raffel
2 years
This semester I'm teaching a role-playing paper-reading seminar on Large Language Models, covering 57 (!) papers on the good, bad, and ugly of LLMs. Follow along here:
10
114
820
@colinraffel
Colin Raffel
6 years
Stages of implementing a machine learning algorithm: 1) Syntax errors 2) Dimension mismatch errors 3) NaNs 4) Model trains, but results are bad 5) Hyperparameter tweaking ... N) Success!
19
182
800
@colinraffel
Colin Raffel
4 years
Best drawing of a neural network that I have ever seen.
@EljazryMohamed
Mohamed Adel Musallam | محمد عادل مسلم
4 years
Part about the Perceptron from Frank Rosenblatt.
Tweet media one
5
44
238
6
92
736
@colinraffel
Colin Raffel
5 years
If you are reeling from a NeurIPS rejection or stressing about an ICLR submission, remember that some of the best papers were never published anywhere except arxiv. Thread of a few favorites (1/5):
19
155
718
@colinraffel
Colin Raffel
3 years
Can your NLP model handle noooisy mEsSy #realworldtext ? ByT5 works on raw UTF-8 bytes (no tokenization!), beats SoTA models on many popular tasks, and is more robust to noise. 📜 Preprint: 💾 Code/Models: Summary thread ⬇️ (1/9)
Tweet media one
6
149
650
@colinraffel
Colin Raffel
4 years
The T5 paper has been published in JMLR! 🎉 Since I have already talked more than enough about T5, here instead is a thread about the (awesome) process of publishing in JMLR: (1/10)
6
100
641
@colinraffel
Colin Raffel
5 years
New blog post: "GANs and Divergence Minimization", which covers the perspective of GANs as minimizing an "adversarial divergence" and draws parallels to maximum likelihood training. Also provides some motivation for better evaluation of GANs.
4
142
571
@colinraffel
Colin Raffel
6 years
New work on arxiv with @avitaloliver @gstsdn @ekindogus @goodfellow_ian on realistically evaluating deep semi-supervised learning algorithms: Thread with our contributions and findings ⬇️ 1/10
Tweet media one
3
195
555
@colinraffel
Colin Raffel
2 years
As of today, I've been an assistant professor for two years. It's been both awesome and difficult. I wrote a blog post about some of the things I've struggled with and how I've coped with them.
13
45
530
@colinraffel
Colin Raffel
4 years
Hot take: Mathiness [1] is like an adversarial patch [2] for ML conference reviewers: Mathiness causes a reviewer to classify the paper as "accept" regardless of whether the math is useful/valid and the paper is any good. [3] Fig. 6 has some empirical evidence of this. (refs ⬇️)
14
82
510
@colinraffel
Colin Raffel
2 years
New preprint! We introduce 𝚃-𝙵𝚎𝚠 and (𝙸𝙰)³, a few-shot learning recipe that outperforms in-context learning at dramatically lower costs and gets super-human results on the RAFT benchmark for the first time. 📄 💾 🧵⬇️ (1/9)
Tweet media one
15
100
504
@colinraffel
Colin Raffel
5 months
New blog post where I argue that "large language model development" can be considered a new subfield that grew out of deep learning, NLP, etc. and reflect on what to do when your field of study gives birth to a new one:
7
85
502
@colinraffel
Colin Raffel
5 years
I got married this weekend. 🧡
Tweet media one
58
0
488
@colinraffel
Colin Raffel
3 months
How can we recycle specialized PEFT modules to create a generalist MoE-style model? We introduce PHATGOOSE, which learns a post-hoc routing scheme and significantly improves zero-shot generalization. 📜 📝 💾
Tweet media one
11
80
479
@colinraffel
Colin Raffel
5 months
Also, I am 1000% hiring PhD students this round! If you want to work on - open models - collaborative/decentralized training - building models like OSS - coordinating model ecosystems - mitigating risks you should definitely apply! Deadline is Friday 😬
@colinraffel
Colin Raffel
5 months
📢Life update:📢 I moved to Toronto, where I'm now an associate professor at the University of Toronto and an associate research director at the Vector Institute. I wrote a blog post about the long winding path that led me here:
129
42
1K
12
75
459
@colinraffel
Colin Raffel
4 years
The t5 library now has a simple API that connects the text-to-text data loading/processing/evaluation pipeline to @huggingface Transformers' PyTorch implementation of the T5 models! Here's a usage example:
3
101
421
@colinraffel
Colin Raffel
3 years
New blog post about a course format @_AlecJacobson and I have been using: the role-playing seminar. It's an alternative to the standard one-presenter-per-class graduate-level paper-reading seminar and is dramatically more interactive, informative, and fun.
14
69
394
@colinraffel
Colin Raffel
4 years
Many people are familiar with code smell () but researchers should also have a good sense of "paper smell". Here are some examples for ML papers (thread):
6
96
371
@colinraffel
Colin Raffel
6 years
New paper w/ @D_Berthelot_ML Aurko Roy and @goodfellow_ian where we propose an adversarial regularizer for improving interpolation in autoencoders and measure whether it also improves representation learning performance. Paper , code
Tweet media one
5
129
371
@colinraffel
Colin Raffel
1 year
What's it take for an LLM to learn a fact? And can an LLM tell what's factual and not? Check out our 💥two💥 new papers! LLMs Struggle to Learn Long-Tail Knowledge Evaluating the Factual Consistency of LLMs Through Summarization
10
61
353
@colinraffel
Colin Raffel
4 years
I just made this figure for a class I am teaching on "learning from limited labeled data". The left plot represents 6 years of results; the right plot is ~1 year. Anyone else feel like our field is moving kinda fast?
Tweet media one
10
46
332
@colinraffel
Colin Raffel
4 years
Me at the #neurips poster session when I see a paper I reviewed and fought for accepting
1
7
323
@colinraffel
Colin Raffel
5 years
New blog post: "You Don't Know JAX", a brief tutorial which covers the basics of computing gradients, just-in-time compilation, and auto-batching with JAX.
5
78
309
@colinraffel
Colin Raffel
6 years
The slides from my talk "A Few Unusual Autoencoders" which I gave last month at @VectorInst and @nyuMARL are now online: The talk covers MusicVAE, ACAI, and some unpublished "adversarial denoising autoencoder" work.
6
70
303
@colinraffel
Colin Raffel
3 years
Now that "Do Transformer Modifications Transfer Across Implementations and Applications?" has been accepted to #EMNLP2021 , we can finally tweet about it! Paper 📝: Code 💾: Thread summary: ⬇️ (1/8)
8
62
302
@colinraffel
Colin Raffel
2 years
My PhD student @zhenlinx made me liang pi (a mutual favorite) in celebration for passing his thesis defense! Congrats Zhenlin.
Tweet media one
6
3
300
@colinraffel
Colin Raffel
2 years
When and why is it possible to extract training data from large language models? In a new preprint, we show that the number of times a sequence is duplicated in the training data heavily impacts whether it can be successfully extracted. Thread⬇️ (1/8)
Tweet media one
4
62
296
@colinraffel
Colin Raffel
4 years
Does anyone know of a list of ML PhD research internships? Asking for some friends...
16
35
289
@colinraffel
Colin Raffel
1 year
After ~1 year, my article on building ML models like OSS has been published in the communications of the ACM! Lots of exciting work in this direction since then and lots to come. If you are interested, join our community:
@colinraffel
Colin Raffel
2 years
Announcing a new research focus in my lab: Developing tools to enable collaboratively-built and continually-improved models. Blog post: Paper on model "patches": Paper on "merging" models: Thread ⬇️ (1/11)
Tweet media one
23
386
2K
7
34
291
@colinraffel
Colin Raffel
7 years
Belated blog post about what I did during the Brain residency and what I'm doing now:
7
75
282
@colinraffel
Colin Raffel
3 years
I contributed to the "Learning with Fewer Labeled Examples" chapter of this incredible book. The chapter is a very broad and up-to-date overview of semi-supervised/transfer/meta/few-shot learning, domain adaptation, data augmentation, and beyond.
@sirbayes
Kevin Patrick Murphy
3 years
I am pleased to announce that the camera ready version of my new textbook, "Probabilistic Machine Learning: An Introduction", is finally available from . Hardcopies will be available from MIT Press in Feb 2022.
Tweet media one
47
787
4K
3
34
279
@colinraffel
Colin Raffel
1 year
Last year, @yisongyue told me that he has his students meet without him to brainstorm honest collective feedback. I had my advisees do this and it was super helpful, so I wrote a blog post about it:
6
32
271
@colinraffel
Colin Raffel
4 years
Friendly reminder that when the BERT paper came out less than two years ago, the authors considered 340M parameters 🔥🔥🔥extreme🔥🔥🔥
Tweet media one
8
19
268
@colinraffel
Colin Raffel
4 years
There is a strange situation in our field: Most people I know and respect (and most people on Twitter in general) agree that "simple is better than complex". But the consensus of the cabal of anonymous, faceless reviewers seems to be the opposite. What is going on?
@lintool
Jimmy Lin
4 years
Reviewers automatically assume that simple is not novel. This is sheer laziness. Yes, it may be simple and obvious in retrospect, but someone had to have that insight first. Simple is good. Simple is robust, easy to implement and reproduce, broadly applicable, etc.
58
509
4K
20
16
264
@colinraffel
Colin Raffel
6 years
TIL there are many (near) duplicates in CIFAR-10. For example, variants of this car image appears (at least) 16 times in the training set. (thread)
Tweet media one
6
93
262
@colinraffel
Colin Raffel
3 years
Now that "How Much Knowledge Can You Pack Into the Parameters of a Language Model?" has been published at #EMNLP2020 (poster at Gather Session 3G, 11/17 UTC-05:00), I can tell you the funny and awful story of how this paper came to be. (1/19)
4
30
266
@colinraffel
Colin Raffel
6 years
I gave this talk again today to an audience of CS majors who didn't have any ML experience. It's really rewarding to be forced to explain things like variational inference and autoregressive models without using _any_ technical language.
@colinraffel
Colin Raffel
6 years
The slides from my talk "A Few Unusual Autoencoders" which I gave last month at @VectorInst and @nyuMARL are now online: The talk covers MusicVAE, ACAI, and some unpublished "adversarial denoising autoencoder" work.
6
70
303
5
44
259
@colinraffel
Colin Raffel
5 years
New pre-print! Monotonic Infinite Lookback Attention (MILk): an online attention mechanism which we applied to simultaneous machine translation. It allows the model to attend to the entire input sequence up to a location set by a monotonic attention head.
Tweet media one
2
60
254
@colinraffel
Colin Raffel
2 years
I'm glad everyone likes my dumb joke
2
6
257
@colinraffel
Colin Raffel
4 years
Hot take: When evaluating a self-supervised model's performance on a new task without fine-tuning, don't call it "zero-shot"; call it "weakly supervised multi-task". These models only succeed when their unsupervised pre-training actually provides weak supervision for the task.
6
23
253
@colinraffel
Colin Raffel
6 years
New paper with Chung-Cheng Chiu: Monotonic Chunkwise Attention (MoChA), an online/linear-time attention mechanism which computes soft attention over small chunks with adaptively set boundaries. Matches the performance of (offline) softmax attention on WSJ!
Tweet media one
2
68
247
@colinraffel
Colin Raffel
3 years
📣 Announcing the ICLR 2021 Workshop on Enormous Language Models 📣 We have an incredible speaker lineup that covers building, evaluating, critiquing, and improving large LMs, as well as a collaborative parcipant-driven benchmark and 2 panels! More info:
6
47
249
@colinraffel
Colin Raffel
4 years
In case you missed our #neurips poster on MixMatch () today because you aren't in Vancouver or didn't survive the poster session stampede, here's the PDF: and here's a transcript of what I said to everyone who came by: ⬇️ 1/11
4
52
249
@colinraffel
Colin Raffel
3 years
New preprint! We introduce a simplified version of pattern-exploiting training called ADAPET. ADAPET outperforms PET and iPET on SuperGLUE without using task-specific unlabeled data or ensembling and beats few-shot GPT-3 with a much smaller model.
Tweet media one
5
44
247
@colinraffel
Colin Raffel
6 years
Controversial (?) opinion: Hinton diagrams are cooler than heatmaps. (from )
Tweet media one
Tweet media two
22
41
242
@colinraffel
Colin Raffel
2 years
I am reading "A Neural Probabilistic Language Model" in detail for the first time and wow is it a fun read - discusses and justifies word embeddings, advocates scaling up models and data, uses rudimentary data- and model- parallel training... all done from scratch on CPUs.
2
17
239
@colinraffel
Colin Raffel
2 years
Single-blind: Reviewers know author's identities Double-blind: Reviewers don't know author's identities Triple-blind: Reviewers must write reviews without reading their assigned submissions Quadruple-blind: Authors are never told if their paper was accepted or rejected ...
10
11
235
@colinraffel
Colin Raffel
4 years
I recently came across , which "assumes 2-3 runs" of T5-11B. In fact, we trained T5-11B *once*. That's why we spend 35 pages figuring out how we should train before we start training. You don't want to mess up a training run that big.
9
16
228
@colinraffel
Colin Raffel
3 years
We showed last year (with OpenAI co-authors!) that it's surprisingly easy to extract verbatim training data from large LMs: It kind of boggles my mind that they included GPL'd source code in the training set for this model.
@mitsuhiko
Armin Ronacher
3 years
I don't want to say anything but that's not the right license Mr Copilot.
72
1K
5K
3
32
229
@colinraffel
Colin Raffel
6 years
Protip: if a random person asks you what you do and you want to avoid talking about the singularity, Sophia the robot, or "Facebook had to shut down AI when it invented its new language", just say "statistics".
12
34
226
@colinraffel
Colin Raffel
4 years
I somehow missed this great paper by @tuvuumass et al.: They learn "task embeddings" (a la task2vec) for NLP tasks and show how they can be used to predict the effectiveness of intermediate-task transfer. Lots of experiments and a promising direction!
3
28
227
@colinraffel
Colin Raffel
3 years
Mind-boggling results on the final EfficientQA leaderboard: The best system beat the REALM baseline by almost 20 points, and a 30 megabyte model got > 25% accuracy! Looking forward to hearing more about these systems at NeurIPS.
0
31
222
@colinraffel
Colin Raffel
3 years
The mT5 paper was accepted to NAACL 🎉 so now we can stop pretending that it doesn't exist! Updated arxiv with many juicy new results, including a simple way to prevent "accidental translation" exhibited by generative models in zero-shot settings.
Tweet media one
@ada_rob
Adam Roberts
4 years
We are releasing mT5: A massively-multilingual version of T5 that supports over 💯 languages! mT5 was pre-trained on a multilingual version of C4 and achieves SoTA on many cross-lingual NLP tasks. 📜Pre-print: 💾Code/models:
Tweet media one
4
124
506
4
38
222
@colinraffel
Colin Raffel
2 years
As a contributor to this book, I've been offered a free copy. However, I don't know what I'd do with an actual physical book in 2022. If you'd like my copy, please reply with a < 280 character description of the benefit you'd get from receiving a copy and I'll pick a recipient.
@sirbayes
Kevin Patrick Murphy
2 years
I am delighted to announce that my new book, “Probabilistic Machine Learning: An Introduction”, is finally available in print format! You can order it from , or from Amazon. Also available at 1/4
Tweet media one
45
487
3K
47
25
205
@colinraffel
Colin Raffel
3 years
In 15 minutes I'll be giving a talk on "The Benefits of Unified Frameworks for Language Understanding" at the "Conceptual Understanding of Deep Learning" workshop (). Livestream here:
4
26
201
@colinraffel
Colin Raffel
6 years
I think we need a taxonomy of adjectives for describing neural network size. "Large neural networks" "Outrageously large neural networks" () "Ridiculously large neural networks" "Inconceivably large neural networks" "Uncomfortably large neural networks" ...
19
40
197
@colinraffel
Colin Raffel
5 years
I saw this paper when it was presented at NeurIPS 2018 and really enjoyed it. It's worth a read for anyone who works on or thinks about generative models.
@StefanoErmon
Stefano Ermon
5 years
If all training images for a GAN/VAE/PixelCNN have 2 objects, will they only generate images with 2 objects? If trained on (🔵,💙,🔴), will they also generate ❤️? Find out in @shengjia_zhao 's blog post on generalization and bias for generative models. 👉
1
136
520
2
32
194
@colinraffel
Colin Raffel
4 years
Hot take: The most surprising thing about BERT isn't how well it worked when it was proposed, but how much better it would have worked if they had just pre-trained for longer on a more diverse dataset.
5
19
192
@colinraffel
Colin Raffel
4 years
NeurIPS95 "Learning to Learn" workshop focused on "unsupervised learning on a large corpus of unlabelled data to learn features for subsequent supervised learning on a smaller labelled corpus" and "using models previously learned for other problems when learning new problems" 🤔
3
11
189
@colinraffel
Colin Raffel
3 years
😢
Tweet media one
5
17
187
@colinraffel
Colin Raffel
4 years
The problem with "let the data speak for itself" is that most of the time data doesn't know how to talk
9
12
183
@colinraffel
Colin Raffel
4 years
#neurips tip! Try to learn about *one* great new paper a day. Any more than that can be overwhelming, any less and you're missing the point a little.
3
15
178
@colinraffel
Colin Raffel
5 years
Should we agree as a field not to post ICLR submissions on arxiv until after the review period is over? The paper is already public thanks to OpenReview, so it can (and should) be cited as existing work. arxiv'ing only serves to deanonymize it, which is probably a net negative.
8
14
173
@colinraffel
Colin Raffel
6 years
A video of my talk "Doing Strange Things with Attention" which I gave at AI @WithTheBest in October is now online: Covers feedforward attention, sequence embedding using attention, monotonic attention, and a new variant called MoChA.
1
59
169
@colinraffel
Colin Raffel
5 years
Fitting a 2D Gaussian to a mixture distribution via KL (aka maximum likelihood) and reverse KL (GAN-like).
3
36
171
@colinraffel
Colin Raffel
4 years
Most underhyped paper of 2019 IMO ()
@GoogleAI
Google AI
4 years
Presenting BiT, an open-source approach for large-scale pre-training of models covering a wide range of visual tasks, which highlights the importance of choices in the model architecture for downstream performance. Learn all about it below:
9
233
737
3
18
171
@colinraffel
Colin Raffel
4 years
Does anyone else feel a unique kind of anxiety as they watch the months tick by on the arxiv IDs of new preprints?
3
4
159
@colinraffel
Colin Raffel
1 year
Super excited to be heading to #NeurIPS2022 with five (!) of my students! Here's a thread of all the places you can find us: (1/9)
4
7
149
@colinraffel
Colin Raffel
4 years
The only measure of intelligence I'm comfortable with is perplexity
7
28
155
@colinraffel
Colin Raffel
4 years
#neurips tips day 5 (h/t @chris_j_beckham )! Conferences are a parade of successes. Remember that for every impressive paper there are many (unpublished) ideas that didn't pan out. Take this opportunity to ask people about negative results!
2
17
153
@colinraffel
Colin Raffel
3 years
The #ICLR2021 Workshop on Enormous Language Models (WELM) is tomorrow, May 7th! Full info: Livestream: gathertown info for ICLR registrants: Thread summarizing the talks & panels ⬇️ (1/14)
2
49
151
@colinraffel
Colin Raffel
6 years
I'll (help) present 3 posters @ICLR18 : Realistic Evaluation of Semi-Supervised Learning Mon 4/30 4:30-6:30 #3 , Thermometer Encoding Tue 5/1 4:30-6:30 #14 , Monotonic Chunkwise Attention Wed 5/2 11:00-1:00 #28 !
Tweet media one
Tweet media two
Tweet media three
0
34
147
@colinraffel
Colin Raffel
5 months
I'll be at #NeurIPS2023 supporting my collaborators who are presenting , , , and . Find me to chat about decentralizing/democratizing/de-risking ML!
5
11
147
@colinraffel
Colin Raffel
4 years
Today, the T5 team competed against T5 in a "pub quiz" on (context-free) questions from the TriviaQA/NQ validation sets. We LOST! We only got 20% right; T5 got 35%. To see how to fine-tune T5 on context-free QA (or any other task) with a free TPU, check out our Colab tutorial ⬇️
@ada_rob
Adam Roberts
4 years
As promised, we have made the Text-To-Text Transfer Transformer (T5) models much easier to fine-tune for new tasks, and we just released a Colab notebook where you can try it yourself on a free TPU! 👇 (1/3)
6
109
402
1
36
146
@colinraffel
Colin Raffel
3 years
I actually encourage my students & colleagues to get on Twitter, because (for better or worse) it remains the best place to find out about new papers. Most of the time, I only check a filtered version of my timeline that only shows tweets with an link. 🤷
3
7
143
@colinraffel
Colin Raffel
5 years
Google has open-sourced Lingvo, which is the excellent codebase we used for the Monotonic (Chunkwise) Attention papers! Has also been used in dozens of other Brain papers. Code: Pre-print:
0
40
140
@colinraffel
Colin Raffel
4 years
Protip: It is not too late to apply to start a PhD in Fall 2020 at the UNC CS department! The deadline for applications is, amazingly, not until March 10th.
@colinraffel
Colin Raffel
4 years
I'm starting a professorship in the CS department at UNC in fall 2020 (!!) and am hiring students! If you're interested in doing a PhD @unccs please get in touch. More info here:
82
146
893
4
32
142
@colinraffel
Colin Raffel
2 years
TIL that ICLR is the #1 conference in "Artificial Intelligence" according to Google Scholar Metrics () but it's still not included in . All rankings are silly and arbitrary, but this seems especially silly and especially arbitrary.
Tweet media one
Tweet media two
10
8
140
@colinraffel
Colin Raffel
3 years
2) Skim You'll find that many papers within your subfield of choice have a lot in common - there is often only a small nugget of novelty in each paper. It's incredibly important to develop your ability to find this nugget as quickly as possible. (3/5)
3
3
140
@colinraffel
Colin Raffel
4 years
I finally put up the slides for my faculty job talk from last year: They are now pretty out-of-date but I spent a ton of time making them fancy and clear. Includes overviews of a few frameworks for deep generative modeling, +MoChA/MILk, MusicVAE, and ACAI.
3
21
138
@colinraffel
Colin Raffel
6 years
The camera ready version of "Realistic Evaluation of Deep Semi-Supervised Learning Algorithms" is now up on arxiv: Includes an entire bonus page, two new tables, a new figure, and a couple of new experiments!
0
27
138
@colinraffel
Colin Raffel
6 years
PSA: If a paper on a generative model of images only presents results on MNIST/SVHN/CelebA, you should be skeptical that it will work in general. These datasets are extremely regular - they are normalized so that objects tend to appear in the same location/orientation.
2
25
137
@colinraffel
Colin Raffel
1 year
We're having a *debate* at the Transfer Learning for NLP workshop @NeurIPSConf this year. @kchonyc is one of our debaters; the other one can't make it to NeurIPS anymore 😢 Who do you want to see go toe-to-toe with Cho?
25
4
133