Micah Goldblum @micahgoldblum Twitter profile

Last Seen Profiles

@D_dong_ii

@LutherNPID

@TheDreshaunRoss

@samurai_rusk

@smileyy_val

@Maestranza

@jandakembangstw

@BigSpider379

@blacktwiterthrd

@bufetat

@KalmanLiebskind

@GlennClarkRadio

@mosquetero_azul

@KubeCon_

@caricarikiki

@vaidehiofficial

@nichirei_foods

@T4lovers

@DeklerkKit29564

@MajinaMoses

@forgetsmall

@Creepooba

@Jamendo

@ESPNBooger

@Nu___55

@eFootballWeb

@bumiharpraveen_

@keenangray6

@chelseasexual

@yafea_alomary

@Graveslvl

@RUBEMRENATO1

@bank_footy

@NewRossRFC2014

@ScienceWDrDoug

@TedZipoy

Micah Goldblum

@micahgoldblum

2 years

TLDR: Diffusion models (like DALLE or Imagen) generate pretty pictures from Gaussian noise, but the same training and generation update rules generalize easily to other degradations, including completely deterministic ones. 1/7

12

150

986

Micah Goldblum

@micahgoldblum

2 years

A common point raised by ML reviewers is that a method is too simple or is made of existing parts. But simplicity is a strength, not a weakness. People are much more likely to adopt simple methods, and simple ones are also typically more interpretable and intuitive. 1/2

29

101

890

Micah Goldblum

@micahgoldblum

1 year

Self-Supervised Learning (SSL) is quickly becoming a defacto way of training neural networks, but if you have ever tried it yourself, you’d know that getting high performance is tricky! Check out our new thorough guide to all things SSL.

7

86

481

Micah Goldblum

@micahgoldblum

7 months

🚨Excited to announce a large-scale comparison of pretrained vision backbones including SSL, vision-language models, and CNNs vs ViTs across diverse downstream tasks ranging from classification to detection to OOD generalization and more! NeurIPS 2023🚨🧵

Battle of the Backbones: A Large-Scale Comparison of Pretrained...

Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an...

arxiv.org

6

93

415

Micah Goldblum

@micahgoldblum

2 years

How much data are augmentations worth? We show that augmentations can actually be worth more than extra data and invariance! They increase variance across batches, and this extra stochasticity finds flatter minima. 1/8

How Much Data Are Augmentations Worth? An Investigation into...

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations...

arxiv.org

3

65

400

Micah Goldblum

@micahgoldblum

2 years

Gradient-boosted decision trees are still thought to be competitive with neural networks on tabular data. But NNs have a massive advantage, they learn representations, and this ability can be leveraged for transfer learning . 1/4

5

45

365

Micah Goldblum

@micahgoldblum

1 year

🚨Here’s an intuitive explanation for why training on lots and lots of data creates emergent properties, for instance math and reasoning, in large language models like #GPT -4 and #ChatGPT 🚨 1/17

6

34

271

Micah Goldblum

@micahgoldblum

8 months

I’m on the faculty job market this year! Going from a late start as a math PhD student to a ML postdoc was a fun challenge. Building a research agenda alongside amazing students has been rewarding with 9 papers accepted to NeurIPS this year. Don’t let rejections get you down!

6

19

261

Micah Goldblum

@micahgoldblum

3 months

Do LLMs simply memorize and parrot their pretraining data or do they learn patterns that generalize? Let’s put this to the test! We compute the first generalization guarantees for LLMs. w/ @LotfiSanae , @m_finzi , @KuangYilun , @timrudner , @andrewgwils 1/9

Non-Vacuous Generalization Bounds for Large Language Models

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the...

arxiv.org

3

26

238

Micah Goldblum

@micahgoldblum

2 years

One view of ML history is that we started out with MLPs and evolved towards more specialized architectures like CNNs for vision, LSTMs for sequences, etc. But actually, the exact opposite is true! 🚨🧵1/6

2

25

218

Micah Goldblum

@micahgoldblum

2 years

Simplicity doesn’t preclude novelty, even when the method is composed of existing parts. During the NeurIPS review period, DO NOT downgrade papers just because the method is simple. If anything, question methods which are needlessly complicated when simple solutions will do. 2/2

6

11

217

Micah Goldblum

@micahgoldblum

9 months

We show that neural networks have a remarkable preference for low complexity which overlaps strongly with real-world data across modalities. PAC-Bayes proves that such models generalize, explaining why NNs are almost universally effective.

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of...

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on...

arxiv.org

Jim Fan

@DrJimFan

9 months

There're few who can deliver both great AI research and charismatic talks. OpenAI Chief Scientist @ilyasut is one of them. I watched Ilya's lecture at Simons Institute, where he delved into why unsupervised learning works through the lens of compression. Sharing my notes: -

55

432

3K

1

28

196

Micah Goldblum

@micahgoldblum

5 months

I’m thrilled to announce the first issue of a community survey on the state and future of deep learning! We asked folks their opinions on benchmarking, transformers, interpretability, theories of deep learning, and directions we should be working on. 1/3

Perspectives on the State and Future of Deep Learning - 2023

The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until...

arxiv.org

5

33

185

Micah Goldblum

@micahgoldblum

2 years

Typical transfer learning pipelines involve initializing at pre-trained weights and hoping that relevant learned information magically transfers even when the weights change during fine-tuning. But you can transfer so much more than just initialization! 1/4

1

31

177

Micah Goldblum

@micahgoldblum

2 years

The new @icmlconf review format is horrendous (no reviewer scores). Students will spend an inordinate amount of time drafting rebuttals for reviewers who have already committed to rejecting their papers. Massive waste of person hours.

1

5

172

Micah Goldblum

@micahgoldblum

2 years

The following statement, while a commonly held view, is actually false! “Learning theory says that the more functions your model can represent, the more samples it needs to learn anything”. 1/8

Yann LeCun

@ylecun

2 years

OK, debates about the necessity or "priors" (or lack thereof) in learning systems are pointless. Here are some basic facts that all ML theorists and most ML practitioners understand, but a number of folks-with-an-agenda don't seem to grasp. Thread. 1/

33

206

1K

7

20

172

Micah Goldblum

@micahgoldblum

2 years

We usually use NNs in silico, but they can also operate on analogue systems involving optics etc. You can train high-performance physical NNs that perform inference orders of magnitude faster than digital computers . 1/2

2

31

162

Micah Goldblum

@micahgoldblum

2 years

Lazy reviewer starter pack: “needs theoretical justification”, “should cite [paper that came out a week after the submission deadline]”, “not novel enough for NeurIPS”, “more datasets, models, and baselines [that don’t apply or are in the appendix]”, “borderline accept/reject”

6

9

156

Micah Goldblum

@micahgoldblum

2 years

As we go into the NeurIPS reviewing process, remember to accept every paper that you think would contribute to the conference! Don’t read papers trying to find little things to criticize. Instead, try also to find the valuable pieces that the community might want to read. 1/3

2

10

153

Micah Goldblum

@micahgoldblum

6 months

🚨Real data is often massively class-imbalanced, and standard NN pipelines built on balanced benchmarks can fail! We show that simply tuning standard pipelines beats all of the newfangled samplers and objectives designed for imbalance. #NeurIPS2023 🚨🧵1/8

Simplifying Neural Network Training Under Class Imbalance

Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class...

arxiv.org

3

19

150

Micah Goldblum

@micahgoldblum

2 years

Data scientists working with tabular data try simple linear models first and work their way through GBDT and maybe finally expressive NNs and ensembles. In contrast, vision or NLP practitioners start with the most powerful NN at their disposal. 1/2

7

143

Micah Goldblum

@micahgoldblum

2 years

After years of ML research and conference publications, I'm finally attending my first in-person conference this summer … as a postdoc … in my home city …🙃

3

1

141

Micah Goldblum

@micahgoldblum

2 years

Lessons from ICML @icmlconf : (1) Eliminate short talks, especially pre-recorded ones. (2) Poster sessions ≫ talks so allocate more time to them. (3) NO EVENTS DURING LUNCH/DINNER TIME! Poster sessions ended ~8:30pm and people went without dinner. (4) Don’t serve moldy bagels.

4

7

141

Micah Goldblum

@micahgoldblum

1 year

There’s a pervasive myth that the No Free Lunch Theorem prevents us from building general-purpose learners. Instead, we need to select models on a per-domain basis. Is this really true? Let’s talk about it! 🧵 1/16 w/ @andrewgwils , @m_finzi , K. Rowan

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of...

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on...

arxiv.org

9

22

127

Micah Goldblum

@micahgoldblum

1 year

#StableDiffusion and #ChatGPT use prompts, but hard prompts (actual text) perform poorly, while soft prompts are uninterpretable and nontransferable. We designed an easy-to-use prompt optimizer PEZ for discovering good hard prompts, complete with a demo.🧵

4

30

124

Micah Goldblum

@micahgoldblum

2 years

Reviewers, engage with authors during the discussion period! So many misunderstandings are waiting to be cleared up by great rebuttals, and mistakes that have now been fixed. Don’t get some unlucky grad student’s paper rejected because you were too lazy to engage!

2

13

119

Micah Goldblum

@micahgoldblum

2 years

I want to point out several problems (areas for improvement) in the @NeurIPSConf review process which I haven't heard talked about. (1) Do not show reviewer scores to other reviewers since these bias score changes via peer pressure (do show scores to authors and ACs). 1/3

4

7

114

Micah Goldblum

@micahgoldblum

2 years

Just made a quick plot of ICLR 2023 mean reviewer scores by percentile. To be in the top 25% of papers, you need a mean reviewer score of at least 5.67

3

9

116

Micah Goldblum

@micahgoldblum

2 years

Some people feel that transfer learning (TL) doesn’t apply to tabular data just because there exist unrelated domains (e.g. cc fraud vs. disease diagnosis). However, there are also adjacent tabular domains where TL makes a ton of sense (e.g. diagnosis of different diseases). 1/6

Bojan Tunguz

@tunguz

2 years

Saw a new article on transfer learning for tabular data using NNs. I don’t have the time to take a closer look, but my initial reaction is the following: 1/4

15

13

136

3

15

113

Micah Goldblum

@micahgoldblum

2 years

After a long day of ML research without any paws. Research is so ruff!

2

107

Micah Goldblum

@micahgoldblum

2 years

I’m on the academic job market this year 🚨🥳🚨! Let me know if there are any interesting opportunities I’m likely to have overlooked or catch me at #NeurIPS2022 !

1

10

100

Micah Goldblum

@micahgoldblum

2 years

People think hierarchical features are why NNs generalize. Has anyone formalized this? How would you verify/falsify? Historically, people thought early layers are tuned to extract low-level features (e.g. edges), while late layers learn to extract abstract ones (e.g. faces) 1/2

7

11

100

Micah Goldblum

@micahgoldblum

1 year

Diffusion models like #StableDiffusion and #dalle2 generate beautiful pictures, but are these images new or are they copies of the images they were trained on? 🧵 #CVPR2023

4

29

100

Micah Goldblum

@micahgoldblum

2 years

If my dog steps on my keyboard while I'm drafting a conference submission, is she a co-author? Some fields are very loose with authorship.

6

4

96

Micah Goldblum

@micahgoldblum

1 year

Data scientists who work at big tech companies benefit from free lunch in more ways than one.

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of...

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on...

arxiv.org

0

14

93

Micah Goldblum

@micahgoldblum

2 years

Why are facial recognition systems so unfair across race/gender? A lot of people think it comes from imbalanced training data, but it even happens with perfectly balanced training data. In fact, randomly initialized face rec systems are unfair too! 1/3

A Deep Dive into Dataset Imbalance and Bias in Face Identification

As the deployment of automated face recognition (FR) systems proliferates, bias in these systems is not just an academic question, but a matter of public concern. Media portrayals often center...

arxiv.org

6

17

84

Micah Goldblum

@micahgoldblum

2 years

Paper found here: All the awesome collaborators that made this happen: @arpitbansal297 , @EBorgnia , Hong-Min Chu, Jie Li, @hamid_kazemi22 , @furongh , @jonasgeiping , @tomgoldsteincs 7/7

4

6

84

Micah Goldblum

@micahgoldblum

2 years

In deep learning, we typically split the data, train on the training split, and evaluate on the validation split, so we only train on part of the data when we are comparing models. In contrast, the marginal likelihood tries to use data more holistically. 1/3

3

11

84

Micah Goldblum

@micahgoldblum

2 years

Classic Reviewer 2: “The authors have now thoroughly addressed all my concerns. I raise my score from 4 to 5.”

5

2

81

Micah Goldblum

@micahgoldblum

2 years

ML security research has been DOMINATED by adversarial examples/defenses for the past few years, not because it is the most important area of security but because it is easy to work on (low implementation/hardware/know-how costs). 1/2

6

4

72

Micah Goldblum

@micahgoldblum

2 years

Thrilled that our paper on model selection won the Outstanding Paper Award at ICML 2022. All credit goes to my great collaborators. Check out @LotfiSanae 's talk and drop by our poster tomorrow!

Sanae Lotfi

@LotfiSanae

2 years

I'm so proud that our paper on the marginal likelihood won the Outstanding Paper Award at #ICML2022 !!! Congratulations to my amazing co-authors @Pavel_Izmailov , @g_benton_ , @micahgoldblum , @andrewgwils 🎉 Talk on Thursday, 2:10 pm, room 310 Poster 828 on Thursday, 6-8 pm, hall E

13

33

324

1

2

63

Micah Goldblum

@micahgoldblum

2 years

We should value practicality and intuitiveness over novelty. Novelty often means an idea is so complicated that the reader couldn't have imagined coming up with it. That's not a good thing. That's bad. Good ideas often seem obvious in retrospect.

5

4

61

Micah Goldblum

@micahgoldblum

3 years

Want to learn more about data poisoning and backdoor attacks? Our survey paper () clarifies the state of the field for newbies and veterans alike! @dawnsongtweets @tsiprasd @xinyun_chen_ @ChulinXie @A_v_i__S @tomgoldsteincs @aleks_madry

Dataset Security for Machine Learning: Data Poisoning, Backdoor...

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve...

arxiv.org

0

13

62

Micah Goldblum

@micahgoldblum

2 years

What are the real reasons that NNs work so much better than other models? It sure as hell isn’t because of the “implicit bias of SGD”. Is it their inductive biases, parameter efficiency, ease of optimization? Would other models work just as well if only we could scale them up?

11

3

61

Micah Goldblum

@micahgoldblum

2 months

We show how to make data poisoning and backdoor attacks way more potent by synthesizing them from scratch with guided diffusion. 🧵 1/8 Paper:

1

10

58

Micah Goldblum

@micahgoldblum

2 years

(3) Reviewers shouldn’t be allowed to click “Author Rebuttal Acknowledgement” (not writing a response to the rebuttal) if they don’t increase their score. It is important to justify to the authors why their points don’t address your feedback. 3/3

2

7

57

Micah Goldblum

@micahgoldblum

2 years

Diffusion models, from DALLE to Imagen, operate by sampling random Gaussian noise and iteratively denoising/noising until they converge to a pretty picture. This simple-sounding process is underpinned by several theoretical motivations. 2/7

1

8

57

Micah Goldblum

@micahgoldblum

2 years

Even using deterministic degradations, training and test-time update rules that underlie diffusion models can be generalized, calling into question the orthodox understanding of diffusion and opening up research on a whole new direction of generative models. 6/7

2

7

57

Micah Goldblum

@micahgoldblum

8 months

Transformers seem to work for all sorts of data, made possible by a shared structure that virtually all real data shares. This also allows NNs to be near-universal compressors. The real world is simple, so all we need is models with a simplicity bias.

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of...

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on...

arxiv.org

AK

@_akhaliq

8 months

Language Modeling Is Compression paper page: It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training

46

389

2K

0

5

56

Micah Goldblum

@micahgoldblum

2 years

ML practitioners used to encode their beliefs about a problem, like invariances, into their architectures by hand. We show that transformers actually learn these same structures directly from the data! 3/6

The Lie Derivative for Measuring Learned Equivariance

Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or...

arxiv.org

2

0

49

Micah Goldblum

@micahgoldblum

2 years

One interpretation of diffusion models views them as score estimators, whereby noise is added to the score estimates to sample images via stochastic gradient Langevin dynamics: 3/7

Generative Modeling by Estimating Gradients of the Data Distribution

We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined...

arxiv.org

1

4

48

Micah Goldblum

@micahgoldblum

1 year

#NeurIPS2022 was fun! No moldy bagels like ICML, more poster sessions, fewer talks. Minor suggestions: (1) Don’t schedule poster sessions during meal times (e.g. sessions from 11am-1pm). I noticed a lot of people skipping the lunchtime poster session because they were hungry. 1/3

2

1

48

Micah Goldblum

@micahgoldblum

1 year

🚨NeurIPS poster Wednesday: 11-1, Hall J #512 🚨 Backdoor attacks are dangerous, but existing attacks are easy to detect. We develop a backdoor attack whose poisons are indistinguishable from clean samples. Can you tell which are poisoned? 1/4

5

11

47

Micah Goldblum

@micahgoldblum

1 year

Check out our paper, with @m_finzi , Keefer Rowan, @andrewgwils , where we show just how important simplicity bias, formalized using Kolmogorov complexity, is for machine learning. The paper is easy to approach for all audiences! 16/17

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of...

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on...

arxiv.org

1

2

46

Micah Goldblum

@micahgoldblum

2 years

A second closely related interpretation views diffusion models as autoencoders with a fixed encoder that noises the image and a learned decoder that reverses this random process by approximating the reverse conditional distributions with Gaussians: 4/7

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best...

arxiv.org

1

5

46

Micah Goldblum

@micahgoldblum

2 years

(2) Low acceptance rates make reviewers worry that assigning a high score, thus increasing the chance of that paper being accepted, in turn decreases the chance of their own paper being accepted. After all, the same AC may be in charge of both papers. This is a bad incentive. 2/3

3

2

45

Micah Goldblum

@micahgoldblum

2 years

SSL can even be used to learn an explicit prior probability distribution over parameters . 8/8

1

2

44

Micah Goldblum

@micahgoldblum

7 months

Despite the popularity of ViTs and SSL, our benchmark suggests that the best backbones for most vision tasks are actually modern convnets (e.g. ConvNeXt) pretrained on massive labeled classification datasets. Future SSL works should train on bigger datasets to be competitive. 2/7

1

44

Micah Goldblum

@micahgoldblum

2 years

I will be at ICML in person next week to present our works on marginal likelihood, model inversion/explainability, and privacy breaches in federated learning. Shoot me a message if you want to connect!

2

0

44

Micah Goldblum

@micahgoldblum

2 years

The vast majority of NNs deployed/released have no privacy guarantee. If someone ever figures out how to recover training data from trained models, they will instantly recover boatloads of private data, and there’s nothing we can do to stop it! The models are already released.

4

6

43

Micah Goldblum

@micahgoldblum

2 years

Just how important are handcrafted inductive biases like CNNs for computer vision? We can just learn them! ViTs are often actually more translation invariant than CNNs after training.

Nate Gruver

@gruver_nate

2 years

CNNs are famously equivariant by design, but how about vision transformers? Using a new equivariance measure, the Lie derivative, we show that trained transformers are often more equivariant than trained CNNs! w/ @m_finzi @micahgoldblum @andrewgwils 1/6

4

83

514

0

3

42

Micah Goldblum

@micahgoldblum

2 years

What started out merely as interesting properties of NNs became the main focus of ML security. But data poisoning and privacy are far bigger threats! Training data is scraped at scale without supervision, and models are trained on user data without any privacy guarantees. 2/2

0

2

42

Micah Goldblum

@micahgoldblum

2 years

Sometimes I wish I knew about RL, but then I remember that there's only so much time in the day and I have to prioritize, so I go back to watching Seinfeld re-runs.

0

42

Micah Goldblum

@micahgoldblum

3 months

Interestingly, we find that bigger models can often be compressed into FEWER bits than smaller models, explaining why they perform better. In the future, if we can compress models better and better, we can make tighter and tighter bounds that explain why LLMs work so well. 9/9

1

3

42

Micah Goldblum

@micahgoldblum

2 years

Under both of these interpretations, noise is central to why diffusion works. But such models can be used to reverse numerous other degradations, including completely deterministic ones: blur, masking, pixelation, snow-ifying, and … wait for it … animorph. 5/7

1

6

42

Micah Goldblum

@micahgoldblum

2 years

Lit reviews are super slow now that Google Scholar thinks I'm a bot.

7

2

41

Micah Goldblum

@micahgoldblum

2 years

Check out our easy-to-use tool for measuring equivariance via the Lie derivative. It even allows for layer-wise analysis and scales gracefully across architectures and input sizes: 5/6

GitHub - ngruver/lie-deriv

Contribute to ngruver/lie-deriv development by creating an account on GitHub.

github.com

1

2

41

Micah Goldblum

@micahgoldblum

2 years

Remember that grad students worked their butts off on those papers, and they shouldn’t be rejected just because they didn’t compare to the n+1’th method that conveniently happens to be yours. 2/3

2

0

40

Micah Goldblum

@micahgoldblum

2 years

How long does it take you to read or review a paper on average? Just reading a paper in full detail takes me hours, so unless I’m way slower than everyone else, I assume most reviewers are just skimming their papers.

8

0

40

Micah Goldblum

@micahgoldblum

8 months

It’s pathetic when ML conferences raise the acceptance cutoff in order to make the conference look prestigious. If 80% of papers are amazing, then accept them, especially in cases where the conference can easily host more papers.

0

3

39

Micah Goldblum

@micahgoldblum

2 years

Tabular deep learning is still in its infancy. Lots of room for improvement!

Sebastian Raschka

@rasbt

2 years

A Short Chronology Of Deep Learning For Tabular Data: Deep tabular methods are an interesting research direction! So, this morning, I sat down and summarized my thoughts + the recent papers I read.

44

207

974

1

4

37

Micah Goldblum

@micahgoldblum

2 years

Check out our work here! Thanks to my excellent collaborators Roman, Valeria, @A_v_i__S , @arpitbansal297 , @cbbruss , @tomgoldsteincs , @andrewgwils 4/4

Transfer Learning with Deep Tabular Models

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks....

arxiv.org

1

2

35

Micah Goldblum

@micahgoldblum

2 years

Check out this work () with @ziv_ravid @HosseinSouri8 @snymkpr @Eiri1114 @ylecun @andrewgwils 3/4

Pre-Train Your Loss: Easy Bayesian Transfer Learning with...

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source...

arxiv.org

1

2

34

Micah Goldblum

@micahgoldblum

2 years

Of course, reject papers with terminal flaws. But accepting more papers won’t dilute the conference, and there are tons of respected conferences in other fields which are bigger or have higher acceptance rates. 3/3

0

32

Micah Goldblum

@micahgoldblum

2 years

What's the deal with reviewer scores for NeurIPS this year? 5's are considered "borderline accept" whereas they are usually "weak reject". Will 5's still end up meaning the same thing as usual? (i.e. average reviewer score of 5-6 likely gets rejected)

4

0

32

Micah Goldblum

@micahgoldblum

2 years

What’s the best way to read (non-theory) papers? I typically go nonlinearly: start with the abstract and bulleted contributions, skim the experimental setup, look at results and baselines, read the experimental conclusions, and then go back and fill in the gaps.

5

0

32

Micah Goldblum

@micahgoldblum

2 years

Well-designed NN inductive biases prefer simple functions when they are compatible with the data but are perfectly capable of learning more complex models when necessary. As better tabular NNs emerge, data scientists will jump to them from the start. 2/2

1

0

29

Micah Goldblum

@micahgoldblum

2 years

To be clear, we should *NOT* abolish ML conferences altogether! They keep the community productive, and in that sense they have been wildly successful. But they could clearly use reform to decrease stress as well as bias and bad incentives in the review process.

2

1

30

Micah Goldblum

@micahgoldblum

2 years

There’s this large contingent of deep learning cynics who think today’s DL research is particularly sloppy and esoteric. I’d remind them that most papers in all fields are low impact, and other empirical fields (DL is one) such as biology are chock full of erroneous findings. 1/4

3

29

Micah Goldblum

@micahgoldblum

6 months

✈️ I’ll be at NeurIPS next week. Shoot me a message if you want to meet up! ✈️

0

29

Micah Goldblum

@micahgoldblum

2 years

Interestingly, a single module applied recursively does the same exact thing (). Also, relatively shallow but wide networks achieve near ImageNet SOTA (). 2/2

4

1

29

Micah Goldblum

@micahgoldblum

2 years

Swing by and chat with me and my awesome collaborators at the poster sessions next week! Tuesday 5:30pm - PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization - End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking 1/4

1

2

29

Micah Goldblum

@micahgoldblum

2 years

Have any generous AC's aggregated NeurIPS review score distributions? What's the 80% percentile this year?

2

27

Micah Goldblum

@micahgoldblum

2 years

BTW, see some of the problems with marginal likelihood here 3/3

1

2

26

Micah Goldblum

@micahgoldblum

2 years

We use Bayesian tools to transfer the loss function rather than only a single initialization. You can use our simple drop-in replacement wherever you would use pre-trained models, including classification, segmentation, and more. 2/4

1

2

26

Micah Goldblum

@micahgoldblum

2 years

We're happy to see that multiple people have pointed out that types of tabular transfer learning are already widely used in industry and in scientific applications. We hope our work can improve tabular transfer learning and make it even more broadly useful!

Bojan Tunguz

@tunguz

2 years

I have won several Kaggle tabular data competitions. I have worked on or with three different AutoML projects for tabular data. I have collaborated with many top experts in the field and some of the largest companies. 1/2

7

10

156

0

2

24

Micah Goldblum

@micahgoldblum

7 months

Jonas is a great researcher and mentor. If you're looking to start a PhD in machine learning, make sure to check him out!

Jonas Geiping

@jonasgeiping

7 months

We're also looking to hire PhD students interested in machine learning! You can find more information about joining my group on my webpage:

1

15

49

0

1

24

Micah Goldblum

@micahgoldblum

2 years

Augmentations like horizontal flips which are consistent with the data distribution can be valuable even when you have tons of training data, yet aggressive augmentations like TrivialAugment quickly become harmful. 4/8

2

3

23

Micah Goldblum

@micahgoldblum

2 years

Not only do transformers learn symmetries, but they can actually be MORE equivariant than CNNs, which are designed specifically for translation equivariance. So what is next in the evolution of ML? One architecture to rule them all? 4/6

1

0

23

Micah Goldblum

@micahgoldblum

1 year

In fact, neural networks (or any other model for that matter) which are sufficiently compressible are formally guaranteed to generalize well to new and unseen test samples. 12/17

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we...

arxiv.org

1

23

Micah Goldblum

@micahgoldblum

2 years

Earlier high-performance vision and language systems were highly specialized, like HOG features and Latent Dirichlet Allocation, whereas tasks which were once performed by these tools can all be performed by transformers now. 2/6

1

23

Micah Goldblum

@micahgoldblum

2 years

Priors, i.e. inductive biases, need not restrict the function class at all. We actually rely on this principle all the time! Flexible neural networks prefer simple functions, even though they can express complex ones, which allows them to generalize. 3/8

2

0

21

Micah Goldblum

@micahgoldblum

7 months

SSL pretraining and convolutional neural network architectures additionally yield superior adversarial robustness compared to supervised pretraining and vision transformers, respectively. 6/7

2

1

21

Micah Goldblum

@micahgoldblum

2 years

While it is true that a model which can only express few functions needs few samples to learn, the converse is not true! This underscores the failure of ideas like VC dimension and Rademacher complexity to explain neural network generalization. 2/8

1

0

21

Micah Goldblum

@micahgoldblum

7 months

Check out Pavel if you are looking to start a PhD in ML. Pavel has a stellar track record of great research, and you'd get to be located in NYC!

Pavel Izmailov

@Pavel_Izmailov

7 months

📢 I am recruiting Ph.D. students for my new lab at @nyuniversity ! Please apply, if you want to work on understanding deep learning and large models, and do a Ph.D. in the most exciting city on earth. Details on my website: . Please spread the word!

30

185

913

0

1

21

Micah Goldblum

@micahgoldblum

8 months

@VarunChandrase3 Thanks! It signifies that I want a job 😎

0

20

Micah Goldblum

@micahgoldblum

1 year

I’ll be in Rwanda next week for #ICLR2023 . Shoot me a message if you want to say hi! 👋 And drop by any of my papers to chat:

1

0

21

Micah Goldblum

@micahgoldblum

2 years

We propose an algorithm for transfer learning even when the set of features differ between upstream/downstream problems. Interestingly, MLPs sometimes transfer better than SOTA models. And current tabular SSL methods do NOT produce transferable representations. 3/4

1

0

20

Micah Goldblum

@micahgoldblum

2 years

What is the difference between academia in Europe and the US? Tenure process, teaching responsibilities, grant applications, etc.

5

1

19

Micah Goldblum

@micahgoldblum

2 years

The ML conference system is noisy yet efficient, much like SGD. Any reforms that dramatically increase the turnaround time to denoise the review process, a la GD, should be thrown out. Already, papers can be outdated by the time the conference happens. Let’s not make that worse.

5

1

20