@micahgoldblum
Micah Goldblum
2 years
One view of ML history is that we started out with MLPs and evolved towards more specialized architectures like CNNs for vision, LSTMs for sequences, etc. But actually, the exact opposite is true! 🚨🧵1/6
2
25
217

Replies

@micahgoldblum
Micah Goldblum
2 years
Earlier high-performance vision and language systems were highly specialized, like HOG features and Latent Dirichlet Allocation, whereas tasks which were once performed by these tools can all be performed by transformers now. 2/6
1
1
23
@micahgoldblum
Micah Goldblum
2 years
ML practitioners used to encode their beliefs about a problem, like invariances, into their architectures by hand. We show that transformers actually learn these same structures directly from the data! 3/6
2
0
49
@micahgoldblum
Micah Goldblum
2 years
Not only do transformers learn symmetries, but they can actually be MORE equivariant than CNNs, which are designed specifically for translation equivariance. So what is next in the evolution of ML? One architecture to rule them all? 4/6
1
0
23
@micahgoldblum
Micah Goldblum
2 years
Check out our easy-to-use tool for measuring equivariance via the Lie derivative. It even allows for layer-wise analysis and scales gracefully across architectures and input sizes: 5/6
1
2
41
@micahgoldblum
Micah Goldblum
2 years
All credit goes to my awesome collaborators! @gruver_nate , @m_finzi , and @andrewgwils 6/6
1
0
9