This paper by is incredible: looking at the heavy baggage picked up by applying the diffusion mindset to diffusion models, rolling up the sleeves saying "math is math" and bringing it down to bare bones.
Papers like this should be revered by the community, they memeify knowledge
When
@_Laurent
and I started learning about diffusion models, we were puzzled by the amount of jargon and concepts.
So, we derived a model from scratch with our own graphics-people intuitions. Simple derivation, simple implementation, SOTA quality.
Transformers are too far gone for a paper like this, they'll get a book chapter rephrasing their properties in the language of operators, then another with implementation tricks
In the years before NN resurgence, my mom had a theory that their failure was a people problem: ppl never bothered to model the problem, assuming a handwavy compute process would do it for them.
Many things changed since then, but fundamentally it's just "compute got faster lol"