Sander Dieleman @sedielem Twitter profile

Pinned Tweet

Sander Dieleman

2 months

New blog post! Some thoughts about diffusion distillation. Actually, quite a lot of thoughts 🤭 Please share your thoughts as well!

The paradox of diffusion distillation

Thoughts on the tension between iterative refinement as the thing that makes diffusion models work, and our continual attempts to make it _less_ iterative.

sander.ai

9

82

430

Last Seen Profiles

@enag

@KyleMcbride6969

@Threshold

@tweet_hanachan

@gi_uthu

@obake_drdr

@Jade_So_Noob

@NutraAsia

@jormon26

@Montsssse1

@_manmaru_maru

@Jose160699

@stw_pdg

@LeonieJHR

@MarcusFOX29

@kuonji_kaede

@tsinj96

@giveawaystograb

@deemers08

@EmJ_not23

@zach_rogers274

@kbkbmuzic

@informati_blogs

@neonngreen

@kelly_powe91397

@Raboot_the_buny

@ashley_yvette88

@AliRaza90240987

@cellilies

@Xhereeff

@Bankshopperbe

@RajKuma90366839

@LarsOpstad

@syusyaku_

@mehboobali95587

@GilmoreTee

Sander Dieleman

@sedielem

2 years

A very common trick in neural net training, often omitted in papers: add a tiny number ε (e.g. 1e-10) to any quantity in a denominator or square root, so you don't divide by 0. My advice: always add ε! If it doesn't help, it won't hurt, and you might avoid a few NaN encounters👀

30

187

2K

Sander Dieleman

@sedielem

1 year

Me: "NOOO, you can't just treat spectrograms as images, the frequency and time axes have completely different semantics, there is no locality in frequency and ..." These guys: "Stable diffusion go brrr"

AK

@_akhaliq

1 year

Riffusion, real-time music generation with stable diffusion @huggingface model: project page:

64

628

3K

18

144

1K

Sander Dieleman

@sedielem

10 months

New blog post: perspectives on diffusion, or how diffusion models are autoencoders, deep latent variable models, score function predictors, reverse SDE solvers, flow-based models, RNNs, and autoregressive models, all at once!

Perspectives on diffusion

Perspectives on diffusion, or how diffusion models are autoencoders, deep latent variable models, score function predictors, reverse SDE solvers, flow-based models, RNNs, and autoregressive models,...

sander.ai

16

203

885

Sander Dieleman

@sedielem

1 year

New blog post about diffusion language models: Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? And can we do anything about that?

Diffusion language models

Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? Can we do anything about that?

sander.ai

25

173

869

Sander Dieleman

@sedielem

6 years

Stacking WaveNet autoencoders on top of each other leads to raw audio models that can capture long-range structure in music. Check out our new paper: Listen to some minute-long piano music samples:

5

246

782

Sander Dieleman

@sedielem

1 year

First Riffusion, now this. Perhaps pixels are all you need🤔

AK

@_akhaliq

1 year

Image-and-Language Understanding from Pixels Only abs:

12

223

873

14

87

607

Sander Dieleman

@sedielem

2 years

This paper is a goldmine for anyone training diffusion models, carefully picking apart theory and practice and showing which choices really matter. I was quite excited to see the authors of the StyleGAN series of papers tackle this topic, and boy do they deliver!

AK

@_akhaliq

2 years

Elucidating the Design Space of Diffusion-Based Generative Models abs: improve efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of an existing ImageNet-64 model from 2.07 to near-SOTA 1.55

0

43

272

2

110

607

Sander Dieleman

@sedielem

3 months

This one's easy! That honour goes to "the diffusion bible", as I like to call it. It's been well over a year and I still refer to it several times a week. Very few papers I've read come close, in terms of signal-to-noise ratio.

Sam Power

@sp_monte_carlo

3 months

what paper (not your own, maybe not even in your own area) can you not stop telling people about?

88

44

450

8

66

549

Sander Dieleman

@sedielem

6 years

"We conclude that the common association between sequence modeling and recurrent nets should be reconsidered, and convolutional nets should be regarded as a natural starting point for sequence modeling tasks." Great to see more work in this direction!

8

197

528

Sander Dieleman

@sedielem

4 years

Very excited about the renewed focus on iterative refinement as a powerful tool for generative modelling! Here are a few relevant ICLR 2021 submissions: (image credit: ) (1/3)

5

108

520

Sander Dieleman

@sedielem

6 months

5-6 years ago I was working on music generation at DeepMind, but let me tell you, this is... something else. Incredibly excited to be able to finally share what our team has been working on!

Demis Hassabis

@demishassabis

6 months

Thrilled to share #Lyria , the world's most sophisticated AI music generation system. From just a text prompt Lyria produces compelling music & vocals. Also: building new Music AI tools for artists to amplify creativity in partnership w/YT & music industry

95

538

3K

18

38

495

Sander Dieleman

@sedielem

2 years

New blog post: diffusion models are autoencoders! I take a closer look at this connection, inspired by the work of @YSongStanford , @StefanoErmon , @kjgeras , @RandomlyWalking , @gyomalin_ML and @jaschasd , among others!

Diffusion models are autoencoders

Diffusion models have become very popular over the last two years. There is an underappreciated link between diffusion models and autoencoders.

sander.ai

11

111

475

Sander Dieleman

@sedielem

2 years

New blog post about the magic of diffusion guidance! Guidance powers the recent spectacular results in text-conditioned image generation (DALL·E 2, Imagen), so the time is right for a closer look at this simple, yet extremely effective technique.

Guidance: a cheat code for diffusion models

A quick post with some thoughts on diffusion guidance

sander.ai

10

97

452

Sander Dieleman

@sedielem

7 years

Harmonic networks ( @deworrall92 et al.) are fully rotation equivariant convnets. Very cool!

4

167

395

Sander Dieleman

@sedielem

3 years

To synthesise realistic megapixel images, learn a high-level discrete representation with a conditional GAN, then train a transformer on top. Beautiful synergy between adversarial and likelihood-based learning! 🧵 (1/8)

AK

@_akhaliq

3 years

Taming Transformers for High-Resolution Image Synthesis pdf: abs: project page:

7

101

490

4

84

389

Sander Dieleman

@sedielem

9 months

New blog post about the geometry of diffusion guidance: This complements my previous blog post on the topic of guidance, but it has a lot of diagrams which I was too lazy to draw back then! Guest-starring Bundle, the cutest bunny in ML 🐇

9

77

355

Sander Dieleman

@sedielem

1 year

New paper: continuous diffusion for categorical data We train diffusion language models with cross-entropy, using score interpolation instead of score matching. The training distribution of noise levels is adapted on the fly with time warping. (1/3)

5

74

342

Sander Dieleman

@sedielem

5 years

Likelihood is a great loss fn, it's all about the space you measure it in! Our latest work on hierarchical AR image models (w/ @JeffreyDeFauw , Karen Simonyan): We generated 128x128 & 256x256 samples for all ImageNet classes: (1/2)

1

99

336

Sander Dieleman

@sedielem

4 years

New blog post: 'Generating music in the waveform domain' A comprehensive overview of the field and some personal thoughts, based on a tutorial I gave at @ismir2019 with @jordiponsdotme and Jongpil Lee back in November. Comments / feedback welcome!

8

113

296

Sander Dieleman

@sedielem

5 years

I will be at #NeurIPS2018 to present our work on music generation in the raw audio domain, using a stack of WaveNet autoencoders. Poster #87 on Tuesday Dec 4th, 5PM-7PM! Paper: Samples:

2

76

296

Sander Dieleman

@sedielem

7 years

I've been working on WaveNet autoencoders with @GoogleBrain Magenta. blog post: paper:

6

99

285

Sander Dieleman

@sedielem

1 year

End of year shower thought: Before AlexNet, we used layer-wise pre-training to train neural nets with >2 layers -- backprop just couldn't hack it. Diffusion and autoregression are the new layer-wise pre-training: decompose generation into many steps, train one step at a time!

9

19

243

Sander Dieleman

@sedielem

1 year

Batch normalisation appears to be falling out of favour (probably for the best IMO, so many bugs end up being batchnorm bugs😬). One area where it persists is GAN discriminators (e.g. in StyleGAN-T and VQGAN). Are there any other settings where batchnorm is still hard to avoid?

19

240

Sander Dieleman

@sedielem

3 years

🆕Variable-rate discrete representation learning🆕 We learn slowly-varying discrete representations of speech signals, compress them with run-length encoding, and train transformers to model language in the speech domain 🗣️ 📜 🔊

1

54

230

Sander Dieleman

@sedielem

5 months

With all the recent work on distilling diffusion models into single-pass models, I've been thinking a lot about diffusion model training as solving a kind of optimal transport problem🚐 (1/6)

3

25

228

Sander Dieleman

@sedielem

5 months

Parameterising neural nets to predict logits and training them using the cross-entropy loss function is an extremely effective combination. This setup works for diffusion models as well, by using score interpolation instead of score matching! See (§3.1)

Continuous diffusion for categorical data

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact...

arxiv.org

Fern

@hi_tysam

5 months

The more I work in ML the more I feel like nearly any loss objective can, and should, be rephrased as its cross-entropy-based analog.

6

75

3

19

216

Sander Dieleman

@sedielem

1 month

10 years ago to the day, I published my first ML-related blog post: My blogging has been very sporadic over the years, but sharing what I've learnt has been very rewarding, and probably a pretty good career move as well😁 I highly recommend it!

All Posts

An archive of posts.

sander.ai

3

15

216

Sander Dieleman

@sedielem

1 year

Two neat papers about diffusion for high-res images without cascading. Similar observations: - tuning the noise schedule is really important - the bulk of computation can be done on a significantly more compact representation

2

28

214

Sander Dieleman

@sedielem

4 years

WaveGrad generates waveforms from spectrograms by iteratively following the log-likelihood gradient. The surprising thing is that it needs as little as 6 steps to produce good quality audio! Seems like the resurgence of score matching is in full swing :)

Heiga Zen (全炳河)

@heiga_zen

4 years

Yet another neural vocoder from my team mates in Google Brain is out! The new model, "WaveGrad", is not autoregressive/Flow/GAN. It is based on score matching / diffusion probabilistic models. Check it please!!

2

62

314

0

42

211

Sander Dieleman

@sedielem

7 years

Lots of interesting work on "fixing" GANs right now: [1/3]

3

82

208

Sander Dieleman

@sedielem

2 years

@A_K_Nain I think this is exacerbated by the fact that there are multiple formalisms (e.g. VAE-style, score-based, SDE, ...) and everything has 2-3 different names, depending on who you ask! I strongly recommend @YSongStanford 's compendium (with Python notebooks!):

2

32

201

Sander Dieleman

@sedielem

8 months

This work shows scalar quantisation is competitive with VQ across a range of tasks, but simplifies things a lot: no codebook collapse, no EMA updates, ... because no codebook! I've been a fan of scalar quantisation for a while, see

Variable-rate discrete representation learning

Semantically meaningful information content in perceptual signals is usually unevenly distributed. In speech signals for example, there are often many silences, and the speed of pronunciation can...

arxiv.org

AK

@_akhaliq

8 months

Finite Scalar Quantization: VQ-VAE Made Simple paper page: propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a…

4

49

243

6

31

190

Sander Dieleman

@sedielem

4 years

Good advice! For classification models, a scatter plot of the cross-entropy loss vs. prediction entropy (~confidence) for individual examples can be very revealing. More generally: study model behaviour for individual data points, don't look at aggregate statistics exclusively.

Andrej Karpathy

@karpathy

4 years

When you sort your dataset descending by loss you are guaranteed to find something unexpected, strange and helpful.

30

227

2K

1

36

189

Sander Dieleman

@sedielem

6 years

Invertible neural networks are really cool! Check out this excellent blog post about a new paper where they are used to analyse inverse problems: paper: (1/4)

6

59

186

Sander Dieleman

@sedielem

8 years

I've uploaded my PhD thesis "Learning feature hierarchies for musical audio signals", which I defended in January:

thesis-FINAL.pdf

Shared with Dropbox

www.dropbox.com

5

48

182

Sander Dieleman

@sedielem

6 months

At the end of the summer, I gave an invited talk at the @M2lSchool in Thessaloniki about training neural networks. It's a bit of a jumble of ideas, suggestions and best practices I've amassed over the years, interspersed with concrete examples.

2023 5.3 How to train Neural Networks effectively - Sander Dieleman

null

www.youtube.com

4

28

183

Sander Dieleman

@sedielem

8 years

tl;dr: connect every CNN layer to every other layer. Simple but effective idea, well-written paper. Worth a read!

Brandon Amos ✈️ ICLR

@brandondamos

8 years

Densely Connected Convolutional Networks

2

66

95

3

63

178

Sander Dieleman

@sedielem

1 year

If diffusion model sampling tries your patience, check out consistency models: single-step sampling! No adversarial loss! In addition to being a very cool idea, this paper significantly leans on the formalism from Karras et al. 2022 AKA my favourite diffusion paper😁 Neat!

AK

@_akhaliq

1 year

Consistency Models achieve the new state-of-the-art FID of 3.55 on CIFAR10 and 6.20 on ImageNet 64 ˆ 64 for one-step generation abs:

8

98

437

0

33

175

Sander Dieleman

@sedielem

4 years

Neat idea: if you fit augmentation params with gradient descent (jointly with model params) using a prior that gently encourages more augmentation, they will naturally drift towards the maximal sensible values, which correspond to the degree of invariance exhibited by the data.

Andrew Gordon Wilson

@andrewgwils

4 years

Translation equivariance has imbued CNNs with powerful generalization abilities. Our #NeurIPS2020 paper shows how to *learn* symmetries -- rotations, translations, scalings, shears -- from training data alone! w/ @g_benton_ , @Pavel_Izmailov , @m_finzi . 1/9

6

90

409

0

31

175

Sander Dieleman

@sedielem

4 years

New blog post, in which I wax lyrical about typicality and the curse of dimensionality: I tweeted about this concept a while back, but it turns out I have more to say on the topic. It's a bit more speculative than what I usually write, hope you like it!

Musings on typicality

A summary of my current thoughts on typicality, and its relevance to likelihood-based generative models.

sander.ai

3

48

165

Sander Dieleman

@sedielem

3 months

I've got a blog post brewing... maybe even two blog posts! They are about diffusion models🙃

4

5

161

Sander Dieleman

@sedielem

5 months

10 years ago today: @avdnoord and I presenting our audio-based music recommendation demo at @NeurIPSConf 2013! We went on to intern at Spotify & Google Play Music the next summer (blog post: ), and by summer 2015, we had both joined @GoogleDeepMind .

5

4

162

Sander Dieleman

@sedielem

7 years

The TF wrapper we use internally at DeepMind has been open sourced. Lasagne users might like this one, it shares a lot of design principles.

Google DeepMind

@GoogleDeepMind

7 years

Excited to release #Sonnet - a library for constructing complex Neural Network models in TensorFlow. Get started:

4

473

714

1

79

160

Sander Dieleman

@sedielem

7 years

Google Assistant is now powered by WaveNet!

3

41

158

Sander Dieleman

@sedielem

4 years

Our latest work on GANs for text-to-speech, from characters/phonemes to waveforms with a single model. Learning varying alignment without teacher forcing is tricky, but we found dynamic time warping (DTW) to be very effective.

Google DeepMind

@GoogleDeepMind

4 years

In our new paper [] we propose EATS: End-to-End Adversarial Text-to-Speech, which allows for speech synthesis directly from text or phonemes without the need for multi-stage training pipelines or additional supervision. Audio:

8

200

706

2

36

154

Sander Dieleman

@sedielem

4 years

A concept that really helped me to understand the behaviour of likelihood-based sequence models is "typicality": It was originally defined in an information-theoretic context, but it is equally relevant in machine learning. (1/4)

Typical set - Wikipedia

en.wikipedia.org

1

44

152

Sander Dieleman

@sedielem

2 years

Diffusion models work by learning to invert a process that gradually destroys information, step-by-step. Adding Gaussian noise is only one way to construct such a process, here's another: running the heat equation across the spatial dimensions of the image gradually blurs it.

Arno Solin

@arnosolin

2 years

🔥 'Generative modelling with inverse heat dissipation' 🔥 \w Severi and @HeinonenMarkus . A model that learns to generate images by inverting a PDE that effectively 'blurs' an image and comes with appealing properties. 📄 🎬 [1/6]

12

87

451

3

16

154

Sander Dieleman

@sedielem

1 year

@BlackHC This is all you need to know

Robbie Barrat

@videodrome

6 years

I'm laughing so hard at this slide a friend sent me from one of Geoff Hinton's courses; "To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it."

25

760

2K

3

4

146

Sander Dieleman

@sedielem

2 years

In addition to being autoencoders, diffusion models are also RNNs. I quite like the perspective of diffusion models as simply a way to train very deep generative nets with something that scales better than backpropagation. Oh and BTW diffusion models are also normalising flows🙃

jack morris

@jxmnop

2 years

Diffusion is just an easy-to-optimize way to give neural networks adaptive computation time. Makes sense then that diffusion models beat GANs, which only get one forward pass to generate an image. have to wonder what other ways there are to integrate for loops into NNs...

24

49

604

3

13

142

Sander Dieleman

@sedielem

10 months

We will be hosting the Machine Learning for Audio workshop🔊🎶 at #NeurIPS2023 in New Orleans in December! Submission deadline: September 29. Cool things are happening in this space🚀so join us if you can and spread the word! Speakers, schedule, etc.:

[“Machine Learning for Audio Workshop”]

[“Discover the harmony of AI and sound.”]

mlforaudioworkshop.com

1

35

140

Sander Dieleman

@sedielem

4 years

Some thoughts about "Scaling Laws for Autoregressive Generative Modeling" by Henighan et al. (). It's a lot to take in, but highly recommended reading! (1/5)

1

17

136

Sander Dieleman

@sedielem

5 months

I read the adversarial diffusion distillation paper recently ( it's neat, check it out!), and realised it's probably the first paper in many months that I've actually read all the way through! What should I be reading on the way to #NeurIPS2023 ? ✈️

5

13

138

Sander Dieleman

@sedielem

5 years

I will be at #ICLR2019 this week, find me if you want to talk about generative models and/or ML for audio/music 🎵. Also make sure to check out the poster and talk for MAESTRO on Tuesday at 10AM!

1

33

137

Sander Dieleman

@sedielem

2 years

When working on WaveNet, we noticed there is a "critical model size" at which point it suddenly starts working well -- smaller models basically don't work at all. In retrospect, I suppose this is another instance of "sudden emergence". This probably applies to all AR models.

Alex Tamkin 🦣

@AlexTamkin

2 years

Why do certain capabilities seem to suddenly emerge in LLMs? One possibility: Even if your probability of predicting the next token correctly goes up gradually (x-axis), Your probability of getting a *multi-token* output correct can shoot up really quickly (y=x^k)

3

12

150

4

14

136

Sander Dieleman

@sedielem

7 years

PatternNet & PatternLRP: nice work from a former colleague on interpreting neural network classification decisions.

0

46

132

Sander Dieleman

@sedielem

3 years

JAX's clean, compact APIs and its powerful function transformations (𝚟𝚖𝚊𝚙 ALL the things!), combined with DeepMind's adoption of "incremental buy-in" as a philosophy underpinning our software infrastructure, have had a huge positive impact on my work.

Google DeepMind

@GoogleDeepMind

3 years

In a new blog post, @davidmbudden and @matteohessel discuss how JAX has helped accelerate our mission, and describe an ecosystem of open source libraries that have been developed to make JAX even better for machine learning researchers everywhere:

4

118

567

1

16

132

Sander Dieleman

@sedielem

5 months

I've been fascinated by Aapo's work on score matching since back when I was doing my PhD. Of course back then, the best application we could think of was training restricted Boltzmann machines🙃 I always had a feeling we would see score matching resurface at some point!

Volodymyr Kuleshov 🇺🇦

@volokuleshov

5 months

It's crazy how many modern generative models are 15-year old Aapo Hyvarinen papers. Noise contrastive estimation => GANs Score matching => diffusion Ratio matching => discrete diffusion If I were a student today, I'd carefully read Aapo's papers, they’re a gold mine of ideas.

10

117

1K

2

16

133

Sander Dieleman

@sedielem

5 years

More progress in flow-based models! tl;dr: use masking as in autoregressive flows to get triangular Jacobians (so they can be computed analytically, avoiding power series), but use fixed point iteration for fast inversion as in i-ResNet / residual flows.

Yang Song

@DrYangSong

5 years

Releasing our paper on MintNet! It's a new flow model built by replacing normal convolutions in ResNets with masked convolutions. It has exact likelihood, fast sampling with fixed-point iteration, and better performance than published results on MNIST, CIFAR-10 and small ImageNet

4

58

259

2

25

130

Sander Dieleman

@sedielem

3 years

Unsupervised speech recognition🤯 a conditional GAN learns to map pre-trained and segmented speech audio features to phoneme label sequences. It is trained only to produce realistic looking words and sentences -- no need for any labeled data. Amazed at how well this works!

Michael Auli

@MichaelAuli

3 years

Today we are announcing our work on building speech recognition models without any labeled data! wav2vec-U rivals some of the best supervised systems from only two years ago. Paper: Blog: Code:

15

314

1K

1

25

127

Sander Dieleman

@sedielem

3 years

This work has the promise of freeing us from the technical debt that normalisation layers bring with them. I've often wondered if the gains from BatchNorm are offset by the myriad of bugs it has the potential to introduce 🤔 Now we can get competitive results without it!

Andy Brock

@ajmooch

3 years

Normalizer-Free ResNets: Our ICLR2021 paper w/ @sohamde_ & @SamuelMLSmith We show how to train deep ResNets w/o *any* normalization to ImageNet test accuracies competitive with ResNets, and EfficientNets at a range of FLOP budgets, while training faster.

8

87

409

2

30

128

Sander Dieleman

@sedielem

2 months

The way overfitting is usually taught: you underfit for a while, then at some point, you start overfitting. This "phase transition" perspective can be misleading. As Alex points out, you can have both at the same time. It's probably more useful to think of it as a trade-off.

Alex Nichol

@unixpickle

2 months

I'm surprised how few people realize it's possible to underfit and overfit at the same time.

2

60

5

7

127

Sander Dieleman

@sedielem

6 months

Fun thread about the magic of diffusion🙂 ...though I can't resist pointing out that this glosses over an important fact: 99% of bits in images are not perceptually relevant, diffusion is good at modelling the 1% that matter. Blog post with more details:

Diffusion language models

Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? Can we do anything about that?

sander.ai

Quantian1

@quantian1

6 months

The fact that it is actually perfectly general, is nothing short of astounding. It’s like watching a street magician pull a handkerchief from his nose, and then for his next trick he astral projects you to the realm of Platonic forms.

5

42

846

1

21

127

Sander Dieleman

@sedielem

10 months

It's been a while, so I thought I'd write a quick blog post about some different perspectives on diffusion models this weekend, but it's already grown to 10 sections and shows no signs of abating. Short-form blogging just isn't my style, I suppose 🙃 Coming soon...ish!

5

4

119

Sander Dieleman

@sedielem

5 months

At the latent diffusion tutorial panel yesterday, I briefly mentioned the difficulties of training autoencoders on language data. Today at the poster session, I found this paper. Looks like they've figured out a way to make this work! (§4.1) #NeurIPS2023

PLANNER: Generating Diversified Paragraph via Latent Language...

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the...

arxiv.org

1

9

116

Sander Dieleman

@sedielem

5 months

If you're at @NeurIPSConf , come check this out our demo at the @GoogleDeepMind booth on Wednesday at noon, we've got some cool stuff to share! 🎶 #NeurIPS2023

1

11

113

Sander Dieleman

@sedielem

8 years

Our paper about exploiting cyclic symmetry in convnets was accepted at ICML!

1

42

111

Sander Dieleman

@sedielem

2 years

Amazing audio generation results using a two-level approach: a semantic (low-rate) and an acoustic (higher-rate) representation, learnt separately, are combined to hierarchically generate waveforms with long-range coherence. Very impressive speech and piano continuations!

AK

@_akhaliq

2 years

AudioLM: a Language Modeling Approach to Audio Generation abs: project page:

4

72

372

2

16

109

Sander Dieleman

@sedielem

3 years

Flow-based models are usually less expressive because the Jacobian needs to be easily invertible. Keller et al. train both forward and inverse models, matching them using cycle-consistency. Then the Jacobian of one model can be used in the loss of the other, no inversion needed!

Andy Keller

@t_andy_keller

3 years

Excited to share my first paper! Self Normalizing Flows -- An efficient training method for unconstrained normalizing flows. Joint work w/ the ever supportive @jornpeters , @priyankjaini , @emiel_hoogeboom , Patrick Forré & @wellingmax 1/5

10

54

428

2

20

109

Sander Dieleman

@sedielem

2 years

Neat idea: to apply diffusion models to discrete data, map the discrete symbols to binary patterns. This paper also contains a few tricks that have the potential to improve diffusion models across the board, most notably "self-conditioning". Worth a read!

AK

@_akhaliq

2 years

Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning abs: first represent the discrete data as binary bits, and then train a continuous diffusion model to model these bits as real numbers called analog bits

3

33

199

1

13

110

Sander Dieleman

@sedielem

2 years

"Negative results" appendices are great and I wish they were more common! Especially in empirically driven ML research, where the devil is so often in the details. That said: just because someone says something didn't pan out, doesn't necessarily mean you can't make it work🙃

suraj kothawade

@surajkothawade

2 years

I wish research papers have a section in the appendix that is titled - "What did not work". Although the main paper should outline "what works", it's worth writing about the series of failed experiments.

41

229

2K

3

5

104

Sander Dieleman

@sedielem

4 years

Neat trick for faster (parallel) sampling from autoregressive models: treat it as solving a triangular system of nonlinear equations and use fixed point iteration, instead of sampling step by step.

Yang Song

@DrYangSong

4 years

Excited to share our paper on accelerating feedforward computations in ML — such as evaluating a DenseNet or sampling from autoregressive models — via parallel computing. Speedup factors are around 1.2–33 under various conditions and computation models.

5

131

628

1

18

103

Sander Dieleman

@sedielem

1 month

Text-to-music is having a moment👀 The team behind Udio are some of the brightest and most goal-driven people I've had the pleasure to work with, before they went on to found Uncharted Labs. Amazing to see the fruits of their labour out in the open!

udio

@udiomusic

1 month

Introducing Udio, an app for music creation and sharing that allows you to generate amazing music in your favorite styles with intuitive and powerful text-prompting. 1/11

861

1K

6K

0

8

101

Sander Dieleman

@sedielem

2 years

Diffusion models for language have mostly used discrete diffusion processes (e.g. D3PM, ARDM, SUNDAE, DiffusER, ...), but if you want to stick with continuous diffusion, you can simply embed token sequences first. This works pretty well as it turns out, even at scale!

AK

@_akhaliq

2 years

Self-conditioned Embedding Diffusion for Text Generation abs: propose SED, the first generally-capable continuous diffusion model for text generation

1

43

188

4

19

102

Sander Dieleman

@sedielem

8 years

New ResNet results from He et al.: put ReLU/batchnorm before weight layers instead of after!

0

59

98

Sander Dieleman

@sedielem

8 years

A human rendition of one of the #WaveNet piano samples, and some detailed analysis from Magenta:

0

42

94

Sander Dieleman

@sedielem

6 years

Autoregressive models like PixelCNN don't necessarily have to be trained using maximum likelihood. Here's an interesting alternative from several of my colleagues!

Google DeepMind

@GoogleDeepMind

6 years

Autoregressive Quantile Networks for Generative Modeling:

1

76

257

1

22

94

Sander Dieleman

@sedielem

2 years

The point of doing this in a square root is slightly less obvious: the derivative of √x w.r.t. x is 1/(2√x). Don't ignore this one unless you like exploding gradients! My code always ends up thoroughly seasoned with εs🧂 Are there any other situations where adding ε is useful?

6

2

96

Sander Dieleman

@sedielem

5 months

Diffusion circle at @NeurIPSConf : let's meet at 2:30pm on Thursday (tomorrow!) outside Hall E (Gate 10B) and then find a place to sit and have a chat. We'll do it old school and just sit on the floor somewhere. I'll share location updates live! Tell your friends! #NeurIPS2023 📢

4

12

94

Sander Dieleman

@sedielem

11 months

Interesting alternative derivation of diffusion models without differential equations or variational inference. Reminiscent of flow matching / rectified flow. Which of these perspectives is the simplest is subjective IMO, but more is better: new perspectives inspire new ideas!

Eric Heitz

@eric_heitz

11 months

When @_Laurent and I started learning about diffusion models, we were puzzled by the amount of jargon and concepts. So, we derived a model from scratch with our own graphics-people intuitions. Simple derivation, simple implementation, SOTA quality.

33

355

2K

2

11

89

Sander Dieleman

@sedielem

10 months

📢 diffusion circle! 📢 As is becoming tradition, let's talk diffusion/iterative refinement at #ICML2023 ! Let's meet at registration 3:00pm on Thursday July 27 and find a spot to sit and chat. Please share and tag anyone at the conference who might be interested!

8

11

90

Sander Dieleman

@sedielem

8 years

Presenting our poster on cyclic symmetry in CNNs this afternoon at #ICML2016 ! (With @JeffreyDeFauw and @koraykv )

1

39

91

Sander Dieleman

@sedielem

1 year

Some thoughts on non-AR language models, and what it might take to dethrone autoregression:

Diffusion language models

Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? Can we do anything about that?

sander.ai

1

11

91

Sander Dieleman

@sedielem

11 months

Neat idea: distill LLMs with reverse instead of forward KL, so the student overgeneralises less. A bit more involved, but it seems to pay off! Reminiscent of probability density distillation (), but this method works for categorical distributions.

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any...

arxiv.org

Aran Komatsuzaki

@arankomatsuzaki

11 months

Knowledge Distillation of Large Language Models - Proposes MiniLLM that distills smaller language models from generative larger language models - Scalable for different model families with 120M to 13B parameters repo: abs:

5

75

288

1

14

86

Sander Dieleman

@sedielem

5 months

Who's at @NeurIPSConf next week? I'll be on the panel for on Mon Dec 11, I'm coorganising on Sat Dec 16, and I'll be hanging out near the @GoogleDeepMind booth througout the week. Keen to chat about generating stuff! #NeurIPS2023

10

4

88

Sander Dieleman

@sedielem

1 month

This blog post is an amazing exposition and analysis of consistency models, and how they relate to diffusion models, leading to several suggested improvements to the training procedure that look very promising. Definitely worth a read!

Zhengyang Geng

@ZhengyangGeng

1 month

🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models. SoTA fast generative models using 1/32 training cost! 🔽 Get ready to speed up your generative…

7

47

143

1

12

88

Sander Dieleman

@sedielem

4 years

I believe I've only encountered self-similarity matrices in the context of music structure analysis until now. This is a really neat application of the idea: counting repetitions in videos.

Debidatta Dwibedi

@debidatta

4 years

Introducing RepNet, a model that counts repetitions in videos of *any* action w @yusufaytar , @JonathanTompson , @psermanet and Andrew Zisserman Paper: Project: Video: #CVPR2020 #computervision #deeplearning

16

125

485

4

16

85

Sander Dieleman

@sedielem

11 months

Making diffusion language models work as well as autoregressive ones will be a challenge (see my earlier blog post: ). This paper quantifies this and finds a 64x efficiency disadvantage across all scales 👀 a big gap, but at least it's a constant factor!

Diffusion language models

Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? Can we do anything about that?

sander.ai

Ishaan Gulrajani

@__ishaan

11 months

New paper with @tatsu_hashimoto ! Likelihood-Based Diffusion Language Models: Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵

8

37

254

4

18

87

Sander Dieleman

@sedielem

3 years

@DavidSKrueger Absolutely! Score-based / diffusion-based generative models are basically denoising autoencoders. Sure, they predict ε from x + ε instead of predicting x, but that's just a question of adding a residual connection🙂

2

3

87

Sander Dieleman

@sedielem

2 years

@Thom_Wolf Classifier-free guidance is a cheatcode that makes these models perform as if they had 10x the parameters. At least in terms of sample quality, and at the cost of diversity. All of the recent spectacular results rely heavily on this trick.

1

2

85

Sander Dieleman

@sedielem

2 years

This is neat: state-space models (S4-style) for raw audio! WaveNet's dilated convolutions are an elegant architectural prior for waveforms, which I'm still very fond of, but this clearly wins in terms of param efficiency. Includes a bidirectional extension to diffusion models🥳

Aran Komatsuzaki

@arankomatsuzaki

2 years

It's Raw! Audio Generation with State-Space Models Achieves SotA perf on autoregressive unconditional waveform generation. proj: repo: abs:

1

23

161

3

8

85

Sander Dieleman

@sedielem

1 year

Who's coming to #NeurIPS2022 ? ICML was great, but attendance was, understandably, a tad sparse... I'm looking forward to (re)connecting with more people this time around! Keen to talk about generative models, iterative refinement, diffusion, that sort of thing🤓

8

1

84

Sander Dieleman

@sedielem

4 years

On Thursday Aug 20, I'm speaking about generating music in the waveform domain at the Vienna deep learning meetup! Virtually of course, from my desk in London 🙂 Sign up to attend: I'll cover most of plus some recent developments!

Generating music in the waveform domain

This is a write-up of a presentation on generating music in the waveform domain, which was part of a tutorial that I co-presented at ISMIR 2019 earlier this month.

sander.ai

2

20

82

Sander Dieleman

@sedielem

1 year

Rumours of GANs' demise have been greatly exaggerated, part 2

AK

@_akhaliq

1 year

Scaling up GANs for Text-to-Image Synthesis present our 1B-parameter GigaGAN, achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive …

40

294

1K

3

5

83

Sander Dieleman

@sedielem

3 years

This work shows how you can sample from autoregressive models with Langevin dynamics, the same iterative refinement approach that powers sampling from score- & diffusion-based models! With this, you can use AR models like WaveNet for denoising, inpainting and source separation.

Vivek Jayaram

@vivjay30

3 years

Excited to share our new paper to appear at @icmlconf ! We show a new way to sample from an autoregressive model like Wavenet. Using Langevin sampling, we can solve many tasks like super-resolution, inpainting, or separation with the same network. Website:

3

35

162

0

7

81

Sander Dieleman

@sedielem

4 years

We've updated the EATS paper on arXiv: 'End-to-end' has many possible interpretations – Table 5 in the appendix (p. 21) describes some of the many ways in which the TTS pipeline has been factorised into stages in the literature, for easier comparison.

Google DeepMind

@GoogleDeepMind

4 years

In our new paper [] we propose EATS: End-to-End Adversarial Text-to-Speech, which allows for speech synthesis directly from text or phonemes without the need for multi-stage training pipelines or additional supervision. Audio:

8

200

706

2

25

80

Sander Dieleman

@sedielem

1 year

This is definitely a problem with AR waveform models, which produce very long sequences (~10^6 steps) and are prone to "going off the rails". It's clearly not been much of an issue with language models so far, but I suppose it could be in the long run! Diffusion it is, then?😁

Yann LeCun

@ylecun

1 year

I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n 1/

220

541

3K

9

7

78

Sander Dieleman

@sedielem

2 years

Audio folk! We are hosting a workshop at ICML this year, and we're very keen to hear what you've been working on. Especially if it generates speech🗣️, music🎶, bird song🐦, rain sounds🌧️, traffic noise🚘, or anything in between! Submissions are due by May 11 (up to 4 pages).

Brian Kulis

@KulisBrian

2 years

Announcing the Workshop on Machine Learning for Audio Synthesis at #ICML2022 @icmlconf ! Paper submissions on all aspects of audio generation/synthesis using ML welcome. Webpage: Organizers: @sedielem , Yu Zhang, @rmanzelli , @saddlepoint18 , @KulisBrian

1

26

73

0

16

79

Sander Dieleman

@sedielem

3 years

I've previously discussed the importance of measuring likelihoods in the right space in a blog post () and on Twitter (e.g. ). (5/8)

Musings on typicality

A summary of my current thoughts on typicality, and its relevance to likelihood-based generative models.

sander.ai

Sander Dieleman

@sedielem

3 years

Measuring likelihoods in the right representation space is important, and we need prior knowledge to find that space. This work formalises this argument for anomaly detection, and also demonstrates that looking at typicality instead of density isn't enough for reliable detection.

0

13

53

1

8

78