Edward Z. Yang @ezyang Twitter profile

Last Seen Profiles

@zeroadvantage_

@litten_andy

@Pak_Gregg

@cutecorestar

@parkgnwkie

@gabmonstere

@bulbinteractive

@TheWellPod

@MFAminGP

@freakyjackz

@Guy__Moux

@dnatweets

@DrRobertKlapper

@lsolum

@EsportsIllus

@BenLBailey

@BurgerKingFR

@mokumo71

@stw_pdg

@gentilkiwi

@iam_ibti_

@MotorCityCruise

@TracyWiles

@fenwickfriars

@namyenss

@TheEWF

@tsubasaE3

@janethhbby

@SaifulK38743860

@comentababybrun

@oficialanais

@InTheMudSports

@koesoowj

@k_myon_RPS13

@dakotacdsmith

@LouisBoyard

Edward Z. Yang

@ezyang

3 years

Do you want to work on PyTorch? The PyTorch team at Facebook is hiring! Remote in many locations is OK and most things we do are open source. Reach out to me in DMs if you're interested.

33

215

1K

Edward Z. Yang

@ezyang

3 years

🚨🚨🚨 HEY EVERYONE I HAVE A PODCAST ABOUT PYTORCH INTERNALS DEVELOPMENT Two episodes public so far, three more recorded and unreleased. Also on Spotify, Apple and Google 🚨🚨🚨

PyTorch Developer Podcast

The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.

pytorch-dev-podcast.simplecast.com

11

99

534

Edward Z. Yang

@ezyang

2 years

I never thought I'd say this, but writing the first compiler for PyTorch in C++ instead of Python was such a big mistake 😂

21

24

415

Edward Z. Yang

@ezyang

3 years

Compile time just absolutely destroys casual contributions. "Oh, you have a free hour to write a fix? Well, spend it compiling the project first"

25

26

387

Edward Z. Yang

@ezyang

3 years

State of PyTorch core, September 2021 edition:

State of PyTorch core: September 2021 edition

State of PyTorch core: September 2021 edition There are a lot of projects currently going on in PyTorch core and it can be difficult to keep track of all of them or how they relate with each other....

dev-discuss.pytorch.org

5

47

278

Edward Z. Yang

@ezyang

7 years

Dr. Edward Z. Yang (March 16, 2017)

32

7

255

Edward Z. Yang

@ezyang

2 years

PyTorch Dev Podcast is coming back, first episode on Monday, with weekly release! I'll also be experimenting with screencasts (I've got one very boring one recorded, more planned); stay tuned on where to get them!

3

18

210

Edward Z. Yang

@ezyang

2 years

In case you haven't seen it, PyTorch Dev Podcast is back! Our first episode from the break is about Torch vs ATen APIs:

PyTorch Developer Podcast

PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What...

pytorch-dev-podcast.simplecast.com

1

18

91

Edward Z. Yang

@ezyang

5 years

Unannotated slides for my PyTorch Internals talk at the PyTorch NYC meetup yesterday are at (I'm also planning to write a longform version with text.)

6

45

194

Edward Z. Yang

@ezyang

5 years

Great blog post by @cwillycs about contributing to PyTorch for the very first time. Best quote: “I don’t have a PhD in ML… I can’t do this.” Ssshh, of course you can.

Committing to PyTorch by someone who doesn’t know a ton about PyTorch

Hello, friends. My name is Cami. Welcome to my mind dump.

medium.com

1

49

163

Edward Z. Yang

@ezyang

1 month

Programming with copilot is like having a gps that kind of works but is also constantly trying to drive you off a cliff

6

18

151

Edward Z. Yang

@ezyang

2 years

As a staff engineer, you are expected to solve impactful problems that no one else can solve. For problems that can be solved via CODING, are you more likely to be breaking down a technical barrier or an organizational barrier? (Choices described more downthread.)

Technical barriers

352

Organizational barriers

977

9

16

138

Edward Z. Yang

@ezyang

2 years

Automatic code formatting is how we have chosen to incrementally solve the problem of why can't we store source code in a more structured representation

10

13

133

Edward Z. Yang

@ezyang

2 years

Assuming I don't mess it up, I'll be streaming my undergrad NYU class on programming languages at The first class is mostly logics and then a trip through JavaScript history, as a way to frame many of the things that will be discussed in the class

edwardzyang - Twitch

I work on PyTorch at Meta; I also teach a programming languages class at NYU.

www.twitch.tv

6

19

118

Edward Z. Yang

@ezyang

4 years

There’s premature optimization, and then there’s not doing dumb shit

1

21

108

Edward Z. Yang

@ezyang

1 year

ok so this just paid back my $20. so fucking worth it

6

2

105

Edward Z. Yang

@ezyang

6 years

Modularity is the ability to answer no to question: "Do I have to reread every line of source code when I make this change?"

0

37

103

Edward Z. Yang

@ezyang

5 months

It is with deepest regret that the performance "optimization" I spent half an hour working on... makes the end to end test suite run 5% slower

6

1

100

Edward Z. Yang

@ezyang

7 years

After being coddled with Haskell's excellent multithreading support, working with multiprocessing in Python is positively rage inducing

1

36

102

Edward Z. Yang

@ezyang

1 year

Me code reviewing a 1.2k line PR

3

2

102

Edward Z. Yang

@ezyang

6 years

The reason to use types is not that it will stop bugs from people who know what they are doing (though it very well may), but it will stop bugs from people who have no idea what they are doing (and are just trying to get the fucking thing to work)

7

25

98

Edward Z. Yang

@ezyang

2 years

Upon reflection, it is shocking that no one in my work sphere figured out how to do collaborative whiteboarding over the entire pandemic (this based on observation that no one is whiteboarding in any meeting I attended)

20

8

93

Edward Z. Yang

@ezyang

2 months

TL: "Let's build an autograd engine, how hard could it be." Two years later: "I REGRET EVERYTHING"

4

3

94

Edward Z. Yang

@ezyang

8 years

As promised: What Template Haskell gets wrong and Racket gets right

2

50

92

Edward Z. Yang

@ezyang

1 year

My read on Mojo is it's what you would do if Swift for Tensorflow failed and you were like "why did it fail" and concluded it's because no one likes Swift, so instead you do Python, and also MLIR is cool so backend to MLIR directly instead of TF

5

92

Edward Z. Yang

@ezyang

6 years

hasktorch: tensors and neural networks with Haskell and Backpack! How cool is that?!

GitHub - hasktorch/hasktorch: Tensors and neural networks in Haskell

Tensors and neural networks in Haskell. Contribute to hasktorch/hasktorch development by creating an account on GitHub.

github.com

1

32

91

Edward Z. Yang

@ezyang

4 years

Engineers think we can solve anything because we're very good at deciding not to work on unsolvable problems

3

6

89

Edward Z. Yang

@ezyang

7 years

Since several people had forward Rich Hickey's latest talk to me, I decide to write a blog post about it

3

36

89

Edward Z. Yang

@ezyang

3 years

High abstraction often results in a loss of tactile feel for low level characteristics; e.g., GC ~> poor understanding of how much memory you're allocating; templates ~> poor understanding of how much code you're compiling. Can we restore feedback w/o losing abstraction?

12

7

86

Edward Z. Yang

@ezyang

2 years

One way to tell you're working with a really well written codebase: when things go wrong, you find there are well oiled diagnostic tools in just the right places to help you figure things out. Could be as simple as useful logging, or as complex as automated repro generation

4

11

86

Edward Z. Yang

@ezyang

2 months

Podcast drop: CUDA graph trees

PyTorch Developer Podcast

CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode="reduce-overhead". Their primary innovation is that they allow the reuse of memory across multiple CUDA...

pytorch-dev-podcast.simplecast.com

1

16

83

Edward Z. Yang

@ezyang

1 year

Finished debugging a crazy bug related to CPython eval frame minutiae. So many fucking twists. Read the full story at

Debugging story: The case of the flaky Dynamo export tests

Keep asking why. For months, Dynamo export tests have been intermittently failing with “AssertionError: whole graph export entails exactly one call”. Flaky tests that you can’t reproduce are pretty...

dev-discuss.pytorch.org

1

5

80

Edward Z. Yang

@ezyang

8 years

Backpack merged. You can try it out. New blog:

2

54

81

Edward Z. Yang

@ezyang

2 years

Tomorrow at 3PM EST I will be livestreaming a deep dive of torchdynamo on my Twitch channel (roughly two hours before my regular PL class). We will try to understand enough to fix

Side effect to OrderedDict doesn't allow resumption · Issue #131 · pytorch/torchdynamo

from typing import List import torch import torchdynamo import collections torchdynamo.config.debug = True def toy_example(a, b, d): x = a + b d[3] = 1 return x * b d = collections.OrderedDict() wi...

github.com

2

9

54

Edward Z. Yang

@ezyang

10 months

Is an interpreter always simpler to implement than a compiler? Discuss!

24

5

73

Edward Z. Yang

@ezyang

5 years

Unit tests are not a substitute for understanding

2

20

74

Edward Z. Yang

@ezyang

2 years

Recording for PyTorch Dev Podcast season 2 has begun!!!

0

2

73

Edward Z. Yang

@ezyang

6 years

I made a thing: Convolution Visualizer (basically, it's but you can twiddle the parameters as you like)

4

22

69

Edward Z. Yang

@ezyang

12 days

I wonder if naively written Rust code can be slower than naively written code in compiled GC'ed language, simply because a garbage collector beats out clone()'ing everywhere just to shut up the borrow checker

13

4

68

Edward Z. Yang

@ezyang

3 years

RIP my inbox 😂

1

0

70

Edward Z. Yang

@ezyang

2 years

come for the functional programming and borrow checking, stay for the package manager

0

1

69

Edward Z. Yang

@ezyang

3 years

I've always thought Skip was pretty cool and I have wondered how their incremental computation engine was implemented internally to get such good performance. Thread.

2

10

67

Edward Z. Yang

@ezyang

27 days

It is pretty frustrating that I am basically working on an optimizing compiler for a dependently typed language (PT2 + dynamic shapes + data dependent shapes) but it is so different from the literature that I don't know if any of it is relevant to our situation

7

4

66

Edward Z. Yang

@ezyang

3 years

Rewrote my Python program in the same style I'd write Haskell code and now it's slow and I'm like

4

1

65

Edward Z. Yang

@ezyang

2 years

You know how some PL courses organize themselves by making little languages and successively extending then? Should do the same thing but with emphasis on some ancillary feature X e.g., debugging, permitting BC breaking changes in the ecosystem, modularity, separate compilation..

4

7

66

Edward Z. Yang

@ezyang

7 years

Reverse mode automatic differentation in a single picture.

1

14

65

Edward Z. Yang

@ezyang

2 years

The problem with strong Python typing is that it is essentially a different language that no one knows how to write

11

3

65

Edward Z. Yang

@ezyang

8 years

Blog post: The convergence of compilers, build systems and package managers

3

38

65

Edward Z. Yang

@ezyang

6 years

Happy Birthday Simon!

0

33

61

Edward Z. Yang

@ezyang

2 years

My cancelleable take is that it is possible to write good software in C++

5

2

62

Edward Z. Yang

@ezyang

1 year

First class dims are criminally underused. Here is an extremely straightforward of roi_align, in pure Python, with NO explicit loops:

A pure Python implementation of roi_align that looks just like its CUDA kernel

vision_maskrcnn has been failing in the PT2 benchmark suite for as long as I can remember, and part of the reason is that when we run it twice in eager mode, it gives different results. It turned out...

dev-discuss.pytorch.org

1

3

58

Edward Z. Yang

@ezyang

2 years

Join me and @_seemethere to talk about PyTorch's new CI system on GitHub Actions

PyTorch Developer Podcast

PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk...

pytorch-dev-podcast.simplecast.com

1

4

57

Edward Z. Yang

@ezyang

3 years

python changed their tracebacks from reporting line number of the end of the statement to the beginning of statement and this broke my code and surely I live in the dumbest timeline

3

59

Edward Z. Yang

@ezyang

5 years

Annotated slides in essay form of my "PyTorch Internals" talk are now up at

4

11

57

Edward Z. Yang

@ezyang

8 years

If I'm not mistaken, my Backpack implementations in my GHC/Cabal/cabal-install branches are feature complete. Refactor-merge-time!!

2

23

55

Edward Z. Yang

@ezyang

2 years

Which is more difficult: writing an ahead-of-time (symbolic) or runtime (eager) reverse mode automatic differentiation implementation? For the PyTorch team, you can observe that we struggled with AoT more. But it is difficult to suss out exactly why this is the case. 🧵

3

56

Edward Z. Yang

@ezyang

8 years

I guess I should write a blog post "What Template Haskell gets wrong and Racket gets right"

6

9

55

Edward Z. Yang

@ezyang

7 years

This thesis should be on the airwaves

3

18

56

Edward Z. Yang

@ezyang

8 years

"Dynamic Witnesses for Static Type Errors" (Seidel, Jhala, Weimer) -- someone did this finally! Hooray!

1

22

54

Edward Z. Yang

@ezyang

3 years

Compiler Twitter: If We decided to make a formal semantics of PyTorch operators (add, conv, etc), what tool should we use to write it? Are we forced to build our own?

8

6

51

Edward Z. Yang

@ezyang

3 months

tfw you overflowed the 16-bit integer because your training job has 70k nodes

2

1

52

Edward Z. Yang

@ezyang

8 years

The transition from "too embarrassed to ask the question" to "horrifying realization that no one else knows the answer either"

1

9

51

Edward Z. Yang

@ezyang

2 months

Podcast drop: AOTInductor

PyTorch Developer Podcast

AOTInductor is a feature in PyTorch that lets you export an inference model into a self-contained dynamic library, which can subsequently be loaded and used to run optimized inference. It is aimed...

pytorch-dev-podcast.simplecast.com

1

5

50

Edward Z. Yang

@ezyang

8 years

With any luck, Backpack will merge tomorrow. Just need to get all the CI passing.

4

12

50

Edward Z. Yang

@ezyang

4 years

Algebraic effects in web assembly! So exciting

Typed continuations to model stacks · Issue #1359 · WebAssembly/design

Motivation Wasm currently lacks any support for multiple stacks of execution and switching between them. That prevents it from efficiently representing most control flow abstractions that appear in...

github.com

0

13

47

Edward Z. Yang

@ezyang

3 years

Monoids are so fucking great for configuration

5

1

48

Edward Z. Yang

@ezyang

8 years

Backpack 2/2 merged

3

9

48

Edward Z. Yang

@ezyang

2 years

Interview question: implement a basic calculator that evaluates expressions, then add a debugger for it

9

2

48

Edward Z. Yang

@ezyang

11 months

Every positive float8 E4M3 number. First row is subnormals, last row normally would be NaNs in IEEE but they robbed it for more dynamic range. No infinities.

4

8

47

Edward Z. Yang

@ezyang

3 years

2

4

45

Edward Z. Yang

@ezyang

8 years

Put GraphViz on the list of "software everyone uses, and which really needs to be rewritten"

5

4

46

Edward Z. Yang

@ezyang

8 years

TYPES. They are magnificent. Types for the refactor god.

1

21

45

Edward Z. Yang

@ezyang

2 years

So I took a stab at outlining like, the first two lectures of "how to design PLs with debuggers" at but really what I found out is I know nothing about debuggers and I'm hoping maybe someone else can run with this (cc @cfbolz )

Designing a PL to support debuggers

docs.google.com

5

45

Edward Z. Yang

@ezyang

1 month

It's alive!

5

46

Edward Z. Yang

@ezyang

2 years

Organizational barriers are code that, if you put a gun to someone's head, they could write it. But for some reason, they don't. Maybe it doesn't look good on perf, or it's really boring to do, or there's too much. See also the "bulldozer method":

Dan Luu

@danluu

2 years

There are a lot of problems that seem like way too much brute force work to be feasible, until you start the work and realize that the velocity increase you get from the practice of doing the work means that the problem is straightforward and feasible.

1

26

171

2

45

Edward Z. Yang

@ezyang

4 years

How does one become a floating point expert? E.g., how does one get to the point where you can decide whether or not to replace cos(x) with sqrt(1-sin(x)^2) and you know it will be profitable AND numerically stable?

10

2

45

Edward Z. Yang

@ezyang

4 years

Higher order reverse mode AD implementation in 35 lines of Python

3

5

45

Edward Z. Yang

@ezyang

7 years

Interested in Backpack? I've released a draft copy of my thesis here: Please let me know about anything unclear!

Releases · ezyang/thesis

Thesis. Contribute to ezyang/thesis development by creating an account on GitHub.

github.com

3

9

44

Edward Z. Yang

@ezyang

3 years

oh my god I think I responded to every body

2

0

43

Edward Z. Yang

@ezyang

7 years

New to Haskell? Want to contribute to OS? We've been working on newcomer tasks in Cabal. Here's an example:

cabal init allows a license that cabal install does not understand · Issue #4496 · haskell/cabal

Summary. For example, a license containing a space: $ cabal init ... Please choose a license: ... 14) Other (specify) Your choice? [default: BSD3] 14 Please specify? hello world ... $ cabal install...

github.com

2

18

43

Edward Z. Yang

@ezyang

3 months

streaming!!! let's implement a feature in pytorch

edwardzyang - Twitch

I work on PyTorch at Meta; I also teach a programming languages class at NYU.

www.twitch.tv

0

6

42

Edward Z. Yang

@ezyang

1 year

Every year I forget why algebraic effects are algebraic and then I have to go reread Bauer's paper in the subject but this year it doesn't feel like it is sticking lol

2

0

44

Edward Z. Yang

@ezyang

3 months

Podcast drop: Tensor subclasses and PT2

PyTorch Developer Podcast

Tensor subclasses allow you to add extend PyTorch with new types of tensors without having to write any C++. They have been used to implement DTensor, FP8, Nested Jagged Tensor and Complex Tensor....

pytorch-dev-podcast.simplecast.com

0

3

44

Edward Z. Yang

@ezyang

3 years

I suppose the moral of the story is that it doesn't matter if your thing is better, if it's not staffed and people don't understand how to adapt it to solve new problems it will get replaced

6

43

Edward Z. Yang

@ezyang

5 months

the real open source power move is ignoring all your bug reports for two years and then fixing half of them in an afternoon

5

2

40

Edward Z. Yang

@ezyang

2 years

@yminsky Picked C++ because we could get static types, but it turns out writing an FP style compiler in C++ is awful (we ended up doing an LLVM style IR), the types aren't even that good so we ended up having to introduce dynamic types anyway for IR nodes

2

0

42

Edward Z. Yang

@ezyang

1 year

Using chatgpt to code is very, very similar to an extremely fast but extremely inexperienced junior engineer. Need lots of code review...

3

4

41

Edward Z. Yang

@ezyang

3 years

doc fix in 2015 🤣

1

41

Edward Z. Yang

@ezyang

2 years

That new language smell when all the ecosystem libraries are less than five years old

1

40

Edward Z. Yang

@ezyang

8 years

0

24

39

Edward Z. Yang

@ezyang

3 years

What's the current state of PL research for incremental programming?

10

3

40

Edward Z. Yang

@ezyang

3 years

making code run fast is very easy just don't do dumb shit what's the problem

2

6

40

Edward Z. Yang

@ezyang

5 years

That moment when you realize that the "official" library is written by someone who doesn't know what they are doing

0

1

40

Edward Z. Yang

@ezyang

1 year

Severely tempted to do an effect handlers and OO design patterns mash up blog post again

2

0

39

Edward Z. Yang

@ezyang

6 years

A year into Backpack: Lots of things have happened in Backpack since I graduated my PhD! Here are some of the most interesting ones.

0

11

39

Edward Z. Yang

@ezyang

4 years

Writing a type checker in 30 lines of miniKanren is a great parlor trick. And then you have a program that can synthesize a term given a type!

1

3

38

Edward Z. Yang

@ezyang

1 year

Fridge thought: programming languages that were designed to harness and contain an "army of junior engineers" will be the best languages for LLMs to code in

3

1

39

Edward Z. Yang

@ezyang

2 years

@dogfishbar If I had my way, it would have been in Haskell. But it was impossible to justify (to myself, anyway) at the time

5

0

36

Edward Z. Yang

@ezyang

1 year

I wonder if OCaml adding effect handlers before the type system catches up will end up being seen as good idea or mistake

1

3

37

Edward Z. Yang

@ezyang

2 years

Vibing “program transformations by symbolically executing an interpreter” (send me papers!)

7

2

37