Edward Z. Yang Profile Banner
Edward Z. Yang Profile
Edward Z. Yang

@ezyang

10,229
Followers
973
Following
69
Media
7,448
Statuses

I work on PyTorch at Meta. Chatty alt at @difficultyang . Mastodon @ezyang @types .pl

Edison, NJ
Joined May 2008
Don't wanna be here? Send us removal request.
@ezyang
Edward Z. Yang
3 years
Do you want to work on PyTorch? The PyTorch team at Facebook is hiring! Remote in many locations is OK and most things we do are open source. Reach out to me in DMs if you're interested.
33
215
1K
@ezyang
Edward Z. Yang
3 years
๐Ÿšจ๐Ÿšจ๐Ÿšจ HEY EVERYONE I HAVE A PODCAST ABOUT PYTORCH INTERNALS DEVELOPMENT Two episodes public so far, three more recorded and unreleased. Also on Spotify, Apple and Google ๐Ÿšจ๐Ÿšจ๐Ÿšจ
11
99
534
@ezyang
Edward Z. Yang
2 years
I never thought I'd say this, but writing the first compiler for PyTorch in C++ instead of Python was such a big mistake ๐Ÿ˜‚
21
24
415
@ezyang
Edward Z. Yang
3 years
Compile time just absolutely destroys casual contributions. "Oh, you have a free hour to write a fix? Well, spend it compiling the project first"
25
26
387
@ezyang
Edward Z. Yang
7 years
Dr. Edward Z. Yang (March 16, 2017)
32
7
255
@ezyang
Edward Z. Yang
2 years
PyTorch Dev Podcast is coming back, first episode on Monday, with weekly release! I'll also be experimenting with screencasts (I've got one very boring one recorded, more planned); stay tuned on where to get them!
3
18
210
@ezyang
Edward Z. Yang
5 years
Unannotated slides for my PyTorch Internals talk at the PyTorch NYC meetup yesterday are at (I'm also planning to write a longform version with text.)
6
45
194
@ezyang
Edward Z. Yang
5 years
Great blog post by @cwillycs about contributing to PyTorch for the very first time. Best quote: โ€œI donโ€™t have a PhD in MLโ€ฆ I canโ€™t do this.โ€ Ssshh, of course you can.
1
49
163
@ezyang
Edward Z. Yang
1 month
Programming with copilot is like having a gps that kind of works but is also constantly trying to drive you off a cliff
6
18
151
@ezyang
Edward Z. Yang
2 years
As a staff engineer, you are expected to solve impactful problems that no one else can solve. For problems that can be solved via CODING, are you more likely to be breaking down a technical barrier or an organizational barrier? (Choices described more downthread.)
Technical barriers
352
Organizational barriers
977
9
16
138
@ezyang
Edward Z. Yang
2 years
Automatic code formatting is how we have chosen to incrementally solve the problem of why can't we store source code in a more structured representation
10
13
133
@ezyang
Edward Z. Yang
2 years
Assuming I don't mess it up, I'll be streaming my undergrad NYU class on programming languages at The first class is mostly logics and then a trip through JavaScript history, as a way to frame many of the things that will be discussed in the class
6
19
118
@ezyang
Edward Z. Yang
4 years
Thereโ€™s premature optimization, and then thereโ€™s not doing dumb shit
1
21
108
@ezyang
Edward Z. Yang
1 year
ok so this just paid back my $20. so fucking worth it
Tweet media one
6
2
105
@ezyang
Edward Z. Yang
6 years
Modularity is the ability to answer no to question: "Do I have to reread every line of source code when I make this change?"
0
37
103
@ezyang
Edward Z. Yang
5 months
It is with deepest regret that the performance "optimization" I spent half an hour working on... makes the end to end test suite run 5% slower
6
1
100
@ezyang
Edward Z. Yang
7 years
After being coddled with Haskell's excellent multithreading support, working with multiprocessing in Python is positively rage inducing
1
36
102
@ezyang
Edward Z. Yang
1 year
Me code reviewing a 1.2k line PR
Tweet media one
3
2
102
@ezyang
Edward Z. Yang
6 years
The reason to use types is not that it will stop bugs from people who know what they are doing (though it very well may), but it will stop bugs from people who have no idea what they are doing (and are just trying to get the fucking thing to work)
7
25
98
@ezyang
Edward Z. Yang
2 years
Upon reflection, it is shocking that no one in my work sphere figured out how to do collaborative whiteboarding over the entire pandemic (this based on observation that no one is whiteboarding in any meeting I attended)
20
8
93
@ezyang
Edward Z. Yang
2 months
TL: "Let's build an autograd engine, how hard could it be." Two years later: "I REGRET EVERYTHING"
4
3
94
@ezyang
Edward Z. Yang
8 years
As promised: What Template Haskell gets wrong and Racket gets right
2
50
92
@ezyang
Edward Z. Yang
1 year
My read on Mojo is it's what you would do if Swift for Tensorflow failed and you were like "why did it fail" and concluded it's because no one likes Swift, so instead you do Python, and also MLIR is cool so backend to MLIR directly instead of TF
5
5
92
@ezyang
Edward Z. Yang
4 years
Engineers think we can solve anything because we're very good at deciding not to work on unsolvable problems
3
6
89
@ezyang
Edward Z. Yang
7 years
Since several people had forward Rich Hickey's latest talk to me, I decide to write a blog post about it
3
36
89
@ezyang
Edward Z. Yang
3 years
High abstraction often results in a loss of tactile feel for low level characteristics; e.g., GC ~> poor understanding of how much memory you're allocating; templates ~> poor understanding of how much code you're compiling. Can we restore feedback w/o losing abstraction?
12
7
86
@ezyang
Edward Z. Yang
2 years
One way to tell you're working with a really well written codebase: when things go wrong, you find there are well oiled diagnostic tools in just the right places to help you figure things out. Could be as simple as useful logging, or as complex as automated repro generation
4
11
86
@ezyang
Edward Z. Yang
8 years
Backpack merged. You can try it out. New blog:
2
54
81
@ezyang
Edward Z. Yang
10 months
Is an interpreter always simpler to implement than a compiler? Discuss!
24
5
73
@ezyang
Edward Z. Yang
5 years
Unit tests are not a substitute for understanding
2
20
74
@ezyang
Edward Z. Yang
2 years
Recording for PyTorch Dev Podcast season 2 has begun!!!
0
2
73
@ezyang
Edward Z. Yang
6 years
I made a thing: Convolution Visualizer (basically, it's but you can twiddle the parameters as you like)
4
22
69
@ezyang
Edward Z. Yang
12 days
I wonder if naively written Rust code can be slower than naively written code in compiled GC'ed language, simply because a garbage collector beats out clone()'ing everywhere just to shut up the borrow checker
13
4
68
@ezyang
Edward Z. Yang
3 years
RIP my inbox ๐Ÿ˜‚
1
0
70
@ezyang
Edward Z. Yang
2 years
come for the functional programming and borrow checking, stay for the package manager
0
1
69
@ezyang
Edward Z. Yang
3 years
I've always thought Skip was pretty cool and I have wondered how their incremental computation engine was implemented internally to get such good performance. Thread.
2
10
67
@ezyang
Edward Z. Yang
27 days
It is pretty frustrating that I am basically working on an optimizing compiler for a dependently typed language (PT2 + dynamic shapes + data dependent shapes) but it is so different from the literature that I don't know if any of it is relevant to our situation
7
4
66
@ezyang
Edward Z. Yang
3 years
Rewrote my Python program in the same style I'd write Haskell code and now it's slow and I'm like
Tweet media one
4
1
65
@ezyang
Edward Z. Yang
2 years
You know how some PL courses organize themselves by making little languages and successively extending then? Should do the same thing but with emphasis on some ancillary feature X e.g., debugging, permitting BC breaking changes in the ecosystem, modularity, separate compilation..
4
7
66
@ezyang
Edward Z. Yang
7 years
Reverse mode automatic differentation in a single picture.
Tweet media one
1
14
65
@ezyang
Edward Z. Yang
2 years
The problem with strong Python typing is that it is essentially a different language that no one knows how to write
11
3
65
@ezyang
Edward Z. Yang
8 years
Blog post: The convergence of compilers, build systems and package managers
3
38
65
@ezyang
Edward Z. Yang
6 years
Happy Birthday Simon!
0
33
61
@ezyang
Edward Z. Yang
2 years
My cancelleable take is that it is possible to write good software in C++
5
2
62
@ezyang
Edward Z. Yang
3 years
python changed their tracebacks from reporting line number of the end of the statement to the beginning of statement and this broke my code and surely I live in the dumbest timeline
3
3
59
@ezyang
Edward Z. Yang
5 years
Annotated slides in essay form of my "PyTorch Internals" talk are now up at
4
11
57
@ezyang
Edward Z. Yang
8 years
If I'm not mistaken, my Backpack implementations in my GHC/Cabal/cabal-install branches are feature complete. Refactor-merge-time!!
2
23
55
@ezyang
Edward Z. Yang
2 years
Which is more difficult: writing an ahead-of-time (symbolic) or runtime (eager) reverse mode automatic differentiation implementation? For the PyTorch team, you can observe that we struggled with AoT more. But it is difficult to suss out exactly why this is the case. ๐Ÿงต
3
3
56
@ezyang
Edward Z. Yang
8 years
I guess I should write a blog post "What Template Haskell gets wrong and Racket gets right"
6
9
55
@ezyang
Edward Z. Yang
7 years
This thesis should be on the airwaves
3
18
56
@ezyang
Edward Z. Yang
8 years
"Dynamic Witnesses for Static Type Errors" (Seidel, Jhala, Weimer) -- someone did this finally! Hooray!
1
22
54
@ezyang
Edward Z. Yang
3 years
Compiler Twitter: If We decided to make a formal semantics of PyTorch operators (add, conv, etc), what tool should we use to write it? Are we forced to build our own?
8
6
51
@ezyang
Edward Z. Yang
3 months
tfw you overflowed the 16-bit integer because your training job has 70k nodes
2
1
52
@ezyang
Edward Z. Yang
8 years
The transition from "too embarrassed to ask the question" to "horrifying realization that no one else knows the answer either"
1
9
51
@ezyang
Edward Z. Yang
8 years
With any luck, Backpack will merge tomorrow. Just need to get all the CI passing.
4
12
50
@ezyang
Edward Z. Yang
3 years
Monoids are so fucking great for configuration
5
1
48
@ezyang
Edward Z. Yang
8 years
Backpack 2/2 merged
3
9
48
@ezyang
Edward Z. Yang
2 years
Interview question: implement a basic calculator that evaluates expressions, then add a debugger for it
9
2
48
@ezyang
Edward Z. Yang
11 months
Every positive float8 E4M3 number. First row is subnormals, last row normally would be NaNs in IEEE but they robbed it for more dynamic range. No infinities.
Tweet media one
4
8
47
@ezyang
Edward Z. Yang
3 years
Tweet media one
2
4
45
@ezyang
Edward Z. Yang
8 years
Put GraphViz on the list of "software everyone uses, and which really needs to be rewritten"
5
4
46
@ezyang
Edward Z. Yang
8 years
TYPES. They are magnificent. Types for the refactor god.
1
21
45
@ezyang
Edward Z. Yang
2 years
So I took a stab at outlining like, the first two lectures of "how to design PLs with debuggers" at but really what I found out is I know nothing about debuggers and I'm hoping maybe someone else can run with this (cc @cfbolz )
5
5
45
@ezyang
Edward Z. Yang
1 month
It's alive!
Tweet media one
5
5
46
@ezyang
Edward Z. Yang
2 years
Organizational barriers are code that, if you put a gun to someone's head, they could write it. But for some reason, they don't. Maybe it doesn't look good on perf, or it's really boring to do, or there's too much. See also the "bulldozer method":
@danluu
Dan Luu
2 years
There are a lot of problems that seem like way too much brute force work to be feasible, until you start the work and realize that the velocity increase you get from the practice of doing the work means that the problem is straightforward and feasible.
1
26
171
2
2
45
@ezyang
Edward Z. Yang
4 years
How does one become a floating point expert? E.g., how does one get to the point where you can decide whether or not to replace cos(x) with sqrt(1-sin(x)^2) and you know it will be profitable AND numerically stable?
10
2
45
@ezyang
Edward Z. Yang
4 years
Higher order reverse mode AD implementation in 35 lines of Python
3
5
45
@ezyang
Edward Z. Yang
7 years
Interested in Backpack? I've released a draft copy of my thesis here: Please let me know about anything unclear!
3
9
44
@ezyang
Edward Z. Yang
3 years
oh my god I think I responded to every body
2
0
43
@ezyang
Edward Z. Yang
1 year
Every year I forget why algebraic effects are algebraic and then I have to go reread Bauer's paper in the subject but this year it doesn't feel like it is sticking lol
2
0
44
@ezyang
Edward Z. Yang
3 years
I suppose the moral of the story is that it doesn't matter if your thing is better, if it's not staffed and people don't understand how to adapt it to solve new problems it will get replaced
6
6
43
@ezyang
Edward Z. Yang
5 months
the real open source power move is ignoring all your bug reports for two years and then fixing half of them in an afternoon
5
2
40
@ezyang
Edward Z. Yang
2 years
@yminsky Picked C++ because we could get static types, but it turns out writing an FP style compiler in C++ is awful (we ended up doing an LLVM style IR), the types aren't even that good so we ended up having to introduce dynamic types anyway for IR nodes
2
0
42
@ezyang
Edward Z. Yang
1 year
Using chatgpt to code is very, very similar to an extremely fast but extremely inexperienced junior engineer. Need lots of code review...
3
4
41
@ezyang
Edward Z. Yang
3 years
doc fix in 2015 ๐Ÿคฃ
Tweet media one
1
1
41
@ezyang
Edward Z. Yang
2 years
That new language smell when all the ecosystem libraries are less than five years old
1
1
40
@ezyang
Edward Z. Yang
8 years
0
24
39
@ezyang
Edward Z. Yang
3 years
What's the current state of PL research for incremental programming?
10
3
40
@ezyang
Edward Z. Yang
3 years
making code run fast is very easy just don't do dumb shit what's the problem
2
6
40
@ezyang
Edward Z. Yang
5 years
That moment when you realize that the "official" library is written by someone who doesn't know what they are doing
0
1
40
@ezyang
Edward Z. Yang
1 year
Severely tempted to do an effect handlers and OO design patterns mash up blog post again
2
0
39
@ezyang
Edward Z. Yang
6 years
A year into Backpack: Lots of things have happened in Backpack since I graduated my PhD! Here are some of the most interesting ones.
0
11
39
@ezyang
Edward Z. Yang
4 years
Writing a type checker in 30 lines of miniKanren is a great parlor trick. And then you have a program that can synthesize a term given a type!
1
3
38
@ezyang
Edward Z. Yang
1 year
Fridge thought: programming languages that were designed to harness and contain an "army of junior engineers" will be the best languages for LLMs to code in
3
1
39
@ezyang
Edward Z. Yang
2 years
@dogfishbar If I had my way, it would have been in Haskell. But it was impossible to justify (to myself, anyway) at the time
5
0
36
@ezyang
Edward Z. Yang
1 year
I wonder if OCaml adding effect handlers before the type system catches up will end up being seen as good idea or mistake
1
3
37
@ezyang
Edward Z. Yang
2 years
Vibing โ€œprogram transformations by symbolically executing an interpreterโ€ (send me papers!)
7
2
37