Allen Downey Profile Banner
Allen Downey Profile
Allen Downey

@AllenDowney

17,207
Followers
1,095
Following
905
Media
7,875
Statuses

Author of Probably Overthinking It, Think Python, and Think Bayes. Emeritus Prof at Olin College, consultant with PyMC Labs.

Boston, MA
Joined April 2013
Don't wanna be here? Send us removal request.
Pinned Tweet
@AllenDowney
Allen Downey
7 months
My copy of Probably Overthinking It has arrived! It should be shipping soon. If you would like to pre-order, you can get 30% from University of Chicago Press. Use the code UCPNEW.
Tweet media one
14
10
139
@AllenDowney
Allen Downey
1 year
Today I asked ChatGPT to solve almost every exercise in Think Python and DSIRP. It did. My conclusion: everyone who writes code should spend the next month doing professional development on writing code with LLM-assist. This is how code will be written from now on.
156
266
2K
@AllenDowney
Allen Downey
1 year
In the last week, three people on reddit/r/statistics have asked about testing whether a sample came from a Gaussian distribution. The answer is that you should never test for normality. The result is a non-answer to the wrong question.
38
218
2K
@AllenDowney
Allen Downey
2 years
This semester I wrote a book, Data Structures and Information Retrieval in Python. It covers data structures and algorithms, organized around a motivating example: building a search engine. It's all in Jupyter notebooks that run on Colab. With quizzes!
15
248
1K
@AllenDowney
Allen Downey
3 years
Hey, look what I got!
Tweet media one
16
42
1K
@AllenDowney
Allen Downey
6 years
A few years ago I wrote a short a book that explains basic use of Git: It contains exercises you can do on a practice repository: Today I added a page called "Merge conflicts with minimal pain": Good luck!
7
305
995
@AllenDowney
Allen Downey
1 year
Math notation is good for a lot of things, but representing algorithms is not one of them. Fortunately, we have other formal languages that are really good at representing algorithms -- programming languages. It's a shame academic papers don't use them more often.
Tweet media one
30
69
974
@AllenDowney
Allen Downey
2 years
Modeling and Simulation in Python is now available for preorder, and Barnes and Noble has it on sale for 25% off:
Tweet media one
15
98
746
@AllenDowney
Allen Downey
4 years
The "Elements of Data Science" curriculum is now substantially complete: It is a notebook-based introduction to Data Science in Python for people with no prior experience in programming or statistics.
9
209
711
@AllenDowney
Allen Downey
4 years
I have started work on the second edition of Think Bayes. I think it will be much better! Here's the notebook for Chapter 1: There are exercises at the end if you want to play along at home.
9
128
678
@AllenDowney
Allen Downey
17 days
The 3rd Edition of Think Python is available now at The print edition is available for preorder, expected to ship in June. What's new? Jupyter notebooks, turtle graphics, doctest, unittest, regular expressions, and a new, full color, parrot on the cover!
Tweet media one
11
164
658
@AllenDowney
Allen Downey
4 months
This is interesting if true, but I'm not sure it is. I tried to replicate the US graph with GSS data and I'm not seeing it. Whether the gap is growing depends how seriously we take the last two points is a noisy series. But it's nowhere near 30 points.
Tweet media one
@jburnmurdoch
John Burn-Murdoch
4 months
NEW: an ideological divide is emerging between young men and women in many countries around the world. I think this one of the most important social trends unfolding today, and provides the answer to several puzzles.
Tweet media one
2K
15K
56K
24
114
645
@AllenDowney
Allen Downey
4 months
A recent article in the Financial Times claims that there is an increasing ideological gender gap in several countries. In this article, I replicate their analysis with GSS data and conclude that there is little evidence the gap in the US is growing.
Tweet media one
18
90
559
@AllenDowney
Allen Downey
2 years
A modest proposal: Let's stop using the term "bias-variance tradeoff". "Underfitting" and "overfitting" are clear, self-explanatory, and easy to remember and recognize. "Bias" and "variance" add nothing but confusion. And the word "bias" is already too overloaded.
31
52
553
@AllenDowney
Allen Downey
1 year
Modeling and Simulation in Python is off to the printer! And available for pre-order :) To celebrate, I published on of my favorite examples: One Queue or Two?
Tweet media one
8
62
505
@AllenDowney
Allen Downey
3 years
In summary: Python Python Python Python Python Python
@randal_olson
Randy Olson
3 years
What #programming languages are required for data jobs at @Meta . #DataScience #MachineLearning #dataviz Source:
Tweet media one
52
428
2K
9
76
494
@AllenDowney
Allen Downey
3 years
#1 New Release in Statistics...
Tweet media one
6
37
485
@AllenDowney
Allen Downey
5 years
New rule: after I explain something, if a student takes a photo of the board, I get to be in it.
Tweet media one
6
43
485
@AllenDowney
Allen Downey
3 years
I have news! After 18 years at @olincollege , I am leaving at the end of this academic year. Since I turn 55 in May, I am retiring, technically :) Not sure yet what I'll do next. I'll take some time to figure it out -- watch this space!
37
8
440
@AllenDowney
Allen Downey
2 years
Planning my Data Science class for the spring, I have 7 slots reserved to critique interesting data visualizations: what do you notice? what do you wonder? what works? what would you try changing? The NYT cochlea of COVID will be on the list. What else?
Tweet media one
45
40
432
@AllenDowney
Allen Downey
5 years
1952: Persecution 2009: Apology 2013: Royal pardon 2019: Alan Turing on the 50 pound note.
8
105
397
@AllenDowney
Allen Downey
5 years
Most of the teaching examples for classification algorithms are toy data, fake applications (looking at you, irises and Titanic datasets). Any suggestions for examples with real data, real applications, ideally in the space of data science for social good?
45
65
393
@AllenDowney
Allen Downey
3 years
@causalinf If grandfathers count: when I was 10 I got to stay up all night in the print room of the Boston Herald. I got (and still have) my name in several typefaces on metal slugs. And I got to press the big, red STOP THE PRESSES button! Which is actually big and red.
5
3
381
@AllenDowney
Allen Downey
6 years
The second edition of Modeling and Simulation in Python is done: I am printing copies for my class this fall. If you would like a hard copy, you can get one from Lulu: Cover design by Olin's own Tim Sauder.
Tweet media one
6
117
377
@AllenDowney
Allen Downey
6 years
Can someone explain why, if you write an idea in math notation, that's "theory", which provides deep understanding of the math "behind" it, but if you write the same idea in a programming language, it's just hacking? This bizarre prejudice is the bane of my professional life.
29
78
353
@AllenDowney
Allen Downey
2 years
Programming languages like Python are more readable than pseudocode, and have the additional advantage of being executable and debuggable. Pseudocode is obsolete.
@JulyanArbel
Julyan Arbel
2 years
Seek & Find: Will you spot the 3 errors hidden in this algorithm? Published paper, true story. Apart from that algorithm, I liked the paper :-)
Tweet media one
16
9
97
16
36
329
@AllenDowney
Allen Downey
1 year
We have a cover! What do you think?
Tweet media one
21
12
319
@AllenDowney
Allen Downey
3 years
I know I shouldn't always take the bait, but someone on the Internet was mean about Bayesian statistics, so I wrote a manifesto: "Bayesian and frequentist results are not the same, ever"
12
57
308
@AllenDowney
Allen Downey
2 years
In my academic career, I have taught about 80 classes for a total of ~2000 class meetings. Today at 4pm will be my last. (At least for a while)
17
3
301
@AllenDowney
Allen Downey
5 years
Got a helpful email today from a student reading my book with a screenreader. Among the suggestions for better accessibility: use "can not" rather than "can't"; with a screenreader it is hard to hear the difference between "can" and "can't". And that's an important difference!
7
43
287
@AllenDowney
Allen Downey
6 years
Why learning to program is actually getting harder, and what we can do about it:
23
132
287
@AllenDowney
Allen Downey
4 years
I've been working on a new book about SQL, Astropy, Pandas, and Matplotlib. Here's the first public draft (generated with Jupyter Book)
Tweet media one
5
52
279
@AllenDowney
Allen Downey
5 years
Think Julia will be available soon at an Amazon near you: To my surprise, O'Reilly has gone with a radically new cover design!
Tweet media one
2
49
259
@AllenDowney
Allen Downey
5 years
For my Data Science class in the spring I am compiling resources related to data visualization. What books would you recommend? Web sites? Other? I will collect suggestions and post them next week.
56
44
254
@AllenDowney
Allen Downey
1 year
Ruin this joke by explaining it? Ok! This is an instance of Berkson's paradox. If a dingy restaurant with a broken website has bad food, it won't last, so among surviving restaurants, greasy styrofoam is correlated with good food. There's a chapter about this in my book!
@DanWilbur
Dan Wilbur
1 year
I was checking to see if a burrito place near me had good reviews and the photos of the place look dingy and the food looks greasy and they serve it in styrofoam and when I clicked their website I got a 404 error. This is going to be the best food I’ve ever tasted.
6
73
1K
8
29
243
@AllenDowney
Allen Downey
8 months
The Overton Paradox in three graphs 1. Older people are more likely to say they are conservative.
Tweet media one
7
45
238
@AllenDowney
Allen Downey
3 years
I'm happy to report that the second edition of Think Bayes is available for preorder now. The red striped mullet says "order yours now!"
Tweet media one
3
31
238
@AllenDowney
Allen Downey
5 years
Scientists: if you are still writing articles primarily in the passive voice because you think journals require it, please check the style guides. Many journals, include Nature, Science, and PNAS, have been begging you to stop for years.
3
82
221
@AllenDowney
Allen Downey
4 years
I'm working on a new series of notebooks to teach probability and Bayesian statistics. The first notebook starts with a famous example of the conjunction fallacy, Tversky and Kahneman's Linda the banker. Want to play along at home?
6
44
217
@AllenDowney
Allen Downey
5 years
Today is my first day on an exciting new project. This semester I will be at Harvard one day a week, co-teaching a seminar on data science education and helping to design a new undergrad class for Spring 2020. Details at
11
20
213
@AllenDowney
Allen Downey
6 years
Hot off the press!
Tweet media one
0
33
207
@AllenDowney
Allen Downey
2 years
Anybody have a favorite introduction to SQL (book or online resource) for a student who wants to do an independent study?
44
35
205
@AllenDowney
Allen Downey
3 years
More examples of Simpson's paradox in the General Social Survey. Old people are racist, sexist, and homophobic, but it's not because they're old; it's because they were born a long time ago.
Tweet media one
7
34
207
@AllenDowney
Allen Downey
5 years
In my Complexity Science class, I mentioned the way NumPy creates "views" to avoid copying arrays. To explain the idea more clearly, I created this notebook, which you can run on Colab: It has some exercises you can work on, in case you are bored.
2
46
202
@AllenDowney
Allen Downey
3 years
The slides, video, and notebooks from my Survival Analysis tutorial are available now from
Tweet media one
4
35
202
@AllenDowney
Allen Downey
3 years
If you've been vaccinated, thank a scientist. And then thank about 100 engineers. Because inventing mRNA vaccines is science. But manufacturing billions of doses, keeping them cold, and delivering them around the world is a feat of engineering.
5
24
203
@AllenDowney
Allen Downey
3 years
I've been working on Modeling and Simulation in Python for... a while. On my 4th try, I have a version I am happy with. It's still a work in progress, but I've posted a mostly complete draft:
Tweet media one
3
37
200
@AllenDowney
Allen Downey
4 years
I don't understand why @github STILL can't reliably render a Jupyter notebook. It's been years! NBViewer is 100% reliable as far as I can tell. Why is this so hard?
13
14
195
@AllenDowney
Allen Downey
1 month
Which plot indicates a stronger relationship? Discussion here:
Tweet media one
18
25
198
@AllenDowney
Allen Downey
3 years
Metaphorical use of "growing exponentially" is growing exponentially. But only metaphorically. Literally, it's growing linearly.
Tweet media one
3
38
191
@AllenDowney
Allen Downey
3 years
Without looking at the color bar, can you tell which colors are hotter than others? Now look at the color bar... does it help?
Tweet media one
42
24
185
@AllenDowney
Allen Downey
4 years
This fall, I am taking my Data Science class on the road... ...the really long road to Ashesi University in Ghana: Classes start August 31. I can't wait!
9
17
184
@AllenDowney
Allen Downey
6 years
Every time someone invites me to Slack, I spend 15 minutes figuring out what email address to use, I connect once, and then never use it again. I'm on about 20 channels, can't log into any of them, and have no idea how or why anyone uses it. Is it terrible, or am I just old?
25
10
180
@AllenDowney
Allen Downey
3 years
New chapter in Elements of Data Science: using resampling to compute sampling distributions and confidence intervals (and understand what they mean).
Tweet media one
1
23
176
@AllenDowney
Allen Downey
4 years
I ran into a NumPy gotcha today: np.var and np.cov have inconsistent default behavior. var divides by N cov divides by N-1 IMO, both should use N, which computes a simple descriptive statistic; if you want an estimator, you have to ask for it
9
28
175
@AllenDowney
Allen Downey
4 years
Theorem: If students find a statistical concept hard, the problem is the concept, not the students. Proof by example: p-values, confidence intervals and likelihood functions are "hard to understand" because they are fundamentally broken, bad ideas.
8
25
164
@AllenDowney
Allen Downey
5 years
Last week I asked for your favorite data visualization resources. Thank you to everyone who replied. I have organized the responses (and added a few of my own):
1
41
165
@AllenDowney
Allen Downey
4 years
What's new in English version 20.20? * Singular "they" is recommended. * Ending a sentence with a preposition is allowed. * Split infinitives are allowed. * "Whom" is now deprecated. * Latin plurals are deprecated. * The passive voice in science writing is deprecated.
7
35
159
@AllenDowney
Allen Downey
4 years
Here's a quick Bayesian analysis of the results from the vaccine trial announced today: Based on some guesses about the raw data, and some modeling assumptions, it seems unlikely that the effectiveness is less than 80%.
1
37
154
@AllenDowney
Allen Downey
1 month
I was at @Google today to give a talk about Chapter 7 of Probably Overthinking It: Causation, Collision, and Confusion. I'll post the video when it's available, but in the meantime, the slides are here:
1
14
156
@AllenDowney
Allen Downey
2 years
conda is so slow it is now unusable, and now can't solve some environments it used to. mamba is fast but buggy and the documentation is not ready for prime time. Is there a good option for package management?
51
10
155
@AllenDowney
Allen Downey
10 months
There are lots of good articles explaining how LLMs work at a mechanical level. This is the best explanation I've seen of how LLMs are able to do what they do, at least as we currently understand it.
3
33
149
@AllenDowney
Allen Downey
2 years
The Think Bayes red mullet is going to India! It looks a little nervous, but I'm sure it will be fine.
Tweet media one
6
8
149
@AllenDowney
Allen Downey
8 months
3. But if you group people by decade of birth, most groups get more liberal as they get older.
Tweet media one
4
28
149
@AllenDowney
Allen Downey
3 years
The second edition of Think Bayes went to the printer today! ebook versions will be available within a week, paper copies 2-3 weeks.
8
11
147
@AllenDowney
Allen Downey
3 years
I wrote an article that uses Bayesian decision analysis to find the optimal strategy for plugging in a USB connector. It turns out there's a reason it's so common to flip twice.
Tweet media one
3
35
146
@AllenDowney
Allen Downey
5 years
My revised tutorial on Bayesian Statistics is ready to go. The slides and notebooks are here: If you are coming to @SciPyConf and you want to see it live, good seats are still available:
Tweet media one
2
43
147
@AllenDowney
Allen Downey
4 months
I am excited to announce the forthcoming third edition of Think Python! What's new? Jupyter notebooks on Colab, learning to program with ChatGPT, regular expressions, automated testing -- and turtle graphics that work in notebooks!
Tweet media one
10
18
147
@AllenDowney
Allen Downey
5 years
Does anyone know why GitHub has such a hard time rendering Jupyter notebooks? For me, it often fails several times and then works, or never works. Whereas nbviewer seems to work, quickly, 100% of the time. @github , can you borrow nbviewer's renderer?
12
17
143
@AllenDowney
Allen Downey
6 years
Reproducible Science Starter Kit: The Open Science Framework @LorenaABarba 's reproducibility manifesto Best Practices for Scientific Computing Software Carpentry; lessons learned
1
91
143
@AllenDowney
Allen Downey
4 years
If you've ever been confused about joint distributions, marginal distributions, and conditional distributions, you might like the next notebook in Bite Size Bayes: Welcome to the world of two dimensions!
2
26
141
@AllenDowney
Allen Downey
3 years
I'm teaching Complexity Science this semester, so I updated the notebooks. They run on @googlecolab , so you can run them without installing anything! Links here:
Tweet media one
3
18
142
@AllenDowney
Allen Downey
1 year
I'm starting a new job this week, as a curriculum designer at Brilliant @brilliantorg , focusing on data science and computer science. If you want to try one of their online classes, here's a freebee:
14
4
140
@AllenDowney
Allen Downey
5 years
My thoughts on this whole 10x engineer thing: Many people are effectively 0x engineers (that's "zero ex") because they are working on things that will never have positive impact. Pay less attention to 10x; focus on making sure you are not 0x.
6
23
135
@AllenDowney
Allen Downey
1 year
The video of my tutorial on Bayesian Decision Analysis, from PyData Global 2022, is available now. For links to the video, slides, and Jupyter notebook, start at
Tweet media one
2
36
137
@AllenDowney
Allen Downey
5 years
I am developing examples of survival analysis for my Data Science class. Anyone have any favorite application domains and/or datasets? @Cmrn_DP So far: 1) Literal survival probably using datasets from R 2) Time until marriage, divorce, NSFG 3) Customer conversion, dataset?
36
18
132
@AllenDowney
Allen Downey
6 years
I am moving toward treating NumPy as core Python and using all NumPy functions instead of the math module. The side effect is that Python sequences get quietly upgraded to NumPy arrays, which are usually faster and smaller.
@SciPyTip
Scientific Python
6 years
You cannot call math.cos() on a list, but you can call numpy.cos() on a list.
3
8
35
6
30
132
@AllenDowney
Allen Downey
4 years
I am working on an ebook, tentatively called Bite Size Bayes, that introduces Bayesian statistics gradually, for people with no prior stats. Here's Python notebook 2, if you want to check it out. R version coming soon!
3
34
134
@AllenDowney
Allen Downey
2 years
Replication crisis update "none of the 193 experiments were described in sufficient detail" "of the 50 experiments from 23 papers that were [repeated], effect sizes were, on average, 85 percent lower than those reported in the original experiments."
3
46
130
@AllenDowney
Allen Downey
2 years
Really nice introduction to probablistic programming languages by showing a simple Python implementation.
1
21
129
@AllenDowney
Allen Downey
4 years
Beryl Hoffman has written a Jupyter notebook that runs on Colab, pulls live COVID data, and demonstrates Pandas visualization tools. Really nice work!
0
43
130
@AllenDowney
Allen Downey
5 years
A few weeks ago I led a workshop at Harvard on "Using computation to teach everything else" The slides for the workshop are at Including my favorite provocative slides:
Tweet media one
4
40
126
@AllenDowney
Allen Downey
2 years
Just ran some older code and got a million warnings about features that have been deprecated. If you find yourself writing one of these warning messages, please, please, please include instructions for whatever the new thing is that replaces the deprecated thing.
9
11
121
@AllenDowney
Allen Downey
4 years
Mental health tip: DO NOT watch the election live like it's a sporting event. To keep you watching, TV people will make it seem like new information is streaming in. It's not. They're just making up stories about noise. If you must, check once Tuesday night. Then go to bed.
5
13
120
@AllenDowney
Allen Downey
5 years
Just used Tabula to read a data table from a PDF. Easy install; worked the first time. 10/10
5
10
117
@AllenDowney
Allen Downey
10 months
Let's practice Bayesian thinking. H: aliens D: unsupported testimony before Congress P(D|H) = high if there were aliens, someone would talk P(D|not H) = high not even the craziest thing said in Congress this week Likelihood ratio close to 1 = little or no evidence
@EdKrassen
Ed Krassenstein
10 months
Breaking: Former US Intelligence agent David Grusch, while testifying to Congress in the UFO hearings, just scared the crap out of me. The idea of UFOs and aliens isn’t frightening to me, but the idea that they could exist and that they could be trying to harm humanity is very
5K
2K
14K
4
10
112
@AllenDowney
Allen Downey
3 years
I just published an article that demonstrates my incremental process for developing and testing models in PyMC: It also explains the relationships between the four distributions of Bayesian analysis.
Tweet media one
3
29
110
@AllenDowney
Allen Downey
5 years
In recent Jupyter , it looks like the default matplotlib backend is inline, so the magic command %matplotlib inline is no longer necessary. Good: that's one less thing to explain to beginners. But can someone confirm that I can count on this behavior? @ProjectJupyter
5
10
111
@AllenDowney
Allen Downey
6 years
I just learned about a new open-source statistics book by @russpoldrack It uses resampling/bootstrap for statistical inference, so I approve :)
0
26
112
@AllenDowney
Allen Downey
3 years
I think this example was written in Pascal, translated into Java, and then translated into Python. By someone who doesn't know Python.
Tweet media one
10
12
109
@AllenDowney
Allen Downey
4 years
I'm developing an introductory data science curriculum. Would you like to play along? Try out this do-it-yourself, choose-your-own-adventure mini-project that explores the relationship between political alignment and other attitudes and beliefs.
3
25
109
@AllenDowney
Allen Downey
1 year
Exploring the effect of researcher choices on statistical results: 73 teams estimate the same effect size with the same data, and generate 1,253 different results. As always, statistical results depend on modeling decisions.
5
33
110
@AllenDowney
Allen Downey
4 years
Here's a concise, readable summary of what we know about Covid-19, from the most reliable source we've got. Highly recommended:
2
43
107
@AllenDowney
Allen Downey
4 months
We are born Gaussian, but we grow up to be lognormal. Video, slides, and Jupyter notebooks from my talk @PyDataGlobal : Extremes, outliers, and GOATS: on life in a lognormal world
Tweet media one
4
15
103
@AllenDowney
Allen Downey
3 years
New cover idea for Elements of Data Science. (The cover wraps, so the left half is the back and the right half is the front) What do we think?
Tweet media one
13
10
105
@AllenDowney
Allen Downey
4 years
Just heard from my friends @OReillyMedia that Think Bayes second edition is a go. So, let's celebrate with the notebook for Chapter 2, featuring cookies, dice, socks, and Elvis.
2
12
104
@AllenDowney
Allen Downey
5 years
Last week I asked "If you were starting a book project today, what tools would you use?" I got many excellent suggestions, which I summarized here:
4
30
102
@AllenDowney
Allen Downey
3 years
If you search for "python breadth first search", a substantial majority of the implementations you find are accidentally quadratic. Should be O(n+m), instead they are O(n^2). Here's the first hit: can you spot the performance error?
Tweet media one
5
9
103
@AllenDowney
Allen Downey
7 years
More Python, less R, less Other.
Tweet media one
6
69
95