» teej Profile Banner
» teej Profile
» teej

@teej_m

9,283
Followers
1,522
Following
1,049
Media
12,119
Statuses

» Working on Titan » » my friends call me teej

San Francisco, CA
Joined November 2008
Don't wanna be here? Send us removal request.
Pinned Tweet
@teej_m
» teej
2 years
I thought these were drawn exclusively for O’Reilly. My whole life is a lie.
Tweet media one
376
4K
36K
@teej_m
» teej
10 months
There’s only one 𝕏 that I respect
Tweet media one
20
1K
5K
@teej_m
» teej
3 years
“We have big data, can you help us?” The data:
Tweet media one
44
588
4K
@teej_m
» teej
4 months
@chrisalbon Every single software company CFO saw what happened with Twitter (90% labor cost reduction) and started asking hard questions of leadership. Also section 174.
18
16
995
@teej_m
» teej
10 months
@anandc Can we not glorify CEOs committing crimes
6
15
946
@teej_m
» teej
4 years
Building a data pipeline in 2020 is like building a bridge in the 14th century • You do a lot of work that gets thrown away • Half the job is getting rid of the stuff you dont want • The folks who started it are dead by the time it's done
12
153
924
@teej_m
» teej
1 year
I have done hundreds of analytical SQL coding interviews. My standards are higher than most. Here’s what you need to know to pass it.
@thebmbennett
b bennett | 500+ connections
1 year
i’m coaching someone through a data analyst job search, and when i asked if they knew sql their response was “i dont need to learn it bc it’s easy to pick up on the job” sql is deceptively complex. easy to pick up. difficult to master. please take the time to learn sql
15
17
335
14
99
856
@teej_m
» teej
2 years
I can confirm @chrisalbon ’s book cover animal is from the 1885 tome Animate Creation by Rev. J. G. Wood I do not have matching bottle of wine
3
9
518
@teej_m
» teej
9 months
I have a teeny little announcement: I've worked with cloud data warehouses for 9 years. And for 9 years, I've felt frustrated at the dev experience. To fix it, I'm working on a new open source project. I'm calling it Titan. Let me give you a sneak peak at what Titan is – 1/
19
52
476
@teej_m
» teej
2 years
This is the story of how I independently landed on the same ideas that make up dbt today. I joined a DTC startup in 2013. This is how their data infrastructure looked shortly before I inherited.
Tweet media one
15
53
437
@teej_m
» teej
3 years
Snowflake released MATCH_RECOGNIZE. I'd never heard of it before. Say you run an ecomm store and want to analyze the shopping funnel. To model the steps in SQL you either need a messy set of JOINs or a daisy-chain of window functions. Or you write 15 lines of MATCH_RECOGNIZE:
Tweet media one
21
63
425
@teej_m
» teej
2 months
Hey friends, I have news to share! I started a company, it's called Titan Systems. Titan builds security software for Snowflake, starting with access management: users, roles, and permissions. I joined the Y Combinator Winter 2024 batch to help me bring this idea to market.
Tweet media one
36
26
395
@teej_m
» teej
8 months
@MichaelFilipiuk @soychotic It allows Walgreens to sell ads. They get to charge more for a 6ft tall video ad than just for shelf placement alone.
3
1
351
@teej_m
» teej
2 years
Databases like Postgres use indexes. If we query a single row in a table, adding an index makes it fast. Snowflake doesn't have indexes. But there are other features that help speed queries up: 1/ Pruning 2/ Clustering 3/ Query Plan Rewriting Let's look at each.
@bernhardsson
Erik Bernhardsson
2 years
@yonidavidson Isn't Snowflake's approach just "use full table scans for everything"?
4
0
10
7
41
317
@teej_m
» teej
2 years
I don't know who needs to hear this – Negotiating job offers is one of the most important professional skills you can learn. I don't know how to teach it, but I at least want to share what it can look like. I sent this exact email to negotiate a prior offer.
Tweet media one
11
18
311
@teej_m
» teej
2 years
Congrats! You decided to roll your own data pipeline from Postgres to a cloud data warehouse! I'm proud of you, I think cloud data warehouses are very cool. Here's every question you'll probably need to answer to build this pipeline –
8
33
293
@teej_m
» teej
2 years
Tweet media one
0
9
287
@teej_m
» teej
1 year
Bunch of thoughts on data I want to write down so I can start next year fresh. Let’s go.
7
44
280
@teej_m
» teej
2 years
For analytical SQL writers, leetcode SQL questions are way too easy. The most popular hards are solved with row_number, self join, or count-case-when. Beginner stuff. Give me your hardest SQL interview questions. I want to suffer.
35
19
279
@teej_m
» teej
4 months
SQL
@ccccjjjjeeee
Christopher Ehrlich
4 months
Which technology is this?
Tweet media one
892
71
1K
3
18
261
@teej_m
» teej
2 years
I once applied for Head Of Analytics at a startup. I was dismissed quickly, they wanted a CS PhD to solve the bin packing problem. It turns out — they had zero diligence with discounts, spent all their cash on bad user acq., and drowned in retention problems. Layoffs ensued.
6
25
259
@teej_m
» teej
9 months
@paul_rietschka The pandas api is bad
30
2
241
@teej_m
» teej
2 months
@chrisalbon @cursor_ai Ask and ye shall receive - how I use Cursor's codegen
18
15
241
@teej_m
» teej
2 months
Every “business person” who doesn’t understand data tomorrow: Wow, February sales are up 3.6% year over year!
6
15
240
@teej_m
» teej
3 years
I’m barely 15 min into CMU’s advanced DB course, this is wild • He is lecturing from Amsterdam’s red light district!! • It looks like it’s night but he doesn’t have a hotel • The head TA is called as “vicious AF” and is a reformed gang member
Tweet media one
12
17
240
@teej_m
» teej
3 years
@TechEmails “If the plan is clear, no meeting is needed” 😍😍
1
5
229
@teej_m
» teej
5 years
I won’t stop beating this drum - SQL is a critical skill and an important programming language.
8
52
223
@teej_m
» teej
9 months
I didn't have pandas coming for me on my 2023 bingo card but here we are
@pandas_dev
pandas
9 months
We are open source, make it better. ;)
308
2K
27K
10
5
215
@teej_m
» teej
1 year
Professional news - I'm on sabbatical! I plan to unwind, go outside, and think about the impact I want with the next 10 years of my career.
20
0
216
@teej_m
» teej
2 years
If you didn't make it to Snowflake Summit, you missed my talk on funnels! "Funnel analysis" is an old marketing term adopted by web & app analytics to measure cohort behavior, onboarding dropoff, and navigation paths. Measuring funnels in SQL is slow and painful. Let's fix that
Tweet media one
5
18
213
@teej_m
» teej
4 months
@ClearHueBlue @chrisalbon When have ✨consequences✨ ever stopped a public company executive from making a bad decision?
1
6
205
@teej_m
» teej
1 year
If you work in analytics, I want you to do this today – Go to your boss, tell them you need to expense $90. Tell them it will save you 10 hours of work in the next month, but $90 pays for the whole year. Here's a few of my top hits from Olga's substack.
@OlgaBerezovsky
Olga Berezovsky
1 year
When you go to a little fun data event and run into founders and friends, NY analysts-bloggers (the big one), and meet the best people. I love SF❤️
2
0
19
3
13
205
@teej_m
» teej
4 months
If you report week-over-week metrics to me on the 1st week of January, you're fired. Right to jail, right away.
Tweet media one
12
10
199
@teej_m
» teej
1 year
A physical "timestamp"
Tweet media one
7
21
179
@teej_m
» teej
8 months
Ok I don’t know why I’m so fired up about this, but – If you run a data science team and your top priority is “automating” the weekly business review, you are not gonna make it. Very negative signal. Your job is not schlep reduction. Your job is business maximization.
@luke_metro
Luke Metro
8 months
After watching friends at FAANG I must admit that management consultants were right about everything. Designing a promo system and org chart aligned with biz needs may be the most important thing that a company does. Principal-agent problems destroy lives. Becoming McKinseypilled
43
122
2K
16
10
164
@teej_m
» teej
8 months
@nicko_md @powerbottomdad1 Europeans referring to where someone was born is so typically European
6
1
159
@teej_m
» teej
2 years
For the last time Frank, you cannot compare this timestamp field across planets unless you’ve adjusted for time dilation. I know the sales numbers don’t add up! I told you before we need ACCURATE position data. I’ve asked the ops team three times to replace the atomic clocks…
@siliconion
CC🥥
2 years
Imagine dealing with time zones between earth and mars. Glad I'll be dead by then
7
4
82
1
7
163
@teej_m
» teej
4 years
I've done this at 5 different co's now. Here's what I do – - Data team reports the weekly metrics - Split into A) notable trends, B) watching close, and C) business as usual - 95% of metrics fall into C and are not discussed I've earned the authority to say "wait and see"
@andrewchen
andrew chen
4 years
"Why did metric X go down last week?" <-- very common question that you might find yourself asking. Or your ceo/board asks you. It's frustrating to not know. But the reality is that there's often random spikes of usage, seasonality, and other external factors
13
28
326
3
11
161
@teej_m
» teej
2 years
In Vegas, Snowflake announced their big new thing: Snowpark for Python. There's just one problem. No one knows what the heck it actually is. new blog, who dis Let's talk about the 5 new things introduced with Snowpark.
Tweet media one
5
32
159
@teej_m
» teej
9 months
Aww, apology accepted. Look, we’re on the same team, at least we’re not writing JavaScript.
@pandas_dev
pandas
9 months
We'd like to apologize to @teej_m . This tweet was never meant as a personal attack, or a "fix it yourself" as many users understood. We welcome all criticism and work with the community every day to make the API better, constrained by being as backward compatible as possible.
31
18
733
4
4
155
@teej_m
» teej
2 years
Great question from Locally Optimistic Slack: Should I split my team of data scientists into those who work on forward-looking insights and others who handle day-to-day adhoc requests? No. The whole team must do both. Let's talk about why.
5
24
155
@teej_m
» teej
3 months
“The data is clean, don’t worry” The data –
@TylerGlaiel
Tyler Glaiel
3 months
Tweet media one
224
3K
21K
7
14
148
@teej_m
» teej
9 months
@pandas_dev I tried to fix the most popular page of Pandas documentation so the example was actually usable and yall closed the PR. Lots of people told me they were confused by this.
@teej_m
» teej
2 years
They closed my PR. Pandas is such an important library. If they can’t take small changes I don’t know how they expect to improve the documentation long term. I won’t waste my time with it.
6
1
52
2
3
141
@teej_m
» teej
2 years
dbt v1.3 will include Python models. You can now write models using dataframe syntax that’s more familiar to those who use pandas. A major objection from @josh_wills - switching from SQL to Python is too much work to be worth it. Well … I may have solved that problem.
@josh_wills
Josh Wills
2 years
@teej_m the expression on the right is gorgeous, but I feel like the amount of boilerplate and rewriting required to get to it from my starting SQL query would preclude me from doing the work in all but the most extreme cases
2
1
7
9
16
141
@teej_m
» teej
2 years
I have committed the cardinal sin of programming. I wrote a package manager. Please forgive me for what I've done.
Tweet media one
4
6
141
@teej_m
» teej
4 months
@LIM49Spartan Wouldn't this presumably have shown up in a simple audit of the hypergolics supply? If you ship enough PBVs with water in them, surely you have too much expensive gas left over. The mistake had to be known sooner by somebody.
7
2
141
@teej_m
» teej
2 years
DuckDB is a new embedded database (like SQLite) that's built for analytics. Their goal is no external dependencies. That makes it simple to drop-in to all sorts of places, like a mobile app, a Jupyter notebook, or your smart toaster. That effort is paying off, check this out -
@hamiltonulmer
Hamilton Ulmer
2 years
excited to share an alpha version of @RillData Developer, a reactive EDA-centric SQL tool for exploring and transforming datasets. Something I've always wanted as a data person :) Powered by @duckdb and @sveltejs ✨ try it & let me know what you think!
15
47
331
3
23
141
@teej_m
» teej
7 months
Cool paper out of Stanford/Airbnb – 2-sided marketplaces are a pain to A/B test. It's impossible to stop test cells from interacting. That causes interference and adds bias. With a neat new design, you can reduce bias with very little effort.
Tweet media one
3
11
140
@teej_m
» teej
1 year
Data people – What was the last technical book you found useful?
51
10
139
@teej_m
» teej
2 years
Did you know that queries in Snowflake are translated to a streaming data pipeline? You might have guessed it if you've ever looked at the query profile. All optimization starts at the query profile. It's so much more than an explain plan, take a look:
Tweet media one
5
13
137
@teej_m
» teej
1 year
Always cast source timestamps to UTC. That is your canonical value. Then create 3 other columns: UTC date, biz timezone time, biz timezone date. Add suffixes with the grain and tz. No to tz means biz tz. So 4 cols: created_at_utc created_at_date_utc created_at created_at_date
@adkravetz
Alex Kravetz
1 year
@teej_m @HamelHusain @sarahcat21 Can you say more about the timestamp conventions? Running into exactly these sort of issues and want to get ahead.
0
0
3
6
13
137
@teej_m
» teej
5 months
Your data pipeline when someone upstream drops a column
@crazyclipsonly
Crazy Clips
5 months
Fire destroys aluminum plant in seconds 😳
732
2K
25K
2
16
135
@teej_m
» teej
1 year
If you ever want self serve to work, you need to make a grocery store. You can’t give folks directions to the farm to pick their own produce.
4
11
136
@teej_m
» teej
4 years
Every analytics tool on the planet needs this. ALL OF THEM. This is a huge time saver. Analyst productivity = business productivity. I want tools that make me and my team super productive. I get irritated that no one seems to care about the humans using these tools.
@patrickc
Patrick Collison
4 years
The new @Stripe Dashboard analytics now make it super easy to perform all the usual intertemporal comparisons in a few clicks, automatically surfacing past periods of equal length. Much less fiddling with calendar/date widgets required!
Tweet media one
12
34
507
3
9
132
@teej_m
» teej
9 months
@pandas_dev Aww, apology accepted. Look, we’re on the same team, at least we’re not writing JavaScript.
8
3
128
@teej_m
» teej
3 years
SQL 201 - the foundational skills for OLAP Syllabus - CTEs - Self join - Set difference - Basic windows - Unions
4
12
130
@teej_m
» teej
4 years
I’ve been coy before but screw it – I work on Twitch-scale live streaming video. I get to solve problems that have never been solved before. The catch: it’s online sex work The upside: My job helps make the worlds oldest profession safer than ever And I’m hiring a data eng
6
31
128
@teej_m
» teej
3 years
@FrankARinaldi1 @david_perell The premise is bogus. These subjects are all 10th/11th grade classes. Average Harvard applicant in 2020 knows way more than this.
1
0
125
@teej_m
» teej
3 years
You will never reproduce Google Analytics numbers from your own log data. Stop trying to do this. Just don’t do it. Your engineers may convince themselves they can. They cannot.
8
13
121
@teej_m
» teej
2 years
Snowflake announced a new table type - iceberg table. What makes this different than an external table is that it’s read/write, with all the data & metadata still stored in your S3. They demoed a transaction deleting data from both an iceberg and normal table at once.
9
10
121
@teej_m
» teej
2 years
Let’s talk about actual cost issues with Snowflake and how they could be fixed. There’s 3 categories that matter: - Cost allocation - Cost optimization - Clustering & Partitioning Let’s dig in
@jthandy
Tristan Handy
2 years
[1/N] I screwed up over the weekend with one particular link in my most recent Roundup. I want to take a second to own that I did a poor job using my platform and apologize.
6
3
67
8
13
119
@teej_m
» teej
2 years
What’s good in streaming data these days? Spark/Flink/??? Planning out some streaming compute, mostly windowed aggregates. Looking for diverse perspectives and advice. We value: - low maintenance - in Python/JS/SQL - in GCP nice to have Designing for 200k events/min.
34
7
113
@teej_m
» teej
2 years
Data work is glue work. And an important part of that work is the emotional regulation of the company as a whole. Numbers are not enough. The data team can be a voice to say – yes there's uncertainty, but waiting for more data isn't going to change the decision. This is hard.
6
7
116
@teej_m
» teej
9 months
What makes a data analyst great? One skill I think is important that I want to talk about – thinking in distributions aka “just look at the histogram already”. Distributions are so much more informative than just the average.
3
5
111
@teej_m
» teej
2 years
Great question from the @LocalOptimistic slack - how do you stay technical as a manager? Does it matter? My answer - I constantly worry about not being technical enough An unorganized thread for my unorganized system.
3
10
111
@teej_m
» teej
1 year
I still need to do a write up on the $5k Snowflake query my team ran in the fall.
@FredKSchott
fks
1 year
Just accidentally spent $300 on a single BigQuery query AMA
Tweet media one
165
132
4K
10
4
112
@teej_m
» teej
1 year
Most important: you need to learn how to do a technical interview. It’s not a quiz, it’s an audition. Silent and technically correct is a failure. Talk, a lot. Ask questions. Walk me through the problem and how you want to solve it. I need to know we can do the work together.
6
3
108
@teej_m
» teej
2 years
Next week at Snowflake Summit, I'm giving a talk on funnel analysis. I just released the code companion to the talk () There's 2 things I want you to show you today –
4
12
107
@teej_m
» teej
2 years
I don’t trust any data system that doesn’t support joins.
12
2
106
@teej_m
» teej
5 years
I used to think endianess, the way computers put bits of bytes together, was some nonsense word. It turns out it’s based on Gulliver’s Travels. Two factions feud over which end to break the hard-boiled egg, the big-endians vs the small-endians.
3
31
104
@teej_m
» teej
4 years
3
15
92
@teej_m
» teej
1 year
Ignore the AI hype. I’ve tried it all, it’s not that great. But you know what’s fun as hell? Midjourney. Best $10 subscription of my life. I would cancel Netflix for this. I only keep Discord on my phone to send dumb prompts to Midjourney.
Tweet media one
Tweet media two
6
2
106
@teej_m
» teej
1 year
@IgorBrigadir all these prompts will be lost in time, like tears in the rain
0
2
104
@teej_m
» teej
5 months
If you want to write documentation that gets used, you need to 🛑STOP🛑 writing like you're Wikipedia It feels good to put your cute little facts into a tidy taxonomy, but that doesn't help you scale knowledge. Instead, organize around: personas, decisions, and actions
6
12
103
@teej_m
» teej
1 year
I've built data transformations in the warehouse for 7+ years. The development environment - if you can even call it that - has always been a mess. SQLMesh is the thing I wish I had on day 1. Look at all this cool stuff that it does out of the box.
@Captaintobs
Toby Mao
1 year
I love how DevOps enables software engineers to safely ship code faster. As a data engineer, I think we deserve something similar: the DataOps revolution. Here’s a 🧵 on why I made SQLMesh, an open source DataOps framework, which launches today
6
25
163
7
7
102
@teej_m
» teej
2 years
Just spent an hour debugging a SQL issue. It turns out I completely forgot that you can’t concatenate strings with + SQL was a mistake folks, pack it up, we’re done here
10
1
100
@teej_m
» teej
2 years
What is the actual problem that a "data lineage tool" solves? If you say data mesh, you're fired.
65
4
97
@teej_m
» teej
1 year
Notebooks are a workshop. Production systems are the factory. Not everything needs to be put into production. Not everything should be a notebook. You need both. Lean in to the strength of each. When do you hand-mill vs injection mold?
2
3
98
@teej_m
» teej
2 years
Never been happier about a chart in my life
Tweet media one
1
6
93
@teej_m
» teej
3 years
@TechEmails Yesterday I liked your tweet (old fashioned, I still use twitter, smile)
1
0
96
@teej_m
» teej
2 years
One day, you too will struggle to find your birth year in the drop down.
5
10
95
@teej_m
» teej
2 years
Snowflake + dbt 1.1.0 performance trick. This has no business working so well, but it does. For large, incremental models with a surrogate unique_key - add one of your surrogate columns to the unique_key config. That's it. This change cut my MERGE times nearly in half:
Tweet media one
2
9
97
@teej_m
» teej
1 year
BI (charts with pickers) feels further from solved than ever. Looker is great at 1-2 things but otherwise awful. All the startups are dead. Tableau will never go away. Everything else sucks, nothing is 10x better.
11
4
93
@teej_m
» teej
3 years
In an interview, I won’t ask you what a p-value is. I don’t know what it is. Not well enough to tell a statistician, at least. I can ride a bike safely even though I have no idea how they stay upright (voodoo). Instead we’ll discuss how you make decisions under uncertainty.
5
7
91
@teej_m
» teej
3 years
Learning to drive decisions quickly, a bias to action, is a critical competency for an analyst. Every skill you learn – communication, storytelling, experimentation, metrics design, causal inference – supports this. Here are a few tactics I use to speed up time-to-decision.
@bennstancil
Benn Stancil
3 years
it's friday, and this one might be a real fight: after talking with @borisjabes , i now believe that analysts should measure themselves entirely on how quickly they can convince other people to make a decision.
20
15
95
4
17
92
@teej_m
» teej
2 months
As a representative of Data Twitter, I hereby declare a moratorium on sankey diagrams
8
2
92
@teej_m
» teej
2 years
Tweet media one
3
2
89
@teej_m
» teej
3 years
When I deny a request for data, it's often when a PM is frozen in analysis paralysis. Data teams support decisions. That means you need to decide. Earning the authority to deny requests is one of the most important factors to running a world-class data team.
@seanrose
Sean Rose
3 years
One thing I see a lot of product orgs struggle with is the simple ownership of the fact that they’re making choices. There’s never any amount of data, research, backlogs, etc. that is going to decide anything for you.
3
18
195
5
7
88
@teej_m
» teej
23 days
My YC W24 journey in 3 photos – Giving my unconference talk at the start of the batch (IYKYK) My classic YC sign photo, taken before the @paulg @jesslivingston talk Learning from @daltonc on how to stay sane and not die, on the last day of YC
Tweet media one
Tweet media two
Tweet media three
5
2
88
@teej_m
» teej
2 years
I’m late to this party but R + dplyr seems great. I haven’t written R in bit but the tidyverse syntax clicked pretty fast. I like piping. I want something in Python that’s halfway between dbplyr and Malloy. Im cooking up something dumb, per usual.
16
1
88
@teej_m
» teej
2 years
This is why so much tension exists in the DS role. Analytics is considered ops - a service org of dashboard jockeys. The data bitches with no levers on growth. But Data Science is “hard”, so you hire a team of PhDs to solve problems the business doesn’t actually have.
2
4
83
@teej_m
» teej
2 years
Snowflake summit thread list:
2
9
86
@teej_m
» teej
2 years
I got a preview of something new and I'm so excited about it. Lets talk about ✨event analytics✨ I have over a trillion events in my db. Just getting them in is a pain. Analyzing them is harder. I can count click of a button, but WHY do people click at all? Why did they stop?
5
6
86
@teej_m
» teej
2 years
I’ve sat on board meetings, set company OKRs, gone to the leadership off-sites, done annual planning - all the “in the room where it happens” work. None of that gives me the feeling of influence I get from experimentation. Sitting inside the event loop of a company is powerful.
5
12
86
@teej_m
» teej
16 days
Day 1 at the Titan office
Tweet media one
5
1
85
@teej_m
» teej
2 years
Yes, I basically built crappy versions of airflow, dbt, mode, fivetran, retool, eppo, & census/hightouch. I am tired of yall stealing my ideas. Just ask next time, I'll give them to you for free.
@teej_m
» teej
2 years
As we enriched our data, demand grew to use it. We built connectors to push data to google sheets, email marketing tools, on-site personalization, Salesforce, and more. For better or worse, SQL was the core of our stack.
Tweet media one
3
1
31
2
0
85
@teej_m
» teej
7 months
Pro tip: Python's standard library has methods to produce random numbers, you don't need numpy.
@fed_speak
fed_speak
7 months
They used a legit random number generator lmao send them all to jail forever
Tweet media one
56
372
4K
6
4
85
@teej_m
» teej
4 months
@xriskology @LeboldJacob Check your community notes lol
1
0
83