We’re hiring for a Chief of Staff!
I’m looking for someone that will be my 2nd pair of hands and will have a front row seat on building a startup. This is a great role for people who want to eventually become a founder or COO.
Why
@artie_tech
?
-We’re building something really
I still watch most of
@ycombinator
's videos in my pastime.
Most people who have done YC or follow YC closely have heard that a lot of the advice they give is “common sense” or “simple advice”.
It’s totally true - there’s no magical advice out there - and I realize the only
Learn how
@Keep_Card
moved operational dashboards to
@SnowflakeDB
and decreased query load on their production databases!
Keep, a Canadian credit card and payments company, uses Artie to sync business-critical data from Postgres and
@MongoDB
into Snowflake in real-time.
They
We’re hiring a third founding engineer to join our team at
@artie_tech
!
We’re a YC-backed company and based in SF. We work out of our office in SOMA 5x/week.
Artie is a real-time database replication solution for databases and data warehouses. To do this, we leverage Kafka and
We had our first team offsite last week
It was not without casualties…
🦵 1 sprained ankle
🏎️ 2 flipped ATVs
🚘 1 scratched bumper
🚗 1 dented bumper (different car)
We had a lot of fun though and also discussed what we value / our company culture
So all in all a big success
I spoke on the Founders Panel at the Female Funders and Founders Summit at
@UCBerkeley
this weekend with
@linamelia
and
@ashleyz413
.
It was great to be back on campus and on commencement day of all days!
I shared my experience on how I got started, working with my
One year ago it was just the 2 of us working from home
Now we’re a team of 5 (& mini aussie) working out of a loft in SOMA
This week we’re at Pismo Beach for our first offsite!
Amazing how much can change in 1 year - so thankful for the idea, leap of faith, and founding
We’ve hit an internal revenue milestone and we’re introducing a new employee perk - free lunches! 🥳
ICYMI we’re hiring a founding engineer (apply via link in 🧵).
Day 1 at
@SnowflakeDB
Summit!
👀 A perfect day to announce that our new snapshotting methodology is GA and 5-10x faster!
We’ve been sharing this with a few beta customers the last couple of months and it is now available to all new and existing customers 🎉
If you’re looking
1/ Beyond excited to share our case study with
@RoutableHQ
!
Routable, an accounts payable software provider, uses
@artie_tech
to sync financial transactions and other production data from
#postgres
to
#redshift
in seconds.
Who else will be in SF for
@SnowflakeDB
Summit June 3-6?
I’ll be around and I’d love to finally meet some folks in-person! DM me if you’re in town. I’ll be walking around with some Artie swag 🙃
Here are a couple events I’m planning to attend - hosted by
@HightouchData
,
I’ve been interviewing candidates over the past few weeks for the Chief of Staff role at
@artie_tech
Here are some things I do that seem to resonate well
👉 Be ready to share real examples of what they would do in the role (we have a case study that gives a good overview of
Anyone that sells anything should read 🌟Gary Halbert’s Boron Letters 🌟
A friend shared it with me recently and it’s one of the best things I’ve read.
Who is Gary Halbert?
Gary Halbert was one of the best copywriters. He had the ability to turn simple words into compelling
1/ Companies may not need real-time data all of the time.
Maybe your support_ticket table only needs to be replicated in real-time during business hours (Monday to Friday 8am to 6pm). But you don’t want to keep paying for
@SnowflakeDB
ingestion costs
Storage costs on OLAP databases are easily 5-10x cheaper than OLTP databases.
This is because OLTP systems require high-speed, high-availability to handle a large number of transactions reliably. Typically they use more expensive SSDs. OLAP systems comparatively emphasize
It takes ~10 mins to set up a connector with
@artie_tech
.
There are just three simple steps.
1️⃣ Set up your database source
2️⃣ Choose tables you want to sync (and/or enable history mode - slowly changing dimension tables)
3️⃣ Set up your destination
Then, sit back, let the
Incredibly proud at how fast we ship and acknowledge customer requests
Real account of what happened last Friday 👇
2:18pm: customer requests adding data type for new columns in our schema change alerts
2:20pm: Slack message was acknowledged
2:27pm: PR shipped & example
It’s YC Demo Day Sep 6-7! I’m very excited to share a bit about
@artie_tech
live tomorrow!
Here’s a real pic of me speaking to a crowd 🙃
Come check out the S23 batch and see how much we’ve achieved.
#ycombinator
#demoday
Schema change alerts are critical for data teams.
It’s one of the many tools that can help with maintaining consistency and reliability across databases and other downstream environments.
Schema change alerts can help data teams –
👉 Prevent data inconsistency: ensure everyone
Sharing a quote from our recent case study with
@Keep_Card
❤️
“Artie has been a foundational component of building our data stack. As a data-driven fintech startup, we wanted all the benefits of fast and reliable database replication, without having to invest months of
This is my favorite blog by far.
There are a lot of generalized discussions about CDC replication, but not all CDC pipelines are built the same.
Here are some of our architectural design principles for
@artie_tech
🧵
#dataengineering
#datareplication
I’ve said this before, but one thing we feel strongly about is offering a data pipeline that doesn’t come with a long list of exceptions.
If a certain data type is supported in the source system, then we need to figure out how to replicate it over to any downstream system.
We’re very excited to launch Microsoft SQL Server as a source!
You can now stream data from Microsoft SQL Server to data warehouses/lakes and other databases in real-time ⚡
SQL Server is also one of our supported destinations, so if you have a SQL Server → SQL Server use
For founders that have never done sales before starting a company, there are several evolutions of learning founder led sales.
For me, the first lesson was to learn to be persistent and keep following up with prospects. This was very uncomfortable for me initially. It’s
I heard a funny story from one of our customers the other day.
We offer a Postgres replication slot monitor as part of our product. They tried to create a monitor with a really low threshold of 1GB so they could test to see what an alert would look like.
A couple weeks passed
Warning, you need a microscope to see this image 🔬
Learnings I got from talking to a ton of data folks? It’s really hard to do their job.
Suppose your company uses 20 different data tools (not a stretch btw), all of which are interdependent for running the analytical and
We recently upgraded to use the
#BigQuery
Storage Write API to stream data into BigQuery.
This results in a performance gain of at least 2-2.5x and also lowers customers’ BigQuery costs. The cost reduction comes from Storage Write API being cheaper than the older streaming API
Countdown to Snowflake Summit: just 5 days to go!
This year we’re not just attending, but we’ll be there as a Snowflake Technology Partner 🤩
If you’re looking for the fastest and most reliable way to get data into
@SnowflakeDB
, let’s connect!
Book a meeting with me to see how
Customer facing analytics and data products can be incredibly valuable and can really elevate customer experience.
It can be a point of differentiation against competitors and a feature to upsell enterprise customers.
A few benefits of offering customer facing analytics as part
We’re vacationing in Hawaii this week. It’s been amazing because I’ve had time to:
👉 Read a book nonstop for 3 hours
👉 Write up a draft of a blog in one sitting
Vacation is a reminder of the power of not having to context switch.
Context switching is one of the hardest things
Stumbled upon a great blog post called “Invest in Lines, Not Dots” by Mark Suster. It’s a good reminder on the importance of viewing relationships as continuous lines instead of isolated points.
I think this can be extended way beyond investing and fundraising, and into personal
@artie_tech
now supports column exclusion! 🎊
If you need to ensure certain columns don’t get replicated to downstream destinations to protect sensitive data (PII), comply with regulatory compliance, and ensure only relevant data is synced, we have you covered.
We recently realized our founding story includes making the most SF decision ever 🌉
We were telling the team how it all started and Dana (founding engineer) cracked up.
🚐 We got obsessed with van life during lockdowns in 2020. I can’t say how many hours of van life videos we
I think having data consistency should be table stakes for data pipelines. Sadly it’s not the case.
If you run the following command for a particular table in your database and data warehouse/lake, I’m curious what the row discrepancies are:
SELECT COUNT(*) FROM table_name;
I got 25/36 on this facial expression reading test. I was curious how Jessica would do, so I sent it to her. She got 36/36. She really is the social radar. (via
@markessien
)
A ton of smart people will post about key learnings and takeaways from
@SnowflakeDB
Summit
I’ll make my contribution by sharing where you can get the coolest swag 🤑
Check out
@twilio
@segment
's booth when you have some downtime, you won’t regret it
#SnowflakeSummit
Some fun data edge cases we have come across 😅
I’m very curious about the use case for negative years - if anyone can share, I’d love to hear it.
👉 Negative years
👉 Non JSON values in a JSONB column
#dataengineering
#datareplication
(1/2)
One thing we feel really strongly about is offering a data pipeline that doesn’t come with a long list of exceptions.
If a certain data type is supported in the source system, then we need to figure out how to replicate it over to any downstream system. Otherwise, it’s not
The team has been investing a lot of time to make the dashboard and onboarding flow intuitive. We learned what we had to fix by having customers share their screens and seeing where they fumble.
Over the past few months, customers fully onboarded themselves and created a
How do you choose the right Postgres table replica identity and what are the performance implications?
Postgres tables require a replica identity to be configured in order to capture `update` and `delete` operations.
Replica identity specifies the type of information written to
@SnowflakeDB
users 🔊
If you have a use case where you need low latency replication during certain times but don’t want continuous ingestion using up your Snowflake credits, check out Eco Mode!
Eco Mode allows you to get real-time data when you need it, without the ballooning
Quote from Poor Charlie’s Almanack. It made me stop and think about
@artie_tech
.
It’s an easy yes for us 🙂
We built Artie because we were frustrated by all the compromises we had to make with existing tools.
So we built the end-to-end, turnkey solution that we wanted. One
3/ Jason Hodson, Director of Data & Analytics, chose to partner with us for our low latency, data accuracy, and zero day-to-day maintenance.
Thank you for trusting us to power your database replication process 🤝
Why use an external buffer when performing data replication?
SO many benefits. Here are a few to call out:
1️⃣ Load reduction on source database. Ability to capture changes and store them temporarily outside the database, so the database doesn’t have to handle direct read
“The frequency of ingestion sets a ceiling on downstream frequency” from Joe Reis' Fundamentals of Data Engineering
This is really important for data engineers to think about when designing their stack. Ingestion is one of the first steps of data engineering responsibilities -
As a data person, you always get questions around what rows changed, when certain rows were updated or deleted, what row values looked like a month ago or over a period of time, and many more.
Answering these questions are always a time suck and sometimes the data you need is
2/
1⃣Bite the bullet and just pay up to keep data in sync all the time.
2⃣Replicate the support_table ticket every 2 hours and accept this constraint (yes you still have to run it every 2 hours outside business hours).
3⃣Use Snowflake Eco Mode to replicate data in real-time
We launched a new feature!!
💥 Schema change alerts 💥
It’s really tough being a data engineer. They’re not in control of source tables, but they are on the hook to accurately and reliably replicate tables over for downstream consumption.
There is frequently a lack of
If you have WAL-xiety (write-ahead log anxiety), keep reading.
To do CDC replication, you have to first enable logical replication on your database instance.
Prior to Postgres 16, logical replication can only be enabled for the primary database. This causes WAL-xiety for folks
2/ This reduced end-to-end latency in detecting and stopping fraud, and reduced losses from fraud.
With Artie, the team has the confidence to roll out their Instant Pay product to other geographies, allowing their customers to get paid faster.
One of the best pieces of advice I ever received is to always focus on the problem.
If it’s a big problem, break it down and think from first principles.
I think this applies everywhere but recently I have been referring back to this piece of advice for sales, marketing, and
3/ Snowflake Eco Mode is an advanced setting that allows customers to minimize time utilization and maximize resource utilization of their Snowflake virtual warehouse.
Get real-time data when you need it, without the ballooning compute costs. Read more about Eco Mode with the
Switching to SCD tables can drastically reduce storage needs by avoiding redundant data copies.
Plus, pairing them with CDC allows you to track all intraday changes, something you miss out on with daily snapshots.
I'm curious to know how many of you rely on daily snapshots for maintaining historical records of your tables.
Have you considered the benefits of using slowly changing dimension (SCD) tables, specifically type 2 or type 4? 🤔
[5/6] We encapsulate all the complexity so that a one person data team can set up Artie in minutes and help the company leverage real-time data for analytics and operational use cases.
👉 Timestamp value where the year exceeds the YYYY format (e.g. 20350) and causes downstream encoding issues
👉 JSON values that are not JSON compliant, like {"foo": "bar", "foo": "bar"}
(2/2)
[6/6] If your company has been thinking about standing up CDC/streaming architecture or you’re interested in learning more about how CDC-based replication could benefit your organization, please reach out. I’d love to chat.