For the past few months, I have been learning about internals of databases. I found many excellent articles on writing compilers, but I could not find many practical resources for databases. So I wrote one. CaskDB is the project I wish I had started with.
fun fact: SQLite is the most deployed and most used database. There are over one trillion (1e12) SQLite databases in active use.
It is maintained by three people. They don't allow outside contributions.
I always wondered how adblockers for YouTube worked, this post nicely explains it. Also, the cat-and-mouse chase between YouTube and adblockers. A fascinating read
here’s a fun fact: almost all modern computer systems depend on a single time zone database that gets updated when local laws change
it’s maintained by two people
Here is a fascinating story of how researchers teamed up with SQLite core developers to make it faster using Bloom filters!
Let's also dive into database internals and understand how databases implement joins.
📄Paper: SQLite - Past, Present, and Future (2022)
If you want something better than Bloom filter, here are some alternatives. Found this answer which provides research progress of probabilistic data structures
Bloom filter -> Cuckoo filter -> XOR filter
Did you know that if SQLite performs better, then the lifetime of your mobile increases?
But how?
Let's uncover that from the paper: SQL Statement Logging for Making SQLite Truly Lite. 1/n
Are you an influencer? And people are bothering you with your empty Github contributions?
Look no further, use Rockstar! Rockstar generates real looking Github contribution graph so that you can focus on your business.
See how it looks on my profile:
This is an excellent post!
@thorstenball
neatly explains why text editors don't use string rather use rope, gap buffers etc, why zed uses rope and then how they implemented rope on top of a B+Tree, called SumTree
Most of the career advice I see on this platform is not applicable to systems level software (OS, compilers, etc).
As I promised a cpl of weeks back, here's some tips from my career on how to develop your own career, including a real life story of how we hired
@iavins
:…
in 2021, Notion sharded their single 'monolithic' db into 32 physical dbs, each with 15 logical shards
Their post on process is excellent with a systematic approach to sharding, with schema design, capacity planning, and technical details of migration
🔗
I am excited to share that on September 8th, I'll be unwrapping the fascinating story of SQLite.
I am going to present the 'SQLite: Past, Present, and Future' paper at
@papersweloveblr
1/2
TIL PRQL - Pipelined Relational Query Language, pronounced “Prequel”. It compiles to SQL and claims that it makes writing complex SQL queries simple and intuitive
Paper: How does the performance of a graph database such as Neo4j compare to the performance of a relational database such as Postgres?
"Postgres outperformed Neo4j in almost every query under various settings" oof
I found this neat article which shows how to turn Postgres into a graph database. I wonder how it fares with larger workloads. Are there any benchmarks/blogs where someone replaced Postgres with Neo4j / Dgraph, etc.?
I sat with a friend to read this paper on SQLite, it took us only three hours. That is sufficient to get most of the ideas! 🚀
📄
details of the meetup in the next tweet 1/2
I am excited to share that on September 8th, I'll be unwrapping the fascinating story of SQLite.
I am going to present the 'SQLite: Past, Present, and Future' paper at
@papersweloveblr
1/2
I can’t believe it’s already been a month since I joined the fantastic team at
@tursodatabase
I will be working on databases full time now. Super excited 🚀🚀
Thank you everyone for the amazing support. Can't believe CaskDB crossed 1000 stars already! Amazed and humbled🤗
Whats CaskDB? It's an educational project which aims to teach you writing a persistent key value store from scratch
found this super cool decade old video of Solomon Hyke showing docker to the public for the first time in a five minute lightning talk at a PyCon.
and then it changed everything...
📺
It is hard, but it is not impossible. All it takes is discipline, determination and dedication. You can do it!
I locked myself in a room, and I got this contribution graph in under two minutes using rockstar -
Paper: Cuckoo filters, an alternative to Bloom filters, which improve upon in three ways
1. support for deleting items dynamically
2. better lookup performance and
3. better space efficiency for applications requiring low false positive rates(ε < 3%)
📄
I am excited to share that on September 8th, I'll be unwrapping the fascinating story of SQLite.
I am going to present the 'SQLite: Past, Present, and Future' paper at
@papersweloveblr
1/2
C is not just a bad language. It is a demonic instrument, that leaves scars on your soul that not even time can heal.
We had a small bug recently where one extension that works with SQLite stopped working with libSQL. One of our engineers,
@iavins
, spent a whole day debugging…
Yes, they did have a Christian Code of Conduct adapted from The Rule of St. Benedict. But it was controversial and they modified it after two years, to Code of Ethics
C is not just a bad language. It is a demonic instrument, that leaves scars on your soul that not even time can heal.
We had a small bug recently where one extension that works with SQLite stopped working with libSQL. One of our engineers,
@iavins
, spent a whole day debugging…
“Thread per core” arguments conflate two stories:
- I have carefully optimized my tasks to guarantee balanced work among threads and therefore can avoid the synchronization costs of work stealing
- wahh wahh wahh sharing state in a thread safe way is hard I don’t wanna
I tried to learn C earlier but couldn’t make progress. C is everywhere, from systems stuff to databases. SQLite is in C, Postgres is in C!
I am picking it up again and hoping to understand the internals. If you have any advice, do share!
I would like to understand write-amplification in B Tree vs LSM Tree. Is there any survey/research paper explaining the same? or a blog post?
I found a few articles online, but none covered all the aspects and missed nuance. e.g. batching in LSM tree, WAL in B tree etc.
For the past few months, I have been learning about internals of databases. I found many excellent articles on writing compilers, but I could not find many practical resources for databases. So I wrote one. CaskDB is the project I wish I had started with.
@iavins
If you want something that’s faster to construct than XOR / Binary Fuse Filters (e.g. if write-intensive), and simple to intuit/implement as a middle ground, plus significantly faster than Bloom Filters…
…then take a look at the Split Block Bloom Filter in Apache Impala, which…
Next paper - 📄 Looking Ahead Makes Query Plans Robust
Query optimisation is an NP-Hard problem. So, people find ways to make query execution better. In this paper, the researchers introduce Lookahead Information Passing where they use Bloom Filters for faster execution 1/3
I sat with a friend to read this paper on SQLite, it took us only three hours. That is sufficient to get most of the ideas! 🚀
📄
details of the meetup in the next tweet 1/2
I am trying out Fly's distributed systems coding challenge. But my code was so bad that it crashed Jepsen itself 💀
But hey, achievement unlocked I guess?
When Jepsen breaks, it also says sorry politely.
Those are just extremes. Due to the scaling difficulties of Postgres/MySQL, people made Citus/Vitess.
Also, many NoSQL databases today are ACID compliant. They aren’t mutually exclusive, NoSQL + ACID is perfectly fine.
Run away from system design instructors who say, "Use a NoSQL database because SQL doesn't scale" 🤦♂️
It shows the sheer immaturity and lack of practical experience of the instructor because crude generalization doesn't exist in computer science and software engineering. Always…
A few days ago,
@penberg
posted about the Hekaton MVCC paper. I hit a wall implementing it.
I tried my best to figure it out and kept doubting myself because I did not know that papers could have errors!
Later, I confirmed the error with the authors:
I enjoyed reading this post by Philip O’Toole, which goes through the journey, design and implementation of rqlite (which came out nine years ago!)
It details consensus algorithm upgrades, scaling read performance, reducing disk usage etc.
TIL Verus
> Verus is a tool for verifying the correctness of code written in Rust. Developers write specs of what their code should do, and Verus statically checks that the executable Rust code will always satisfy the specifications for all possible executions of the code
Reminder that P99 Conf is happening today and tomorrow.
They have some excellent talks lined up.
Looking forward to the talks by
@sarna_dev
,
@glcst
,
@gwenshap
, and matklad (
@TigerBeetleDB
)🚀
This is an excellent and informative post by
@boyter
on how to start a Go project in 2023
Covers initial setup, testing, profiling, linters and other tooling:
Next up in Systems Distributed premieres: join
@alanamarzoev
and TigerBeetle live in the chat as Alana shares what makes caching so hard in application development!
Monday April 17th at 4PM UTC
For the past few months, I have been learning about internals of databases. I found many excellent articles on writing compilers, but I could not find many practical resources for databases. So I wrote one. CaskDB is the project I wish I had started with.
This fantastic paper covers the historical perspective, how it fares with analytics workloads, comparisons with
@duckdb
and some super cool tricks to make SQLite faster.
Join me! - 2/2
I watched Andy's lectures on Transactions and found this crazy story of how a bitcoin exchange lost 896 BTC in a day because they used Mongo, which did not support transactions (at that time)! They shut down soon after 😲😲
Thank you to everyone who showed up, that too on the Friday eve. I had great fun discussing with you all.
I met many amazing people and had great discussions; please DM / email me to continue the conversation.
This was my first ever public talk; any feedback is welcome.
I am excited to share that on September 8th, I'll be unwrapping the fascinating story of SQLite.
I am going to present the 'SQLite: Past, Present, and Future' paper at
@papersweloveblr
1/2
I found this neat article which shows how to turn Postgres into a graph database. I wonder how it fares with larger workloads. Are there any benchmarks/blogs where someone replaced Postgres with Neo4j / Dgraph, etc.?
Language I dislike: Rust
Language I begrudgingly respect: Rust
Language I think is overrated: Rust
Language I think is underrated: Rust
Language I love: Rust
Language I hate: Rust
Language I dream of writing in: Rust
Language I dislike: Bash
Language I begrudgingly respect: COBOL
Language I think is overrated: Brainfuck
Language I think is underrated: Java
Language I like: Rust
Language I love: C
Language I dream of writing in: My own (some day)
I no longer have a job!
I was part of a big layoff at
@PlanetScale
today. I'll miss working there, it was a great experience.
Feeling bummed, kinda embarrassed, but also slightly optimistic?
Trying to figure out what's next. I'd love to hear any ideas ❤️
The book has taught me so much and I still learn something new every time. The perfect book for a database fan boi like me.
I wish more and more such books come out and databases to get same love as compilers
When I wrote Database Internals back in ~2018, my main goal was to make the field more approachable and less intimidating. Everyone should be welcome to enter, there's so much work to do here!
Over the last year, so many people have read it. I've seen at least 3 book reading…
Writing a Simple Garbage Collector in C
This seems approachable, covers implementing a memory allocator like malloc, and mark and sweep garbage collector. For simplicity, it is stop the world kind, meaning your code halt while the gc is running
🔗
The 2nd ed of The Garbage Collection Handbook: Art of Automatic Memory Management is available to preorder. I have yet to read the first edition (1996), but some people have recommended this to me
If writing a garbage collector is your jam, then get it!
While I was writing the GPU article, I had to review Little's law which comes from queueing theory. I came across an old book which explains queuing theory in the context of computer systems.
it is not too math heavy, you can read through it without too many equations. As it's…
fun fact: SQLite is the most deployed and most used database. There are over one trillion (1e12) SQLite databases in active use.
It is maintained by three people. They don't allow outside contributions.