📝 Public change event streams (think
#Debezium
) should be evolved with forward compatibility in mind. Find out why, and how to do so with the help of data contracts created via
#ApacheFlink
, in my latest blog post 👇.
"Change Data Capture Breaks Encapsulation". Does it, though? 🤔
New post on the blog: Learn how to create data contracts for your change event streams and evolve them with compatibility in mind. By
@gunnarmorling
.
What to focus on during a code review? Don't waste your time with automatable formalities like code style. Rather spend your review budget on those aspects which will be hard/expensive to change later on. The "Code Review Pyramid" provides some guidance on what to look for.
"Good code documents itself" is one of the most damaging takes in software engineering. Code itself won't tell you about the decisions and rationale behind it, nor about about higher-level structures and abstractions. All this needs elaboration in documentation.
📢 "The One Billion Row Challenge"
How fast can YOU aggregate 1B rows using modern
#Java
? Grab your threads, flex your SIMD, and kick off 2024 true coder style by joining this friendly little competition. Submissions accepted until Jan 31.
👉
⏱️ Just ten more days until the release of
@java
17, the next version with long-term support! To shorten the waiting time a bit, I'll do one tweet per day on a cool feature added since 11 (previous LTS), introducing just some of the changes making worth the upgrade. Let's go 🚀!
🗣️ "Distill years of Java experience down to a set of best practices that help developers build high-quality Java applications and libraries"
Lots of good advice for
#Java
developers on this site by
@JonathanGiles
👍.
"Google Best Practices for
@java
Libraries"
Lots of great advice in these docs for library authors, e.g. to have a well-defined minimal public interface, avoid split packages, and much more. That's a really useful resource 👍!
"Why Kafka Is so Fast"
Sequential I/O, zero-copy, async disk flushes, and other design choices of
#ApacheKafka
, very well explained in this post by Emil Koutanov (
@IsTheArchitect
). Excellent read if you want to dive a bit into the details of Kafka 👍.
If you have a
#Maven
parent POM for your org or project, here's an enforcer rule to put into it which will ban any current of future usage of vulnerable
#log4j2
versions.
📢 Some personal news: after ten years, it's my last week at
@RedHat
!
Feeling blessed to have had the opportunity to work with and learn from this world-class team, help to build amazing open-source projects and communities, and travel the world. Thanks for everything!
If there's one thing I dislike about our industry, it's its tendency to dogmatism: "100% TDD", "ORM sucks!", "That's not truly RESTful", etc. Reality is always nuanced, and believing into simple "truths" won't get you very far.
I wish more library maintainers would follow a (close to) zero dependencies policy. Libs should never depend on stuff like Guava, kotlin-stdlib, or logger implementations. Yes, it means less comfort for yourself, but your users will be grateful.
"pgroll: zero-downtime, reversible, schema migrations for
#Postgres
"
Wow, very refreshing take on schema updates: using updatable views to expose old and new schema in parallel, until all app instances have been migrated. Great idea!
"SQLite: Past, Present, and Future"
Excellent paper by Kevin P. Gaffney, D. Richard Hipp et al., touching on history and design and
#SQLite
, some perf benchmarks, its suitability for OLAP workloads, etc.
"Distributed transaction patterns for microservices compared"
Excellent in-depth write-up on approaches for coordinating data changes across multiple services, also touching on the outbox pattern with
#Debezium
, by
@bibryam
. Great stuff!
Splitting a monolith into microservices with their own data stores means giving up on foreign keys for ensuring referential integrity. Readworthy post on this problem and possible solution strategies, by
@stephennimmo
.
When I'm interrupted during some coding work (lunch break, etc), I usually add some quick code which won't compile. I found that's the quickest reminder for where I left off when getting back. Anyone else doing that?
As a software engineer, there's no way to understand 100% of everything; you need to identify key areas where you build up deep expertise while having good enough knowledge of the rest. Being conscious about this becomes only ever more important as you progress in your career.
🗣️ "There’s no magic behind CQRS and Event Sourcing. Before starting your journey it is crucial to understand the many impacts of the two patterns"
Nice read about employing ES and CQRS in the domain of air traffic management, by
@teivah
.
👇 And here's another one which should be on every data engineer's bookshelf. While I may be biased a teeny-tiny bit, I highly recommend it to folks looking for a pragmatic hands-on guide on building realtime analytics systems, with tech like Kafka, Debezium, and Pinot.
Not buying too many IT books on paper these days, but when I do, it is about building data systems 🤓! It's also one of the benefits here at
#Decodable
: anyone has their generous yearly betterment budget, so you just can buy things that help doing the job, no questions asked 💯.
"Real-time Data Infrastructure at Uber"
Apache Kafka for streaming storage, Flink for stream processing, Pinot for OLAP, HDFS for archival storage, Presto for interactive queries -- Nice rundown of
@UberEng
's data infra, by Yupeng Fu and
@ChinmaySoman
.
🧵 "How does Apache Flink compare to Kafka Streams?"
Both do stream processing, but differ in some important aspects. A few folks asked me about this recently, so I thought I'd share some thoughts. This is from a user's perspective, not touching on implementation details. 1/10
🗣️ "This article focuses on how we leveraged open source technology to build Uber's first 'near real-time' exactly-once events processing system"
Nice write-up about
@ApacheFlink
,
@apachekafka
, and
@ApachePinot
at work at
@UberEng
👍.
🎉 Living the stream -- I am thrilled to share that I've joined
@Decodableco
as a software engineer! The world is real-time, and so should be your data. Beyond excited to help building the real-time data platform for everyone 🚀!
📝 Blogged: "The JDK Flight Recorder File Format"
Spent some time with a debugger and a hex editor for exploring the format of
#JFR
recordings, and how it efficiently stores large numbers of events.
#Java
(link to SVG version of the image inside)
"6 Event-Driven Architecture Patterns"
A two-part series by
@NSilnitsky
about some patterns that proved useful for building robust distributed systems at
@WixEng
.
📢 Thrilled to share my article "Saga Orchestration for
#Microservices
Using the Outbox Pattern" is up on
@InfoQ
!
Discussing how to implement Sagas safe and reliably using change data capture and
#Debezium
, with services connected via
#ApacheKafka
.
Before trash-talking someone else's code, always keep in mind they probably tried to do the best they could given their specific knowledge and context of the situation back then.
Got asked how stream processing platforms (e.g. Apache Flink, Kafka Streams, Spark Structured Streaming) compare to streaming databases (e.g. RisingWave, Materialize, PranaDB). There's some overlap and similarities, but also differences. Here's some aspects which may help 1/10
Quick 🧵 on what's "Head-of-Line Blocking" in
@apachekafka
, why it is a problem, and what some mitigation strategies are.
Context: Records in Kafka are written to topic partitions, which are read sequentially by consumers. To parallelize processing, consumers can be organized in
🔟 Ambigous null pointer exceptions were a true annoyance in the past. Not a problem any longer since Java 14: Helpful NPEs (JEP 358, ) now exactly show which variable is null. A very nice improvement to
#OpenJDK
, previously available only in SAP's JVM.
🧵 If you run
@apachekafka
in production, creating clusters, topics, connectors etc. by hand is tedious and error-prone. Better rely on declarative configuration which you put into revision control and apply in an automated way,
#GitOps
-style. Some tools which help with that:
Whenever I see folks complaining about the
#Java
language copying things that "other languages had years ago", I think there's a misunderstanding. Picking up approaches that worked elsewhere (and leaving out those that didn't!) is a core idea and key part to Java's success.
#Java
coding tip: when you catch an exception, either log it, *or* re-throw it, but don't do both at once. Only the final handler of an exception (ultimately, the uncaught exception handler) should log it, in order to avoid duplicated log entries.
Woot, the latest
@OpenJDK
14 early-access build brings the long-awaited
@java
Records (JEP 359) as a preview feature 🎉!
Couldn't wait to give it a try, and first impression is great. Liking the compact constructor style.
📢
#1BRC
—The Results Are In!
Oh what a ride it was. A big congrats to the winners
🥇
@thomaswue
/
@quananh1999
/
@TheMukel
🥈 Artsiom Korzun
🥉
@jerrinot
and everyone else on the leaderboard! What a great experience. Complete results and some stats:
👉
"How to (and how not to) design
#REST
APIs"
Some of these rules are probably more agreeable than others (e.g. not quite buying into using strings for all ids), but overall a very reasonable set of guidelines, by
@jeffschnitzer
.
When looking for ways for documenting your software architecture, check out the excellent arc42 template by Gernot Starke and Peter Hruschka; licensed under CC BY-SA 4.0, it touches on many critical aspects like general goals and constraints, ADRs, etc.
A fair share of software design is professional procrastination: keeping options on the table, doors open for as long as you can, in order to avoid locking yourself into a corner by committing too early to abstractions which turn out poor later on.
"Understanding Request Latency with Profiling"
Absolute must-read post by
@richardstartin
about the differences of CPU vs. wall time profiling, applying both to find the performance bottleneck in a practical example.
String templates in
@java
21 (JEP 430) are going to be a game changer. Not only can you embed any kind of Java expression within templates, you also can define your own template processors, e.g. to return a JSON object node or a query object. Noice!
Regular reminder: when it comes to persistent state, think about scaling up before scaling out. Set up an RDBMS on a decently sized machine, have someone make sure the queries you run aren't too bad, and you'll get quite far without the headaches of distributed state.
📢 Blogged: "Getting Started With Java Development in 2023"
An opinionated guide for folks new to
#Java
, with recommendations on which version to use, what's the right build tool and IDE, etc. Happy Java programming 🚀!
"How to Share Data Between Microservices on High Scale"
Nice overview on microservices data exchange approaches by Shiran Metsuyanim of
@fiverr
, also discussing how publishing events via Kafka decouples their services.
Most
@java
record examples center around their usage as named tuples with multiple attributes. But records are also great f. representing domain-specific types with just one value. Always guaranteed to be valid, more type-safe application logic, easy identification of any usages.
Not buying too many IT books on paper these days, but when I do, it is about building data systems 🤓! It's also one of the benefits here at
#Decodable
: anyone has their generous yearly betterment budget, so you just can buy things that help doing the job, no questions asked 💯.
Avoid calling Thread::sleep() in tests to account for some async activity as much as you can. It's brittle and needlessly extends duration of tests. Rather use tools like Awaitility () to explicitly wait until some specific condition has been met.
Hey conference organizers: could you please *always* have unique descriptive URLs for each talk on the agenda?! No JavaScript, no pop-ups, no UUIDs. Let folks share links to the talks they're excited about 🙏!
🧑⚕️ Upgrading your
@java
version is like going to the dentist: skip for too long, and the next encounter may be painful and unpleasant. Do it every six months and you'll hardly notice.
"How Query Engines Work"
Ever wanted to build your own query engine? Then check out this guide by
@andygrove_io
, touching on all the relevant parts like query parsing, planning, optimizing, etc.
📢 Just blogged: "Ten Tips to Make Conference Talks Suck Less"
Some thoughts on things which are easy to get wrong during a talk but which can make a huge difference for how its received.
Woot, support for helpful NullPointerExceptions (JEP 358, ) has landed in the
@java
14 early access builds! Opt-in for now (-XX:+ShowCodeDetailsInExceptionMessages), planned to be enabled by default in JDK 15. Loving this, great improvement 🤩!
#OpenJDK
Might be unpopular (or not?), but I still prefer Docker Compose over K8s by far for most demos or examples which folks should run on their local machines. Much easier to get started with, allowing to focus on the things you actually want to show and teach.
@DarrenBaldwin03
Oooh, one of my favorite topics ;) The purpose of ORM isn't to abstract SQL. Rather it's mapping result sets to objects, and (for some) dirty checking of managed object graphs and synchronizing changes efficiently to the database.
Always wanna cry when reading such comments. The primary purpose of Hibernate ORM is not to "do SQL for you", but to map result sets to object graphs, track changes, and efficiently sync those changes back to the DB. You can, and absolutely must, stay in control of emitted SQL.
📢 Blogged: "
#Loom
and Thread Fairness"
Taking
@java
19's Project Loom for a spin, I learned about an interesting aspect to the scheduling of CPU-bound workloads on virtual threads.
🎉 Very nice, JEP 413 ("Code Snippets in Java API Documentation", ) has landed in
@java
18 EA! This going to be a huge improvement for API authors and users. Some first impressions after a quick test with the
#JavaDoc
of the Bean Validation API. 1/4
After working for a year now with code using Java's `var` keyword, it's just not clicking for me. Sure, in my IDE the compiler will infer the type, but for instance when reviewing code on GitHub, I feel it often makes things harder to grok.
One common techniques used by many
#1BRC
solutions is SWAR ("SIMD within a register"). It may look like magic at first, but with a bit of time, it's actually not too difficult to understand. Great post by
@lemire
which explains the idea.
👉
On microservices🆚monoliths, it's not that one always beats the other. You can build tightly coupled, intermingled microservices and well modularized, maintainable monoliths -- and the other way around. Understand implications of both + choose what make most sense for your case.
When struggling to get the brackets right in List<Map<String, List<String>>>, it's not a problem with generics or the language, but rather a hint for weak modelling. Those lists and maps are concepts of your domain waiting to be expressed as classes encapsulating state and logic.
The "--release" compiler option new in
@java
9 is super-useful: finally you can safely compile for older versions (6, 7, 8) without accidentally referencing newer APIs. No external tools or separate JDKs needed, as the compiler has full signatures of the earlier platform APIs.
Working on a new blog post about custom
@Java
Flight Recorder events and feeding them to Prometheus/Grafana using the new JFR Event Streaming API (). Having both, live monitoring/alerting and recording files for offline analysis, is a really powerful combo.
"1,133 changed files with 95,870 additions and 8,270 deletions"
That's the commit which integrates
#Loom
into
#OpenJDK
. Deepest respect not only to those involved creating it, but also to those who reviewed these 104k changed lines 🤯.
When designing an API, start with writing up fake client code against it, then incrementally define those methods in the API. This helps you to think from the user's perspective and make sure they can achieve their goals with the API, rather than coming up with an API in vacuum.
📢 Starting your journey with
#ApacheKafka
and looking for some orientation?
Then we (
@hpgrahsl
and I) got something for you: "A Great Day Out With... Apache Kafka", a curated collection of related resources. Enjoy, and save travels in streaming land!
🔗
As I'm just seeing this bizarre take again: don't let anyone shame you for using a debugger, it's one of the most powerful tools in the box to understand the runtime behavior of programs. Use what gets the job done, not what someone on the internet finds aesthetically pleasing.
"Keep your cache always fresh with
#Debezium
!"
📢 The slides and demo from my
#Current22
talk are online now. Video recording should follow in a bit.
🖥️
🤖
Unused
#ApacheKafka
topics can become a real problem on production clusters, causing network, CPU, and memory overhead. Nice write-up from
@LinkedInEng
about their tooling for identifying and removing empty topics. Anyone aware of something OSS for this?
If you're using
@java
in your job, I highly recommend to spend some time poking around a bit in the Java Language Specification. It is written in a really comprehensible way, and it'll widen your understanding of the language for sure.
"Improving
#JVM
Warm-up on
#Kubernetes
"
Vikas Kumar explains why you should not run your
#Java
applications with a fixed quota of a single CPU core. Instead, use Burstable QoS to allow for increased CPU usage during start-up.
@unclebobmartin
Sad to see you on this downwards spiral of increasingly provocative hot-takes, Bob. Remember, you don't have to go down this path. You can stop at any time and use your reach to be the catalyst for a positive change.
If you have been away from
#Java
for a while, check out the Java Playground. It's the simplest way to explore recent language additions like records, text blocks, or switch expressions--right from within your browser ☕.
Very honoured and grateful for being named a Java Champion!
#Java
is a center piece of my professional life, and I can't begin to express how thankful I am for being part of its outstanding community and getting to know and being inspired by so many fantastic people!
Wow, that's a really cool experiment by
@ozangunalp
:
#ApacheKafka
compiled to native binary via
#Quarkus
and
#GraalVM
. A single broker node up and running on my Mac M1 in ~130ms w/ 60MB RSS. This should come in really handy e.g. for testing Kafka apps!
🐘 For CDC users, one of the most exciting features in
#Postgres
version 16 is the support for logical replication from stand-by servers (a.k.a. read replicas). I wrote a two-part series about this:
Part 1⃣ 👉
Part 2⃣ 👉
1/4
🤓 What's your
@java
API secret tip? I.e. a type or method from the JDK, which isn't widely known yet very useful, and which you think more people should be aware of? My vote goes for Desktop.moveToTrash(). What's yours?
"Events: Fat or Thin"
Enjoyed reading this discussion of event design philosophies: fully self-contained, only containing delta information, only containing a reference for retrieving more context out of bands, etc 👍. By Satjinder Bath.
"Normalize as much as possible, denormalize as much as necessary". Got this advice on data modelling from a senior developer in my first job, and I'm still thankful to them for that.
A nice little gem was added in
#Java
16 b27: Stream.toList(). It's not only more concise compared to collect(Collectors.toList()), but also more efficient -- in particular with parallel() -- as it avoids result copying. Benchmark is a few simple ops on 100K elem stream of Long.
As it came up more than once in
#1BRC
entries, I just had to get my own copy of "Hacker's Delight". Kudos to the betterment budget we have at work, making this an easy decision. Let the bit fiddling begin 🤓!
If you have even the vaguest interest in JVM performance topics, check out
@richardstartin
's blog. It's a gold mine of deep and insightful posts on that topic 👇.
One data architecture I expect we'll see more in 2023 is
#SQLite
/
#DuckDB
deployed as caches at the edge, updated via change feeds from system-of-record: stellar read performance due to close local proximity to users and fully queryable data models tailored for specific use cases.
As for Java build systems, Maven is the right default choice for most teams. Gradle lets you go way further and has many advantages, e.g. around build avoidance, but in absence of someone who truly groks how Gradle works (which takes time!), it's too easy to shoot your own foot.
LOL, saw Apache Kafka being referred to as "legacy platform" today in a start-up's Series A announcement. By all means, be ambitious; improve, innovate, invent. But let's also keep things somewhat real. Such hyperbole messaging doesn't speak in your favour.