JEPSEN

Distributed Systems Safety Research

About Jepsen

Jepsen aims to improve the safety of distributed databases, queues, consensus systems, etc. We maintain an open source library for safety testing, and publish free, in-depth analyses of specific systems. In each analysis we explore whether the system lives up to its documentation’s claims, file new bugs, and suggest recommendations for operators. In addition to paid analysis, Jepsen offers technical talks, training classes, and consulting services.

Jepsen pushes vendors to make accurate claims and test their software rigorously, helps users choose databases and queues that fit their needs, and teaches engineers how to evaluate distributed systems correctness for themselves.

See Also

News

Recent research, analyses, and announcements.

Upcoming Events

2025-03-03

By popular demand, we’re offering another open session of the distributed systems fundamentals class: four half-days discussing the basics of distributed systems theory and practice. For the first time we’re also opening up the accompanying workshop, for up to five participants. Please join!

Jepsen also has two conference talks coming up: one at BugBash 2025 (April 3-4), in Washington, DC, and a second at Systems Distributed 2025 (June 19-20), in Amsterdam.

Antithesis, Buf, and Jepsen are running a joint webinar on December 5th, 2024. We’ll discuss a Kafka protocol safety issue, talk about the challenges of distributed systems testing, and show how Jepsen and Antithesis helped identify critical safety errors in Bufstream. Come watch Antithesis pause, rewind, and explore a running Bufstream cluster in an interactive debugging shell!

Bufstream 0.1.0

2024-10-24

Jepsen worked with Buf to analyze the safety of Bufstream, a Kafka-compatible streaming system. We found three safety and two liveness issues in Bufstream 0.1.0, including the loss of acknowledged writes in healthy clusters. These problems were resolved by version 0.1.3. We also discovered serious issues in Kafka’s transaction protocol, including write loss, aborted read, and torn transactions. These problems affect Kafka, Bufstream, and (presumably) other Kafka-compatible systems; they remain outstanding.

In late 2023 we reported that MySQL and MariaDB’s REPEATABLE READ did not, in fact, provide repeatable reads. The MariaDB team has been hard at work this past year. They’ve added a new flag, --innodb-snapshot-isolation=true, which causes REPEATABLE READ to prevent Lost Update, Non-repeatable Read, and violations of Monotonic Atomic View.

Jepsen has not yet tested this, but it looks like MariaDB might, with the new flag enabled, offer Snapshot Isolation at REPEATABLE READ.

Jepsen’s distributed systems training introduces engineers and operators to the fundamentals of nodes and networks, consistency and availability, techniques for replicating state, a slew of design patterns, and production concerns. By popular request, we’re offering a special session of this class that anyone can register for. Join us on Zoom, December 16th through 19th, 2024. Tickets are on sale now.

All news from Jepsen…