JEPSEN

Distributed Systems Safety Research

About Jepsen

Jepsen aims to improve the safety of distributed databases, queues, consensus systems, etc. We maintain an open source library for safety testing, and publish free, in-depth analyses of specific systems. In each analysis we explore whether the system lives up to its documentation’s claims, file new bugs, and suggest recommendations for operators. In addition to paid analysis, Jepsen offers technical talks, training classes, and consulting services.

Jepsen pushes vendors to make accurate claims and test their software rigorously, helps users choose databases and queues that fit their needs, and teaches engineers how to evaluate distributed systems correctness for themselves.

See Also

News

Recent research, analyses, and announcements.

Antithesis, Buf, and Jepsen are running a joint webinar on December 5th, 2024. We’ll discuss a Kafka protocol safety issue, talk about the challenges of distributed systems testing, and show how Jepsen and Antithesis helped identify critical safety errors in Bufstream. Come watch Antithesis pause, rewind, and explore a running Bufstream cluster in an interactive debugging shell!

Bufstream 0.1.0

2024-10-24

Jepsen worked with Buf to analyze the safety of Bufstream, a Kafka-compatible streaming system. We found three safety and two liveness issues in Bufstream 0.1.0, including the loss of acknowledged writes in healthy clusters. These problems were resolved by version 0.1.3. We also discovered serious issues in Kafka’s transaction protocol, including write loss, aborted read, and torn transactions. These problems affect Kafka, Bufstream, and (presumably) other Kafka-compatible systems; they remain outstanding.

In late 2023 we reported that MySQL and MariaDB’s REPEATABLE READ did not, in fact, provide repeatable reads. The MariaDB team has been hard at work this past year. They’ve added a new flag, --innodb-snapshot-isolation=true, which causes REPEATABLE READ to prevent Lost Update, Non-repeatable Read, and violations of Monotonic Atomic View.

Jepsen has not yet tested this, but it looks like MariaDB might, with the new flag enabled, offer Snapshot Isolation at REPEATABLE READ.

Jepsen’s distributed systems training introduces engineers and operators to the fundamentals of nodes and networks, consistency and availability, techniques for replicating state, a slew of design patterns, and production concerns. By popular request, we’re offering a special session of this class that anyone can register for. Join us on Zoom, December 16th through 19th, 2024. Tickets are on sale now.

Jepsen 0.3.6

2024-10-17

Jepsen 0.3.6 is now available on GitHub and Clojars. This is a sizeable release. It includes a significant correctness bugfix for a rare bug that could make operations in the history print with the wrong data. It also adds a new namespace for composing databases, nemeses, and generators when working with systems where each node has a different role. Kafka-style tests gain new powers and are significantly faster. And we have the usual slew of small bugfixes, dependency bumps, and quality-of-life improvements. Happy testing!

The full release notes are available on GitHub.

All news from Jepsen…