Today marks a very long journey for RavenDB as we release version 6.0 into the wild. This completes a multi-year effort on our part, which I’m really excited to share with you.
In version 6.0 the RavenDB team had a number of goals, with the primary one being, as always, providing a simple and powerful platform for your data and documents.
The full list of changes in the release is here and you can already download the binaries of the new release to try it out.
Below you’ll find more information about some of the highlights for the new release, but the executive summary for the technically inclined is:
- Deep integration with Kafka & RabbitMQ - allowing to send and consume messages directly from RavenDB.
- Sharding support allows you to run massively large databases easily and cheaply.
- Corax is a new indexing engine for RavenDB that can provide a 10x performance improvement for both indexing and queries.
With this new release we are also announcing a pricing update for our commercial customers. You can find more details at the bottom of this post.
Before we get to the gist of the changes, I want to spend a minute talking about this release. One of the reasons that we took so long to craft this release was an uncompromising eye toward ease of use, performance, and quality of experience.
It may sound strange to talk about “quality of experience” when discussing a database engine, we aren’t talking about a weekend trip or a romantic getaway, after all. The reason RavenDB exists is that we found other databases to be complicated and delicate beasts, and we wanted to create a database that would be a joy to work with. Something that you could throw your data and queries to and would just work.
With version 6.0, I’m happy to say that we managed to keep on doing just that while providing you with a host of new features and capabilities. Without further ado, allow me to tell you about the highlights of this release.
Sharding
Sharding is the process of splitting your data among multiple nodes, to allow you to scale beyond the terabyte range. RavenDB actually had this feature, as far back as 2010, when the 1.0 version came out. I’m speaking about this today because we have decided to revamp the entire approach to sharding in the new release. Sharding in RavenDB version 6.0 consists of selecting the sharded option on database creation time, and everything else is handled by the cluster. Splitting the data across the nodes, balancing load and giving you the feeling that you are working on a unified database is all our responsibility.
A chief goal of the new sharding feature in RavenDB was to make sure that you don’t have to be aware that you are running in a sharded environment. In fact, you can even switch the backend to a sharded database behind your application’s back, with no modifications to the application.
Your systems can take advantage of the sharding feature, but they aren’t forced to. This makes it easier when you need to scale because you don’t need an expensive rewrite or stop the world migration to move to the new architecture.
RavenDB, with or without sharding, offers you the same capabilities. Features such as indexing, map/reduce, ETL tasks, or subscriptions all work in the same way regardless of how you choose to store your data. A successful metric for us is that after you switch over to sharding, the only thing that you’ll notice is that you can now scale your systems even more. And, of course, you could do that easily, safely, and quickly.
I recorded a webinar showcasing our new sharding approach that you can watch.
Corax Indexing Engine
Corax indexing engine is now available in version 6.0. We have been working on that since 2014, and as you can imagine, it hasn’t been a simple -to-solve challenge. Corax has a single task, to be able to do everything that we use Lucene for, but do that faster. In fact, to do things much faster.
RavenDB has used Lucene as its indexing engine from the start, and Lucene has been the industry benchmark for search engines for a long time. In the context of RavenDB, we have optimized Lucene heavily for the past 15 years. When building Corax, it wasn’t sufficient to match what Lucene can do but to also significantly outperform it.
With version 6.0, we now offer both Corax and Lucene as indexing engines for RavenDB. You can choose to use either engine (globally or per index). Upgrading from an older version of RavenDB, you’ll keep on using the Lucene engine but can create new indexes with Corax.
Corax outperforms Lucene on a wide range of scenarios by an order of magnitude. This includes both indexing time and query latency. Corax also manages to be far faster than Lucene while utilizing less CPU and memory. One of the ways it does that is by preparing, in advance, the relevant data structures that Lucene needs to compute at runtime.
Corax indexes tend to consume about 10% - 20% more disk space than Lucene, as a trade for offering lowered memory usage, far faster querying performance, and reduced indexing time.
In the same manner as sharding, switching to the new indexing engine will get you performance improvements, with no difference in capabilities or features.
Kafka & RabbitMQ Integration
Kafka & RabbitMQ integration has been extended to version 6.0. RavenDB already supported writing to Kafka and RabbitMQ and we have now extended this capability to allow you to read messages from Kafka and RabbitMQ and turn them into documents.
This capability provides complete support for queuing infrastructure integration. RavenDB can work with existing pipelines and easily expose its data for the rest of the organization as well as consume messages from the queue and create or update documents accordingly.
Such documents are then available for querying, aggregation, etc. This makes it trivial to use a low-code solution to quickly and efficiently plug your system into Kafka and RabbitMQ. In the style of all RavenDB features, we take upon ourselves all the complexity inherent in such a solution. You only need to provide the connection string and the details about what you want to pull, and the full management of pulling or pushing data, handling distributed state, failover, and monitoring is the responsibility of the RavenDB cluster.
Data Archiving
Data archiving is a new capability in RavenDB, aiming to help users who deal with very large amounts of data. While a business wants to retain all its data forever, it typically works primarily with recent data. That is typically the data from the last year or two. As time goes by, old data accumulates, and going through irrelevant data consumes computing and memory resources from the database.
RavenDB version 6.0 offers a way to mark a document with an attribute, marking when RavenDB should move that document to archive mode. Aside from telling RavenDB at what time we should archive a document, there is nothing further that you need to do.
When the time comes to archive the document, RavenDB will change the document mode, which will have a number of effects. The document itself is retained, it is not deleted. If you aren’t already using document compression, the archived document is stored in a compressed manner, to reduce disk usage further.
Subscriptions and indexes will ignore archive documents and unless explicitly requested, archived documents will remain on the disk and consume no memory resources. Archiving reduces the resources consumed by archived documents while keeping them available if you need to look them up.
You can also construct indexing that would operate over archived and regular documents as well if you still want to allow queries over archived data. In this way, you can retain your entire dataset in the main system of record but still operate a significantly smaller chunk of that during normal operations.
Performance, observability, and usability enhancements have also been on the menu for the version 6.0 release. There are far too many individual features and improvements to count them all in this format. I encourage you to check the full release details on the website.
You are probably well aware of the drastic changes that happened in the business environment in the last few years. As part of adjusting to this new reality, we are updating our pricing to allow us to keep delivering top-notch solutions and support to our valued customers.
The new pricing policy is effective starting from January 1st, 2024.
What does this mean to you right now? Absolutely nothing until January 1st, 2024. All subscription renewals in 2023 keep their current price point.
An Early Birds benefit for our existing customers: renew your 2024 subscription while locking your 2023 price point.
* Available for Cloud and on-premises Customers.
In the weeks ahead, we'll provide detailed updates about these changes and ensure a smooth transition. Our goal is to empower you to make optimized choices about your RavenDB subscription. Your satisfaction remains our priority.
Our team is ready to assist you. Please reach out to us at [email protected] for any questions you have.
Finally, I would like to thank the RavenDB team for their hard work in delivering this version.
Our goal for this release was to deliver you a lot more while making sure that you’ll not be exposed to the underlying complexity of the problems that we are solving for you.
I believe that as you try out the new release, you’ll see how successful we are in providing you with an excellent database platform to take your systems to new heights.
And with that, I am left only with encouraging you to try out the new version. And let us know what you think about it.