RavenDB Cloud now offering NVMe based clusters
I’m really happy to announce that RavenDB Cloud is now offering NVMe based instances on the Performance tier. In short, that means that you can deploy RavenDB Cloud clusters to handle some truly high workloads.
You can learn more about what is actually going on in our documentation. For performance numbers and costs, feel free to skip to the bottom of this post.
I’m assuming that you may not be familiar with everything that a database needs to run fast, and this feature deserves a full explanation of what is on offer. So here are the full details of what you can now do.
RavenDB is a transactional database that often processes far more data than the memory available on the machine. Consequently, it needs to read from and write to the disk. In fact, as a database, you can say that it is its primary role. This means that one of the most important factors for database performance is the speed of your disk. I have written about the topic before in more depth, if you are interested in exploring the topic.
When running on-premises, it’s easy to get the best disks you can. We recommend at least good SSDs and prefer NVMe drives for best results. When running on the cloud, the situation is quite different. Machines in the cloud are assumed to be transient, they come and go. Disks, on the other hand, are required to be persistent. So a typical disk on the cloud is actually a remote storage device (typically replicated). That means that disk I/O on the cloud is… slow. To the point where you could get better performance from off-the-shelf hardware from 20 years ago.
There are higher tiers of high-performance disks available in the cloud, of course. If you need them, you are paying quite a lot for that additional performance. There is another option, however. You can use NVMe disks on the cloud as well. Well, you could, but do you want to?
The reason you’d want to use an NVMe disk in the cloud is performance, of course. But the problem with achieving this performance on the cloud is that you lose the safety of “this disk is persistent beyond the machine”. In other words, this is literally a disk that is attached to the physical server hosting your VM. If something goes wrong with that machine, you lose the disk. Traditionally, that means that you can only use that for transient data, not as the backend store for a database.
However, RavenDB has some interesting options to deal with this. RavenDB Cloud runs RavenDB clusters with 3 copies of the data by default, operating in a full multi-master configuration. Given that we already have multiple copies of the data, what would happen if we lost a machine?
The underlying watchdog would realize that something happened and initiate recovery, which will effectively spawn the instance on another node. That works, but what about the data? All of that data is now lost. The design of RavenDB treats that as an acceptable scenario, the cluster would detect such an issue, move the affected node to rehabilitation mode, and start pumping all the data from the other nodes in the cluster to it.
In short, now we’ve shifted from a node failure being catastrophic to having a small bump in the data traffic bill at the end of the month. Thanks to its multi-master setup, RavenDB can recover even if two nodes go down at the same time, as we’ll recover from the third one. RavenDB Cloud runs the nodes in the cluster in separate availability zones specifically to handle such failure scenarios.
We have run into this scenario multiple times, both as part of our testing and as actual production events. I am happy to say that everything works as expected, the failed node comes up empty, is filled by the rest of the cluster, and then seamlessly resumes its work. The users were not even aware that something happened.
Of course, there is always the possibility that the entire region could go down, or that three separate instances in three separate availability zones would fail at the same time. What happens then? That is expected to be a rare scenario, but we are all about covering our edge cases.
In such a scenario, you would need to recover from backup. Clusters using NVMe disks are configured to run using Snapshot backups, which consume slightly more disk space than normal but can be restored more quickly.
RavenDB Cloud also blocks the user's ability to scale up or down such clusters from the portal and requires a support ticket to perform them. This is because special care is needed when performing such operations on NVMe machines. Even with those limitations, we are able to perform such actions with zero downtime for the users.
And after all this story, let’s talk numbers. Take a look at the following table illustrating the costs vs. performance on AWS (us-east-1):
Type | # of cores | Memory | Disk | Cost ($ / hour) |
P40 (Premium disk) | 16 | 64 GB | 2 TB, 10,000 IOPS, 360 MB/s | 8.790 |
PN30 (NVMe) | 8 | 64 GB | 2 TB, 110,000 IOPS, 440 MB/s | 6.395 |
PN40 (NVMe) | 16 | 128 GB | 4 TB, 220,000 IOPS, 880 MB/s | 12.782 |
The situation is even more blatant when looking at the numbers on Azure (eastus):
Type | # of cores | Memory | Disk | Cost ($ / hour) |
P40 (Premium disk) | 16 | 64 GB | 2 TB, 7,500 IOPS, 250 MB/s | 7.089 |
PN30 (NVMe) | 8 | 64 GB | 2 TB, 400,000 IOPS, 2 GB/s | 6.485 |
PN40 (NVMe) | 16 | 128 GB | 4 TB, 800,000 IOPS, 4 GB/s | 12.956 |
In other words, you can upgrade to the NVMe cluster and actually reduce the spend if you are stalled on I/O. We expect most workloads to see both higher performance and lower cost from a move from P40 with premium disk to PN30 (same amount of memory, fewer cores). For most workloads, we have found that IO rate matters even more than core count or CPU horsepower.
I’m really excited about this new feature, not only because it can give you a big performance boost but also because it leverages the same behavior that RavenDB already exhibits for handling issues in the cluster and recovering from unexpected failures.
In short, you now have some new capabilities at your fingertips, being able to use RavenDB Cloud for even more demanding workloads. Give it a try, I hear it goes vrooom 🙂.
Comments
It's worth mentioning, that in Azure, using 2 P30 will result in better I/O metrics than 1 P40.
Comment preview