Michael’s talk will help you make sense about how RavenDB is all put together:
RavenDB - A guided tour into ravenDB from Øredev Conference on Vimeo.
Michael’s talk will help you make sense about how RavenDB is all put together:
RavenDB - A guided tour into ravenDB from Øredev Conference on Vimeo.
The recording on my talk in the RavenDB Day two weeks ago is now available. You can watch it here:
RavenDB - Unraveling RavenDB 3.0 from Øredev Conference on Vimeo.
I’m excited to get to tell you that RavenDB is a featured vendor in DZone’s 2014 Guide to Big Data. The guide includes expert opinions and tips, industry knowledge, and data platform and database comparisons. And it would give you a good background information about the different NoSQL solutions that are currently available.
Readers can download a free copy of the guide here.
During the RavenDB Days conference, I got a lot of questions from customers. Here is one of them.
There is a migration process that deals with event sourcing system. So we have 10,000,000 commits with 5 – 50 events per commit. Each event result in a property update to an entity.
That gives us roughly 300,000,000 events to process. The trivial way to solve this would be:
foreach(var commit in YieldAllCommits())
{
using(var session = docStore.OpenSession())
{
foreach(var evnt in commit.Events)
{
var entity = evnt.Load<Customer>(evnt.EntityId);
evnt.Apply(entity);
}
session.SaveChanges();
}
}
That works, but it tends to be slow. Worse case here would result in 310,000,000 requests to the server.
Note that this has the nice property that all the changes in a commit are saved in a single commit. We’re going to relax this behavior, and use something better here.
We’ll take the implementation of this LRU cache and add an event for dropping from the cache and iteration.
usging(var bulk = docStore.BulkInsert(allowUpdates: true))
{
var cache = new LeastRecentlyUsedCache<string, Customer>(capacity: 10 * 1000);
cache.OnEvict = c => bulk.Store(c);
foreach(var commit in YieldAllCommits())
{
using(var session = docStore.OpenSession())
{
foreach(var evnt in commit.Events)
{
Customer entity;
if(cache.TryGetValue(evnt.EventId, out entity) == false)
{
using(var session = docStore.OpenSession())
{
entity = session.Load<Customer>(evnt.EventId);
cache.Set(evnt.EventId, entity);
}
}
evnt.Apply(evnt);
}
}
}
foreach(var kvp in cache){
bulk.Store(kvp.Value);
}
}
Here we are using a cache of 10,000 items. With the assumption that we are going to have clustering for events on entities, so a lot of changes on an entity will happen on roughly the same time. We take advantage of that to try to only load each document once. We use bulk insert to flush those changes to the server when needed. This code will handle the case where we flushed out a document from the cache then we get events for it again, but he assumption is that this scenario is much lower.
This is a big release, it is a big deal for us.
It took me 18(!) blog posts to discuss just the items that we wanted highlighted, out of over twelve hundred resolved issues and tens of thousands of commits by a pretty large team.
Even at a rate of two posts a day, this still took two weeks to go through.
We are also working on the new book, multiple events coming up as well as laying down the plans for RavenDB vNext. All of this is very exciting, but for now, I want to ask your opinion. Based on the previous posts in this series, and based on your own initial impressions of RavenDB, what do you think?
This is me signing off, quite tired.
One of the important roles operations has is going to an existing server and checking if everything is fine. This is routine maintenance stuff. It can be things like checking if we have enough disk space for our expected growth, or if we don’t have too many indexes.
Here is some data from this blog’s production system:
Note that we have the squeeze button, for when you need to squeeze every bit of perf out of the system. Let us see what happens when I click it (I used a different production db, because this one was already optimized).
Here is what we get:
You can see that RavenDB suggest that we’ll merge indexes, so we can reduce the overall number of indexes we have.
We can also see recommendations for deleting unused indexes in general.
The idea is that we keep track of those stats and allow you to make decisions based on those stats. So you don’t have to go by gut feeling or guesses.
After looking at all the pretty pictures, let us take a look at what we have available for us for behind the cover for ops.
The first such change is abandoning performance counters. In 2.5, we reported a lot of our state through performance counters. However, while they are a standard tool and easy to work with using admin tools, they were also unworkable. We have had multiple times where RavenDB would hang because performance counters were corrupted, they require specific permissions and in general they were a lot of hassle. Instead of relying on performance counters, we are now using the metrics.net package to handle that. This gives us a lot more flexibility. We can now generate a lot more metrics, and we have. All of those are available in the /debug/metrics endpoint, and on the studio as well.
Another major change we did was to consolidate all of the database administration details to a centralized location:
Manage your server gives us all the tools we need to manage the databases on this server.
You can manage permissions, backup and restore, watch what is going on and in general do admin style operations.
In particular, note that we made it slightly harder to use the system database. The intent now is that the system database is reserved for managing the RavenDB server itself, and all users’ data will reside in their own databases.
You can also start a compaction directly from the studio:
Compactions are good if you want to ask RavenDB to return some disk space to the OS (by default we reserve it for our own usage).
Restore & backup are possible via the studio, but usually, admins want to script those out. We had Raven.Backup.exe to handle scripted backup for a while now. And you could restore using Raven.Server.exe --restore from the command line.
The problem was that this restored the database to disk, but didn’t wire it to the server, so you had the extra step of doing that. This was useful for restoring system databases, not so much for named databases.
We now have:
Which make a clear distinction between those operations.
Another related issue is how Smuggler handles error. Previously, the full export process had to complete successfully for you to have a valid output. Now we are more robust for errors such as unreliable network or timeouts. That means that if your network has a tendency to cut connections off at the knee, you will be able to resume (assuming you use incremental export) and still get your data.
We have also made a lot of changes in the Smuggler to make it work more nicely in common deployment scenarios, where request size and time are usually limited. The whole process is more robust for errors now.
Speaking of making things more robust, another area where we put attention to was memory usage over time. Beyond just reducing our memory usage in common scenarios, we have also improved our GC story. We can now invoke explicit GCs when we know that we created a lot of garbage that needs to be rid off. We’ll also invoke Large Object Heap compaction if needed, utilizing the new features in the .NET framework.
That is quite enough for a single post, but still doesn’t cover all the operations change, I’ll cover the stuff that should make your drool on the next post.
One of the most challenging things to do in production is to know what is going on? In order to facilitate that, we have dedicate some time to exposing the internal guts of RavenDB to the outside world (assuming that the outside world has the appropriate permissions).
One way to look at that is to subscribe to the log stream from RavenDB, you can do it like this:
This gives you the following:
Note that this requires no configuration changes, or restarting the server or database. As long as your logs subscription is active, we’ll send you a live stream of all the log activity in RavenDB, which should allow you to get a lot of useful insights about what exactly it is that RavenDB is doing.
This is especially important if you need to do any sort of trouble shooting, because that is when you need to have logs, and restarting the server to enable them is often out of the question (it would likely resolve the issue you want to understand). And honestly, this is a feature that we need to support customers, it is going to be much easier to just say “let us look at the logs”, rather than having to go over how to configure them, etc. Another thing to note is the fact that this can all be done remotely, you don’t have to have access to the physical server. It does require you to have admin permissions on the server, so not any user can do that.
Another production view that is available to you is the Traffic Watcher:
This gives you the option of looking at the requests that are actually hitting the server. It is a subset of information from the logs, but it is usually a lot more interesting to watch. And again, this can be done remotely as well. You can watch all databases, or just a single one.
But most importantly from support perspective is the new Debug Info! package. And yes, it deserver the bang in the name. What this does is gather a lot of important information from the database, all the current stats, and a lot of stuff that we need to figure out what is going on. The idea is that if you have a problem, we won’t have to ask for a lot of separate pieces of information, you can get it all as a single shot.
Oh, and we can also grab the actual stack trace information from your system, so we even know exactly what your system is doing.
In my next post, I’ll discuss one last operational concern, optimizations.
This has been the most important change in RavenDB 3.0, in my opinion. Not because of complexity and scope, pretty much everything here is much simpler than other features than we have done. But this is important because it makes RavenDB much easier to operate. Since the get go, we have tried to make sure that RavenDB would be a low friction system. We usually focused on the developer experience, and that showed when we had to deal with operational issues.
Things were more complex than they should. Now, to be fair, we had the appropriate facilities to figure things out, ranging from debug endpoints, to performance counters to a great debug log story. The problem is that in my eye, we were merely on par with other systems. RavenDB wasn’t created to be on par, RavenDB was created so when you use this, you would sigh and say “that is how it should be done”. With RavenDB 3.0, I think we are much closer to that.
Because we have done so much work here, I’m going to split things to multiple posts. This one is the one with all the pretty pictures, as you can imagine. Next one will talk about the actual operational behavior changes we made.
Let me go over some of those things with you. Here you can see the stats view, including looking at an index details.
That is similar to what we had before. But it gets interesting when we want to start actually looking at the data more deeply. Here are the indexing stats on my machine:
You can see that the Product/Sales index has a big fanout, by the fact that it has more items out than in, for example. You can also see how much items we indexed per batch, and how we do parallel indexing.
We also have a lot more metrics to look at. The current requests view along several time frames.
The live index work view:
The indexing batch size and the perfetching stats graph gives us live memory consumption usage for indexing, as well as some view on what indexing strategy is currently in use.
Combining those stats, we have a lot of information at our fingertips, and can get a better idea about what exactly is going on inside RavenDB.
But so far, this is just to look at things, let us see what else we can do. RavenDB does a lot of things in the background. From bulk insert work to set based operations. We added a view that let you see those tasks, and cancel them if you need to:
You can now see all the work done by the replication background processes, which will give you a better idea on what your cluster is doing. And of course there is the topology view that we already looked at.
We also added views for most of the debug endpoints that RavenDB has. Here we are looking at the subscribed changes connections.
We get a lot of metrics available for us now. In fact, we went a bit crazy there and started tracking a lot of stuff. This will help you understand what is going on internally. And you can also get nice histograms.
There is a lot of stuff there, so I won’t cover it all, but I would show you what I think is one of the nicest features:
This will give you real stats about resource usage in your system. Including counts of documents per collection and the size on disk.
Okay, that is enough with the pretty pictures, on my next post, I’ll talk about the actual changes we made to support operations better.
SQL Replication has been a part of RavenDB for quite some time,showing up for the first time in the 1.0 build as the Index Replication Bundle. This turned out to be a very useful feature, and in 3.0 we had a dedicated developer for this for several weeks, banging it into new and interesting shapes.
We started out with a proper design for how you want to use it. And I’m just going to take you through the process for a bit, then talk about the backend changes.
We start by defining a named connection string (note that you can actually test this immediately):
And then we define the actual replication behavior:
Note that we have the Tools control in the top? Clicking it and selecting Simulate will give you:
So you can actually see the commands that we are going to execute to replicate a specific document. That is going to save a lot of head scratching about “why isn’t this replicating properly”.
You can even run this simulation against your source db, to check for errors such as constraint violations, etc.
The SQL Replication bundle now support forcing query recompilation, which avoid bad query plans caching in SQL Server:
And for the prudent DBA, we have done a lot to give you additional information. In particular, you can look at the metrics and see what is going on.
And:
In this case, I actually don’t have a relational database on this machine to test this, but I’m sure that you can figure it out.
The nice thing about it, we’ll report separate metrics per table, so your DBA can see if a particular table is causing a slow down.
Overall, we streamlined everything and tried to give you as much information upfront as possible, as well as tracking the entire process. You’ll find it much easier to work with and troubleshoot if needed.
This actually ties very well with our next topic, the operations changes in RavenDB to make it easier to manager. But that will be in the a future post.
No future posts left, oh my!