Caching documents in RavenDB: The good, the bad and the ugly
RavenDB has a hidden feature, enabled by default and not something that you usually need to be aware of. It has built-in support for caching. Consider the following code:
async Task<Dictionary<string, int>> HowMuchWorkToDo(string userId)
{
using var session = _documentStore.OpenAsyncSession();
var results = await session.Query<Item>()
.GroupBy(x =>new { x.Status, x.AssignedTo })
.Where(g => g.Key.AssignedTo == userId && g.Key.Status != "Closed")
.Select(g => new
{
Status = g.Key.Status,
Count = g.Count()
})
.ToListAsync();
return results.ToDictionary(x => x.Status, x => x.Count);
}
What happens if I call it twice with the same user? The first time, RavenDB will send the query to the server, where it will be evaluated and executed. The server will also send an ETag header with the response. The client will remember the response and its ETag in its own memory.
The next time this is called on the same user, the client will again send a request to the server. This time, however, it will also inform the server that it has a previous response to this query, with the specified ETag. The server, when realizing the client has a cached response, will do a (very cheap) check to see if the cached response matches the current state of the server. If so, it can inform the client (using 304 Not Modified) that it can use its cache.
In this way, we benefit twice:
- First, on the server side, we avoid the need to compute the actual query.
- Second, on the network side, we aren’t sending a full response back, just a very small notification to use the cached version.
You’ll note, however, that there is still an issue. We have to go to the server to check. That means that we still pay the network costs. So far, this feature is completely transparent to the user. It works behind the scenes to optimize server query costs and network bandwidth costs.
We have a full-blown article on caching in RavenDB if you care to know more details instead of just “it makes things work faster for me”.
Aggressive Caching in RavenDB
The next stage is to involve the user. Enter the AggressiveCache() feature (see the full documentation here), which allows the user to specify an additional aspect. Now, when the client has the value in the cache, it will skip going to the server entirely and serve the request directly from the cache.
What about cache invalidation? Instead of having the client check on each request if things have changed, we invert the process. The client asks the server to notify it when things change, and until it gets notice from the server, it can serve responses completely from the local cache.
I really love this feature, that was the Good part, now let’s talk about the other pieces:
There are only two hard things in Computer Science: cache invalidation and naming things.
-- Phil Karlton
The bad part of caching is that this introduces more complexity to the system. Consider a system with two clients that are using the same database. An update from one of them may show up at different times in each. Cache invalidation will not happen instantly, and it is possible to get into situations where the server fails to notify the client about the update, meaning that we didn’t clear the cache.
We have a good set of solutions around all of those, I think. But it is important to understand that the problem space itself is a problem.
In particular, let’s talk about dealing with the following query:
var emps = session.Query<Employee>()
.Include(x => x.Department)
.Where(x => x.Location.City == "London")
.ToListAsync();
When an employee is changed on the server, it will send a notice to the client, which can evict the item from the cache, right? But what about when a department is changed?
For that matter, what happens if a new employee is added to London? How do we detect that we need to refresh this query?
There are solutions to those problems, but they are super complicated and have various failure modes that often require more computing power than actually running the query. For that reason, RavenDB uses a much simpler model. If the server notifies us about any change, we’ll mark the entire cache as suspect.
The next request will have to go to the server (again with an ETag, etc) to verify that the response hasn’t changed. Note that if the specific query results haven’t changed, we’ll get OK (304 Not Modified) from the server, and the client will use the cached response.
Conservatively aggressive approach
In other words, even when using aggressive caching, RavenDB still has to go to the server sometimes. What is the impact of this approach when you have a system under load?
We’ll still use aggressive caching, but you’ll see brief periods where we aren’t checking with the server (usually be able to cache for about a second or so), followed by queries to the server to check for any changes.
In most cases, this is what you want. We still benefit from the cache while reducing the number of remote calls by about 50%, and we don’t have to worry about missing updates. The downside is that, as application developers, we know that this particular document and query are independent, so we want to cache them until we get notice about that particular document being changed.
The default aggressive caching in RavenDB will not be of major help here, I’m afraid. But there are a few things you can do.
You can use Aggressive Caching in the NoTracking mode. In that mode, the client will not ask the server for notifications on changes, and will cache the responses in memory until they expire (clock expiration or size expiration only).
There is also a feature suggestion that calls for updating the aggressive cache in a background manner, I would love to hear more feedback on this proposal.
Another option is to take this feature higher than RavenDB directly, but still use its capabilities. Since we have a scenario where we know that we want to cache a specific set of documents and refresh the cache only when those documents are updated, let’s write it.
Here is the code:
public class RecordCache<T>
{
private ConcurrentLru<string, T> _items =
new(256, StringComparer.OrdinalIgnoreCase);
private readonly IDocumentStore _documentStore;
public RecordCache(IDocumentStore documentStore)
{
const BindingFlags Flags = BindingFlags.Instance |
BindingFlags.NonPublic | BindingFlags.Public;
var violation = typeof(T).GetFields(Flags)
.FirstOrDefault(f => f.IsInitOnly is false);
if (violation != null)
{
throw new InvalidOperationException(
"You should cache *only* immutable records, but got: " +
typeof(T).FullName + " with " + violation.Name +
" which is not read only!");
}
var changes = documentStore.Changes();
changes.ConnectionStatusChanged += (_, args) =>
{
_items = new(256, StringComparer.OrdinalIgnoreCase);
};
changes.ForDocumentsInCollection<T>()
.Subscribe(e =>
{
_items.TryRemove(e.Id, out _);
})
;
_documentStore = documentStore;
}
public ValueTask<T> Get(string id)
{
if (_items.TryGetValue(id, out var result))
{
return ValueTask.FromResult(result);
}
return new ValueTask<T>(GetFromServer(id));
}
private async Task<T> GetFromServer(string id)
{
using var session = _documentStore.OpenAsyncSession();
var item = await session.LoadAsync<T>(id);
_items.Set(id, item);
return item;
}
}
There are a few things to note about this code. We are holding live instances, so we ensure that the values we keep are immutable records. Otherwise, we may hand the same instance to two threads which can be… fun.
Note that document IDs in RavenDB are case insensitive, so we pass the right string comparer.
Finally, the magic happens in the constructor. We register for two important events. Whenever the connection status of the Changes() connection is modified, we clear the cache. This handles any lost updates scenarios that occurred while we were disconnected.
In practice, the subscription to events on that particular collection is where we ensure that after the server notification, we can evict the document from the cache so that the next request will load a fresh version.
Caching + Distributed Systems = 🤯🤯🤯
I’m afraid this isn’t an easy topic once you dive into the specifics and constraints we operate under. As I mentioned, I would love your feedback on the background cache refresh feature, or maybe you have better insight into other ways to address the topic.
Comments
Hi Oren,
Thanks for the article. Our expectation is RavenDB invalidates the Aggressive Cache for the one item that changes. We use Aggressive Cache for metadata that hardly ever changes for a give tenant (categories, currencies, drop-down lists, custom fields definitions etc.).
The thing we're looking for could just work for Load<> without anything fancy. No need for queries. The server would keep track of the Ids that we load in this special way, and tell the client(s) if those items change (invalidate just that item) so that the next time we ask for it, it goes to the server to get that item. An important point to note is that this needs to work in a load balanced way. Multiple application servers will be watching the same metadata document n the database and they all need to be told if it's changed.
Our metadata rarely changes (most of it never changes once configured) but we have 5000 different metadata documents across 50 + collections and so it's not just one collection to enable a Data Subscription for a given collection.
For the short-term, we are resorting to the 'DoNotTrackChanges' option and then clearing the RavenDB cache when we know any item of metadata has changed, however this is a sledgehammer because it clears the entire cache. We will also need to use the ChangesAPI to 'broadcast' the change to the metadata item to other application servers. In the short-term it would be great to be able to clear just one item from the cache rather than the entire cache.
Cheers,
Ian
I prefer the other version of the quote
"There are only two hard problems in computer science, cache invalidation, naming things and off-by-one errror", Leon Bambrick
Regards
Paul
Comment preview