Fun with bugs: Advanced Dictionary API
In RavenDB, we really care about performance. That means that our typical code does not follow idiomatic C# code. Instead, we make use of everything that the framework and the language give us to eke out that additional push for performance. Recently we ran into a bug that was quite puzzling. Here is a simple reproduction of the problem:
using System.Runtime.InteropServices;
var counts = new Dictionary<int, int>();
var totalKey = 10_000;
ref var total = ref CollectionsMarshal.GetValueRefOrAddDefault(
counts, totalKey, out _);
for (int i = 0; i < 4; i++)
{
var key = i % 32;
ref var count = ref CollectionsMarshal.GetValueRefOrAddDefault(
counts, key, out _);
count++;
total++;
}
Console.WriteLine(counts[totalKey]);
What would you expect this code to output? We are using two important features of C# here:
- Value types (in this case, an int, but the real scenario was with a struct)
- CollectionMarshal.GetValueRefOrAddDefault()
The latter method is a way to avoid performing two lookups in the dictionary to get the value if it exists and then add or modify it.
If you run the code above, it will output the number 2.
That is not expected, but when I sat down and thought about it, it made sense.
We are keeping track of the reference to a value in the dictionary, and we are mutating the dictionary.
The documentation for the method very clearly explains that this is a Bad Idea. It is an easy mistake to make, but still a mistake. The challenge here is figuring out why this is happening. Can you give it a minute of thought and see if you can figure it out?
A dictionary is basically an array that you access using an index (computed via a hash function), that is all. So if we strip everything away, the code above can be seen as:
var buffer = new int[2];
ref var total = ref var buffer[0];
We simply have a reference to the first element in the array, that’s what this does behind the scenes. And when we insert items into the dictionary, we may need to allocate a bigger backing array for it, so this becomes:
var buffer = new int[2];
ref var total = ref var buffer[0];
var newBuffer = new int[4];
buffer.CopyTo(newBuffer);
buffer = newBuffer;
total = 1;
var newTotal = buffer[0]
In other words, the total variable is pointing to the first element in the two-element array, but we allocated a new array (and copied all the values). That is the reason why the code above gives the wrong result. Makes perfect sense, and yet, was quite puzzling to figure out.
Comments
Interesting post. This bug must have been really tough to spot if it was in a much more complex method obfuscated by many other things going on!
In this case I'd guess the solution would be to set the capacity to 32 ahead of time, or to just use the regular indexer, depending on the use-case?
On a separate note, a while back I provided what I think is an elegant solution to "Challenge: Efficient snapshotable state" in the comments of that post. I am not sure if you had seen it but if not I'd appreciate your thoughts. Thanks either way.
Comment preview
Join the conversation...