Ayende @ Rahien

filter by tags archive

architecture (596) rss
bugs (444) rss
challanges (123) rss
community (372) rss
databases (472) rss
design (891) rss
development (617) rss
hibernating-practices (69) rss
miscellaneous (591) rss
performance (385) rss
programming (1058) rss
raven (1407) rss
ravendb.net (485) rss
reviews (184) rss

2024
- December (1)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Nov 30 2018

Refactoring C CodeGiving good SSL errors to your client…

time to read 4 min | 769 words

Tweet Share Share 4 comments

Tags:

development

Getting errors from SSL isn’t easy. Sometimes, I think that so much encryption has wrapped things up and error reporting are treated as secret information that must be withheld. The root of the problem is that SSL doesn’t really have a way for the two communicating parties to tell each other: “I don’t trust you because you wear glasses”. There are some well known error codes, but for the most part, you’ll get a connection abort. I want to see what it would take to provide good error handling for network protocol using SSL that handles:

No certificate provided
Expired / not yet valid certificate provided
Unfamiliar certificate provided

In order to do that, we must provide this error handling at a higher level than SSL. Therefor, we need to provide something in a higher layer. In this protocol, the first thing that the server will send to the client on connection will be: “OK\r\n” if everything is okay, or some error string that will explain the issue, otherwise. This turned out to be rather involved, actually.

First, I had to ask OpenSSL to send the “please gimme a client cert”:

Note the callback? It will accept everything, since doing otherwise means that we are going to just disconnect with very bad error experience. After completing the SSL_accept() call, I’m explicitly checking the client certificate, like so:

I have to say, the sheer amount of code (and explicit error handling) can be exhausting. And the fact that C is a low level language is something that I certainly feel. For example, I have this piece of code:

This took a long time to write. And it should have been near nothing, to be honest. We get the digest of the certificate, convert it from raw bytes to hex and then do a case insensitive comparison on the registered certificates that we already know.

To be honest, that piece of code was stupidly hard to write. I messed up and forgot that snprintf() will always insert a null terminator and I specified that it should insert only 2 characters. That led to some confusion, I have to say.

The last thing that this piece of code does is scan through the registered certificates and figure out if we are familiar with the one the client is using. You might note that this is a linked list, because this is pretty much the only data structure that you get in C. The good thing here is that it is easy to hide the underlying implementation, and this is a just some code that I write for fun. I don’t expect to have a lot of certificates to deal with, nor do I expect o have to optimize the connection portion of this code, but it bothers me.

In C#, I would use a dictionary and forget about it. In C, I need to make a decision on what dictionary to use. That adds a lot of friction to the process, to be honest. If I wasn’t using a dictionary, I would use a List<T>, which has the advantage that it is contiguous in memory and fast to scan. For completion’s sake, here is the code that register a certificate thumbprint:

As you can see, this isn’t really something that interesting. But even though it doesn’t really matter, I wonder how much effort it will take to avoid the usage of a link list… I think that I can do that without too much hassle with realloc. Let’s see, here is the new registration code:

And here is how I’m going to be iterating on it:

So that isn’t too bad, and we can be sure that this will be all packed together in memory. Of course, this is pretty silly, since if I have enough certificates to want to benefits from sequential scan performance, I might as well use a hash…

Next topic I want to tackle is actually handling more than a single connected client at a time, but that will be for the next post. The code for this one is here.

Nov 29 2018

Refactoring C CodeStarting with an API

time to read 3 min | 406 words

Tweet Share Share 0 comments

Tags:

development

In my last post, I introduced a basic error handling mechanism for C API. The code I showed had a small memory leak in it. I love that fact about it, because it is hidden away and probably won’t show up very easily for production code until very late in the game. Here is the fix:

The problem was that I was freeing the error struct, but wasn’t freeing the actual error message. That is something that is very easy to miss, unfortunately. Especially as we assume that errors are rare. With that minor issue out of the way, let’s look at how we actually write the code. Here is the updated version of creating a connection, with proper error handling:

The calling code will check if it got NULL and then can decide whatever it should add it’s own error (for context) and return a failure to its own caller or handle it. The cleanup stuff is still annoying with goto, but I don’t believe that there is much that can be done about it.

I’m going to refactored things a bit, so I have proper separation and an explicit API. Here is what the API looks like now:

This is basically a wrapper around all the things we need to around SSL. It handles one time initialization and any other necessary work that is required. Note that I don’t actually expose any state outside, instead just forward declaring the state struct and leaving it at that. In addition to the overall server state, there is also a single connection state, whose API looks like this:

Now, let’s bring it all together and see how it works.

This is a pretty trivial echo server, I’ll admit, but it actually handles things like SSL setup, good error handling and give us a good baseline to start from.

The code for this series can be found here and this post’s code is this commit. Next, I want to see what it would take to setup client authentication with OpenSSL.

Nov 28 2018

RavenDB 4.2 FeaturesLet’s get colorful

time to read 1 min | 120 words

Tweet Share Share 2 comments

Tags:

raven

New in RavenDB 4.2 is the theming support for the studio. If you don’t like the dark theme, you now have a bunch more options.

You can access this from the top right:

Which will give you:

And here is how this looks like after you made the selection:

You can play around with this (and with all the new RavenDB 4.2 features) in our live test environment.

Nov 27 2018

Refactoring C codeError handling is HARD, error REPORTING is much harder

time to read 4 min | 792 words

Tweet Share Share 7 comments

Tags:

development

As part of my usual routine, I’m trying out writing some code in C, to get a feeling for a different environment. I wanted to build something that is both small enough to complete in a reasonable time and complex enough that it would allow to really explore how to use things. I decided to use C (not C++) because it is both familiar and drastically different from what I usually do. The project in question? Implementing the network protocol I wrote about here. Another part of the challenge that I set out to myself, this should be as production quality code as I could make it, which means paying all the usual taxes you would expect.

The first thing that I had to do was to figure out how to actually networking and SLL working in C. Something that would take me 5 minutes in C# took me a several hours of exploring and figuring things out. Eventually, I settled down on the obvious choice for SSL with OpenSSL. That is portable, reasonably well documented and seems fairly easy to get started with.

Here is some code from the sample TCP server from the OpenSSL documentation:

This is interesting, it show what needs to be done and does it quite clearly. Unfortunately, this isn’t production quality code, there are a lot of stuff here that can go wrong that we need to handle. Let’s wrap things in proper functions. The first thing to do is to capture the state of the connection (both its socket and the SSL context associated with it). I created a simple structure to hold that, and here is how I close it.

So far, pretty simple. However, take a look on the code that creates the connection:

As you can see, quite a lot of this function is error handling and cleanup. This code looks like it does the right thing and cleanup after itself in all cases. So far, so good, but we are still missing a very important component. We handled the error, but we haven’t reported it. In other words, from the outside, any failure will look exactly the same to the caller. That is not a good thing if you want to create software that is expressive and will tell you what is wrong so you can fix it.

If this was C#, I would be throwing an exception with the right message. As this is C, we run into some interesting issues. As part of your error handling, you might run into an error, after all… In particular, good error handling usually requires string formatting, and that can cause issues (for example, being unable to allocate memory). There is also the issue of who frees the memory allocated for errors, of course.

Typically, you’ll see code that either prints to the console or to a log file and it is usually a major PITA. OpenSSL uses a thread local error queue for this purpose, which gives you the ability to hold a context, but it requires an awful lot of ceremony to use and doesn’t seem to be useful for generic error handling. I decided to see if I can do a quick and dirty approach to the same problem, with something that is slightly more generic.

The purpose was to get reasonable error handling strategy without too much hassle. Here is what I came up with. This is a bit much, but I’ll explain it all in a bit.

Whenever we get an error, we can call push_error. Note that I included an error code there as well, in addition to the string. This is if there will be a need to do programtic error handling, for example, to handle a missing file that is reported “upstairs”.

I also included how we get errors out, which is pretty simple. You can figure out from here how you would handle error handling in code, I’m going to assume.

Here is how this is used:

And the output that this will print to the console would be:

consoleapplication4.cpp:211 - main() - 1065 can't open db: northwind
consoleapplication4.cpp:210 - main() - 2 cannot open file: raven.db

And yay, I invented exceptions!

Also, I have a memory leak there, can you find it?

Nov 26 2018

Design exerciseA generic network protocol

time to read 10 min | 1862 words

Tweet Share Share 4 comments

Tags:

I spent some idle time thinking about the topic lately, and I think that this can be a pretty good blog post. The goal is to have a generic application level network protocol for client/server and server/server communication. There are a lot of them out there, and this isn’t actually design work that I intend to implement. It is a thought exercise that run through a lot of the reasoning behind the design.

There are network protocol that are specific for a purpose, and that is reflected in a lot of hidden assumptions they have. I’m trying to conceive a protocol that would be generic enough to be used for a wide variety of cases. Here are the key points in my thinking.

Don’t invent the wheel
Security is a must
RPC is the most common scenario
Don’t cause problem for the client / server by messing the protocol
Debuggability can’t be bolted on
Push model should also work

By not re-inventing the wheel I mean that this should be relatively simple to implement. That pretty much limits us to TCP as the underlying mechanism. I’m actually going to specify a stream based communication protocol, instead, though. With the advent of QUIC, HTTP/3, etc, that might actually be useful. But the whole idea is that the underlying abstraction that we want to rely on is a connection between two nodes that is a stream. All the issue of packet ordering, retries, congestions, etc are to be handled at that level.

At this day and age, security is no an optional requirement, and that should be incorporated into the design of the system from the get go. I absolutely adore TLS, and it solves a whole bunch of problems for us at the same time. It give us a secure channel, it handles authentication on both ends and is is both widely understood and commonly used. This means that selecting TLS as the security mechanism, we aren’t limiting any clients. So the raw protocol we rely on is TLS/TCP, with authentication done using client certificates.

By far the most common usage for a network protocol is the request/reply model. You can see it in HTTP, SMTP, POP3 and most other network protocol. There is a problem with this model, though. A simply request/reply protocol is going to cause scalability and management issues for the users. What do I mean by that? Look at HTTP as a great example. It is a simple request/reply protocol, and that fact has caused a lot of complexity for users. If you want to send several requests in parallel, you need multiple connections, and head of line queue is a real problem. This impact both client and servers and can cause a great deal of hardship for all. Indeed, this is why HTTP/2 allows framing and to send multiple requests without specifying the order in which the server reply to them.

A better model would be to break that kind of dependency, and I’m likely going to be modeling at least some of that on the design of HTTP/2.

Speaking of which, HTTP/2 is a binary protocol, which is great if you have the entire internet behind you. If you are designing a network protocol that isn’t going to be natively supported by all and sundry, you are going to need to take into account the debuggability of the solution. The protocol I have in mind is a text based protocol and should be usable from the command line by using something like:

openssl s_client -connect my_server:4833

This will give you what is effectively a shell into the server, and you should be able to write commands there and get their results. I used to play around a lot with network protocols and being able to telnet to a server and manually play with the commands is an amazing experience. As a side affect of this, it also means that having a trace of the communication between client and server will be amazingly useful for diagnostics down the line. For that matter, for certain industries, being able to capture the communication trace might be an absolute requirement for auditing purposes (who did what and when).

So, here is what we have so far:

TLS/TCP as the underlying protocol
Text based so we can manually

What we are left with, though, is what is the actual data on the wire going to look like?

I’m not going to be too fancy, and I want to stick closely to stuff that I know that works. The protocol will use messages as the indivisible unit of communication. A message will have the following structure (using RavenDB as the underlying model):

GET employees/1-A employees/2-B
Timeout: 30
Sequence: 293
Include: ReportsTo

PUT “document with spaces”
Sequence: 294
Body: chunked

39
<<binary data 39 bytes in len>>

So, basically, we have a line oriented protocol (each line separated by \r\n, and limited to a well known maximum size). A message starts with a command line, which has the following structure:

cmd (token) args[] (token)

Where token is either a sequence of characters without whitespace or a quoted string if it contains whitespace.

Following the command line, you have the headers, which pretty much follow the design of HTTP headers. They are used to pass additional information, such as the timeout for that particular command, command specific data (like the Include header on the first command) or protocol details (like specifying a timeout for that particular command or that the second command has a body and how to read it). A command ends with an empty line, and then you have an optional body.

The headers here serve a very important role. As you can see, they are key for protocol flexibility and enabling versioning. It give us a good way to add additional data after the first deployment without breaking everything.

Because we want to support people typing this manually, we’ll probably need to have some way to specify message bodies that a human can type on their own without having to compute sizes upfront. This is something that will likely be needed only for human input, so we can probably define a terminating token that would work, not focusing on this because it isn’t a mainline feature, but I wanted to mention this because debuggability isn’t a secondary concern.

You might have noticed a repeated header in the commands I sent. The Sequence header. This one is optional (when human write it) but will be very useful for tracing, so tools will always add it. A stream connection is actually composed of two channels, the read and the write. On the read side of this protocol, we read a full command from the network, hand it off to something else to process it and go right back to reading from the network. This design is suitable for the event based systems that has proven to be so useful to scale the amount of work a server can handle. Because we can start reading the next command while we process the current ones, we greatly reduce the number of connections we require and enable a lot more parallel work.

Once a message has been processed, the reply is sent back to the client. Note that there is no requirement that the replies will be sent in the same order as the requests that initiated them. That means that an expensive operation on the server side isn’t going to block cheaper operations that came after it, which is again, important for the overall speed of the system. It also lends itself quite nicely for an event loop based processing. The sequence number for the request is used in the reply to ensure that the client can properly correlate the reply to the relevant request.

On the client side, you write a command to the network. When reading from the network, you need to keep track of the sequence number you sent and route it back to the right caller. The idea here is that on the client side, you may have a single connection that is shared among several threads, reducing the number of overall connections you need and getting better overall utilization from the network.

A nice property of this design is that you don’t have to do things this way. If you don’t want to support concurrent requests / replies, just have a single connection and wait to read the reply from the server whenever you make a request. That give you the option of simple stateful approach, but also an easy upgrade path down the line if / when you need it. The fact that the mental model of the user is request/reply is a great help, to be honest, even if this isn’t what is actually going on. This greatly reduce the amount of complexity that a user need to keep in their head.

Some details on the protocol would need gentle massaging, to ensure that a human on the command line can type reasonable commands, but that is fairly straightforward. The text based nature of the communication also lends itself nicely to tracing / audits. At the client or server levels, we can write <connection-id>.trace file that will have all the reads and writes on that connection. During debugging, you can just tail <connection-id> the right file and see exactly what is going on, or just zip them for achieve for auditing.

Speaking of zipping, let’s consider the following command:

OPTIONS gzip

This command on the connection can do things like change how we encode the data, in this case, ask the server to use gzip for all reads and writes from now on. The server can reply with a message (uncompressed) that it is now switching compression and everything from that point forward will be compressed. Note that unlike HTTP compression, we can get the benefits of compression across multiple requests, and given that most requests / reply have a lot of the same structure, likely benefit quite us by a lot.

The last topic I listed is the notion of push operations and how they should be handled. Given that we don’t have a strict request/reply model, there is an obvious way for the server to send additional data “out of band”. A client can request to be notified by the server of certain things, and the server will make a note on that and just send the data back at some later time. There is obviously the need to correlate the push notification to the original request, but that is why we have the headers for. A simple CorrelationId header on the original request and the push notification will be sufficient for the client side to be able to route that to the right callback.

I think that this is pretty much it, this should cover enough to give you a clear idea about what is required and I believe that it is enough for a thought exercise. There are a lot of other details that should probably be answered, for example, how do you deal with very large responses (break them to multiple messages, I would assume, to avoid holding up the connection for other requests), but that should be the gist of it.

Nov 23 2018

Production postmortemThe ARM is killing me

time to read 9 min | 1610 words

Tweet Share Share 5 comments

Tags:

“If a tree falls in a forest and no one is around to hear it, does it make a sound?” is a well known philosophical statement. The technological equivalent of this is this story. We got a report that RavenDB was failing in the field. But the details around the failure were critical.

The failure happened on the field, literally. This is a system that is running an industrial robot using a custom ARM board. The failure would only happen on the robot on the field, would not reproduce on the user’s test environment or on our own systems. Initially, that was all the information that we had: “This particular robot works fine for a while, but as soon as there is a break, RavenDB dies and needs to be restarted”. That was the first time I run into a system that would crash when it went idle, instead of dying under load, I have to say.

My recommendation that they would just keep the robot busy at all time was shot down, but for a while, we were in the dark. It didn’t help that this was literally a custom ARM machine that we had no access to. We finally managed to figure out that the crash was some variant of SIGSEGV or SIGABRT. That was concerning. The ARM machine in question is running on 32 bits, and the worry was that our 32 bits code was somehow doing a read out of bound. This is a crash in production, so we allocated a couple of people to investigate and try to figure out what was going on.

We started by doing a review of all our 32 bits memory management code and in parallel attempted to reproduce this issue on a Raspberry Pi (the nearest machine we had to what was actually going on). We got a lucky break when we someone did manage to kill the RavenDB process on our own lab somehow. The exit code was 139 (Segmentation fault), but we weren’t sure what was actually going on. We were trying all sort of stuff on the machine, seeing what would cause this. We basically fed it all sorts of data that we had laying around and saw if it would choke on that. One particular data export would sometimes cause a crash. Sometimes. I really really hate this word. That meant that we were stuck with trying to figure out something by repeatedly trying and relying on the law of averages.

It took several more days, but we figured out that a certain sequence of operations would reliably cause a crash within 5 – 30 minutes. As you can imagine, this made debugging pretty hard. The same sequence of operations on Intel machines, either 32 bits or 64 bits worked without issue, regardless of many times we have repeated them.

We followed several false trails with our investigation into RavenDB’s memory management’s code in 32 bits. We had a few cases where we thought that we had something, but nothing popped up. We have instrumented the code and verified that everything seemed kosher, and it certainly did, but the system still crashed on occasion.

RavenDB usually relies on mmap() to access the data on disk, but on 32 bits, we couldn’t do that. With an addressable memory of just 2 GB, we cannot map the whole file to memory if it is too large. Because of that, we map portions of the file to memory as needed for each transaction. That led us to suspect that we were somehow unmapping memory while it was still in use or something like that. But we have gone through the code with a fine tooth comb and got nothing. We used strace to try to help point out what is going on and we could see that there were no surprise calls to unmap() that shouldn’t be there.

What was really nasty was the fact that when we failed with SIGSEGV, the error was always on an address just past the area of memory that we mapped. This lead us to suspect that we had an out of boundary write and led to a chase for that rouge pointer operation. We instrumented our code ever more heavily, but weren’t able to find any such operation. All our reads and writes were in bound, and that was incredibly frustrating. RavenDB is a CoreCLR application. As such, debugging it on an ARM device is… challenging. We tried lldb and gdb. Both allow unmanaged debugging, but even with lldb, we couldn’t debug managed code or even just pull the managed stack properly from ARM. Eventually we found this extension which allow to do SSH debugging on the Raspberry PI from a Windows machine.

That helped, and we finally figured out where in our managed code the error happened. This always happened during a copy of memory from a document write to a scratch buffer in a memory mapped file. The entire thing was wrapped in boundary checks and everything was good.

We went back to the drawing board and attempted to set it on fire, because it was no good for us. Once we put the fire out, we looked at what remained and had an Eureka! moment. Once of the differences between ARM and x86/x64 machines is in how they treat alignment. In x64/x86 alignment is pretty much a non issue for most operations. On ARM, however, an unaligned operation will cause a CPU fault. That led us to suspect that the SIGABRT error we got was indeed an alignment issue. Most of our code is already aligned on memory because while it isn’t mandatory on x64/x86, it can still get better perf in certain cases, but it is certainly possible that we missed it.

We discovered a horrifying problem:

We were using the CopyBlock method, and obviously that was the issue, right? We wrote a small test program that simulated what we were doing and used unaligned CopyBlock and it just worked. But maybe our situation is different?

Using CopyBlockUnaligned on x86 led to a 40% performance drop (we call this method a lot) and initially it looked like it fixed the problem on ARM. Except that on the third or forth attempts to reproduce the problem, we run into our good old SIGSEGV again, so that wasn’t it. This time we went to the drawing board and broke it.

During this time, we have managed to capture the error inside the debugger several times, here is what this looked like:

Reading ARM assembly is not something that I’m used to do, so I looked at the manual, and it looks like this instruction is to store multiple registers in descending order and… no clue beyond that. It didn’t make any sort of sense to us.

At this point, we were several weeks and four or five people into this investigation (we consider such issues serious). We have instrumented our code to the point where it barely run, we could manage to reproduce the error in a relatively short time and we were fairly convinced that we were doing things properly. Going over the kernel code for memory mapping and unmapping several times, stracing, debugging, everything. We were stumped. But we also had enough data at this point to be able to point a fairly clear picture of what was going on. So we opened an issue for the CoreCLR about this, suspecting that the issue is in the implementation of this CopyBlockUnaligned.

We got a strange response, though: “This assembly code doesn’t make any sense”. I did mention that I have no idea about ARM assembly, right? We tried reproducing the same thing in gdb, instead of lldb and got the following assembly code:

This looked a lot more readable, to be sure. And it was extremely suspicious. Let me explain why:

The faulting instruction is: ldr r3, [r0, #0]

What this says is basically, read a word from the address pointed to by r0 (with 0 offset) into r3.

Now, r0 in this case has this value: 0x523b3ffd. Note the last three characters, ffd.

We are running this on a 32 bits machine, so a word is 4 bytes in side. FFD+4 = 1001

In other words, we had a read beyond the current page boundary. In most cases, the next page is mapped, so everything goes smoothly. In some cases, the next page is not mapped, so you are going to get an access violation trying to read a byte from the next page.

The fix for this is here:

This is literally a single character change. And probably the worst lines of codes / time invested that I have ever seen for any bug. Actually, there wasn’t even any code change in RavenDB’s codebase, so that is 0 lines of code / 4 people x 4 weeks.

The good thing is that at least we have proven that the 32 bits memory code is rock solid, and we have a lot better understanding on how to resolve the next issue.

Nov 21 2018

RavenDB 4.2 FeaturesPull replication & edge processing

time to read 4 min | 603 words

Tweet Share Share 3 comments

Tags:

raven

The flagship feature for RavenDB 4.2 is the graph queries, but there are a lot of other features that also deserve attention. One of the more prominent second string features is pull replication.

A very common deployment pattern for RavenDB is to have it deployed to the edge. A great example is shown in this webinar, which talk about deploying RavenDB to 36,000 locations and over half a million instances. To my knowledge, this is one of the largest single deployments of RavenDB, this deployment model is frequent in our users.

In the past few months I talked with users would use RavenDB on the edge for the following purposes:

Ships at sea, where RavenDB is used to track cargo and ongoing manifests updates. The ships do not have any internet connection while at sea, but connect to the head quarters when they dock.
Clinics in health care providers, where each clinic has a RavenDB instance and can operate completely independently if the network is down, but communicates to the central data center during normal operations.
Industrial robots, where each robot holds their own data and communicate occasionally with a central location.
Using RavenDB as the backing end for an application running on tablets to be used out in the field, which will only have connection to the central database when back in the office.

We call such deployments the hub & spoke model and distinguish between the types of nodes that we have. We have edge nodes and the central node.

Now, to be clear, both the edge and the central can be either a single node or a full cluster, it doesn’t matter to our discussion.

Pull replication in RavenDB allows you to define a replication definition on the central once. On each of the edges nodes, you define the pull replication definition and that is pretty much it. Each edge node will connect to the central location and start pulling all the data from that database. On the face of it, it seems like a pretty simple process and not much different from external replication, which we already have in RavenDB.

The difference is that external replication is defined on the central node, for each of the nodes on the edges. Pull replication is defined once on the central node and then defined on each of the edges. The idea here is that deploying a new edge node shouldn’t have any impact on the central database. It is pretty common for users to deploy a new location, and you don’t want to have to go and update the central server whenever that happens.

There are a few other aspects of this feature that matters greatly. The most important of them is that it is the edge that initiates the connection to the central node, not the central to the edge. This means that the edge can be behind NAT and you don’t have to worry about tunneling, etc.

The second is about security. Pull replication it its own security measure. When you define a pull replication on the central node, you also setup the certificates that are allowed to utilize that. Those certificates are completely separate from the certificates that are used to access the database in general. So your edge nodes don’t have any access to the database at all, all they can do is just setup the channel for the central node to send them the data.

This is going to make edge deployments and topologies a lot easier to manage and work with in the future.

Nov 19 2018

The return of the aunt of the sister of the friend of the data format vampire

time to read 1 min | 105 words

Tweet Share Share 6 comments

Tags:

bugs

I was talking about frameworks and environments that are hard to work with, and I gave SharePoint as an example of an application that is meant to be a development platform but doesn’t have the right facilities to actually be a good development platform.

I tried to explain (to a non technical person) why that is the case and I run into this page. That contained the following snippet:

I’m sorry, dear non techie, I know you won’t get that, but I believe that the persecution rests.

Nov 14 2018

Use cases for MADV_DONTNEED in Voron

time to read 2 min | 341 words

Tweet Share Share 4 comments

Tags:

development

The rant in this video is an absolute beautiful one. I run into this rant figuring out how MADV_DONTNEED work and I thought I would give some context on why the behavior is exactly what I want. In fact, you can read my reasoning directly in the Linux Kernel source code.

During a transaction, we need to put the memory pages modified by the transaction somewhere. We put them in temporary storage which we call scratch. Once a transaction is committed, we still use the scratch memory for a short period of time (due to MVCC) and then we flush these pages to the data file. At this point, we are never going to use the data on the scratch pages again. Just leaving them around means that the kernel needs to write them to the file they were mapped from. Under load, that can actually be a large portion of the I/O the system is doing.

We handle that by tell the OS that we don’t actually need this memory and it should throw it away and not write it to disk using MADV_DONTNEED. We are still checking whatever this cause us excessive reads when we do that (when the kernel tries to re-read the data from disk).

There are things that seems better, though. There is MADV_REMOVE, which will do the same and also zero (efficiently) the data on disk if needed, so it is not likely to cause page faults when reading it back again. The problem is that this is limited to certain file systems. In particular, SMB file systems are really common for containers, so that is something to take into account.

MADV_FREE, on the other hand, does exactly what we want, but will only work on anonymous maps. Our scratch files use actual files, not anonymous maps. This is because we want to give the memory a backing store in the case of memory overload (and to avoid the wrath of the OOM killer). So we explicitly define them as file (although temporary ones). h

Nov 13 2018

RavenDB Node.JS client updated (now supporting subscription)

time to read 1 min | 85 words

Tweet Share Share 0 comments

Tags:

raven

The node.js RavenDB client had a milestone release last month, which I only now got to talking about. The most important factor about this release is that now node.js users can use subscriptions to process data from RavenDB.

Here is how you can do this:

This is one of the most powerful abilities RavenDB have, you will continuously get new orders in your script and be able to process them. This make it very easy to build features such as data pipelines, background processing, etc.

Oren Eini

Oren Eini

CEO of RavenDB

Refactoring C CodeGiving good SSL errors to your client…

Refactoring C CodeStarting with an API

RavenDB 4.2 FeaturesLet’s get colorful

Refactoring C codeError handling is HARD, error REPORTING is much harder

Design exerciseA generic network protocol

Production postmortemThe ARM is killing me

RavenDB 4.2 FeaturesPull replication & edge processing

The return of the aunt of the sister of the friend of the data format vampire

Use cases for MADV_DONTNEED in Voron

RavenDB Node.JS client updated (now supporting subscription)

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed