Modeling hierarchical structures in RavenDB
The question pops up frequently enough and is interesting enough for a post. How do you store a data structure like this in Raven?
The problem here is that we don’t have enough information about the problem to actually give an answer. That is because when we think of how we should model the data, we also need to consider how it is going to be accessed. In more precise terms, we need to define what is the aggregate root of the data in question.
Let us take the following two examples:
As you can imagine, a Person is an aggregate root. It can stand on its own. I would typically store a Person in Raven using one of two approaches:
Bare references | Denormalized References |
{ "Name": "Ayende", "Email": "[email protected]", "Parent": "people/18", "Children": [ "people/59", "people/29" ] } |
{ "Name": "Ayende", "Email": "[email protected]", "Parent": { "Name": "Oren", "Id": "people/18"}, "Children": [ { "Name": "Raven", "Id": "people/59"}, { "Name": "Rhino", "Id": "people/29"} ] } |
The first option is bare references, just holding the id of the associated document. This is useful if I only need to reference the data very rarely. If, however, (as is common), I need to also show some data from the associated documents, it is generally better to use denormalized references, which keep the data that we need to deal with from the associated document embedded inside the aggregate.
But the same approach wouldn’t work for Questions. In the Question model, we have utilized the same data structure to hold both the question and the answer. This sort of double utilization is pretty common, unfortunately. For example, you can see it being used in StackOverflow, where both Questions & Answers are stored as posts.
The problem from a design perspective is that in this case a Question is not a root aggregate in the same sense that a Person is. A Question is a root aggregate if it is an actual question, not if it is a Question instance that holds the answer to another question. I would model this using:
{ "Content": "How to model relations in RavenDB?", "User": "users/1738", "Answers" : [ {"Content": "You can use.. ", "User": "users/92" }, {"Content": "Or you might...", "User": "users/94" }, ] }
In this case, we are embedding the children directly inside the root document.
So I am afraid that the answer to that question is: it depends.
Comments
In CouchDB, you would not want to embed the answers to a question directly in the document because if two people answered the question at about the same time, or if you were using replication and they answered it between replication cycles, then you would get a 409 (conflict). If you add the answers as documents of their own, two people adding at the same time will not cause conflicts.
Would this scenario not be a problem with RavenDB? What about RavenDB makes the proper choice of strategy different?
Nathan,
That is a good point. WRT replication, Raven would be in the same situation as CouchDB, but Raven also support the notion of partial updates, things like: "Add this answer to the Answers array"
Which means that two concurrent updates can both succeed.
How do the partial updates work? Does the app have to specify that it is doing a partial update or does Raven do this behind the scenes? Got a link handy?
ravendb.net/documentation/docs-http-api-patch
I have a similar question to Nathan's. Given the StackOverflow model you presented, if two people answer the question at about the same time, won't you get conflicts storing the data to the db.
I can imagine the following scenario:
1) Person A answers question.
2) Get question document for Person A
3) Append answer A
4) Person B answers question.
5) Get question document for Person B
6) Save Person A's answer to DB.
7) Append answer B
8) Save Person B's answer to DB.
If we let the last-in win, Person A's answer is completely gone. I've actually avoided working a part of my application that requires this sort of modeling because I haven't figured out what to do yet.
Obviously storing the answers as entities themselves would help, but we'd almost always want to access the data as one document in this situation. Can you expand on a strategy here?
Thanks
Brian,
As I told Nathan, the answer for that is to use Raven's partial document update support, which would resolve the issue
Interesting!
So... for a limitlessly recursive heirarchy (e.g. parent-child relationship), you want each element in its own document, but for depth-limited relationships (e.g. question-answer), you can put all the "children" in a collection in the "parent" document, and "children" need not have documents of their own, correct? If so, that makes sense to me.
Ahh, thanks, I see now. I read that as only being available with replication. Reading the mailing list, it seems there is client support at the store level for this. I haven't seen any examples of it though. I'll go ahead and ask on the list.
Jason,
Yes...
Although I would put it differently
seems like client api doesnot support the command "patch" ,right?
maybe i'm missing something here but the Person denormalized example saves only id and name. When you query the model how does children and parent convert back to a whole c# person (with own parent and children) ?
@c# model
You can use the id string and load the document based on that, i.e.
var person = session.Load <person("people/59");
Just to add: Load is a generic method that need to have the type specified as "Person", but it got stripped out in my answer.
@Matt warren , I get this if you go with the bare reference approach and then in you POCO class you have a string ParentId { get;set;}
but in the denormalized way what kind of class you get in return ?? it's not an id field nor a full Person class
btw "c# model" was intended to be the title not the name, a funny mistake :)
What would be the cost of updating name in partialy denormalized reference ?
Sebastian,
It shouldn't be very expensive.
Is there a way to only load a small part of the Answers for paging (etc if there will be hundreds or thousands of them) soo the database wont have to send all of them ?
... and what happens if a Username is stored for every Answer as it always needs to be displayed, but the user is allowed to change his Username ?
Will i have to loop through all documents in the database where the Username is stored (almost everywhere there is a user action), and update the Username? will it be a problem ?
Thanks for a great blog.
Martin,
Yes, you can.
You create an index that project those out, and then query on that
Martin,
Changing username is a rare occasion, you can handle that as a background process
Comment preview