Modeling hierarchical structures in RavenDB

time to read 5 min | 829 words

The question pops up frequently enough and is interesting enough for a post. How do you store a data structure like this in Raven?

The problem here is that we don’t have enough information about the problem to actually give an answer. That is because when we think of how we should model the data, we also need to consider how it is going to be accessed. In more precise terms, we need to define what is the aggregate root of the data in question.

Let us take the following two examples:

As you can imagine, a Person is an aggregate root. It can stand on its own. I would typically store a Person in Raven using one of two approaches:

Bare references

Denormalized References

{
  "Name": "Ayende",
  "Email": "[email protected]",
  "Parent": "people/18",
  "Children": [
        "people/59",
        "people/29"
  ]
}

{
  "Name": "Ayende",
  "Email": "[email protected]",
  "Parent": { "Name": "Oren", "Id": "people/18"},
  "Children": [
        { "Name": "Raven", "Id": "people/59"},
        { "Name": "Rhino", "Id": "people/29"}
  ]
}

The first option is bare references, just holding the id of the associated document. This is useful if I only need to reference the data very rarely. If, however, (as is common), I need to also show some data from the associated documents, it is generally better to use denormalized references, which keep the data that we need to deal with from the associated document embedded inside the aggregate.

But the same approach wouldn’t work for Questions. In the Question model, we have utilized the same data structure to hold both the question and the answer. This sort of double utilization is pretty common, unfortunately. For example, you can see it being used in StackOverflow, where both Questions & Answers are stored as posts.

The problem from a design perspective is that in this case a Question is not a root aggregate in the same sense that a Person is. A Question is a root aggregate if it is an actual question, not if it is a Question instance that holds the answer to another question. I would model this using:

{
   "Content": "How to model relations in RavenDB?",
   "User": "users/1738",
   "Answers" : [
      {"Content": "You can use.. ", "User": "users/92" },
      {"Content": "Or you might...", "User": "users/94" },
   ]
}

In this case, we are embedding the children directly inside the root document.

So I am afraid that the answer to that question is: it depends.

Tweet Share Share 20 comments

Tags:

Raven

Comments

23 Jun 2010
13:19 PM

Nathan Stott

In CouchDB, you would not want to embed the answers to a question directly in the document because if two people answered the question at about the same time, or if you were using replication and they answered it between replication cycles, then you would get a 409 (conflict). If you add the answers as documents of their own, two people adding at the same time will not cause conflicts.

Would this scenario not be a problem with RavenDB? What about RavenDB makes the proper choice of strategy different?

23 Jun 2010
13:38 PM

Ayende Rahien

Nathan,

That is a good point. WRT replication, Raven would be in the same situation as CouchDB, but Raven also support the notion of partial updates, things like: "Add this answer to the Answers array"

Which means that two concurrent updates can both succeed.

23 Jun 2010
13:41 PM

Nathan Stott

How do the partial updates work? Does the app have to specify that it is doing a partial update or does Raven do this behind the scenes? Got a link handy?

23 Jun 2010
14:02 PM

Ayende Rahien

ravendb.net/documentation/docs-http-api-patch

23 Jun 2010
14:30 PM

Brian Vallelunga

I have a similar question to Nathan's. Given the StackOverflow model you presented, if two people answer the question at about the same time, won't you get conflicts storing the data to the db.

I can imagine the following scenario:

1) Person A answers question.

2) Get question document for Person A

3) Append answer A

4) Person B answers question.

5) Get question document for Person B

6) Save Person A's answer to DB.

7) Append answer B

8) Save Person B's answer to DB.

If we let the last-in win, Person A's answer is completely gone. I've actually avoided working a part of my application that requires this sort of modeling because I haven't figured out what to do yet.

Obviously storing the answers as entities themselves would help, but we'd almost always want to access the data as one document in this situation. Can you expand on a strategy here?

Thanks

23 Jun 2010
14:33 PM

Ayende Rahien

Brian,

As I told Nathan, the answer for that is to use Raven's partial document update support, which would resolve the issue

23 Jun 2010
15:13 PM

Jason Young

Interesting!

So... for a limitlessly recursive heirarchy (e.g. parent-child relationship), you want each element in its own document, but for depth-limited relationships (e.g. question-answer), you can put all the "children" in a collection in the "parent" document, and "children" need not have documents of their own, correct? If so, that makes sense to me.

23 Jun 2010
15:19 PM

Brian Vallelunga

Ahh, thanks, I see now. I read that as only being available with replication. Reading the mailing list, it seems there is client support at the store level for this. I haven't seen any examples of it though. I'll go ahead and ask on the list.

23 Jun 2010
15:21 PM

Ayende Rahien

Jason,

Yes...

Although I would put it differently

24 Jun 2010
03:01 AM

DavidChan

seems like client api doesnot support the command "patch" ,right?

24 Jun 2010
04:39 AM

c# model

maybe i'm missing something here but the Person denormalized example saves only id and name. When you query the model how does children and parent convert back to a whole c# person (with own parent and children) ?

24 Jun 2010
10:16 AM

Matt Warren

@c# model

You can use the id string and load the document based on that, i.e.

var person = session.Load <person("people/59");

24 Jun 2010
10:18 AM

Matt Warren

Just to add: Load is a generic method that need to have the type specified as "Person", but it got stripped out in my answer.

25 Jun 2010
13:50 PM

Daniel Cohen

@Matt warren , I get this if you go with the bare reference approach and then in you POCO class you have a string ParentId { get;set;}

but in the denormalized way what kind of class you get in return ?? it's not an id field nor a full Person class

btw "c# model" was intended to be the title not the name, a funny mistake :)

28 Jun 2010
11:46 AM

sebastien

What would be the cost of updating name in partialy denormalized reference ?

28 Jun 2010
13:07 PM

Ayende Rahien

Sebastian,

It shouldn't be very expensive.

14 Jul 2010
22:48 PM

Martin

Is there a way to only load a small part of the Answers for paging (etc if there will be hundreds or thousands of them) soo the database wont have to send all of them ?

14 Jul 2010
22:55 PM

Martin

... and what happens if a Username is stored for every Answer as it always needs to be displayed, but the user is allowed to change his Username ?

Will i have to loop through all documents in the database where the Username is stored (almost everywhere there is a user action), and update the Username? will it be a problem ?

Thanks for a great blog.

20 Jul 2010
11:18 AM

Ayende Rahien

Martin,

Yes, you can.

You create an index that project those out, and then query on that

20 Jul 2010
11:18 AM

Ayende Rahien

Martin,

Changing username is a rare occasion, you can handle that as a background process

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB