NHibernateStreaming large result sets
Note: I am not feeling very well for the past week or so, which is why I am posting so rarely.
NHibernate is meant to be used in an OLTP system, as such, it is usually used in cases where we want to load a relatively small amount of data from the database, work with it and save it back. For reporting scenarios, there are better alternatives, usually (and before you ask, any reporting package will do. Right tool for the job, etc).
But there are cases where you want to do use NHibernate in reporting scenarios nonetheless. Maybe because the reporting requirements aren’t enough to justify going to a separate tool, or because you want to use what you already know. It is in those cases where you tend to run into problems, because you violate the assumptions that were made while building NHibernate.
Let us imagine the following use case, we want to print a list of book names to the user:
using (ISession s = OpenSession()) { var books = s.CreateQuery("from Book") .List<Book>(); foreach (var book in books) { Console.WriteLine(book.Name); } }
There are several problems here:
- We query on a large result set without a limit clause.
- We read a lot of data into memory.
- We only start processing the data after it was completely read from the database.
What I would like to see is something like this:
while(dataReader.Read()) { Console.WriteLine(dataReader.GetString("Name")); }
This still suffer from the problem of reading a large result set, but we will consider this a part of our requirements, so we’ll just have to live with it. The data reader code has two major advantages, it uses very little memory, and we can start processing the data as soon as the first row loads from the database.
How can we replicate that with NHibernate?
Well, as usual with NHibernate, it is only a matter of finding the right extension point. In this case, the List method on the query also has an overload that accepts an IList parameter:
That make it as simple as implementing our own IList implementation:
public class ActionableList<T> : IList { private Action<T> action; public ActionableList(Action<T> action) { this.action = action; } public int Add(object value) { action((T)value); return -1; } public bool Contains(object value) { throw new NotImplementedException(); } // ... }
And now we can call it:
using (ISession s = OpenSession()) { var books = new ActionableList<Book>(book => Console.WriteLine(book.Name)); s.CreateQuery("from Book") .List(books); }
This will have the exact same effect as the pervious NHibernate code, but it will start printing the results as soon as the first result loads from the database. We still have the problem of memory consumption, though. The session will keep track of all the loaded objects, and if we load a lot of data, it will eventually blow out with an out of memory exception.
Luckily, NHibernate has a ready made solution for this, the stateless session. The code now looks like this:
using (IStatelessSession s = sessionFactory.OpenStatelessSession()) { var books = new ActionableList<Book>(book => Console.WriteLine(book.Name)); s.CreateQuery("from Book") .List(books); }
The stateless session, unlike the normal NHibernate session, doesn’t keep track of loaded objects, so the code here and the data reader code are essentially the same thing.
More posts in "NHibernate" series:
- (19 Nov 2010) Complex relationships
- (27 Jun 2010) Streaming large result sets
- (25 May 2009) Why do we need to specify the query type twice?
- (20 Mar 2009) Avoid identity generator when possible
- (26 Mar 2007) Nullable DateTime Issues
- (08 Jan 2007) Fetching multiply collections in one roundtrip
Comments
awesome - I guess you could also create a list that creates an IEnumerator that supplies the results one by one. Would the effect be similar?
<book ();
__For reporting scenarios, there are better alternatives, usually (and before you ask, any reporting package will do. Right tool for the job, etc).
I'll bite, what's everyone use? I've been using Quartz.Net to run heavy reporting jobs at night and as sort of a load balancer (so really I use NHibernate, as Quartz.net just triggers the report). I wasn't aware of any report specific tools. Of course searching ".NET Reporting" brings up a bunch of garbage ...
Partial solution would be also to use Enumerable (if books table is not extra large):
using (ISession s = OpenSession())
{
<book();
}
However there is performance penalty:
(select Id from Book ) * 1
(select * from Book Where Id = :id) * number of records
Ideally I would like to see in NHibernate Iterable interface which would work like this:
using (ISession s = OpenSession())
{
<book(); // DB not touched; books contains only proxy
}
I don't know if this is technically possible. I know that for my particular needs it would be very convenient - calculating complex fees for each account periodically. Currently we are using stored procedures for things like this.
Effectively an NHibernate-friendly cursor.
I nice demonstration of consuming large amounts of data responsively without allocating the memory to store the entire result set.
On one hand I want to say: "That violates semantics of IList completely!" On the other, I don't see a better solution.
Wheres NH3! Heard no news about NH in ages.
Hi Ayende
Is that really true?
I remember quite some while back I wanted to stream results and posting to nhusers made it seem that was not the case:
groups.google.com/.../d9423d9a45ff9c62
One of the findings in that thread was:
The SqlDataReader is wrapped in a Driver.NDataReader which loads
everything into an internal list!
Maybe like you said all that was missing is the correct extension point?
Ajai
This is precisely what Rx is for. When you created that ActionableList, you actually just reinvented a push-model (akin to IObservable in Rx). I for one would be interested to see Rx (and other .net features) getting built into NH that would enrich it from its java ancestry
Not to mention that if we use Rx (instead of reinventing another ActionableList), we'll also get all the nice works those guys have been working hard on at Redmond, e.g. Linq support, parallelism. Hence..
// not only caters for simplest use-cases
IObservable books = s.CreateQuery("from Book")
books.Subscribe(book=> Console.Writeline(book.Name));
// but also fancy stuff
IObservable customers = s.CreateQuery("from Customer").Stream();
var mayLike = from book in books
mayLike.Subscribe(x=>
Maybe a bad example, but thats beside the point.
I thought it's a really neat abstraction for a "push-model queryable cursor" :)
Ajai,
The use of NDataReader depends on the exact mix of what you are doing, the driver that you use and the configuration option.
Yes, if it is wrapped, it isn't good, but it is controllable.
Does this work with Crtieria API?
Yes
Doesn't this break NHibernate.Linq which only work under ISession and In essence breaks people's generic repository pattern?
@Frank,
Probably, but I don't really like that attempt anyway
So what are your suggestions or pointers? To me, it doesn't make sense to have IStatelessSession and ISession not inherit from some other common interface and I believe there's already a suggested patch (NH-2211) which doesn't look like it made it into the trunk (3.0).
@Frank,
The problem is that there really aren't is-a between stateless and regular session.
They serve different purposes.
I guess the folks who did the NHibernate.Linq thought otherwise. Whether or not you're reading data from ISession or IStatelessSession, the behavior is the same, it's just one does not do any tracking and the when the foundation is not designed properly, we start seeing things like Linq not working on IStatelessSession. I guess we can agree to disagree but I thank you for your input.
Not really, no.
Sure, on the surface it looks like that is all it is doing, but they produce quite a different behavior.
Is there a nice way to terminate data fetching when we use this approach?
I mean, say, we need 100 objects that meet a criteria that can only be enforced in-memory.
Bahador ,
No
I tried this approach while streaming a few MBs over UMTS - it takes over 55 seconds. Unfortunately no performance impact was measureable. It seems that the streaming starts as soon as all the data was loaded from the database. Is there something I am missing or is this method not working when streaming from the database directly row by row (like the DataReader)?
Thilo,
There are several factors at play here.
a) you may be using an ADO.Net provider that will only push results to NHibernate after a significant amount has been reached.
b) you may be using a buffered output.
The value of streaming is reducing the time to first byte, and you need to ensure that you deal with this in both ends.
Comment preview