Coding out of procrastination

A developer’s rants and light bulb moments!

Archive for the ‘yield’ tag

Saved by a yield – Bulking NHibernate Read-only Data

without comments

 

I’ve just finished working on a service which exports information from a legacy database once per day, in a nutshell retrieve, map, publish.

In a four-step process it works as follow:

  • Query the data layer for records modified after last run date (~24hrs)
  • Load the records into a domain model (responsibility of NHibernate)
  • Convert the collection to our flat shared schema DTOs (AutoMapper)
  • Publish the messages to the bus (NServiceBus / MSMQ)

Whilst performance wasn’t a noted priority it still had to complete the daily job within a few hours and with the resources a basic server afforded. In the initial stages of implementation, I had my integration tests running against a SQLite database and performance wise everything was progressing smoothly. However when we pointed it against a production database (SQL Server) with a six figure number of aggregate roots instead of less than 100 for test data we soon hit problems, dreaded out of memory exceptions. 

It didn’t take long to pinpoint the issue was related the data access strategy, bringing all the entities out of the database at once made the Session (not helped by its change tracking etc..) balloon at shocking rate, in a hindsight a bad approach to begin with. This was a showstopper, and with it approaching 5pm, and happy hour on the beach finishing at 7pm, I could already see this turning out to be disappointing.

I contemplated a few ideas, changing some settings with regards to mutability (we are doing read-only), switching to an IStatelessSession, and even polluting the consumer to use some paging mechanism…. I played around for while and then I thought about a yield inside the service layer, essentially implementing my own iterator to keep the unit of work small and thus underlying NHibernate session minimal.

In my service layer and wherever possible I favour IEnumerable over IList, for the definition I had:

IEnumerable<Derivative> GetDerivativesWithPriceChanges(DateTime laterThan)

And for the code (refactor due):

  public IEnumerable<Derivative> GetDerivativesWithPriceChanges(DateTime laterThan)
        {
            Expression<Func<Derivative, bool>> predicate = d => d.PricingUpdated > laterThan;

            int bufferSize = 10; // make configurable, inject
            int marker = 0;
            int lastAmountRetrieved = 0;

            do
            {

                using (UOW.Start())
                {
                    var selection
                        = new Func<IQueryable<Derivative>, IQueryable<Derivative>>
                            (d => d.Skip(marker).Take(bufferSize));

                    var derivatives = derivativeRepository.GetByQuery(predicate, selection).ToList();

                    lastAmountRetrieved = derivatives.Count;

                    foreach (var derivative in derivatives)
                    {
                        yield return derivative;
                    }
                }

                marker = marker + bufferSize;

            } while (lastAmountRetrieved == bufferSize);

            yield break;

        }

)

Inside GetDerivativesWithPriceChanges, I’m paging through the database opening and closing the Unit of work after a specified number of entities are returned. Note: UOW system provides the underlying ISession to the repositories, although my UOW differs slights I recommend analysing Ayende’s Rhino Commons for a good example.

Best of all the consumer of the service is none the wiser, the code for paging was already on the base repository (as simple as Skip(x).Take(y) with NHibernate.Linq), and as an added benefit we have quicker feedback to our logging system and on our integration tests.  It’s certainly not the purist implementation, but since I was working with read-only data it turned out perfect for this scenario and I was enjoying a Heineken by the beach by 6.30pm. I imagine I will eventually refactor the some of the logic for readonly paging/uow streaming to its own class, but for today, check-in, and close solution. :)

A few lessons learnt:

  1. Memory is not never infinite.
  2. Long running operations are bad for integration tests, these tests are often already a magnitude slower than unit tests so keep them trim.
  3. Always consider the data and quantity of which you’ll be working with in production. If unsure, ask.
  4. The problem is rarely solely NHibernate but usually the high-level approach taken to tackle the problem, don’t damn the NHibernate Session.

Share/Save/Bookmark

Written by matt-csharp

August 6th, 2009 at 8:15 am

Posted in nhibernate

Tagged with , ,