Twitter About Home

LINQ to SQL: Lazy and Eager Loading Hiccups

In a recent post, I covered a few strategies for using LINQ to SQL in a traditional multi-tier architecture. The winning strategy for me, so far anyway, is DataContext-per-unit-of-work – it’s both convenient and safe. Or so I thought.

Published Dec 2, 2007

By default, child collections in LINQ to SQL are lazy-loaded. That is, they’re not loaded until they’re required, then the child entities are populated on a one-by-one basis. In the worst case this causes the “Select N+1” problem. If we’re not going to use the child entities, great, we win, but if we know we are, it’s better to load them all in advance in a single query.

RIP : “Including” operator

Before the March 2007 CTP of LINQ to SQL, we saw plenty of demos and code snippets using the “Including” operator. It looked like this:

var query = (from customer in MyDataContext.Customers select customer)
                .Including(customer => customer.Orders);

That was a nice, concise syntax to invoke eager loading selectively (i.e. on a per-query basis). It’s quite similar to NHibernate’s ICriteria.SetFetchMode(“Orders”, FetchMode.Eager) mechanism for controlling per-query loading strategies.

Unfortunately, LINQ to SQL’s “Including” syntax was killed in the March CTP. Random forum comments and blog posts imply it was removed because it was either too buggy or just didn’t fit with their general architecture. Unfortunately we’re now a bit stuck if we want eager-loading on a per-query basis.

The official solution: DataLoadOptions

First it was called DataShape, then it was renamed to DataLoadOptions. You use it like this:

MyDataContext context = new MyDataContext();
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<customer>(c => c.Orders);
context.LoadOptions = options;
var query = from c in context.Customers // ... etc

Not exactly elegant, but it can be a powerful way to control the “shape” of the data that’s returned. Note the shape.AssociateWith() method which lets you control the ordering and filtering of related entities too.

Unfortunately, changes to DataContext.LoadOptions affect the entire DataContext (obviously), and once assigned, cannot be changed during the lifetime of the DataContext (for consistency). This is completely at odds with the DataContext-per-unit-of-work strategy, as different subtasks within the same unit of work now have unwanted dependencies on each other.

Trying to use DataLoadOptions safely, and failing

I’m not yet convinced that there is a good way to control fetching strategies in LINQ to SQL on a per-query basis, since DataLoadOptions is the only control we have and that only works on a per-DataContext basis. If you don’t need deferred execution in a particular case, you can do this:

public static IList<customer> CustomersWithOrders
{
    get
    {
        return Customers
                .Select(c => new { Customer = c, Orders = c.Orders })
		.ToList()
                .Select(pair => pair.Customer)
		.ToList();
    }
}

Here we load pairs (Customer, Orders), then from that project back to simple Customer objects. Since the relevant Order records are already loaded, no further queries are needed to evaluate customer.Orders for these customers.

If anybody can help me out with a better solution, i.e. that retains the deferred IQueryable semantics, I would be really glad to hear it. Until then, this is pretty frustrating. It may be that I’m on completely the wrong lines, but the LINQ-to-SQL multi-tier story doesn’t yet appear to have a happy ending.

SQL Server in the future

One limitation when eagerly loading object graphs is the dimensionality of the result set. To get customers and their orders, we have to JOIN the two tables together, and waste a lot of space to represent the parent-child relationship in two dimensions, like this:

CustomerID CustomerName OrderID ProductID ProductName Price
1 Steve 53 77 Ferarri $50,000
1 Steve 53 91 Hamster $4.99
2 Frank 54 35 Chips $0.50
2 Frank 54 91 Hamster $4.99
2 Frank 54 102 Submarine $3,500,000

The other problem, besides wasting space, is that it’s hard to page over data like this. How many rows must you retrieve to ensure you get the first 10 complete Customer records? LINQ to SQL, like other ORMs, does some clever trickery to hide you from this problem, but it’s still wasteful.

What I’d really like to see from SQL Server is multi-dimensional result sets. You can only have 2-dimensional results sets today, but why stop there? Of course, it’s hard to visualise 6-dimensional results in Query Analyzer, but I don’t really care about that. If aggregation operations weren’t forced to collapse lists onto scalars, we could GROUP BY customer, and get the list of Orders as a child table.

CustomerID CustomerName Orders
1 Steve
OrderID Lines
53
ProductID ProductName Price
77 Ferarri $50,000
91 Hamster $4.99
2 Frank
OrderID Lines
54
ProductID ProductName Price
35 Chips $0.50
91 Hamster $4.99
102 Submarine $3,500,000

This would map trivially onto an object graph and is very easy to page over. I guess there are lots of reasons why SQL Server doesn’t already do this. Perhaps someone with more knowledge of its architecture can explain.

By the way, this is exactly what LINQ-to-objects does when you use its “groupby” keyword, and it’s very slick.

READ NEXT

LINQ-to-SQL: the multi-tier story

If your web application is even slightly bigger than the typical MSDN example, you’re almost certainly using some sort of multi-tier architecture. Unfortunately, as Dinesh Kulkari (MS LINQ Program Manager) explains on his blog, LINQ-to-SQL version 1 has no out-of-the-box multi-tier story.

Published Nov 29, 2007