see all posts

LINQ, Deferred Execution, and Generators

A brief look at an invaluable tool to modern C# developers

LINQ (Language INtegrated Query) was introduced in C# 3.0 back in 2007, and it will be celebrating its tenth birthday this year. Though it's been around longer than most of us have been coding, there are still plenty of developers who aren't familiar with it. So, what is LINQ?

At its heart, LINQ provides methods to operate on collections. Filtering, ordering, and grouping are some common applications. For example, prior to LINQ, the way one would filter a List<Person> to those over the age of 21 might be:

List<Person> theBoys = new List<Person>();
foreach (Person person in allPeople)
{
    if (person.GetAge() >= 21)
    {
        theBoys.Add(person);
    }
}

That's a lot of code for something so simple. Do this on a couple dozen collections and you'll end up with enough code to kill Neo. Most of us have better things to do with our time. Enter LINQ:

var theBoys = allPeople
    .Where(person => person.GetAge() >= 21)
    .ToList();

Boom: one statement, broken up for readability. The type inference (also introduced in C# 3) helps, too. Before I get into deferred execution and generators, let's make sure we get the basics.

allPeople is a List<Person>. List<T> implements IEnumerable<T>, which is key to understanding what's going to happen next.

The call to Where which you see on the second line is an extension method on the static Enumerable class (I won't get into extension methods in this post, but there is plenty of information available elsewhere). This method accepts a Func<T, bool>. So, in our case, it expects us to pass a lambda expression which takes a Person as its sole parameter and returns true or false. The lambda expression we provide is applied to every element in the collection (not completely true, but more on this later), and only the elements which return true are kept. How useful!

But wait: while adding .ToList() at the end of all our LINQ expressions might make one's code more conceptually simple, it would mean we miss out on the potential performance gains of deferred execution.

Deferred Execution

Deferred execution is exactly what it sounds like: the code won't execute until you need it to. This presents similar performance benefits as short-circuiting an expression or breaking out of a loop early. For example, suppose theBoys need to delegate someone to pick up some cold ones from the nearest gas station. Who would they send? Someone with at least $15 for that Natty 30-pack, of course! However, theBoys aren't meeting until later this morning, and their financial situation can change a lot in that time.

var beerGuys = theBoys.Where(guy => guy.BankAccount?.Balance > $15);

(Since we are dealing with college-aged guys, don't forget your null check on the BankAccount property!)

Note the absence of .ToList() at the end — that's the important part. Right now, beerGuys is not a List<Person>. Instead, it's an IEnumerable<Person>.

What's the difference?

An IEnumerable<T> simply denotes that it can be iterated over. That's all we know about it right now. At the moment after that code executes, we don't know anything else about the results, such as the size or which elements are returned... until we enumerate it. Here's an example.

At 10 AM, theBoys consisted of:

  • Bob, with $7
  • Joe, with $3
  • Fred, with $0

So, we execute the Where method and enumerate the results, and much to the disappointment of theBoys, no one is able to afford beer. So is the day ruined? Not quite. We wait until 10:45 AM: prime day-drinking time.

In this 45-minute time span, Joe was able to step it up and gather $12, bringing him to a grand total of $15. Just enough for the beer! Now if we enumerate beerGuys, Joe shows up in the collection! But we didn't modify beerGuys directly, so why is it different? This is the main difference between having the ToList() at the end and not having it. The ToList() enumerates the IEnumerable<Person> entirely and stores the results in a new list, but stopping at the Where method allows the original collection to be observed at different points in time as-is, without creating a new collection.

LINQ accomplishes this by using something called a "generator".

Generators

What exactly is a generator? Now we've reached the true core of LINQ and deferred execution. Take a look at this code:

public IEnumerable<Beer> GetColdOnes(List<Beer> beers)
{
    foreach (var beer in beers)
    {
        if (!beer.IsEmpty() && beer.IsCold())
        {
            yield return beer;
        }
    }
}

GetColdOnes returns all the non-empty, cold beers from the List<Beer> passed in. But it's a significant amount of time and effort to determine if it's cold, and this time could be better spent actually drinking the beer. After all, they don't need to determine if all of them are cold: just the ones they are going to drink! That's where yield return statement comes into play. This method won't go through each and every beer to determine if it's full and cold. Instead, it goes through each beer as it is requested by the consumer, and then returns to the calling code. This allows the consuming code to stop executing at any point without wasting excessive resources. This also means the code won't execute until it is requested to do so, which is what we call "deferred execution."

A concrete comparison: iterating through a List<Beer>, adding the cold ones to a new List<Beer>, and returning the cold ones at the end is all immediately executed, whether we need them or not. However, using the generator pattern is similar to the following:

  1. The IEnumerable<Beer> is enumerated
  2. If there are no more beers to check, the generator stops.
  3. Is the next beer both cold and full?
  4. If not, restart the loop.
  5. If so, return it to the caller.

When used in the context of a foreach loop, this process occurs every time the loop runs, and the results from the last step is assigned to the loop variable just before the main loop body executes again.

The LINQ equivalent to the above is:

var coldOnes = beers.Where(b => !b.IsEmpty() && b.IsCold());

At any moment of iterating over coldOnes, the beers list can change and coldOnes will change with it. An analogy, if I may:

Bob asks Fred to retrieve all the beers from the cooler which are cold right now. Joe asks Fred to retrieve a single cold beer and leave the rest in the cooler.

Joe is taking advantage of deferred execution. The status of the beers could change at any time: they might always stay cold no matter where they are, or the cooler might suddenly warm up and all the beers get warm. However, Joe is always happy with the beer he's currently drinking, because he's only requesting one more cold beer at a time.

Bob, however, wanted all the cold beers immediately. While Bob might have a better time initially, it could also mean that he's drinking warm beer by the time he gets to the end of his stash.

Finally, since I've been building up to this the entire time...

var coldOnes = beers.Where(b => !b.IsEmpty() && b.IsCold());
while (coldOnes.Any())
{
    foreach (var theBoy in theBoys)
    {
        try
        {
            theBoy.CrackOpen(coldOnes.First());
        }
        catch (InvalidOperationException e)
        {
            throw new OutOfBeerException("No more cold ones :(", e);
        }
    }
}

Congratulations on making it all the way through. You deserve a beer. Go grab a cold one and relax.


For any LINQ newcomers, I sincerely hope this post has been helpful. I touched a lot of material very briefly, but there are lots of resources regarding anything I left out. My favorite resource is referencesource.microsoft.com. There you can browse the source code of the .NET framework, including the Enumerable class which contains most of the magic of LINQ.

I’d love to hear any feedback you have or if you found this post helpful. This is my first "code blog" post, so please leave your thoughts below!