2011 – Page 8 – The Blog of Colin Mackay

Parallelisation in .NET 4.0 – Part 2 Throwing Exceptions

With more threads running simultaneously in an application there is increasing complexity when it comes to debugging. When exceptions are thrown you usually catch them somewhere and handle them. But what happens if you throw an exception inside a thread?

Naturally, if you can handle the exception within the thread then that makes life much easier. But what if an exception bubbles up and out into code that created the thread?

In the example in my previous post on Parallelisation in .NET 4.0 had the calls to a third party service happening in separate threads. So, what happens if somewhere in the call an exception is raised.

In the service call GetAvailability, I’ve simulated some error conditions to throw exceptions based on the input to illustrate the examples. This is what it looks like:

public HotelAvail GetAvailability(string hotelCode, DateTime startDate, int nights)
{
    // Throw some exceptions depending on the input.
    if (hotelCode == null)
        throw new ArgumentNullException("hotelCode");

    if ((hotelCode.Length > 10) || (hotelCode.Length == 0))
        throw new ArgumentOutOfRangeException(
            "Hotel Codes are 1 to 10 chars in length. Got code which was " +
            hotelCode.Length + " chars.");

    if (hotelCode.StartsWith("Z"))
        throw new AvailabilityException("Hotel code '" + hotelCode +
                                        "' does not exist"); // A custom exception type
    // ... etc. ...
}

The calling code, from the previous example, looks like this:

public IEnumerable<HotelAvail> GetAvailability(IEnumerable<string> codes,
        DateTime startDate, int numNights)
{
        return codes.AsParallel().Select(code =>
            new AvailService().GetAvailability(code, startDate, numNights))
            .ToList();
}

If we provide incorrect input into the service such that it causes exceptions to be raised then Visual Studio responds in the normal way by breaking the debugging session at the point closest to where the exception is thrown.

If we were to wrap the call to the service in a try catch block (as in the following code sample) then we’d except that Visual Studio wouldn’t break the debugging session as there is a handler (the catch block) for the exception.

public IEnumerable<HotelAvail> GetAvailabilityPlinqException(IEnumerable<string> codes,
        DateTime startDate, int numNights)
{
    try
    {
        return codes.AsParallel().Select(code =>
            new AvailService().GetAvailability(code, startDate, numNights))
            .ToList();
    }
    catch (Exception ex)
    {
        // Do stuff to handle the exception.
    }
    return null;
}

Normally, that would be the case, however if the handler is outside the thread that threw the exception, as in the above example, the situation is somewhat different. In this case the Exception Assistant will appear and highlight the exception (or the code nearest the exception if it can’t highlight the throw statement itself^*)

This happens because the exception is not caught within the thread in which it was originally thrown.

The AggregateException

If you just tell the debugger to continue executing the application it will continue, but the code that created the threads will have to handle an AggregateException. This is a special exception class that contains an InnerExceptions (note the plural) property that contains all the exceptions thrown from each of the threads.

You can enumerate over each of the inner exceptions to find out what happened in each of the threads.

Be aware, however, that an Aggregate exception can, itself, contain an AggregateException. So simply calling InnerExceptions may yet yield another AggregateException. For example if the hierarchy of exceptions looks like this:

Then the results of iterating over the InnerExceptions will be:

foreach(Exception ex in aggregateException.InnerExceptions)
{
    // ... do stuff ...
}

AggregateException
ApplicationException

You can flatten the hierarchy into a single AggregateException object that doesn’t contain InnerExceptions with any additional AggregateException objects. To do this call Flatten() on the original AggregateException. This returns a new AggregateException which you can then call InnerExceptions on and not have to worry about any hierarchy.

For example:

foreach(Exception ex in aggregateException.Flatten().InnerExceptions)
{
    // ... do stuff ...
}

Which results in the following exceptions being enumerated by the loop:

ApplicationException
NullReferenceException
ArgumentException
DivideByZeroException

But it’s broken, why doesn’t it just stop?

Well, it does. Once a thread has thrown an exception that bubbles up and out then no new tasks are started, so no new threads are created, and no new work gets done. However, remember that there will be other threads running as well and if one breaks, maybe others will break too, or maybe they will complete successfully. We won’t know unless they are allowed to finish what they are doing.

Going back to the room availability example if the input hotel codes contain invalid codes then it will throw an exception that is not caught within the thread. What if a selection of good and bad hotel codes are passed:

1, 2, 3, Z123, 4, 5, 6, 1234567890ABC, 7, 8, 9

Of the above list “Z123” and “1234567890ABC” are both invalid and produce different exceptions. However, when running tests the AggregateException only contains one of the exceptions.

To show what happens, I’ve modified my “service” like this and run it through a console applications. Here’s the full code:

The service class

public class AvailService
{
    // ...

    public HotelAvail GetAvailability(string hotelCode, DateTime startDate, int nights)
    {
        Console.WriteLine("Start @ {0:HH-mm-ss.fff}: {1}", DateTime.Now, hotelCode);

        ValidateInput(hotelCode);

        // ... do stuff to process the request ...

        Console.WriteLine("  End @ {0:HH-mm-ss.fff}: {1}", DateTime.Now, hotelCode);
        return result;
    }

    private void ValidateInput(string hotelCode)
    {
        if (hotelCode == null)
        {
            Console.WriteLine("Error @ {0:HH-mm-ss.fff}: hotelCode is null", DateTime.Now);
            throw new ArgumentNullException("hotelCode");
        }

        if ((hotelCode.Length > 10) || (hotelCode.Length == 0))
        {
            Console.WriteLine("Error @ {0:HH-mm-ss.fff}: hotelCode is {1}", DateTime.Now, hotelCode);
            throw new ArgumentOutOfRangeException(
                "Hotel Codes are 1 to 10 chars in length. Got code which was " +
                hotelCode.Length + " chars.");
        }

        if (hotelCode.StartsWith("Z"))
        {
            Console.WriteLine("Error @ {0:HH-mm-ss.fff}: hotelCode is {1}", DateTime.Now, hotelCode);
            throw new AvailabilityException("Hotel code '" + hotelCode +
                                            "' does not exist");
        }
    }
}

The method on the controller class

public IEnumerable<HotelAvail> GetAvailability(IEnumerable<string> codes,
        DateTime startDate, int numNights)
{
    return codes.AsParallel().Select(code =>
        new AvailService().GetAvailability(code, startDate, numNights))
        .ToList();
}

The Main method on the Program class

static void Main(string[] args)
{
    string[] codes = "1,2,3,Z123,4,5,6,1234567890ABC,,7,8,9".Split(',');
    AvailController ctrl = new AvailController();

    DateTime start = DateTime.Now;
    try
    {
        var result = ctrl.GetAvailability(codes,
            DateTime.Today.AddDays(7.0), 2);
    }
    catch (AggregateException aex)
    {
        Console.WriteLine(aex.Message);

        foreach (Exception ex in aex.InnerExceptions)
            Console.WriteLine(" -- {0}", ex.Message);

    }
    finally
    {
        DateTime end = DateTime.Now;
        Console.WriteLine("Total time in ms: {0}",
                            (end - start).TotalMilliseconds);

    }
}

And the console output is:

Start @ 16-36-36.518: 7
Start @ 16-36-36.518: Z123
Start @ 16-36-36.518: 6
Start @ 16-36-36.518: 1
Error @ 16-36-36.526: hotelCode is Z123
  End @ 16-36-42.438: 1
  End @ 16-36-42.654: 6
  End @ 16-36-42.900: 7
One or more errors occurred.
 -- Hotel code 'Z123' does not exist
Total time in ms: 6400

As you can see only 4 items got started out of an initial input collection of 11 items. The error occurred 8ms after these items started. Those items that did not cause an error were allowed to continue to completion. The result variable in the Main method will never have anything because of the exception so we never get the results of the three items that did succeed.

Naturally, the best course of action is not to let the exception bubble up and out of the thread in which the code is executing.

^* Note, there appears to be a bug in Visual Studio with the Exception Assistant not always highlighting the correct line of code.

Tip of the Day #21: Prefer the use of first-child CSS selector over last-child

I just got this fantastic tip from Jamie Boyd , a colleague of mine:

The :first-child and :last-child selectors are super-useful for applying alternate styling to items in lists and things like that (e.g. removing the margin from the last item in a container-spanning nav bar). But when it comes to browser support, they are not equal.

:last-child is actually only supported in IE9+, whereas :first-child has had partial support since IE7 (where it works, but styles won?t update if dynamic content is added).

So if you can, use :first-child rather than :last-child.

Tip of the day #20: Don't spam your own email while developing apps that send email

When we develop applications, often there will be a requirement for that application to send out emails. While this is going on we usually end up with lots of emails being sent to our own email address for test purposes.

I got this fantastic tip from a colleague of mine, Andy Gibson, so here it is:

If you want to test the email an application sends out without spamming your inbox you can modify your web.config with the following code so that it will save the emails to your machine as flat files rather than sending them through the SMTP client. If you combine this with ASP.NET 4 build configurations (web.config.release, web.config.debug, etc,) then this becomes even niftier.

<system.net>
  <mailSettings>
    <smtp deliveryMethod="SpecifiedPickupDirectory">
      <specifiedPickupDirectory pickupDirectoryLocation="D:Email"/>
      <network host="localhost"/> <!-- Required for .NET 4.0! -->
    </smtp>
  </mailSettings>
</system.net>

There is an added benefit to this, .NET saves it as a .eml file so your default mail client (in my case Outlook 2007) will open it on double click, or if you need to see the raw email including headers, you can open it in notepad.

Parallelisation in .NET 4.0 – Part 1 looping

In an upcoming project we have a need for using some parallelisation features. There are two aspects to this, one is that we have to make multiple calls out to a web service that can take some time to return and in the meantime we have to get data out of the CMS to match up to the data coming back from the web service.

I’ll be writing a series (just for the irony of it) of posts on these new features in .NET and how we will be implementing them.

The problem

We have a web service that we have to call to get data back to our system. Calls to the web service take in the region of 4 to 7 seconds each to return data to us. The performance of the web service does not degrade significantly if we make multiple calls to it.

The Solution (first attempt)

Since the calls to the web service are not altering state we can safely make those calls in parallel. There is a class called AvailService that calls the web service and gets the results back to us. If you’re interested the web service checks on the availability of rooms in a hotel based on the the hotel code you pass and the stay date range (expressed as a start date and number of nights).

What we could have done in a serial implementation is this:

public IEnumerable<hotelAvail> GetAvailabilitySerial(
    IEnumerable<string> codes, DateTime startDate, int numNights)
{
    List result = new List<hotelAvail>();

    foreach (string code in codes)
    {
        AvailService service = new AvailService();
        HotelAvail singleResult = service.
                GetAvailability(code, startDate, numNights);
        result.Add(singleResult);
    }

    return result;
}

This simply goes around each item and gets the availability of rooms in that hotel for the stay date range. It could be refactored into a LINQ expression, but I’m going to leave it as a fuller loop just to show more clearly what’s going on.

Using Parallel.For

When we change this to a parallel it doesn’t really change much. I’ve changed the list to an array which is set up with the correct size of the result set and each parallel iteration only interacts with one slot in the array.

public IEnumerable<HotelAvail> GetAvailability (
    IList<string> codes, DateTime startDate, int numNights)
{
    HotelAvail[] result = new HotelAvail[codes.Count];

    Parallel.For(0, codes.Count, i =>
        {
            string code = codes[i];
            result[i] = new AvailService().
                GetAvailability(
                    code, startDate, numNights);
        });

    return result;
}

The code in the lambda expression is the part that is parallelised. I’ve had to make some concessions here as well. The IEnumerable of codes is now an IList, this is because I need to be able to access specific indexes into the list in order to align it with the array. That way the there is no accidental overwriting of elements in the result set.

If the ordering of the output is important, (e.g. must be the same order as the input) then this will maintain that ordering. Many of the remaining solutions do not maintain the order.

However, there is a better way to do this that doesn’t involve setting up and maintaining structures in this way and it closer to our serial code.

Using a Parallel.ForEach and a ConcurrentBag

public IEnumerable<HotelAvail> GetAvailabilityConcurrentCollection(
    IEnumerable<string> codes,
    DateTime startDate, int numNights)
{
    ConcurrentBag<HotelAvail> result = new ConcurrentBag();

    Parallel.ForEach(codes, code => result.Add(
        new AvailService().
            GetAvailability(code, startDate, numNights)));

    return result;
}

The content of the Parallel.ForEach here is almost identical to the serial foreach version, except that the code is slightly more terse. However, this time I’m using a ConcurrentBag for the result collection. As the method has always returned an IEnumerable this will not change the return type of the method, meaning that a serial method can be made parallel (assuming other considerations necessary for parallelism are taken into account) fairly easily.

The order of the results may be quite different from the order of the input.

Using PLINQ

Finally, I’ve refactored the code using PLINQ. This example is really quite terse, but if you understand LINQ then it should be very easy to pick up.

public IEnumerable<HotelAvail> GetAvailabilityPlinq(
    IEnumerable<string> codes,
    DateTime startDate, int numNights)
{

    return codes.AsParallel().Select(code =>
        new AvailService().GetAvailability(code, startDate, numNights))
        .ToList();
}

Again, the order of the result may be quite different from the order of the input. If ordering is important to you you can instruct PLINQ to maintain the ordering by using .AsOrdered(). This ensures that the output from the PLINQ expression is in the same order as the input.

e.g.

public IEnumerable<HotelAvail> GetAvailabilityPlinq(
    IEnumerable<string> codes,
    DateTime startDate, int numNights)
{

    return codes.AsParallel().AsOrdered().Select(code =>
        new AvailService().GetAvailability(code, startDate, numNights))
        .ToList();
}

The important part to all this is the call to the service, which has remained the same throughout all the samples. It is really just the infrastructure around that call that has changed over these examples.

The results

This table and graph show the results of some tests I ran between the serial and parallel versions. The numbers across the top row and X-axis represent the number of calls to the service. The numbers in the table and Y-Axis represent the time (in seconds) to complete the calls.

	1	2	3	4	5	6	7	8	9
Serial	5.52	11.22	16.72	22.60	28.59	34.32	40.32	45.67	51.64
Parallel	5.52	6.20	6.35	6.45	11.24	12.35	12.45	12.86	16.66

Finally, a warning about using parallelised code from Alex Mackey’s book Introducing .NET 4.0:

“Although parallelization enhancements make writing code to run in parallel much easier, don’t underestimate the increase in complexity that parallelizing an application can bring. Parallelization shares many of the same issues you might have experiences when creating multithreaded applications. You must take care when developing parallel applications to isolate code that can be parallelized.”

Tesco, Your car wash sucks

Earlier today I was at Tesco to refuel my car and I noticed that it was a bit overdue for a wash, so when I paid for my fuel I also purchased a voucher for the car wash. It was the premium super-duper all singing all dancing wash for £6.

When I drove round to where the car wash was there was a queue of three people in front of me so I had to wait. There was a chap at the jet wash too and I noticed that he was much slower. By the time my turn came around he was still there washing his car. In fact, by the time I wash finished he was still washing his car, he must have put much more money in that machine than I did for the car wash. In hindsight, I think he took the better decision. Why?

When I got home, I went to open the boot to retrieve my shopping and I noticed that the back of the car was still dirty. Sure, bits of it were clean, and all of it was still wet, but it was obvious that the bushes on the rollers don’t clean very well. Or maybe they don’t clean cars with near vertical rears (like the Toyota Yaris) very well. I could still wipe my finger through the dirt. And here’s a picture just to show you. (You can click the image to see it full size and you can see my finger mark in the remaining dirt)

All I can say is that I’ll not be back to Tesco to use their car wash again. If I do find myself there, I may just use the Jet Wash like the other chap did. That seemed the more sensible solution.