Using semaphores to restrict access to resources

I’m in the process of building a small data extraction application. It uses the new Parallel Extensions in .NET 4 in order to more efficiently extract data from a web service. While some threads are blocked waiting on the web service to respond, other threads are working away processing the results of the previous call.

Initially when I set this up I didn’t throttle the calls to the web service. I let everything through. However, in this environment I quickly discovered that I was having to re-try calls a lot because the original call for some data was timing out. When I looked in Fiddler to see what was going on I discovered that as I ran the application I was getting over a screen full of started requests that were not finishing or just taking a very long time to complete. I was overloading the server and it couldn’t cope with the volume of requests.

With this in mind I added in some code to the class that initiated the web service calls in order to ensure that it didn’t call the web service too frequently. This is where the semaphores come in to play.

Semaphores are a type of synchronisation mechanism that allow you to limit access to some segment of code. No more than a specified number of threads may enter the segment of code at any one time. If more threads attempt to enter that segment of code than are permitted then any new thread arriving will be forced to wait until access is granted.

I’ll show you what I mean:

   1:  public class WebServiceHelper
   2:  {
   3:      private static Semaphore pool = new Semaphore(3, 3);
   4:   
   5:      public ResultsData GetData(RequestData request)
   6:      {
   7:          try
   8:          {
   9:              pool.WaitOne();
  10:              return GetDataImpl(request);
  11:          }
  12:          finally
  13:          {
  14:              pool.Release();
  15:          }
  16:      }
  17:   
  18:      private ResultsData GetDataImpl(RequestData request)
  19:      {
  20:          // Do stuff here
  21:      }
  22:   
  23:  }

This is just a fragment of the class in order to show just the important bits.

In line 3 we set up the Semaphore as a static, so that all instances of the class can have access to it. It doesn’t need to be a static if you are going to reuse the same instance of the class in many places, but for the purposes of this example I’m using a static.

The Semaphore is initialised with an initial count of 3 (first parameter) which means that there are three resources available currently, and a maximum count  also of 3 (second parameter) which means we can have a maximum of three resources in use at any one time.

In the GetData method (lines 5-16) I wrap the call that does the actual work in a try-finally block. If any exceptions are thrown here is not the place to handle them. The only thing this method should be concerned with is ensuring the resources are properly synchronised. In line 9 we wait for a resource to become available (the first three calls will not block because we’ve started off with three available) but after that calls may block if necessary. On line 10 we call the method that does the actual work we are interested in (this prevents cluttering up one method with the details of the work needing done and the synchronisation code). In the finally block (lines 12 to 15)  we ensure that the resource is released regardless of the ultimate outcome. It doesn’t matter if an exception was thrown or if it was successful we always release the resource back at the end of the operation.

WaitOne (line 9) does have overloads that accept a time to wait either as a TimeSpan or integer representing milliseconds. This means that you can ensure you are not blocking infinitely if an error occurs and the resource is never released.

That just about sums it up. I now have an application that I can parallelise yet ensure that I don’t overload the web server at the same time.

I should also point out that using Semaphores (or any kind of locking or synchronisation method) does reduce the parallelisability of the application, but they can be useful to ensure safe access to data or resources. However, there are also other techniques which help reduce the need for these synchronisation schemes.

What a waste of money by Currys (but win for GAME)

As the end of last year, I was at a Microsoft event where we got to see a number of new Microsoft technologies. At this event I got my first chance to have a look at the XBOX 360 Kinect. Since I’m not a gamer I hadn’t paid much attention to what a Kinect was until I actually saw one and had a play on one. Then I instantly wanted one. If you’ve never seen it, even if you are not into computer games, I would highly recommend you have a look.

Anyway, since arriving back home I decided to have a look at getting my hands on one. I’m not a gamer. so I don’t already have an XBOX 360, but since all the options were explained to me I now know exactly what I want. And what I want is an XBOX 360 250Gb HD with the Kinect sensor bar. I know I should be able to get that bundle for somewhere in the region of £300. But I’m looking for a deal. With that in mind I went looking for options. So I searched on Bing and Google.

They both return advertised links (PPC: Pay Per Click) as well as the regular (“organic”) results. So, I click on all most of them opening them in to new tabs. (Remember, I am looking for a deal, so I want to compare quickly what each of the offerings are).

Currys has a paid link with the tag line “Buy Xbox Kinect. We are in stock Reserve and Collect yours Now.” [sic] It sounds promising, doesn’t it? currys-kinect-page

So, I click the link and go looking for the price. Nope can’t see a price.

Some form of Add to Basket link, surely that’ll get me a price. Nope, can’t see that either.

Anything at all that looks remotely like some form or buy/purchase/reserve link. Anything at all! Nope. Not a thing.

I know what I want. I’m motivated to buy. All I want to know is that you’ve got it in stock and how much you want for it.

Well done Currys, you’ve wasted money on advertising a product that I cannot see how to actually buy. I got so irritated that I went to close down the tab in my browser. But… I didn’t do that. I got to thinking about how the follow up from the advert had not served its purpose. The advert hooked me in, but the website was so ineffectual that I was heading off elsewhere.

So, how do I actually buy it? There is no “buy this” call to action, so I really don’t know where to go from here. Any button I press is going to be a bit random and I have to think about what is likely to give me the best route to accomplishing my goal.

I really feel at this point that the website isn’t doing its job properly. Surely the purpose of this website is to get people to buy stuff? That’s how it makes money. That’s why Currys spend money on building the site and advertising its existence. It is so they can get people to come to them to buy stuff rather than go to a competitor to buy stuff.

Lets consider if this had been a situation where I had actually walked in to a Currys store. It would have been akin to me asking a sales assistant on the shop floor “Can you tell me the price of an XBox 360 with 250Gb drive and the Kinect Sensor bar?" and instead of answering my question they wax lyrical about what a great product it is.

I scroll down the page scanning any text for things that look like links or buttons. There are some pictures with “Find out more” links below each of them. Two of them actually have the sensor bar on it, one of which also has the console on it. I actually had to open both links up to figure that out because at scanning speed they look pretty similar. It is only when I’m analysing my actions do I really consciously take in what the difference is.

currys-kinect-findout-moreOnce I get to the correct page I’m presented with a grid of pretty similar looking pictures. At least this time there is a description below each of them and a price (Finally, I’m getting the information I actually wanted). However, my issues with this website are not over. Since there are several similar bundles which vary only slightly from each other by the type of console and by the packaged games. The graphics are too small to see what the difference is in the games and the consoles all look alike I need the text to tell me which is which. The descriptions pretty much all say “MICROSOFT Xbox 360 Came Console with…”, occasionally it will say something else such as “MICROSOFT Xbox 250Gb Bundle with…”

This is not giving me what I want. In fact, Currys are doing themselves a disservice as well. Some of the titles that just say “Xbox 360” without reference to the type of Xbox are actually the 250Gb version, so at a glance I would skip past them because I’ve also seen other descriptions that say “250Gb” so I am assuming it is a lower spec model that I’m not interested in.

Had I not been piqued with interest about the issues with this website I’d have left a long time ago. Instead, I took some time to understand what was actually going on and highlight them.

I’m guessing there was a meeting at some point to discuss the design of the website. At this meeting various aspects of the site were discussed. In the rush to get the site out of the door short cuts were taken. Certain things weren’t thought about properly.

The “Find out more” button actually takes you to a page where you can browse the products relating to the page you’ve just come from. Why not tell me that? I’d have been much more interested if the link had mentioned that I’d see prices, bundle options or what not. Yes, technically I am finding out more, but it didn’t really inspire me to find out more, which is more my point.

The product names in the page that allows you to browse the products are all clipped. I’m guessing that at some point a graphic designer put together the some visuals to show how the page should look. A web developer converts that into a working site. The visuals show two line product names but the developer sees that some product names are too long to match the visuals, so the product names get clipped and thus rendered (in situations were there are very similar product bundles) next to useless. Again, time is probably very tight. An unforeseen situation early in the project has now come to light. There is no time to redesign the visuals so the next best solution is taken. That’s to force the product names into the space provided.

DidI buy from Currys now I spent all that time analysing their site? No. Had an interest in usability not caused me to have a think about what was going on I would have left long before. In the end I bought my XBOX 360 Kinect with 250Gb HDD from GAME… in store! And, you know what? When I was looking in store I couldn’t see an XBOX 360 with 250Gb HD and as I  was searching a sales assistant asked me if she could help. I said I was looking for the version with the 250Gb HDD and she said that they didn’t have any left, however
if I bought the HDD as a separate item at the same time as the XBOX they would discount it so that it was the same total price as buying the model with the 250HDD included. Fantastic! Oh… and they knocked roughly 25% off each of the Kinect games we bought to get going with.

Parallelisation in .NET 4.0 – The concurrent dictionary

One thing that I was always conscious of when developing concurrent code was that shared state is very difficult to deal with. It still is difficult to deal with, however the Parallel extensions have some things to help deal with shared information better and one of them is the subject of this post.

The ConcurrentDictionary has accessors and mutators that “try” and work over the data. If the operation fails then it returns false. If it works you get a true, naturally. To show this, I’ve written a small program that counts the words in Grimm’s Fairy Tales (which I downloaded from the Project Gutenberg website) and displayed the top forty most used words.

Here is the program:

   1:  class Program
   2:  {
   3:      private static ConcurrentDictionary<string, int> wordCounts =
   4:          new ConcurrentDictionary<string, int>();
   5:   
   6:      static void Main(string[] args)
   7:      {
   8:          string[] lines = File.ReadAllLines("grimms-fairy-tales.txt");
   9:          Parallel.ForEach(lines, line => { ProcessLine(line); });
  10:   
  11:          Console.WriteLine("There are {0} distinct words", wordCounts.Count);
  12:          var topForty = wordCounts.OrderByDescending(kvp => kvp.Value).Take(40);
  13:          foreach (KeyValuePair<string, int> word in topForty)
  14:          {
  15:              Console.WriteLine("{0}: {1}", word.Key, word.Value);
  16:          }
  17:          Console.ReadLine();
  18:      }
  19:   
  20:      private static void ProcessLine(string line)
  21:      {
  22:          var words = line.Split(' ')
  23:              .Select(w => w.Trim().ToLowerInvariant())
  24:              .Where(w => !string.IsNullOrEmpty(w));
  25:          foreach (string word in words)
  26:              CountWord(word);
  27:      }
  28:   
  29:      private static void CountWord(string word)
  30:      {
  31:          if (!wordCounts.TryAdd(word, 1))
  32:              UpdateCount(word);
  33:      }
  34:   
  35:      private static void UpdateCount(string word)
  36:      {
  37:          int value = wordCounts[word];
  38:          if (!wordCounts.TryUpdate(word, value + 1, value))
  39:          {
  40:              Console.WriteLine("Failed to count '{0}' (was {1}), trying again...",
  41:                  word, value);
  42:   
  43:              UpdateCount(word);
  44:          }
  45:      }
  46:  }

The ConcurrentDictionary is set up in line 3 &4  with the word as the key and the count as the value, but the important part is in the CountWord and UpdateCount methods (starting on line 29 and 35 respectively).

We start by attempting to add a word do the dictionary with a count of 1 (line 31). If that fails then we must have already added the word to the dictionary, in which case we will need to update the existing value (lines 37-44). In order to do that we need to get hold of the existing value (line 37). We can do that with a simple indexer using the word as the key, we then attempt to update the value (line 38). The reason I say we attempt to do that is that there are many threads operating on the same dictionary object and we the update may fail.

The TryUpdate method ensures that you are updating the correct thing as it asks you to pass in the original value and the new value. If someone got there before you (a race condition) the original value will be different to what is currently in the dictionary and the update will not happen. This ensures that the data is consistent.  In our case, we simply try again.

The result of the application is as follows.

Failed to count 'the' (was 298), trying again...
Failed to count 'the' (was 320), trying again...
Failed to count 'and' (was 337), trying again...
Failed to count 'of' (was 113), trying again...
Failed to count 'the' (was 979), trying again...
Failed to count 'the' (was 989), trying again...
Failed to count 'and' (was 698), trying again...
Failed to count 'well' (was 42), trying again...
Failed to count 'the' (was 4367), trying again...
Failed to count 'and' (was 3463), trying again...
Failed to count 'the' (was 4654), trying again...
Failed to count 'to' (was 1772), trying again...
Failed to count 'the' (was 4798), trying again...
Failed to count 'the' (was 4805), trying again...
Failed to count 'the' (was 4858), trying again...
Failed to count 'her' (was 508), trying again...
Failed to count 'and' (was 3693), trying again...
Failed to count 'and' (was 3705), trying again...
Failed to count 'and' (was 3719), trying again...
Failed to count 'the' (was 4909), trying again...
Failed to count 'she' (was 600), trying again...
Failed to count 'to' (was 1852), trying again...
Failed to count 'curdken' (was 3), trying again...
Failed to count 'the' (was 4665), trying again...
Failed to count 'which' (was 124), trying again...
Failed to count 'the' (was 5361), trying again...
Failed to count 'and' (was 4327), trying again...
Failed to count 'to' (was 2281), trying again...
Failed to count 'they' (was 709), trying again...
Failed to count 'they' (was 715), trying again...
Failed to count 'and' (was 4668), trying again...
Failed to count 'you' (was 906), trying again...
Failed to count 'of' (was 1402), trying again...
Failed to count 'the' (was 6708), trying again...
Failed to count 'and' (was 5149), trying again...
Failed to count 'snowdrop' (was 21), trying again...
Failed to count 'draw' (was 18), trying again...
Failed to count 'he' (was 1834), trying again...
There are 10369 distinct words
the: 7168
and: 5488
to: 2725
a: 1959
he: 1941
of: 1477
was: 1341
in: 1136
she: 1134
his: 1031
that: 1024
you: 981
it: 921
her: 886
but: 851
had: 829
they: 828
as: 770
i: 755
for: 740
with: 731
so: 693
not: 691
said: 678
when: 635
then: 630
at: 628
on: 576
will: 551
him: 544
all: 537
be: 523
have: 481
into: 478
is: 444
went: 432
came: 424
little: 381
one: 358
out: 349

As you can see in this simple example, a race condition was encountered 38 times.

A quick intro to the HTML Agility Pack

I want a way to extract all the post data out of my blog. To do that I’m building a little application to do that, mostly as an exercise to try out some new technologies. In this post I’m going to show a little of the HTML Agility pack which is the framework I’m using to extract the information out of a blog entry page.

Creating an HtmlDocument

Where in the following code snippet, html is a string containing some HTML

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

However, the HtmlDocument class also has a Load method that is overloaded and can accept a Stream, TextReader or a string (representing a file path) in order to get the HTML. The one obvious thing that was missing was a version that took a URL although HttpWebResponse does contain a ResponseStream which you could pass in.

Navigating the HTML Document

Once you have loaded in your HTML you will want to navigate it. To do that you need to get hold of HtmlNode that represents the document as a whole:

HtmlNode docNode = doc.DocumentNode;

The docNode will then give you all the bits and pieces you need to navigate around the HTML. If you are also ready used to using the LINQ XML classes introduced in .NET 3.5 then you shouldn’t have too much trouble finding your way around here.

For example, here is a snippet of code that gets all the URLs out of the anchor tags:

var linkUrls = docNode.SelectNodes("//a[@href]")
     .Select(node => node.Attributes["href"].Value);

The linkUrls variable is actually an IEnumerable<string> (if you are curious).

One thing that is particularly annoying

There is one thing that I find particularly annoying however. SelectNodes returns an HtmlNodeCollection, however, if the xpath in the SelectNodes method call results in no nodes being found then it returns a null instead of an empty collection. For me, it is perfectly valid to get an empty collection if the query returned no results. Because of this, I can’t simply write code like the section above. I actually have to check for null before continuing. That means the code in the previous section actually looks like this:

HtmlNodeCollection nodes = docNode.SelectNodes("//a[@href]");
if (nodes != null)
{
    var linkUrls = nodes.Select(node => node.Attributes["href"].Value);
    // And what ever else we were doing.
}

What next?

Well, as you can see the functionality is actually fairly easy to follow. I was initially dismayed at the lack of apparent documentation for it until I realised that the folks that have built the framework have done a great job of ensuring that it works very similarly to libraries already in the .NET framework itself so it is remarkably quick to get used to.

Table names – Singular or Plural

Earlier this morning I tweeted asking for a quick idea of whether to go with singular table names or plural table names. i.e. the difference between having a table called “Country” vs. “Countries”

Here are the very close results:

  • Singular: 8
  • Plural: 6
  • Either: 1

Why Singuar:

  • That’s how the start off life on my ER diagram
  • You don’t need to use a plural name to know a table will hold many of an item.
  • A table consists of rows of items that are singular

Why Plural:

  • It is the only choice unless you are only ever storing one row in each table.
  • because they contain multiple items
  • It contains Users
  • I think of it as a collection rather than a type/class
  • SELECT TOP 1 * FROM Customers

Why either:

  • Either works, so long as it is consistent across the entire db/app

Parallelisation in .NET 4.0 – Part 2 Throwing Exceptions

With more threads running simultaneously in an application there is increasing complexity when it comes to debugging. When exceptions are thrown you usually catch them somewhere and handle them. But what happens if you throw an exception inside a thread?

Naturally, if you can handle the exception within the thread then that makes life much easier. But what if an exception bubbles up and out into code that created the thread?

In the example in my previous post on Parallelisation in .NET 4.0 had the calls to a third party service happening in separate threads. So, what happens if somewhere in the call an exception is raised.

In the service call GetAvailability, I’ve simulated some error conditions to throw exceptions based on the input to illustrate the examples. This is what it looks like:

public HotelAvail GetAvailability(string hotelCode, DateTime startDate, int nights)
{
    // Throw some exceptions depending on the input.
    if (hotelCode == null)
        throw new ArgumentNullException("hotelCode");

    if ((hotelCode.Length > 10) || (hotelCode.Length == 0))
        throw new ArgumentOutOfRangeException(
            "Hotel Codes are 1 to 10 chars in length. Got code which was " +
            hotelCode.Length + " chars.");

    if (hotelCode.StartsWith("Z"))
        throw new AvailabilityException("Hotel code '" + hotelCode +
                                        "' does not exist"); // A custom exception type
    // ... etc. ...
}

The calling code, from the previous example, looks like this:

public IEnumerable<HotelAvail> GetAvailability(IEnumerable<string> codes,
        DateTime startDate, int numNights)
{
        return codes.AsParallel().Select(code =>
            new AvailService().GetAvailability(code, startDate, numNights))
            .ToList();
}

If we provide incorrect input into the service such that it causes exceptions to be raised then Visual Studio responds in the normal way by breaking the debugging session at the point closest to where the exception is thrown.

If we were to wrap the call to the service in a try catch block (as in the following code sample) then we’d except that Visual Studio wouldn’t break the debugging session as there is a handler (the catch block) for the exception.

public IEnumerable<HotelAvail> GetAvailabilityPlinqException(IEnumerable<string> codes,
        DateTime startDate, int numNights)
{
    try
    {
        return codes.AsParallel().Select(code =>
            new AvailService().GetAvailability(code, startDate, numNights))
            .ToList();
    }
    catch (Exception ex)
    {
        // Do stuff to handle the exception.
    }
    return null;
}

Normally, that would be the case, however if the handler is outside the thread that threw the exception, as in the above example, the situation is somewhat different. In this case the Exception Assistant will appear and highlight the exception (or the code nearest the exception if it can’t highlight the throw statement itself*)

AvailabilityException in Exception Assistant

This happens because the exception is not caught within the thread in which it was originally thrown.

The AggregateException

If you just tell the debugger to continue executing the application it will continue, but the code that created the threads will have to handle an AggregateException. This is a special exception class that contains an InnerExceptions (note the plural) property that contains all the exceptions thrown from each of the threads.

AggregateException.InnerExceptions

You can enumerate over each of the inner exceptions to find out what happened in each of the threads.

Be aware, however, that an Aggregate exception can, itself, contain an AggregateException. So simply calling InnerExceptions may yet yield another AggregateException. For example if the hierarchy of exceptions looks like this:

AggregateException Hierarchy

Then the results of iterating over the InnerExceptions will be:

foreach(Exception ex in aggregateException.InnerExceptions)
{
    // ... do stuff ...
}
  • AggregateException
  • ApplicationException

You can flatten the hierarchy into a single AggregateException object that doesn’t contain InnerExceptions with any additional AggregateException objects. To do this call Flatten() on the original AggregateException. This returns a new AggregateException which you can then call InnerExceptions on and not have to worry about any hierarchy.

For example:

foreach(Exception ex in aggregateException.Flatten().InnerExceptions)
{
    // ... do stuff ...
}

Which results in the following exceptions being enumerated by the loop:

  • ApplicationException
  • NullReferenceException
  • ArgumentException
  • DivideByZeroException

But it’s broken, why doesn’t it just stop?

Well, it does. Once a thread has thrown an exception that bubbles up and out then no new tasks are started, so no new threads are created, and no new work gets done. However, remember that there will be other threads running as well and if one breaks, maybe others will break too, or maybe they will complete successfully. We won’t know unless they are allowed to finish what they are doing.

Going back to the room availability example if the input hotel codes contain invalid codes then it will throw an exception that is not caught within the thread. What if a selection of good and bad hotel codes are passed:

1, 2, 3, Z123, 4, 5, 6, 1234567890ABC, 7, 8, 9

Of the above list “Z123” and “1234567890ABC” are both invalid and produce different exceptions. However, when running tests the AggregateException only contains one of the exceptions.

To show what happens, I’ve modified my “service” like this and run it through a console applications. Here’s the full code:

The service class

public class AvailService
{
    // ...

    public HotelAvail GetAvailability(string hotelCode, DateTime startDate, int nights)
    {
        Console.WriteLine("Start @ {0:HH-mm-ss.fff}: {1}", DateTime.Now, hotelCode);

        ValidateInput(hotelCode);

        // ... do stuff to process the request ...

        Console.WriteLine("  End @ {0:HH-mm-ss.fff}: {1}", DateTime.Now, hotelCode);
        return result;
    }

    private void ValidateInput(string hotelCode)
    {
        if (hotelCode == null)
        {
            Console.WriteLine("Error @ {0:HH-mm-ss.fff}: hotelCode is null", DateTime.Now);
            throw new ArgumentNullException("hotelCode");
        }

        if ((hotelCode.Length > 10) || (hotelCode.Length == 0))
        {
            Console.WriteLine("Error @ {0:HH-mm-ss.fff}: hotelCode is {1}", DateTime.Now, hotelCode);
            throw new ArgumentOutOfRangeException(
                "Hotel Codes are 1 to 10 chars in length. Got code which was " +
                hotelCode.Length + " chars.");
        }

        if (hotelCode.StartsWith("Z"))
        {
            Console.WriteLine("Error @ {0:HH-mm-ss.fff}: hotelCode is {1}", DateTime.Now, hotelCode);
            throw new AvailabilityException("Hotel code '" + hotelCode +
                                            "' does not exist");
        }
    }
}

The method on the controller class

public IEnumerable<HotelAvail> GetAvailability(IEnumerable<string> codes,
        DateTime startDate, int numNights)
{
    return codes.AsParallel().Select(code =>
        new AvailService().GetAvailability(code, startDate, numNights))
        .ToList();
}

The Main method on the Program class

static void Main(string[] args)
{
    string[] codes = "1,2,3,Z123,4,5,6,1234567890ABC,,7,8,9".Split(',');
    AvailController ctrl = new AvailController();

    DateTime start = DateTime.Now;
    try
    {
        var result = ctrl.GetAvailability(codes,
            DateTime.Today.AddDays(7.0), 2);
    }
    catch (AggregateException aex)
    {
        Console.WriteLine(aex.Message);

        foreach (Exception ex in aex.InnerExceptions)
            Console.WriteLine(" -- {0}", ex.Message);

    }
    finally
    {
        DateTime end = DateTime.Now;
        Console.WriteLine("Total time in ms: {0}",
                            (end - start).TotalMilliseconds);

    }
}

And the console output is:

Start @ 16-36-36.518: 7
Start @ 16-36-36.518: Z123
Start @ 16-36-36.518: 6
Start @ 16-36-36.518: 1
Error @ 16-36-36.526: hotelCode is Z123
  End @ 16-36-42.438: 1
  End @ 16-36-42.654: 6
  End @ 16-36-42.900: 7
One or more errors occurred.
 -- Hotel code 'Z123' does not exist
Total time in ms: 6400

As you can see only 4 items got started out of an initial input collection of 11 items. The error occurred 8ms after these items started. Those items that did not cause an error were allowed to continue to completion. The result variable in the Main method will never have anything because of the exception so we never get the results of the three items that did succeed.

Naturally, the best course of action is not to let the exception bubble up and out of the thread in which the code is executing.

 

 

* Note, there appears to be a bug in Visual Studio with the Exception Assistant not always highlighting the correct line of code.

Tip of the Day #21: Prefer the use of first-child CSS selector over last-child

I just got this fantastic tip from Jamie Boyd , a colleague of mine:

The :first-child and :last-child selectors are super-useful for applying alternate styling to items in lists and things like that (e.g. removing the margin from the last item in a container-spanning nav bar). But when it comes to browser support, they are not equal.

:last-child is actually only supported in IE9+, whereas :first-child has had partial support since IE7 (where it works, but styles won?t update if dynamic content is added).

So if you can, use :first-child rather than :last-child.

Tip of the day #20: Don't spam your own email while developing apps that send email

When we develop applications, often there will be a requirement for that application to send out emails. While this is going on we usually end up with lots of emails being sent to our own email address for test purposes.

I got this fantastic tip from a colleague of mine, Andy Gibson, so here it is:

If you want to test the email an application sends out without spamming your inbox you can modify your web.config with the following code so that it will save the emails to your machine as flat files rather than sending them through the SMTP client. If you combine this with ASP.NET 4 build configurations (web.config.release, web.config.debug, etc,) then this becomes even niftier.

<system.net>
  <mailSettings>
    <smtp deliveryMethod="SpecifiedPickupDirectory">
      <specifiedPickupDirectory pickupDirectoryLocation="D:Email"/>
      <network host="localhost"/> <!-- Required for .NET 4.0! -->
    </smtp>
  </mailSettings>
</system.net>

There is an added benefit to this, .NET saves it as a .eml file so your default mail client (in my case Outlook 2007) will open it on double click, or if you need to see the raw email including headers, you can open it in notepad.

Parallelisation in .NET 4.0 – Part 1 looping

In an upcoming project we have a need for using some parallelisation features. There are two aspects to this, one is that we have to make multiple calls out to a web service that can take some time to return and in the meantime we have to get data out of the CMS to match up to the data coming back from the web service.

I’ll be writing a series (just for the irony of it) of posts on these new features in .NET and how we will be implementing them.

The problem

We have a web service that we have to call to get data back to our system. Calls to the web service take in the region of 4 to 7 seconds each to return data to us. The performance of the web service does not degrade significantly if we make multiple calls to it.

The Solution (first attempt)

Since the calls to the web service are not altering state we can safely make those calls in parallel. There is a class called AvailService that calls the web service and gets the results back to us. If you’re interested the web service checks on the availability of rooms in a hotel based on the the hotel code you pass and the stay date range (expressed as a start date and number of nights).

What we could have done in a serial implementation is this:

public IEnumerable<hotelAvail> GetAvailabilitySerial(
    IEnumerable<string> codes, DateTime startDate, int numNights)
{
    List result = new List<hotelAvail>();

    foreach (string code in codes)
    {
        AvailService service = new AvailService();
        HotelAvail singleResult = service.
                GetAvailability(code, startDate, numNights);
        result.Add(singleResult);
    }

    return result;
}

This simply goes around each item and gets the availability of rooms in that hotel for the stay date range. It could be refactored into a LINQ expression, but I’m going to leave it as a fuller loop just to show more clearly what’s going on.

Using Parallel.For

When we change this to a parallel it doesn’t really change much. I’ve changed the list to an array which is set up with the correct size of the result set and each parallel iteration only interacts with one slot in the array.

public IEnumerable<HotelAvail> GetAvailability (
    IList<string> codes, DateTime startDate, int numNights)
{
    HotelAvail[] result = new HotelAvail[codes.Count];

    Parallel.For(0, codes.Count, i =>
        {
            string code = codes[i];
            result[i] = new AvailService().
                GetAvailability(
                    code, startDate, numNights);
        });

    return result;
}

The code in the lambda expression is the part that is parallelised. I’ve had to make some concessions here as well. The IEnumerable of codes is now an IList, this is because I need to be able to access specific indexes into the list in order to align it with the array. That way the there is no accidental overwriting of elements in the result set.

If the ordering of the output is important, (e.g. must be the same order as the input) then this will maintain that ordering. Many of the remaining solutions do not maintain the order.

However, there is a better way to do this that doesn’t involve setting up and maintaining structures in this way and it closer to our serial code.

Using a Parallel.ForEach and a ConcurrentBag

public IEnumerable<HotelAvail> GetAvailabilityConcurrentCollection(
    IEnumerable<string> codes,
    DateTime startDate, int numNights)
{
    ConcurrentBag<HotelAvail> result = new ConcurrentBag();

    Parallel.ForEach(codes, code => result.Add(
        new AvailService().
            GetAvailability(code, startDate, numNights)));

    return result;
}

The content of the Parallel.ForEach here is almost identical to the serial foreach version, except that the code is slightly more terse. However, this time I’m using a ConcurrentBag for the result collection. As the method has always returned an IEnumerable this will not change the return type of the method, meaning that a serial method can be made parallel (assuming other considerations necessary for parallelism are taken into account) fairly easily.

The order of the results may be quite different from the order of the input.

Using PLINQ

Finally, I’ve refactored the code using PLINQ. This example is really quite terse, but if you understand LINQ then it should be very easy to pick up.

public IEnumerable<HotelAvail> GetAvailabilityPlinq(
    IEnumerable<string> codes,
    DateTime startDate, int numNights)
{

    return codes.AsParallel().Select(code =>
        new AvailService().GetAvailability(code, startDate, numNights))
        .ToList();
}

Again, the order of the result may be quite different from the order of the input. If ordering is important to you you can instruct PLINQ to maintain the ordering by using .AsOrdered(). This ensures that the output from the PLINQ expression is in the same order as the input.

e.g.

 

public IEnumerable<HotelAvail> GetAvailabilityPlinq(
    IEnumerable<string> codes,
    DateTime startDate, int numNights)
{

    return codes.AsParallel().AsOrdered().Select(code =>
        new AvailService().GetAvailability(code, startDate, numNights))
        .ToList();
}

The important part to all this is the call to the service, which has remained the same throughout all the samples. It is really just the infrastructure around that call that has changed over these examples.

The results

This table and graph show the results of some tests I ran between the serial and parallel versions. The numbers across the top row and X-axis represent the number of calls to the service. The numbers in the table and Y-Axis represent the time (in seconds) to complete the calls.

 

1 2 3 4 5 6 7 8 9
Serial 5.52 11.22 16.72 22.60 28.59 34.32 40.32 45.67 51.64
Parallel 5.52 6.20 6.35 6.45 11.24 12.35 12.45 12.86 16.66

Parallel vs Serial processing

 

Finally, a warning about using parallelised code from Alex Mackey’s book Introducing .NET 4.0:

“Although parallelization enhancements make writing code to run in parallel much easier, don’t underestimate the increase in complexity that parallelizing an application can bring. Parallelization shares many of the same issues you might have experiences when creating multithreaded applications. You must take care when developing parallel applications to isolate code that can be parallelized.”

Tesco, Your car wash sucks

Earlier today I was at Tesco to refuel my car and I noticed that it was a bit overdue for a wash, so when I paid for my fuel I also purchased a voucher for the car wash. It was the premium super-duper all singing all dancing wash for £6.

When I drove round to where the car wash was there was a queue of three people in front of me so I had to wait. There was a chap at the jet wash too and I noticed that he was much slower. By the time my turn came around he was still there washing his car. In fact, by the time I wash finished he was still washing his car, he must have put much more money in that machine than I did for the car wash. In hindsight, I think he took the better decision. Why?

Share photos on twitter with TwitpicWhen I got home, I went to open the boot to retrieve my shopping and I noticed that the back of the car was still dirty. Sure, bits of it were clean, and all of it was still wet, but it was obvious that the bushes on the rollers don’t clean very well. Or maybe they don’t clean cars with near vertical rears (like the Toyota Yaris) very well. I could still wipe my finger through the dirt. And here’s a picture just to show you. (You can click the image to see it full size and you can see my finger mark in the remaining dirt)

All I can say is that I’ll not be back to Tesco to use their car wash again. If I do find myself there, I may just use the Jet Wash like the other chap did. That seemed the more sensible solution.