Software Development

But why is the iterator operating in multiple-threads

Background

Recently, I had a bit of a problem with NHibernate when I was converting some code into parallel tasks. (If you have no interest in NHibernate, then don’t worry – it is just background to the issue I was having when I spotted this gulf between my expectation and reality. NHibernate is incidental to this and I won’t mention it much beyond this paragraph.) It turns out that Parallel.ForEach runs the iterator in multiple threads, not just the the action it performs on each item received from the iterator. NHibernate, being the source of the data was running inside the iterator and when I attached NHibernate Profiler to see what it could turn up it very quickly began reporting that the NHibernate session was running in multiple-threads and that NHibernate was not designed to be thread safe.

The Iterator Patten in .NET

In .NET the iterator pattern is exposed via an IEnumerator or IEnumerator<T> and there is some syntactic sugar so that you can create an iterator method using yield return. There is also syntactic sugar surrounding the consumption of iterators via foreach. This almost completely hides the complexities of IEnumerator implementations.

There are some limitations to this. The interface is inherently not thread safe as it does not provide for an atomic operation that retrieves an element and moves the internal pointer on to the next. You have to call MoveNext() followed by Current if it returned true. If the iterator needs thread-safety, it is the responsibility of the caller to provide it.

But, then this happens…

Knowing this, I would have assumed (always a bad idea, but I’m only human) that Parallel.ForEach() operates over the iterator in a single thread, but farms out each loop to different threads, but I was wrong. Try the following code for yourself and see what happens:

public class Program
{
    public static void Main(string[] args)
    {
        Parallel.ForEach(
            YieldedNumbers(),
            (n) => { Thread.Sleep(n); });
        Console.WriteLine("Done!");
        Console.ReadLine();
    }

    public static IEnumerable<int> YieldedNumbers()
    {
        Random rnd = new Random();
        int lastKnownThread = Thread.CurrentThread.ManagedThreadId;
        int detectedSwitches = 0;
        for (int i = 0; i < 1000; i++)
        {
            int currentThread = Thread.CurrentThread.ManagedThreadId;
            if (lastKnownThread != currentThread)
            {
                detectedSwitches++;
                Console.WriteLine(
                    $"{detectedSwitches}: Last known thread ({lastKnownThread}) is not the same as the current thread ({currentThread}).");
                lastKnownThread = currentThread;
            }
            yield return rnd.Next(10,150);
        }
    }
}

The Action<int> passed to the Parallel.ForEach simply simulates some work being done (and the times sent to the Thread.Sleep() are roughly analogous to the times of the tasks in the original project).

What I’ve done here also is detect when the thread changes and report that to the console. It happens roughly 15%-18% of the time on the runs I’ve made on my machine. Now that was surprising (not really, because NHibernate Profiler had already told me – but to have a very clean example of the same was). I can’t blame any weirdness in third party libraries. It happens with some very basic .NET code in a console application.

Possible Solutions

1. My first thought was to dump all the data retrieved from the iterator into a collection of some sort (e.g. an array or list), but the iterator was originally put in place because the volume of data was causing memory pressure. The app ran overnight and will process anything between a few hundred to a few hundred thousand customers and testing found that it significantly slowed down around the 7000 mark because of the size of the data, and fell over completely not far past that. So, the iterator that I created hides the fact that I now page the data, the calling code knows nothing about this paging and didn’t have to be modified. So that solution was out of the question, we’d be back to the problem we had a while ago.

2.The data could be processed in batches and each fully retrieved batch be run in parallel one at at time. I did try that but it just made the calling code difficult to read and more complex than it needed to be. The reader has to be able to understand why there are batches, and the person writing the code has to remember that the data may not fit an exact number of batches and will have to process the final batch outside the loop which adds to the cognitive load on the reader/maintainer.

public static void Main(string[] args)
{
    int batchSize = 97;
    List batch = new List<int>();
    foreach (int item in YieldedNumbers())
    {
        batch.Add(item);
        if (batch.Count >= batchSize)
            ProcessBatch(batch);
    }
    ProcessBatch(batch);

    Console.WriteLine("Done!");
    Console.ReadLine();
}

private static int batchCount = 0;
private static void ProcessBatch(List<int> batch)
{
    batchCount ++;
    Console.WriteLine($"Processing batch {batchCount} containing {batch.Count} items");
    Parallel.ForEach(batch, (n) => { Thread.Sleep(n); });
    batch.Clear();
}

// The YieldedNumbers() method is unchanged from before.

The iterator is always called from a single thread and therefore never complains on this set up.

3. Use the Microsoft Data Flow for the Task Parallel library. Personally, I think this one is best because the pattern is clear and the complex bits can be moved away from the main algorithm. The only part I didn’t like was the effort to set up the Producer/Consumer pattern using this library, but it handles all the bits I want to abstract away quite nicely… And that set up can be abstracted out later. Here’s the basic algorithm.

public static void Main(string[] args)
{
    var producerOptions = new DataflowBlockOptions { BoundedCapacity = 97 };
    var buffer = new BufferBlock<int>(producerOptions);
    var consumerOptions = new ExecutionDataflowBlockOptions
    {
        BoundedCapacity = Environment.ProcessorCount,
        MaxDegreeOfParallelism = Environment.ProcessorCount
    };
    var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
    var consumer = new ActionBlock<int>( n=> {  Thread.Sleep(n); }, consumerOptions);
    buffer.LinkTo(consumer, linkOptions);
    Produce(buffer);
    Task.WaitAll(consumer.Completion);

    Console.WriteLine("Done!");
    Console.ReadLine();
}

private static void Produce(ITargetBlock target)
{
    foreach (var n in YieldedNumbers())
    {
        // Normally, this will return immediately, but if the queue has
        // reached its limit then it will wait until the consumer has
        // processed items on the queue.
        Task.WaitAll(target.SendAsync(n));
    }
    // Set the target to the completed state to signal to the consumer
    // that no more data will be available.
    target.Complete();
}

I originally had the the Produce() method as an async/await method… But that didn’t work, it seems that doing that the iterator shifts around threads again because when the code wakes up after the await it may be restarted on a new thread. So I put it back to a simple Task.WaitAll() and it kept it all on the same thread.

The producer options are set so that the queue size is limited, it stops pulling from the producer if the queue reaches capacity and thus it keeps the app running smoothly. The producer won’t over produce.

The consumer options need to be set explicitly otherwise it acts on a single thread. Unlike other things in the TPL it won’t necessarily optimise for the number of cores you have, you have to specify that, and a crude rule of thumb for getting that number is Environment.ProcessorCount (crude, because if you have hyper threading it can treat that as being multiple processor cores). However, it is good enough unless you really need to optimise things accurately.

Now, a lot of this can be abstracted away so that the calling code can just get on with what it needs without the distractions that this pattern introduces.

Most of this code can be extracted out to a class that extends IEnumerable<T>

public static class IEnumerableExtensions
{
    public static void ConsumeInParallel<T>(this IEnumerable<T> source, Action<T> action, int queueLimit = int.MaxValue)
    {
        var producerOptions = new DataflowBlockOptions { BoundedCapacity = queueLimit };
        var buffer = new BufferBlock<T>(producerOptions);
        var consumerOptions = new ExecutionDataflowBlockOptions
        {
            BoundedCapacity = Environment.ProcessorCount,
            MaxDegreeOfParallelism = Environment.ProcessorCount
        };
        var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
        var consumer = new ActionBlock<T>(action, consumerOptions);
        buffer.LinkTo(consumer, linkOptions);
        Produce(source, buffer);
        Task.WaitAll(consumer.Completion);
    }

    private static void Produce<T>(IEnumerable<T> source, ITargetBlock<T> target)
    {
        foreach (var n in source)
            Task.WaitAll(target.SendAsync(n));
        target.Complete();
    }
}

With this, we can use any IEnumerator<T> as a source of data and it will happily process it. The queueLimit ensures that we don’t end up with too much data waiting to be processed as we don’t want memory pressures causing the app to become unstable.

The calling code now looks much neater:

public static void Main(string[] args)
{
    YieldedNumbers().ConsumeInParallel(n=> {Thread.Sleep(n);}, 97);

    Console.WriteLine("Done!");
    Console.ReadLine();
}
Software Development

Overusing the Null-Conditional Operator

The null-conditional operator is the the ?. between object and field/property/method. It simply says that if the thing on the left hand side is null then the thing on the right hand side is not evaluated. It is a shorthand, so:

if (a != null)
{
    a.DoSomething();
}

becomes

a?.DoSomething();

And that’s great. It makes life much simpler, and if you’re using ReSharper it will alert you when you could use this operator over a null guard check.

But, and this is quite a big “but”, I have noticed a trend to overuse it.

I’ve seen people do crazy stuff like replace most (almost all) instances of the dot-operator so their code is littered with these question-marks before the dot.

var result = myObject?.GetSomething()?.SomeValue?.ToString()?.Split()?.Where(s=>s?.Length > 0);

And when you get to that level of lunacy you’re basically turning on the old Visual Basic OnError Resume Next head-in-the-sand error handling anti-pattern.

I want to make this absolutely clear. The null-conditional operator is not bad, per se. However, over using it or using it without thought to the logic of your application is bad as it hides potential bugs.

You should only use it when you would normally do a null check in advance. If ReSharper says you it can refactor your code to use it, then it most likely is fine to use (you were probably using a longer construct for the same thing already which implies you’ve most likely thought about it – No on likes writing reams of code for no good reason).

Software Development

Application Configuration in .NET Core – Part 2

In the first part we got started by pulling in configuration data from multiple source in .NET Core, in this part we’ll look at mapping the configuration onto a set of classes so that is becomes easier to access.

This is something that is built into ASP.NET Core through dependency injection, so we’ll look at that.

In ASP.NET Core applications, the configuration is setup in the Startup class normally. It is added in the ConfigureServices method. There are two lines you need to add:

public void ConfigureServices(IServiceCollection services)
{
  services.AddOptions();
  services.Configure(GetConfiguration());
  // Other services are configured here.
}

In order to get them to compile you need an additional NuGet package called Microsoft.Extensions.Options.ConfigurationExtensions. This ensures that everything you need to have configuration converted to the type you specify is set up and that it can be dependency injected into your code.

The GetConfiguration() call is to get the generated IConfigurationRoot and looks similar to the way we set up the configuration int the previous post. For this example it looks like this:

private IConfigurationRoot GetConfiguration()
{
  var builder = new ConfigurationBuilder();

  builder.AddInMemoryCollection(new Dictionary<string, string>
  {
    { "InMemory", "This value comes from the in-memory collection" }
  });

  builder.AddEnvironmentVariables();
  builder.AddJsonFile("appSettings.json");

  return builder.Build();
}

I’ve added configuration from the environment variables as well.

As the app is configured to map the configuration settings to an object structure, here’s what it looks like:

public class MyConfiguration
{
  public string UserName { get; set; } // From the environment variable of the same name
  public string InMemory { get; set; } // From the in-memory collection
  public string RootItem { get; set; } // from appSettings.json
  public FavouriteStuff Favourites { get; set; } // from appSettings.json
  public string[] Fruits { get; set; } // from appSettings.json
}

public class FavouriteStuff
{
  public string TvShow { get; set; }
  public string Movie { get; set; }
  public string Food { get; set; }
  public string Drink { get; set; }
}

As you can see the structure can be deep if necessary which gives you quite a lot of flexibility.

The appSettings.json file looks like this:

{
  "RootItem": "This is at the root",
  "Favourites": {
    "TvShow": "Star Trek: The Next Generation",
    "Movie": "First Contact",
    "Food": "Haggis",
    "Drink":  "Cream Soda"  
  },
  "Fruits": [
    "Apples",
    "Oranges",
    "Pears",
    "Cherries",
    "Bananas",
    "Strawberries",
    "Raspberries"
  ]           
}

Now, the controller that needs the configuration information looks like this:

public class HomeController : Controller
{
    private readonly IOptions<MyConfiguration> _config;

    public HomeController(IOptions<MyConfiguration> config)
    {
        _config = config;
    }
    public IActionResult Index()
    {
        JsonSerializerSettings settings = new JsonSerializerSettings();
        settings.Formatting = Formatting.Indented;
        return new JsonResult(_config.Value, settings);
    }
}

All this does is take the `MyConfiguration` object created by the framework on our behalf and render it as JSON to the browser. The key parts are that the constructor takes an IOptions<MyConfiguration> reference, which you store in the controller and you can then access as needed in any of the methods of the controller.

Finally, the output in the browser looks as follows.

{
  "UserName": "colin.mackay",
  "InMemory": "This value comes from the in-memory collection",
  "RootItem": "This is at the root",
  "Favourites": {
    "TvShow": "Star Trek: The Next Generation",
    "Movie": "First Contact",
    "Food": "Haggis",
    "Drink": "Cream Soda"
  },
  "Fruits": [
    "Apples",
    "Oranges",
    "Pears",
    "Cherries",
    "Bananas",
    "Strawberries",
    "Raspberries"
  ]
}

It looks very similar to the appSettings.json file, but you can see that it has, in addition, the “UserName” and “InMemory” elements which don’t appear in that file.

Software Development

Application configuration in .NET Core – Part 1

.NET Core has a new way of working with configuration that is much more flexible than the way that previous versions of .NET have.

It allows you to:

  1. Pull configuration from multiple sources and bring it in to one place.
  2. Easily map that configuration information into classes to make access easier.
  3. Override configuration from previous sources so that you can import a base configuration then override settings on per-environment basis.

This post will be concerned with the first of these: Pulling configuration from multiple sources and bringing it together in to one place. We’ll discuss the second and third aspect in future posts.

Getting Started

To use it you need to add the Microsoft.Extensions.Configuration NuGet package to your application.

Microsoft.Extensions.Configuration 1.0.0 NuGet package
Microsoft.Extensions.Configuration NuGet package

Once you’ve imported the package your project.json will contain:

  "dependencies": {
    "Microsoft.Extensions.Configuration": "1.0.0",
    .... Other dependencies here ....
  }

From the basic configuration package you don’t really get much in the way of configuration sources, only the in-memory one is available. However, that’s just enough to show you the basic set up of the configuration in an application.

public class Program
{
    public static void Main(string[] args)
    {
        // Defines the sources of configuration information for the 
        // application.
        var builder = new ConfigurationBuilder()
            .AddInMemoryCollection(new []
            {
                new KeyValuePair<string, string>("the-key", "the-value"),
            });

        // Create the configuration object that the application will
        // use to retrieve configuration information.
        var configuration = builder.Build();

        // Retrieve the configuration information.
        var configValue = configuration["the-key"];
        Console.WriteLine($"The value for 'the-key' is '{configValue}'");

        Console.ReadLine();
    }
}

The builder is the thing that allows you to set up the sources of configuration information. Each provider adds extension methods so you can add them easily to the builder. The InMemoryCollection simply takes an IEnumerable of KeyValuePairs to initialise its values.

Once you have set up your configuration sources you can build all that into an actual object you can use in your application, by calling Build() on the builder object. From here on you can access configuration values with indexer notation.

Adding a JSON File Source

So far, what we have isn’t very useful. We need to pull configuration information from outside the application such as a JSON file. To do that, we need to add another NuGet package. This one provides a JSON provider and is called Microsoft.Extensions.Configuration.Json.

Microsoft.Extensions.Configuration.Json NuGet package
Microsoft.Extensions.Configuration.Json NuGet package

We can now extend the simple application above by adding an appsettings.json file and adding in the code to build it.

var builder = new ConfigurationBuilder()
    .AddJsonFile("appsettings.json")
    .AddInMemoryCollection(new []
    {
        new KeyValuePair("the-key", "the-value"),
    });

And the appsettings.json looks like this:

{
  "my-other-key": "my-other-value" 
}

And the value is retrieved like any other:

configValue = configuration["my-other-key"];
Console.WriteLine($"The value for 'my-other-key' is '{configValue}'");

However, while this looks like it should work, it won’t. When you added a settings file previously, Visual Studio would mark it for copying to the output folder so that the running application could find it. However, it doesn’t do that with .NET Core (yet – I do hope they add it).

Instead you get a FileNotFoundException, like this:

Exception Assistant showing a File Not Found Exception
An unhandled exception of type ‘System.IO.FileNotFoundException’ occurred in Microsoft.Extensions.Configuration.FileExtensions.dll Additional information: The configuration file ‘appsettings.json’ was not found and is not optional.

To get the appsettings.json file added to the output folder you are going to have to modify the project.json file.

In the buildOptions section add copyToOutput with the name of the file. If there is more than one file you can put in an array of files rather than just the one. The top of the project.json file now looks like this:

{
  "version": "1.0.0-*",
  "buildOptions": {
    "emitEntryPoint": true,
    "copyToOutput": "appsettings.json"
  },
  .... The rest of the file goes here ....

The next time the project is run it will copy the appsettings.json file and you won’t get an exception to say that the file was not found.

Software Development

Setting up Ubuntu for .NET Development

First up, at the time of writing only Ubuntu 14.04LTS is supported. I’ve read that it will work on 15.04, but I know it won’t work on 15.10 because of a binary incompatibility on a library that .net core relies on.

Step 1: Install the .NET Execution Environment

Follow the instructions at https://docs.asp.net/en/latest/getting-started/installing-on-linux.htm

This will install the .NET Execution Environment (DNX)

Step 2: Install Node.js

Since .NET Core relies on node js for parts, and there are some cool code generators using node.js as the templating engine, install node.js by following the instructions here: https://nodejs.org/en/download/package-manager/#debian-and-ubuntu-based-linux-distributions

I used version 4.x LTS (4.4.1 to be exact)

Step 3: Install Visual Studio Code

This is actually optional – I’m installing it because I wanted to get the standard IDE for C#. You can get away with running just the regular text editor installed with Ubuntu.

First, download Visual Studio Code. Then follow the setup instructions…. Kind of.

Unzipped the zip file to /usr/local/bin with

sudo unzip ~/Downloads/VSCode-linux-x64-stable.zip

Then I created the link as in the instructions so that I can launch from the terminal.

To launch from the terminal and get the prompt back use

code &

Step 4: Install Yeoman

Before you do, you’ll need up to update NPM as the version that comes with 4.x LTS is older and the current version of Yeomen doesn’t like it.

sudo npm install -g npm

Install yoman by following the instructions here: https://github.com/omnisharp/generator-aspnet#generator-aspnet

Remember to put sudo in front of install commands specifying -g (global) otherwise you’ll get an error message.

Step 5: Create a project

Move to a directory that you want to create a new project in. I use ~/dev for all my development work.

Then start Yeoman with:

yo aspnet

This will result in a prompt that looks like this:

     _-----_
    |       |    .--------------------------.
    |--(o)--|    |      Welcome to the      |
   `---------´   |   marvellous ASP.NET 5   |
    ( _´U`_ )    |        generator!        |
    /___A___\    '--------------------------'
     |  ~  |     
   __'.___.'__   
 ´   `  |° ´ Y ` 

? What type of application do you want to create? (Use arrow keys)
❯ Empty Application 
  Console Application 
  Web Application 
  Web Application Basic [without Membership and Authorization] 
  Web API Application 
  Nancy ASP.NET Application 
  Class Library 
  Unit test project 

You can then use the arrow keys to move up and down the list.

Choose “Web Application Basic”

It will then prompt for a name. I chose “MyHelloWorldApp”

It will create that directory and populate it with files for the project. You’ll still need to restore the packages that you need, and yeomen gives you some help on getting that done.

If you follow the yeomen instructions you’ll find that at the dnu build step it fails. This is because the project template has dual targeting. It targets .NET 4.5.1 and .NET Core. On Linux only .NET Core will run. To remove the dual targetting open the project.json file and find the section that looks like this:

  "frameworks": {
    "dnx451": {},
    "dnxcore50": {}
  },

And remove the entry for "dnx451" then save the file.

dnu build won’t work just yet. If you try it you’ll get an error message:

/home/colin/dev/MyHelloWorldApp/project.lock.json(1,0): error NU1006: Dependencies in project.json were modified. Please run "dnu restore" to generate a new lock file.

Build failed.
    0 Warning(s)
    1 Error(s)

So, run dnu restore once again so that the dependencies synchronised with the project.

Once that’s done type dnu build and it will now succeed.

You now have a basic environment set up on Linux for developing .NET Core applications and have demonstrated that you can create and build a simple ASP.NET Core application.

Software Development

Using GitHub Two Factor Authentication (2FA) with TeamCity

If you have two factor authentication (2FA) set up in GitHub and you also want to use TeamCity, the easiest way to set this up is to set up SSH keys to access the GitHub repository.

The first step is to follow this guide to creating SSH keys for GitHub. Remember the passphrase you use when creating the key, you’ll need it later.

Once you have created your keys and applied it to your GitHub account you can then follow this guide for managing SSH keys in TeamCity.

Finally, when setting up your VCS Root in Team City you set the Fetch URL to the SSH variant. You can find this on your project page on Github towards the bottom of the right sidebar.

You may need to click the “SSH” link below the URL if it does not already show the SSH URL.

Back in Team City you can paste this URL in the Fetch URL box in the general settings. Further down the form in the Authentication Settings section you can specify the SSH key you uploaded earlier.

By specifying “Uploaded Key” the boxes below will change. Select the key you uploaded earlier, the user name is “git”, and enter the passphase you used when you created the SSH key.

You should now be able to test the connection to see if all is well.

Software Development

Debugging a process that cannot be invoked through Visual Studio.

Sometimes it is rather difficult to debug through Visual Studio directly even although the project is right there in front of you. In my case I have an assembly that is invoked from a wrapper that is itself invoked from an MSBuild script. I could potentially get VS to invoke the whole pipeline but it seemed to me a less convoluted process to try and attach the debugger to the running process and debug from there.

But what if the process is something quite ephemeral. If the process starts up, does its thing, then shuts down you might not have time to attach a debugger to it before the process has completed. Or the thing you are debugging is in the start up code and there is no way to attach a debugger in time for that.

However there is something that can be done (if you have access to the source code and can rebuild).

for (int i = 30; i >= 0; i--)
{
    Console.WriteLine("Waiting for debugger to attach... {0}", i);
    if (Debugger.IsAttached)
        break;
    Thread.Sleep(1000);
}

if (Debugger.IsAttached)
{
    Console.WriteLine("A debugger has attached to the process.");
    Debugger.Break();
}
else
{
    Console.WriteLine("A debugger was not detected... Continuing with process anyway.");
}

You could get away with less code, but I like this because is is reasonably flexible and I get to see what’s happening.

First up, I set a loop to count down thirty seconds to give me time to attach the debugger. On each loop it checks to see if the debugger is attached already and exits early if it has (This is important as otherwise you could attach the debugger then get frustrated waiting for the process to continue.)

After the loop (regardless of whether it simply timed-out or a debugger was detected) it does a final check and breaks the execution if a debugger is detected.

Each step of the way it outputs to the console what it is doing so you can see when to attach the debugger and you can see when the debugger got attached, or not.

My recommendation, if you want to use this code, is to put it in a utility class somewhere that you can call when needed, then take the call out afterwards.