refactoring – The Blog of Colin Mackay

Code Review: Making a drop down list out of an enum

I’ve come across code like this a couple of times and it is rather odd:

IEnumerable<CalendarViewEnum> _calendarViewEnums =
    Enum.GetValues(typeof(CalendarViewEnum)).Cast<CalendarViewEnum>();
List selectList = new List<SelectListItem>();
foreach (CalendarViewEnum calendarViewEnum in _calendarViewEnums)
{
  switch (calendarViewEnum)
  {
    case CalendarViewEnum.FittingRoom:
      selectList.Add(new SelectListItem { 
          Text = AdminPreferencesRes.Label_CalendarViewFittingRoom, 
          Value = ((int)calendarViewEnum).ToString(CultureInfo.InvariantCulture), 
          Selected = (calendarViewEnum == CalendarViewEnum.FittingRoom) 
          });
      break;
    case CalendarViewEnum.Staff:
      selectList.Add(new SelectListItem { 
          Text = AdminPreferencesRes.Label_CalendarViewStaff, 
          Value = ((int)calendarViewEnum).ToString(CultureInfo.InvariantCulture), 
          Selected = (calendarViewEnum == CalendarViewEnum.Staff) 
      });
    break;
    case CalendarViewEnum.List:
      selectList.Add(new SelectListItem { 
          Text = AdminPreferencesRes.Label_CalendarViewList, 
          Value = ((int)calendarViewEnum).ToString(CultureInfo.InvariantCulture), 
          Selected = (calendarViewEnum == CalendarViewEnum.List) 
      });
    break;
    default:
      throw new Exception("CalendarViewEnum Enumeration does not exist");
  }
  return selectList.ToArray();
}

So, what this does is it generates a list of values from an enum, then it loops around that list generating a second list of SelectListItems (for a drop down list box on the UI). Each item consists of a friendly name (to display to the user), a integer value (which is returned to the server on selection) and a Boolean value representing whether that item is selected (which is actually always true, so it is lucky that MVC ignores this the way the Drop Down List was rendered, otherwise it would get very confused.)

Each loop only has one possible path (but the runtime doesn’t know this, so it slavishly runs through the switch statement each time). So that means we can do a lot to optimise and simplify this code.

Here it is:

List<SelectListItem> selectList = new List<SelectListItem>();
selectList.Add(new SelectListItem { 
    Text = AdminPreferencesRes.Label_CalendarViewFittingRoom, 
    Value = ((int) CalendarViewEnum.FittingRoom).ToString(CultureInfo.InvariantCulture) 
  });
selectList.Add(new SelectListItem { 
    Text = AdminPreferencesRes.Label_CalendarViewStaff, 
    Value = ((int)CalendarViewEnum.Staff).ToString(CultureInfo.InvariantCulture) 
  });
selectList.Add(new SelectListItem { 
    Text = AdminPreferencesRes.Label_CalendarViewList, 
    Value = ((int)CalendarViewEnum.List).ToString(CultureInfo.InvariantCulture) 
  });
return selectList.ToArray();

There is one other another bit of refactoring we can do. We always, without exception, return the same things from this method and it is a known fixed size at compile time. So, let’s just generate the array directly:

return new []
{
  new SelectListItem { 
      Text = AdminPreferencesRes.Label_CalendarViewFittingRoom, 
      Value = ((int) CalendarViewEnum.FittingRoom).ToString(CultureInfo.InvariantCulture) 
    },
  new SelectListItem { 
      Text = AdminPreferencesRes.Label_CalendarViewStaff, 
      Value = ((int)CalendarViewEnum.Staff).ToString(CultureInfo.InvariantCulture) 
    },
  new SelectListItem { 
      Text = AdminPreferencesRes.Label_CalendarViewList, 
      Value = ((int)CalendarViewEnum.List).ToString(CultureInfo.InvariantCulture) 
    }
};

So, in the end a redundant loop has been removed and a redundant conversion from list to array has been removed. The code is also easier to read and easier to maintain. It is easier to read because the cyclomatic complexity of the method is now one, previously it was 5 (one for the loop, and one for each case clause in the switch statement [assuming I’ve calculated it correctly]). The lower the cyclomatic complexity of a method is the easier it is to understand and maintain as there are less conditions to deal with. It is easier to maintain because now instead of a whole new case statement to be added to the switch statement a single line just needs to be added to the array. The redundant Selected property has also been removed.

There are still ways to improve this, but the main issue (what that loop is actually doing) for anyone maintaining this code has now been resolved.

Tip of the day: Using the null-coalescing operator over the conditional operator

I’ve recently been refactoring a lot of code that used the conditional operator and looked something like this:

int someValue = myEntity.SomeNullableValue.HasValue
                    ? myEntity.SomeNullableValue.Value
                    : 0;

That might seem less verbose than the traditional alternative, which looks like this:

int someValue = 0;
if (myEntity.SomeNullableValue.HasValue)
    someValue = myEntity.SomeNullableValue.Value;

…or other variations on that theme.

However, there is a better way of expressing this. That is to use the null-coalescing operator.

Essentially, what is says is that the value on the left of the operator will be used, unless it is null in which case the value ont the right is used. You can also chain them together which effectively returns the first non-null value.

So now the code above looks a lot more manageable and understandable:

int someValue = myEntity.SomeNullableValue ?? 0;

Why should you be returning an IEnumerable

I’ve seen in many places where a method returns a List<T> (or IList<T>) when it appears that it may not actually really be required, or even desirable when all things are considered.

A List is mutable, you can change the state of the List. You can add things to the List, you can remove things from the List, you can change the items the List contains. This means that everything that has a reference to the List instantly sees the changes, either because an element has changed or elements have been added or removed. If you are working in a multi-threaded environment, which will be increasingly common as time goes on, you will get issues with thread safety if the List is used inside other threads and one or more threads starts changing the List.

Return values should, unless you have a specific use case in mind already, be returning an IEnumerable<T> which is not mutable. If the underlying type is still a List (or Array or any of a myriad of other types that implement IEnumerable<T>) you can still cast it. Also, some LINQ expressions will self optimise if the underlying type is one which better supports what LINQ is doing. (Remember that LINQ expressions always take an IEnumerable<T> or IQueryable<T> anyway so you can do what you like regardless of what the underlying type is).

If you ensure that your return values are IEnumerable<T> to begin with yet further down the line you realise you need to return an Array or List<T> from the method it is easy to start doing that. This is because everything accepting the return value from the method will still be expecting an IEnumerable<T> which List<T> and Array implement. If, however, you started with a List<T> and move to returning an IEnumerable<T> then because so much code will have the expectation of a List<T> without actually needing it you will have a lot of refactoring to do just to update the reference types.

Have I convinced you yet? If not, think about this. How often are you inserting items into a collection of objects after the initial creation routine? How often do you remove items from a collection after the initial creation routine? How often do you need to access a specific item by index within a collection after the initial creation routine? My guess is almost never. There are some occasions, but not actually that many.

It took me a while to get my head around always using an IEnumerable<T>, until I realised that I almost never require to do the things in the above paragraph. I almost always just need to loop over a collection of objects, or filter a collection of objects to produce a smaller set. Both of those things can be done with just an IEnumerable<T> and LINQ.

But, what if I need a count of the objects in the List<T>, that would be inefficient with an IEnumerable<T> and LINQ? Well, do you really need a count? Oftentimes I just need to know if there are any objects at all in the collection, I don’t care how many object there actually are, in which case the LINQ extension method Any() can be used. If you do need a count LINQ is clever enough to work out that the underlying type may expose a Count property and it calls that (anything that implements ICollection<T> such as arrays, lists, dictionaries, sets, and so on) so it is not iterating over all the objects counting them up each time.

Remember, there is nothing necessarily wrong with putting a ToArray() to ToList() before returning as a reference to an IEnumerable<T> something to which a LINQ expression has been applied. That removes the issues that deferred execution can bring (e.g. unexpected surprises when it suddenly evaluates during the first iteration but breaks in the process) or repeatedly applying the filter in the Where() method or the transformation in the Select() method.

Just because an object is of a specific type, doesn’t mean you have to return that specific type.

For example, consider the services you actually need on the collection that you are returning, remembering how much LINQ gives you. The following diagram shows what each of the interfaces expect to be implemented what a number of the common collection types implement themselves.

Incidentally, the reason some of the interfaces on the Array class are in a different colour is that these interfaces are added by the runtime. So if you have a string[] it will expose IEnumerable<string>.

I’d suggest that as a general rule IEnumerable<T> should be the return type when you have anything that implements it as the return type from the method, unless something from an ICollection<T> or IList<T> (or any other type of collection) as absolutely desperately in needed and not just because some existing code expects, say, an IList<T> (even although it is using no more services from it that it would had it been an IEnumerable<T>).

The mutability that implementations of ICollection<T> and IList<T> give will prove problematic in the long term. If you have a large team with members that don’t fully understand what is going on (and this is quite common given the general level developer documentation) they are likely to change the contents of the collection without understanding its implications. In some situations this may fine, in others it may be disastrous.

Finally, if you absolutely do need to return a more specific collection type then instead of returning a reference to the concrete class, return a reference to the lowest interface that you need. For example, if you have a List<T> and you need to add further items to it, but not at specific locations in the list, then ICollection<T> will be the most suitable return type.

Interpreting promotional codes

The application I’m working on these days has a thing in it called a booking code. Now a booking code can be many things. It can tell you which discount code to use, which third party was being used to make the order, which business customer is making the order and so on. It is really just a convenience on the user interface so we didn’t have a large selection of text boxes asking for a discount code, merchant code or corporate customer ID, etc.

The application can determine what the code is based on which prefix has been used. “D” for a discount, “ME” for a merchant, “CC” for a corporate customer and so on.

So far all this seems fairly simple and straight forward. It is very simple to perform some conditional based on, for example,

string.StartsWith("ME")

However, it does mean that the code is soon cluttered up with lots of conditional statements testing the start of a string with magic values.

Of the two issues here the easiest to address is the magic values. It is easy enough to set up a series of const values representing the various prefixes. For example:

public const string MerchantPrefix = "ME";
public const string DiscountPrefix = "D";
public const string CorporateCustomerPrefix = "CC";

Then when a condition has to be met then the constant value can be used. This reduces the potential number of errors in the code because if the name of the constant is mistyped the compiler will catch it. If the magic value is mistyped then bugs can be introduced. Also, by using a constant value you can provide more meaningful names than the prefix alone can.

However, this is not the end of the story.

Because these prefixes are artificial there was also lots of code to strip off the prefix and check that the right number of characters was being stripped off, or have none stripped off (because in one case that type of code genuinely did always have that prefix in the back-end system). It seemed a lot easier to me to refactor the code and create a BookingCode class that encapsulated that functionality.

The new class takes in its constructor the code as the user would have typed it in. It has a property that exposes the type of booking code as a enum so it can be easily be used in switch statements. It also has a number of properties along the line of IsMerchantIdentifier, IsDiscountCode or IsCorporateCustomer for use where a single departure from the normal processing was required (i.e. an if statement).

Finally, the BookingCode class has a string property that exposes the actual code that the back end system needs. That way all the code that was stripping off the prefixes can be removed. The possibility of introducing intermittent errors is also reduced because there is now only one place where the code the user typed in is deconstructed into its component parts.

All in all this is a much more robust solution to the way the code used to work.

NOTE: The details in the example code above serve as an example only and do not represent the actual system in production.

Tip of the Day #15: Loop Performance

When you look at the code it will probably seem somewhat obvious, but it is interesting how the same thought process isn?t necessarily there when actually developing the code, especially when under the pressure of a looming deadline.

Take for example this snippet of code (from a fictitious hotel management system) that may have been run by a receptionist to print out the check in forms that customers of the hotel will fill in on their arrival.

List bookings = GetTodaysBookings();foreach (Booking booking in bookings)
{
    PrintService service = GetLocalPrintService();
    service.PrintCheckInForm(booking);
}

The method GetLocalPrintService() could be doing a multitude of things. It may simply be creating a new instance of the PrintService object, or it could be resolving a number of dependencies in order to set itself up to communicate with the local printer. But what ever it is doing, we don’t actually need a new instance of the service on each loop. The code will work just as well if we have just one instance that is re-used on each iteration of the loop.

That being the case, the creation of the PrintService can be moved outside the loop so it is created only once, thus removing unnecessary work from the loop. The new code then looks like this:

List bookings = GetTodaysBookings();
PrintService service = GetLocalPrintService();
foreach (Booking booking in bookings)
{
    service.PrintCheckInForm(booking);
}

As I said at the top, this is obvious. Isn’t it?

The value of smaller methods

A while ago on a forum I advised someone that their method was too long and difficult to read and maintain. I suggested that they break the code into smaller methods as it would improve the maintainability and readability of the methods. I got a response back from another person on the forum suggesting that it was stupid to do that as it made debugging more difficult because you had to “step-in” to so many methods and that made things more difficult. At the time I didn’t respond to that. However, I’m hoping to change that with this post.

I’m prompted to write this because I’ve been catching up with the DotNetRocks podcasts and they have an advert on some episodes running up to to the TechEd in Barcelona about winning a big monitor so you can more easily read “that impossibly long line of code”. I don’t like impossibly long lines of code (or even possible yet very long lines of code).

I actually used to work with someone who’s code reviews included the very strict “thou shalt not exceed 80 characters per line” and although I don’t stick to a hard limit I don’t tend to go over about 100 characters per line, which is the amount of code that will comfortably fit on a regular monitor (1024×768) with various bits of screen real estate taken up on either side (for things like the solution explorer and toolbox) without having to scroll horizontally.

In free flowing languages like C# it is possible to to allow one code line take several screen lines. A line of code in C# is terminated with a semi-colon rather than a new-line marker. It is possible, therefore, to insert new-lines in the middle of a code line. However, before doing this I’ll see if there other ways to make the line shorter.

For example, the following line of code:

someObject.SomeMethod(GetWidget(variableA, variableB), anotherObject.SomeProperty, someFieldOnThisClass);

In a normal application the above line would be indented some way because it would be contained in a method, in a class, in a namespace, it may possibly be inside a conditional statement (e.g. an if statement) or loop (e.g. a foreach statement). That adds, typically, another 16 characters to the line length.

This line of code is also more difficult to debug because in order to step-into SomeMethod you first have to step in and then out of GetWidget and SomeProperty.

So, to reduce the line length you can refactor the code so that it is built up over a number of lines instead

Widget widget = GetWidget(variableA, variableB);
int propertyValue = anotherObject.SomeProperty;
someObject.SomeMethod(widget, propertyValue, someFieldOnThisClass);

On the long line of code that saves 33 characters, so the code is now somewhat easier to read as it doesn’t get to the point where any horizontal scrolling is necessary. It is also easier to debug as it is possible to directly step into SomeMethod without getting waylaid in and out of GetWidget and SomeProperty first.

For cases where you can’t extract any more out of the line there is still the ability to wrap the line of code onto a new line of text.

In general I’d say that any method that exceeds a screen in size is too big and should be considered for refactoring. I don’t use this as a hard limit, just a general guide. But why put that limit on the size of a method?

But lets take an example:

public void DoSomething(bool someParameter)
{
      if (someParameter == true)
      {
          // Some processing that takes more than a screen.
          // Assume it contains nested ifs, fors, while, etc.
      }
      else
      {
          // Else what? What was the original if statement
          // I guess we'll just have to scroll up to see
          // (and just hope I match it with the correct if)
      }
}

The same would go for any code that uses a lot of conditional or looping code because it is easy to get confused with what the end curly-braces go with because the opening part is off the screen and the context is lost.

Another example would be looking through an overly long method to see if any code is repeated, or perhaps the code at one point does the same as in another method. That block of code within the method would be an excellent candidate for refactoring out.

I once went on holiday and to come back to find that another developer had finished writing a control that I’d started. I didn’t recognise much of my original code, but I did find a 1500 line Render method that manually wrote the HTML to the stream. There was code in there that inserted breaks and spacers between bits of data. In fact, I found several (dozens of) instances of the same handful of lines of code that just by refactoring that out I got the Render method down a good few hundred lines.

If I had been writing that method I would have worked out what the constituent parts were and written smaller methods for each of those parts. The control was a customised control to write out an editable table in a way that the DataGrid didn’t handle. I think my render method would have been along the lines of:

protected override void Render(HtmlTextWriter writer)
{
    RenderHeader(writer);
    RenderSpacer(writer);
    bool isAltLine = false;
    foreach (ControlData dataItem in data)
    {
        RenderData(writer, dataItem, isAltLine);
        RenderSpacer(writer);
        isAltLine = !isAltLine;
    }
    RenderFooter(writer);
}

Each of the subordinate Render methods (such as RenderHeader, RenderData, RenderSpacer and RenerFooter) would also be split up in to smaller chunks. For example the RenderHeader would have iterated through all the column headers and called some RenderColumnHeader method. The RenderData method may, for example, iterate through the data and call specific render methods for each type of data to be displayed.

So, as to the objection that it makes debugging more difficult because you have to step into more methods, you can see that it should make debugging simpler. In the above example it becomes easier, for instance, to concentrate on a problem with rendering a line of data rather than fight through a mountain of code before getting to the bit that renders that data.

As I hope you can see it would be much easier to be able to concentrate on small parts of the process rather than have to take the whole process in one go.

Technorati Tags: C#,refactoring,design,style