Getting values out of XML in .NET 3.5 (LINQ to XML series part 2)

In my last post I gave a brief introduction to some of the new XML classes available in .NET 3.5. In this post I’ll continue that introduction by explaining how to get information out of the XML.

First off, lets assume we have some XML that looks like this:

XElement root = new XElement("root",
    new XAttribute("Attribute", "TheValue"),
    new XElement("FirstChild"),
    new XElement("SecondChild", new XElement("Grandchild", "The content of the grandchild")));

or, if you prefer in XML format, like this:

<root Attribute=”TheValue”>

<FirstChild />

<SecondChild>

<Grandchild>The content of the grandchild</Grandchild>

</SecondChild>

</root>

There are a number of ways to get the content of the grandchild element. For example:

Console.WriteLine(root.Element("SecondChild").Element("Grandchild").Value);

Value returns a string which contains the content of the element specified. In the above case it will output:

The content of the grandchild

However, you need to watch out for when there are child elements of the thing you want the value of as their content is included when you get the value. For example, if the above XML is extended so that it looks like this:

<root Attribute=”TheValue”>

<FirstChild />

<SecondChild>

<Grandchild>The content of the grandchild

<Great-grandchild>GGC content</Great-grandchild>

<Grandchild>

</SecondChild>

</root>

And the above line of C# is executed again, the result is now:

The content of the grandchildGGC content

As you can see the content of the element you want plus its child elements are now returned. This may not necessarily be what you want.

There is a second way to get the content from an element. That is to use a casting operator. You can cast the element to a number of types. In this case to a string. for example:

Console.WriteLine((string)root.Element("SecondChild").Element("Grandchild"));

The result is the same as calling Value on the element.

Be careful here, because casting an element to a string will not have the same result as calling ToString() on an element. You can see that if you simply pass the element itself to writeline (which will then call ToString() for you). For example:

Console.WriteLine(root.Element("SecondChild").Element("Grandchild"));

The result is:

<Grandchild>The content of the grandchild

<Great-grandchild>GGC content</Great-grandchild>

</Grandchild>

The process is similar if you are dealing with attribute. Using the above XML as an example, an attribute value can be retrieved like this:

Console.WriteLine(root.Attribute("Attribute").Value);

Or, using the cast operator to a string like this:

Console.WriteLine((string)root.Attribute("Attribute"));

Both of the above pieces of code output the same thing: TheValue

If you were to call WriteLine on an XAttribute object you’ll see that the ToString() method returns something slightly different. It returns this: Attribute=”TheValue”

Lets say, for instance, that the Attribute had a value of 123.456, which is a valid number representation. I mentioned earlier about casting operators on XElement and XAttribute. Well, you can cast this to a double if you prefer to get the value in that type. There is no tedious converting in your own code as the framework can handle that for you. For example:

(double)root.Attribute("Attribute")

That’s it for this post. There will be more on XML and LINQ soon.

Introduction to LINQ to XML

Last year I wrote about the new languages features available in C# 3.0 (Anonymous Types, Extension Methods, Automatic Properties, A start on LINQ, Object Initialisers I, Object Initialisers II, & Object Initialisers III) and since then I’ve really got in to LINQ, especially LINQ to XML. The reason for that is that I hate XPath and I see LINQ to XML as a much easier way of querying XML files without faffing about with terse XPath strings. I would much rather have the ability to easily see what is going on with the query than have to figure out why my XPath isn’t working for me.

However, LINQ to XML is more than just new funky querying mechanisms. There is a whole new set of classes to deal with XML that are much easier and more intuitive than the classes that were provided back with .NET 1.0, in my opinion.

The main two classes in the new way of doing XML are XElement and XAttribute. For example, to create a new element:

XElement root = new XElement("root");

And to add an attribute to that element:

root.Add(new XAttribute(“AttributeName”, “TheValue”));

Which produces the result: <root AttributeName=”TheValue” />

If you look at the intellisense for XElement constructor you’ll see that none of the 5 overloads takes a string. The nearest is an XName. This is because there is an implicit conversion happening between a string and an XName so that creating XElements does not have to be so arduous. It would be quite irritating to have to declare XElement objects like this:

XElement root = new XElement(XName.Get("root"));

At this point you’ll find that all the VB developers will be gloating because VB9 contains a feature called XML Literals whereby the developer can just write XML directly into the source code file and VB will parse and compile it correctly. An incredibly handy feature I’m sure you’ll agree. But, since I’m a C# developer that’s what I’ll stick with – especially considering that the majority of demos of LINQ to XML I’ve seen are VB based.

If you look closely at XName’s Get method you’ll see that there are two overrides, one for an expanded name, and the other for a local name and a namespace name. The expanded name is just a string of the name with the namespace embedded in the string inside curly braces, like this:

XName.Get("{mynamespace}root");

If you prefer you can use the other overloaded version and provide two strings. The equivalent XName in that case would be created like this:

XName.Get("root", "mynamespace");

Now, you are probably wondering why a static method is being used rather than a constructor. This is because the XML classes are clever enough to reuse existing XName objects. If you create a second XName object with the same characteristics as an existing XName object it will just reuse the existing XName. For example, the following code will output “True” to the console:

XName name1 = XName.Get("{ns}MyName");
XName name2 = XName.Get("MyName", "ns");
Console.WriteLine(object.ReferenceEquals(name1, name2));

XName is immutable (it cannot change) so this is a perfectly acceptable thing to do.

The extended name notation also works if you are using strings while constructing your XElement. For example:

XElement root = new XElement("{mynamespace}root");

However, there is another way of applying namespaces in an XElement. You can use an XNamespace object and add it to the string. Like this:

XNamespace ns = XNamespace.Get("mynamespace");
XElement root = new XElement(ns + "root");

As you can probably tell the + operator has been overloaded so it can be used to add a namespace to a string to produce an XName.

Technorati Tags: ,,,,,

Mixins in C# 3.0

This is something I’ve been mulling around in my head for a few days now. “Out of the box” C# 3.0 does not support mixins, but I think you can get some of the abilities of a mixin with what is there already.

Firstly I should probably explain what a mixin is. A mixin is a class that provides some specific functionality that is to be inherited by a derived class, but it does not have a specialisation (kind-of) relationship with the derived class.

The example that I have is of a class hierarchy representing different types of animal.The base class is Animal, derived from that is Avian and Mammal. Derived from Avian is Parrot, Penguin and Chicken. Derived from Mammal is Dog, Cat, Whale and Bat.

Class-Diagram-1

These animals all have various methods of locomotion. Some can swim, some can run and others can fly. However, as you can see there is no obvious relationship through the base class. It might seem at first glance while designing the class hierarchy that an avian should be able to fly. It is, after all, the first thing that springs to mind when thinking about how birds get from one place to another. But what about flightless birds such as the Dodo? Similarly, don’t all mammals run? No, there are many that live in the sea.

As you can see, adding methods for flight on the Avian base class or running on the Mammal base class don’t work in all cases. This is where mixins come into play.

Mixins can, in this example, provide the functionality for flight, running or swimming, or any other form of locomotion by having the appropriate class inherit the functionality. However, C# does not permit multiple inheritance. You can inherit from one base class only in C#.

But, you can implement multiple interfaces. At this point you are probably thinking “Ah-hah! But interfaces don’t have any functionality”. True, you won’t get too far if you just use some interfaces on the classes. But it is the first step.

Class-Diagram-2

With C# 3.0 came the introduction of Extension Methods and they can be applied, not only to classes but, also, to interfaces. Extension Methods provide additional functionality on an existing class without modifying the class. (You can read more about Extension Methods here). It then becomes possible to create a static helper class for specific functionality that defines the extension methods. Because the classes implement the interface (even if the actual interface doesn’t contain any methods or properties to implement) it will pick up all the extension methods also.

public  static  class SwimMixin
{
    public static void Swim(this ISwim target)
    {
        // Perform Swim functionality on the target
    }
}

This provides very limited mixin functionality. The imitation mixin cannot hold any data of its own which means that so long as the imitation mixin can get away without adding attribute information of its own then it is still useful.

If you need to have the mixin hold its own data then I can, at present, see a number of potential solutions to this problem. Unfortunately no solution is terribly elegant, nor are they problem free.

The first is to use a lookup keyed on weak references to the actual instantiated class with the result of the lookup returning the data needed for the Mixin. The reason for the weak reference is to ensure that the instances of the class get cleared out and are not retained by the imitation mixin. Remember the imitation mixin is built out of a static class so it won’t go out of scope and get cleared up by the garbage collector and everything it holds will stay around as long as the application is running. The main problem with this approach is that as the number of actual instantiated classes increases the lookups get larger and will naturally slow down. Also, some mechanism for clearing out the keys and data that are no longer required has to be implemented as the actual objects are garbage collected.

The second is to use the interface that the extension method is using to provide a method that can be used by the imitation mixin to access its data. This would mean that the actual  instance of the class would have to hold onto some additional data on behalf of the imitation mixin, which negates part of its usefulness.

The third is to create a base class for that all classes that may wish to use a mixin inherit from. This base class can contain “instance” data, in a hashtable keyed on the mixin type (for instance) on behalf of the mixin itself. This would, unfortunately, mean that the data is exposed and render encapsulation useless. It also causes a small hit each time a mixin method needs access to its “instance” data. Naturally, if you are inheriting from an existing framework class you won’t have the option of putting in a base class to hold the mixin data.

Class-Diagram-3

It isn’t too hard to see that it may be possible in the future to have mixin behaviour built directly into the language as we are already part of the way there. In the meantime some limited functionality is available which can be extended to include instance data for the mixin itself with some extra work, but it isn’t without its problems.

The value of smaller methods

A while ago on a forum I advised someone that their method was too long and difficult to read and maintain. I suggested that they break the code into smaller methods as it would improve the maintainability and readability of the methods. I got a response back from another person on the forum suggesting that it was stupid to do that as it made debugging more difficult because you had to “step-in” to so many methods and that made things more difficult. At the time I didn’t respond to that. However, I’m hoping to change that with this post.

I’m prompted to write this because I’ve been catching up with the DotNetRocks podcasts and they have an advert on some episodes running up to to the TechEd in Barcelona about winning a big monitor so you can more easily read “that impossibly long line of code”. I don’t like impossibly long lines of code (or even possible yet very long lines of code).

I actually used to work with someone who’s code reviews included the very strict “thou shalt not exceed 80 characters per line” and although I don’t stick to a hard limit I don’t tend to go over about 100 characters per line, which is the amount of code that will comfortably fit on a regular monitor (1024×768) with various bits of screen real estate taken up on either side (for things like the solution explorer and toolbox) without having to scroll horizontally.

In free flowing languages like C# it is possible to to allow one code line take several screen lines. A line of code in C# is terminated with a semi-colon rather than a new-line marker. It is possible, therefore, to insert new-lines in the middle of a code line. However, before doing this I’ll see if there other ways to make the line shorter.

For example, the following line of code:

someObject.SomeMethod(GetWidget(variableA, variableB), anotherObject.SomeProperty, someFieldOnThisClass);

In a normal application the above line would be indented some way because it would be contained in a method, in a class, in a namespace, it may possibly be inside a conditional statement (e.g. an if statement) or loop (e.g. a foreach statement). That adds, typically, another 16 characters to the line length.

This line of code is also more difficult to debug because in order to step-into SomeMethod you first have to step in and then out of GetWidget and SomeProperty.

So, to reduce the line length you can refactor the code so that it is built up over a number of lines instead

Widget widget = GetWidget(variableA, variableB);
int propertyValue = anotherObject.SomeProperty;
someObject.SomeMethod(widget, propertyValue, someFieldOnThisClass);

On the long line of code that saves 33 characters, so the code is now somewhat easier to read as it doesn’t get to the point where any horizontal scrolling is necessary. It is also easier to debug as it is possible to directly step into SomeMethod without getting waylaid in and out of GetWidget and SomeProperty first.

For cases where you can’t extract any more out of the line there is still the ability to wrap the line of code onto a new line of text.

In general I’d say that any method that exceeds a screen in size is too big and should be considered for refactoring. I don’t use this as a hard limit, just a general guide. But why put that limit on the size of a method?

But lets take an example:

public void DoSomething(bool someParameter)
{
      if (someParameter == true)
      {
          // Some processing that takes more than a screen.
          // Assume it contains nested ifs, fors, while, etc.
      }
      else
      {
          // Else what? What was the original if statement
          // I guess we'll just have to scroll up to see
          // (and just hope I match it with the correct if)
      }
}

The same would go for any code that uses a lot of conditional or looping code because it is easy to get confused with what the end curly-braces go with because the opening part is off the screen and the context is lost.

Another example would be looking through an overly long method to see if any code is repeated, or perhaps the code at one point does the same as in another method. That block of code within the method would be an excellent candidate for refactoring out.

I once went on holiday and to come back to find that another developer had finished writing a control that I’d started. I didn’t recognise much of my original code, but I did find a 1500 line Render method that manually wrote the HTML to the stream. There was code in there that inserted breaks and spacers between bits of data. In fact, I found several (dozens of) instances of the same handful of lines of code that just by refactoring that out I got the Render method down a good few hundred lines.

If I had been writing that method I would have worked out what the constituent parts were and written smaller methods for each of those parts. The control was a customised control to write out an editable table in a way that the DataGrid didn’t handle. I think my render method would have been along the lines of:

protected override void Render(HtmlTextWriter writer)
{
    RenderHeader(writer);
    RenderSpacer(writer);
    bool isAltLine = false;
    foreach (ControlData dataItem in data)
    {
        RenderData(writer, dataItem, isAltLine);
        RenderSpacer(writer);
        isAltLine = !isAltLine;
    }
    RenderFooter(writer);
}

Each of the subordinate Render methods (such as RenderHeader, RenderData, RenderSpacer and RenerFooter) would also be split up in to smaller chunks. For example the RenderHeader would have iterated through all the column headers and called some RenderColumnHeader method. The RenderData method may, for example, iterate through the data and call specific render methods for each type of data to be displayed.

So, as to the objection that it makes debugging more difficult because you have to step into more methods, you can see that it should make debugging simpler. In the above example it becomes easier, for instance, to concentrate on a problem with rendering a line of data rather than fight through a mountain of code before getting to the bit that renders that data.

As I hope you can see it would be much easier to be able to concentrate on small parts of the process rather than have to take the whole process in one go.

Technorati Tags: ,,,

 

What is a DAL (Part 4)

As has been mentioned previously, one of the purposes of the DAL is to shield that application from the database. That said, what happens if a DAL throws an exception? How should the application respond to it? In fact, how can it respond to an exception that it should not know about?

If something goes wrong with a query in the database an exception is thrown. If the database is SQL Server then a SqlException is thrown. If it isn’t SQL Server then some other exception is thrown. Or the DAL may be performing actions against a completely different type of data source such as an XML file, plain text file, web service or something completely different. If the application knows nothing about the back end database (data source) then how does it know which exception to respond to?

In short, it doesn’t. It can’t know which of the myriad of possible exceptions that could be thrown will be and how to respond to it. The calling code could just catch(Exception ex) but that is poor practice. It is always best to catch the most specific exception possible.

The answer is to create a specific exception that the DAL can use. A DalException that calling code can use. The original exception is still available as an InnerException on the DalException.

using System;
using System.Runtime.Serialization;

namespace Dal
{
    public class DalException : Exception
    {
        public DalException()
            : base()
        {
        }

        public DalException(string message)
            : base(message)
        {
        }

        public DalException(string message, Exception innerException)
            : base(message, innerException)
        {
        }

        public DalException(SerializationInfo info, StreamingContext context)
            : base(info, context)
        {
        }
    }
}

The DAL will catch the original exception, create a new one based on the original and throw the new exception.

public DataSet GetPolicy(int policyId)
{
    try
    {
        SqlDataAdapter da =
            (SqlDataAdapter)this.BuildBasicQuery("GetPolicy");
        da.SelectCommand.Parameters.AddWithValue("@id", policyId);
        DataSet result = new DataSet();
        da.Fill(result);
        return result;
    }
    catch (SqlException sqlEx)
    {
        DalException dalEx = BuildDalEx(sqlEx);
        throw dalEx;
    }
}

The code for wrapping the original exception in the DAL Exception can be refactored in to a separate method so it can be used repeatedly. Depending on what it needs to do it may be possible to put that as a protected method on one of the abstract base classes

private DalException BuildDalEx(SqlException sqlEx)
{
    string message = string.Format("An exception occured in the Policy DALrn" +
        "Message: {0}", sqlEx.Message);
    DalException result = new DalException(message, sqlEx);
    return result;
}

Previous articles in the series:

 

 

Technorati Tags: , ,

 

What is a DAL (Part 3)

In this continuation of my series on the DAL I’m going to show the ability to create several DALs and have a Factory class instantiate the correct DAL based on settings in a config file.

One of the purposes of the DAL is to shield the application from the detail of accessing the database. You may have a system where you may have to talk to different databases depending on the way the software is installed. Typically this is most useful by creators of components that need to plug into many different database systems.

The DAL will no longer be a singleton, but rely on a Factory to create the correct DAL class. This is most useful when the data may be coming from different sources. It can also be useful when your application is in development and the database isn’t ready because you can then create a mock DAL and return that instead of the real thing.

UML Diagram showing multiple DALs

UML Diagram showing multiple DALs

The above UML Diagram shows the base DAL that was introduced in the last article in this series. Again, it is an abstract class as it does not contain enough functionality to be instantiated on its own. However, it is now joined by two additional abstract classes. These add a bit more functionality specific to the target database, but still not enough to be useful as an object in its own right.

Finally at the bottom layer are the various concrete DAL classes that act as the proxies to the database.

So that the calling application doesn’t need to know what database it is dealing with the concrete classes implement interfaces. That way the application receives a reference to an IPolicyDal, for example, without having to know whether it is connected to an Oracle database or a SQL Server database (or any other future database you may need to support)

The factory class, not shown in the UML diagram above, picks up information in a config file which tells it which specialised DAL to actually create. It returns that specialised instance through a reference to the interface. This means that the calling code does not need to know what specialised class has actually been created. Nor should it need to care.

using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;

namespace Dal
{
    /// <summary>
    /// Creates and dispenses the correct DAL depending on the configuration
    /// </summary>
    public class DalFactory
    {
        private static IPolicyDal thePolicyDal = null;
        private static IClaimDal theClaimDal = null;

        /// <summary>
        /// Gets the policy DAL
        /// </summary>
        /// <returns>A dal that implements IPolicyDal</returns>
        public static IPolicyDal GetPolicyDal()
        {
            if (thePolicyDal == null)
            {
                string policyDalName = DalConfig.PolicyDal;
                Type policyDalType = Type.GetType(policyDalName);
                thePolicyDal =
                    (IPolicyDal)Activator.CreateInstance(policyDalType);
            }
            return thePolicyDal;
        }

        /// <summary>
        /// Gets the claim DAL
        /// </summary>
        /// <returns>A DAL that implements IClaimDal</returns>
        public static IClaimDal GetClaimDal()
        {
            if (theClaimDal == null)
            {
                string claimDalName = DalConfig.ClaimDal;
                Type claimDalType = Type.GetType(claimDalName);
                theClaimDal =
                    (IClaimDal)Activator.CreateInstance(claimDalType);
            }
            return theClaimDal;
        }
    }
}

The factory stores the each DAL class as it is created so that if a Get...() method is called repeatedly it always returns the same instance. The DAL classes should not be storing any state so this is a safe thing to do.

The DalConfig class is a very simple static class that consists of a number of property getters that retrieves the information out of the config file. The PolicyDal and ClaimDal properties return the fully qualified name of the specialised DAL class to instantiate.

Previous articles in the series:

Technorati Tags: ,

What is a DAL?

I occasionally see posts on forums asking for help with some database
problem. Then I see they’ve been using the the wizards in Visual Studio (which
is okay for a beginner, but should really be ditched once the basic concepts
have been learned). I suggest that they use a DAL and I get lots of positive
signals back that they’ll do that in future. Then a week or two later they’ve
got themselves into another problem and they’re still using wizard generated
code.

So, it occurred to me that just saying “Use a DAL” doesn’t mean anything. If
someone told me that I need to use the finkle-widget pattern I wouldn’t know
what they are talking about. And if a person has never heard the term DAL
before, or never had it explained to them, then it is just as useful as saying
“use the finkle-widget pattern”

So, I figured it would be a good idea to show what a very basic DAL looks
like and to explain what it is.

A DAL is a Data Abstraction Layer. It is the part of the application that is
responsible for communicating with a data source. That data source is typically
a database, but it can be anything you like, such as an XML file, a plain text
file or anything else that data can be read from or written to.

In this example it will be a static class that communicates with a SQL
Server. It picks up the connection string
from the config file. It will communicate with stored procedures and return
DataTables, DataSets or single values. It doesn’t do everything a DAL could do,
but it shows the basic functionality.

Whenever I create DAL classes I try and think about them as if they are a
proxy for a group of actual stored procedures. I give the methods the same name
as the stored procedures, I give each method the same parameter names as the
stored procedure.

A DAL class has to mange the connection, so in this example I have a static
initialiser that gets the connection string from the config file and stores
that.

        private static string connectionString;
        static Dal()         {             connectionString =
                ConfigurationManager.AppSettings["ConnectionString"];         }

On each query a new connection object is created. The reason for this is that
the recommendation is that you acquire-query-release. Acquire a connection,
perform your query, then release the connection back to the pool.

            cmd.Connection.Open();
            int result = (int)cmd.ExecuteScalar();
            cmd.Connection.Close();

The connection Open / Close calls are not necessary when using a DataAdapter
as it will open the connection and then close it again in the Fill method. If
the connection was open before Fill is called then it will stay open.

There are two helper methods that are not publicly available. They are
BuildCommand and BuildBasicQuery.

BuildCommand creates the connection and command object and attaches the
connection to the command. It also tells the command the name of the stored
procedure that is to be called.

BuildBasicQuery uses BuildCommand, but then attaches the command to a
DataAdapter so that it can be used to obtain query results.

The full code of the sample DAL is below:

using System;
using System.Collections.Generic;
using System.Text;
using System.Configuration;
using System.Data.Common;
using System.Data.SqlClient;
using System.Data;

namespace Cam.DataAbstractionLayer
{
    public static class Dal
    {
        private static string connectionString;

        /// <summary>
        /// The static initialiser will be called the first time any
        /// method on the class is called. This ensures that the
        /// connection string is available.
        /// </summary>
        static Dal()
        {
            connectionString =
                ConfigurationManager.AppSettings["ConnectionString"];
        }


        /// <summary>
        /// Builds the command object with an appropriate connection and
        /// sets the stored procedure name.
        /// </summary>
        /// <param name="storedProcedureName">The name of the stored
        /// procedure</param>
        /// <returns>The command object</returns>
        private static SqlCommand BuildCommand(string storedProcedureName)
        {
            // Create a connection to the database.
            SqlConnection connection = new SqlConnection(connectionString);

            // Create the command object - The named stored procedure
            // will be called when needed.
            SqlCommand result = new SqlCommand(storedProcedureName,
                connection);
            result.CommandType = CommandType.StoredProcedure;
            return result;
        }

        /// <summary>
        /// Builds a DataAdapter that can be used for retrieving the
        /// results of queries
        /// </summary>
        /// <param name="storedProcedureName">The name of the stored
        /// procedure</param>
        /// <returns>A data adapter</returns>
        private static SqlDataAdapter BuildBasicQuery(
            string storedProcedureName)
        {
            SqlCommand cmd = BuildCommand(storedProcedureName);

            // Set up the data adapter to use the command already setup.
            SqlDataAdapter result = new SqlDataAdapter(cmd);
            return result;
        }

        /// <summary>
        /// A sample public method. There are no parameters, it simply
        /// calls a stored procedure that retrieves all the products
        /// </summary>
        /// <returns>A DataTable containing the product data</returns>
        public static DataTable GetAllProducts()
        {
            SqlDataAdapter dataAdapter = BuildBasicQuery("GetAllProducts");

            // Get the result set from the database and return it
            DataTable result = new DataTable();
            dataAdapter.Fill(result);
            return result;
        }

        /// <summary>
        /// A sample public method. It takes one parameter which is
        /// passed to the database.
        /// </summary>
        /// <param name="invoiceNumber">A number which identifies the
        /// invoice</param>
        /// <returns>A dataset containing the details of the required
        /// invoice</returns>
        public static DataSet GetInvoice(int invoiceNumber)
        {
            SqlDataAdapter dataAdapter = BuildBasicQuery("GetInvoice");
            dataAdapter.SelectCommand.Parameters.AddWithValue(
                "@invoiceNumber", invoiceNumber);

            DataSet result = new DataSet();
            dataAdapter.Fill(result);
            return result;
        }

        /// <summary>
        /// A sample public method. Creates an invoice in the database
        /// and returns the invoice number to the calling code.
        /// </summary>
        /// <param name="customerId">The id of the customer</param>
        /// <param name="billingAddressId">The id of the billing
        /// address</param>
        /// <param name="date">The date of the invoice</param>
        /// <returns>The invoice number</returns>
        public static int CreateInvoice(int customerId,
            int billingAddressId, DateTime date)
        {
            SqlCommand cmd = BuildCommand("CreateInvoice");
            cmd.Parameters.AddWithValue("@customerId", customerId);
            cmd.Parameters.AddWithValue("@billingAddressId",
                billingAddressId);
            cmd.Parameters.AddWithValue("@date", date);

            cmd.Connection.Open();
            int result = (int)cmd.ExecuteScalar();
            cmd.Connection.Close();
            return result;
        }
    }
}

So, that is that. A very basic introduction to creating a DAL.

Tags:

SQL Injection Attacks

Every day I see messages on various forums asking for help with SQL. Nothing wrong with that. People want to understand how something works, or have a partial understanding but something is keeping them from completing their task. However, I frequently also see messages that have SQL statements being built in C# or VB.NET that are extremely susceptible to injection attack. Sometimes it is from the original poster and, while they really need to learn to defend their systems, that is fine as they are trying to learn. Nevertheless there is also a proportion of people responding to these questions that give advice that opens up gaping security holes in the original poster’s system, if they follow that advice.

Consider this following example:

C#

static DataSet GetCustomersFromCountry(string countryName)
{
    SqlConnection conn = new SqlConnection("Persist Security Info=False;"+
        "Integrated Security=SSPI;database=northwind;server=(local)");
    string commandText = string.Format("SELECT * FROM Customers WHERE Country='{0}'",
        countryName);
    SqlCommand cmd = new SqlCommand(commandText, conn);
    SqlDataAdapter da = new SqlDataAdapter(cmd);
    DataSet ds = new DataSet();
    da.Fill(ds);
    return ds;
}

VB.NET

Function GetCustomersFromCountry(ByVal countryName As String) As DataSet
    Dim conn As SqlConnection = New SqlConnection("Persist Security Info=False;" + _
        "Integrated Security=SSPI;database=northwind;server=(local)")
    Dim commandText As String = String.Format( _
        "SELECT * FROM Customers WHERE Country='{0}'", _
        countryName)
    Dim cmd As SqlCommand = New SqlCommand(commandText, conn)
    Dim da As SqlDataAdapter = New SqlDataAdapter(cmd)
    GetCustomersFromCountry = New DataSet
    da.Fill(GetCustomersFromCountry)
End Function

What happens here is that what ever the value of countryName is will be inserted (injected, if you prefer) directly into the SQL string. More often than not I see examples of code on forums where there has been absolutely no checking done and the developer has used countryNameTextBox.Text directly in the string format or concatenation statement. In these cases just imagine what the effect of various unrestricted text box entries might be.

For instance, imagine the values a malicious user might put in the text box on a web form. What if they type ';DROP TABLE Customers;-- ?

That would expand the full SQL Statement passed by the .NET application to be

SELECT * FROM Customers WHERE Country='';DROP TABLE Customers; -- '

So, no more customers (at least in the database… But how long in real life?)

Some people might then say, sure, but who in their right mind would give that kind of access on a SQL Server to the ASP.NET account? If you ask that question then you cannot have seen the number of people who post code with the connection strings clearly showing that, firstly, they are using the sa account for their web application and, secondly, by posting their problem to a forum they have given to the world the password of their sa account.

Some others might say, yes yes yes, but wouldn’t an attacker would have to know what the overall SQL statement is before they can successfully inject something? Not so, I say. If you look at code posted on forums it becomes obvious that the vast majority of values from textboxes are inserted right after an opening apostrophe, like the example above. Based on that assumption, all an attacker needs to do is close the apostrophe, add a semi-colon and then inject the code they want. Finally, just to make sure that any remaining SQL from the original statement is ignored they add a couple of dashes (comment markers in SQL)

These defenders-of-bad-SQL-because-you-can-never-completely-secure-your-system-anyway-so-why-bother will often follow up with, okay okaay! But the attacker would have to know the structure of the database as well! Well, maybe not. Normally there are common table names. I’m sure most people that have been dealing with databases for a few years will have come across many with tables with the same names. Customers, Users, Contacts, Orders, Suppliers are common business table names. If that doesn’t work it may be possible to inject an attack on sysobjects. Often an attacker just gets lucky or notices a quirky output when entering something unusual and uses that to work on cracking the web site or database.

So here I present three tips for improving the security of your SQL Server database. In no particular order, they are: Use parameterised queries. Login using an appropriate account and grant only the permissions necessary. Use stored procedures.

* Using parameterised queries is really very simple, and it can make your code easier to read, and therefore to maintain. Parameters also have other advantages too (for instance you can receive values back from parameters, not just use them for sending information into the query). The previous code example can be changed very easily to use parameters. For instance:

C#

static DataSet GetCustomersFromCountry(string countryName)
{
    SqlConnection conn = new SqlConnection("Persist Security Info=False;"+
        "Integrated Security=SSPI;database=northwind;server=(local)");
    string commandText = "SELECT * FROM Customers WHERE Country=@CountryName";
    SqlCommand cmd = new SqlCommand(commandText, conn);
    cmd.Parameters.Add("@CountryName",countryName);
    SqlDataAdapter da = new SqlDataAdapter(cmd);
    DataSet ds = new DataSet();
    da.Fill(ds);
    return ds;
}

VB.NET

Function GetCustomersFromCountry(ByVal countryName As String) As DataSet
    Dim conn As SqlConnection = New SqlConnection("Persist Security Info=False;" + _
        "Integrated Security=SSPI;database=northwind;server=(local)")
    Dim commandText As String = "SELECT * FROM Customers WHERE Country=@CountryName"
    Dim cmd As SqlCommand = New SqlCommand(commandText, conn)
    cmd.Parameters.Add("@CountryName", countryName)
    Dim da As SqlDataAdapter = New SqlDataAdapter(cmd)
    GetCustomersFromCountry = New DataSet
    da.Fill(GetCustomersFromCountry)
End Function

* The application should be set up to use a specific account when accessing the SQL Server. That account should then be given access to only the things it needs. For instance:

GRANT SELECT ON Customers TO AspNetAccount

It is generally unwise to GRANT permission ON someObject TO PUBLIC because then everyone has the permission.

* My final tip is to use only stored procedures for selecting and modifying data, because then the code that accesses the tables is controlled on SQL server. You then do not need to grant access directly to the tables, only the stored procedures that are called. The extra protection then comes by virtue of the fact that the only operations that can be performed are those that the stored procedures allow. They can perform additional checks and ensure that relevant related tables are correctly updated.

NOTE: This was rescued from the Wayback Machine. The original post was dated Saturday, 25th September 2004.

Tags:


Original posts:

Excellent post Colin. I’d always wondered what a SQL Injection attack was without actually bothering to Google for it – and now I know.

9/26/2004 6:08 AM | Rob

this is great job, colin. i’ve just know the words ‘SQl injection’ but dont know what it exactly means. now i’ve got it. i can defend my own database.

thanks

10/3/2004 3:41 PM | Fired Dragon

I’ve seen people who’ve read this article thinking they can’t do it because you’ve only given .NET syntax. You should probably emphasize that ‘classic’ ADO also has Command objects with a Parameters collection, as does the more obscure underlying OLE DB object model. You can also use parameters with ODBC. There’s no excuse – parameters are cleaner, more secure, and less prone to error than building a SQL string. The slight disadvantage is that the code becomes a little less portable – SQL Server uses the @param notation, Oracle uses :param, and the OleDbCommand and OdbcCommand both use positional parameters marked with a ?

12/18/2004 1:01 AM | Mike Dimmick

I just can’t get this to work. I do exactly as the example shows, and the query doesn’t replace the parameters with the values to search for. The query came up with nothing, until I entered a row in the database with the name of the parameter as a field. It finds that, so it just doesn’t parse the command. I don’t get it.

7/27/2005 10:46 AM | Vesa Ahola

Since I cannot see your code I cannot see what is going wrong. However, I wrote a longer article about the subject over on codeproject.com. Perhaps that may help. See:

http://www.codeproject.com/cs/database/SqlInjectionAttacks.asp

7/27/2005 4:28 PM | Colin Angus Mackay

Things I keep forgetting about FileInfo

This is going to sound like a real newbie post. But I keep forgetting this particular bit of information and I keep having to write little throw away applications to find out the answer. FileInfo has a number of properties and the MSDN description on them are almost next to useless.

Given the file C:\folder\file.ext

So there we have it. The things that I keep forgetting about FileInfo that MSDN just does not explain (except the Extension property, it at least does explain that one)

NOTE: This was rescued from the Google Cache. The original was dated Monday, 28th March 2005.

The simplicity of nullable types

I just discovered nullable types. Wow! They are really simple and such a powerful feature. Just see for yourself….

If you have an int or a DateTime or any other value type you’ll already know that you cannot assign null to them. But in C#2.0 you can.

You can define a nullable int by adding a question mark to the end of the type like this:

int? a = null;

However, you’ll want the new code to operate with old code which hasn’t yet been upgraded to use nullable types, so there is a new binary operator to help. The ?? (I’ve no idea how your meant to pronounce that. I just say “Double question mark”)

So, if you want to assign a (above) to a regular int you can do the following:

int b = a ?? -1;

If a is non-null then b is assigned the same value as a. If a is null then b is assigned the value on the right side of the ??. So, just like the old days where you’d make up a value to represent null for an integer (I normally used int.MinValue)

NOTE: This was rescued from the google cache. The original date was Friday, 9th June, 2006.

Tags: