Misc

What is a DAL (Part 3)

In this continuation of my series on the DAL I’m going to show the ability to create several DALs and have a Factory class instantiate the correct DAL based on settings in a config file.

One of the purposes of the DAL is to shield the application from the detail of accessing the database. You may have a system where you may have to talk to different databases depending on the way the software is installed. Typically this is most useful by creators of components that need to plug into many different database systems.

The DAL will no longer be a singleton, but rely on a Factory to create the correct DAL class. This is most useful when the data may be coming from different sources. It can also be useful when your application is in development and the database isn’t ready because you can then create a mock DAL and return that instead of the real thing.

UML Diagram showing multiple DALs

UML Diagram showing multiple DALs

The above UML Diagram shows the base DAL that was introduced in the last article in this series. Again, it is an abstract class as it does not contain enough functionality to be instantiated on its own. However, it is now joined by two additional abstract classes. These add a bit more functionality specific to the target database, but still not enough to be useful as an object in its own right.

Finally at the bottom layer are the various concrete DAL classes that act as the proxies to the database.

So that the calling application doesn’t need to know what database it is dealing with the concrete classes implement interfaces. That way the application receives a reference to an IPolicyDal, for example, without having to know whether it is connected to an Oracle database or a SQL Server database (or any other future database you may need to support)

The factory class, not shown in the UML diagram above, picks up information in a config file which tells it which specialised DAL to actually create. It returns that specialised instance through a reference to the interface. This means that the calling code does not need to know what specialised class has actually been created. Nor should it need to care.

using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;

namespace Dal
{
    /// <summary>
    /// Creates and dispenses the correct DAL depending on the configuration
    /// </summary>
    public class DalFactory
    {
        private static IPolicyDal thePolicyDal = null;
        private static IClaimDal theClaimDal = null;

        /// <summary>
        /// Gets the policy DAL
        /// </summary>
        /// <returns>A dal that implements IPolicyDal</returns>
        public static IPolicyDal GetPolicyDal()
        {
            if (thePolicyDal == null)
            {
                string policyDalName = DalConfig.PolicyDal;
                Type policyDalType = Type.GetType(policyDalName);
                thePolicyDal =
                    (IPolicyDal)Activator.CreateInstance(policyDalType);
            }
            return thePolicyDal;
        }

        /// <summary>
        /// Gets the claim DAL
        /// </summary>
        /// <returns>A DAL that implements IClaimDal</returns>
        public static IClaimDal GetClaimDal()
        {
            if (theClaimDal == null)
            {
                string claimDalName = DalConfig.ClaimDal;
                Type claimDalType = Type.GetType(claimDalName);
                theClaimDal =
                    (IClaimDal)Activator.CreateInstance(claimDalType);
            }
            return theClaimDal;
        }
    }
}

The factory stores the each DAL class as it is created so that if a Get...() method is called repeatedly it always returns the same instance. The DAL classes should not be storing any state so this is a safe thing to do.

The DalConfig class is a very simple static class that consists of a number of property getters that retrieves the information out of the config file. The PolicyDal and ClaimDal properties return the fully qualified name of the specialised DAL class to instantiate.

Previous articles in the series:

Technorati Tags: ,
Misc

What is a DAL? (Part 2)

In my last post about the DAL I presented a very basic DAL based on a static class. This is fine for small applications, but for larger applications the limitations will soon start to show themselves.

In this post I’ll convert the DAL to a series of singletons so that each singleton operates on a different part of the database.

First of all, I should explain what a singleton is. A singleton is a class that has only one instance. The way that single instance is generated is controlled from the class itself.

When some code wants to use the singleton it calls a static property called Instance to get the one and only instance of the class. The property checks to see if there is already an instance available and if not creates it. This instance is stored in a private static field on the class. To ensure that only one instance of the class is ever created

The following is the DAL with just the parts that make it a singleton

    public class InvoiceDal
    {
        /// <summary>
        /// A private field to hold the only instance of this class
        /// </summary>
        private static InvoiceDal instance;

        /// <summary>
        /// A private constructor so that the only thing that can
        /// construct this class is itself
        /// </summary>
        private InvoiceDal()
        {
            // Do initialisation stuff here.
        }

        /// <summary>
        /// A public property that retrieves the only instance of the
        /// class
        /// </summary>
        public static InvoiceDal Instance
        {
            get
            {
                if (instance == null)
                    instance = new InvoiceDal();
                return instance;
            }
        }
    }

Now the methods from the original static DAL (see previous post) can be carried over almost as they are. It is just the static keyword that has to be removed.

But, the point of this post is to refactor the old static DAL into a number of DAL classes where each class represents some component. The part given already represents the invoicing system. The next part will be for the stock inventory system.

It should, hopefully, be obvious that part of the DAL given already will be needed again. The lazy way is to just copy and paste the methods into a new DAL class. This is wasteful and the poor sod that has to maintain the code will not thank you. In practice what tends to happen is that, in your absence, you will be the butt of all jokes and your name will be dragged through the mud. This is something you don’t want to happen to you.

So, instead an abstract base class will be created which contains the common parts of the DAL. Certain decisions need to be made about goes in the base class. For example, is the base class responsible for figuring out what connection string to use? If you are only going to connect to one database then it certainly does seem to be a good choice. What if you need to (or expect to) connect to multiple databases? In those situations the derived class may be the best place to figure out the connection string. For this example, I’ll assume that multiple database are in use, but all from the same type of database system (e.g. Different databases running on SQL Server). I’ll come on to different database systems later in the series.

The abstract base class looks like this:

public abstract class DalBase
{
    private static string connectionString;

    protected DalBase()
    {
    }

    /// <summary>
    /// Builds the command object with an appropriate connection and sets
    /// the stored procedure name.
    /// </summary>
    /// <param name="storedProcedureName">The name of the stored
    /// procedure</param>
    /// <returns>The command object</returns>
    protected SqlCommand BuildCommand(string storedProcedureName)
    {
        // Create a connection to the database.
        SqlConnection connection = new SqlConnection(connectionString);

        // Create the command object - The named stored procedure
        // will be called when needed.
        SqlCommand result = new SqlCommand(storedProcedureName,
            connection);
        result.CommandType = CommandType.StoredProcedure;
        return result;
    }

    /// <summary>
    /// Builds a DataAdapter that can be used for retrieving the results
    /// of queries
    /// </summary>
    /// <param name="storedProcedureName">The name of the stored
    /// procedure</param>
    /// <returns>A data adapter</returns>
    protected SqlDataAdapter BuildBasicQuery(string storedProcedureName)
    {
        SqlCommand cmd = BuildCommand(storedProcedureName);

        // Set up the data adapter to use the command already setup.
        SqlDataAdapter result = new SqlDataAdapter(cmd);
        return result;
    }
}

And the derived class looks like this:

public class InvoiceDal : DalBase
{

    /// <summary>
    /// A private field to hold the only instance of this class
    /// </summary>
    private static InvoiceDal instance;

    /// <summary>
    /// A private constructor so that the only thing that can
    /// construct this class is itself
    /// </summary>
    private InvoiceDal()
    {
        connectionString =
            ConfigurationManager.AppSettings["InvoiceConnectionString"];
    }

    /// <summary>
    /// A public property that retrieves the only instance of the class
    /// </summary>
    public static InvoiceDal Instance
    {
        get
        {
            if (instance == null)
                instance = new InvoiceDal();
            return instance;
        }
    }

    /// <summary>
    /// A sample public method. There are no parameters, it simply calls
    /// a stored procedure that retrieves all the products
    /// </summary>
    /// <returns>A DataTable containing the product data</returns>
    public DataTable GetAllProducts()
    {
        SqlDataAdapter dataAdapter = BuildBasicQuery("GetAllProducts");

        // Get the result set from the database and return it
        DataTable result = new DataTable();
        dataAdapter.Fill(result);
        return result;
    }

    /// <summary>
    /// A sample public method. It takes one parameter which is passed
    /// to the database
    /// </summary>
    /// <param name="invoiceNumber">A number which identifies the
    /// invoice</param>
    /// <returns>A dataset containing the details of the required
    /// invoice</returns>
    public DataSet GetInvoice(int invoiceNumber)
    {
        SqlDataAdapter dataAdapter = BuildBasicQuery("GetInvoice");
        dataAdapter.SelectCommand.Parameters.AddWithValue(
            "@invoiceNumber", invoiceNumber);

        DataSet result = new DataSet();
        dataAdapter.Fill(result);
        return result;
    }

    /// <summary>
    /// A sample public method. Creates an invoice in the database and
    /// returns the invoice number to the calling code.
    /// </summary>
    /// <param name="customerId">The id of the customer</param>
    /// <param name="billingAddressId">The id of the billing
    /// address</param>
    /// <param name="date">The date of the invoice</param>
    /// <returns>The invoice number</returns>
    public int CreateInvoice(int customerId, int billingAddressId,
        DateTime date)
    {
        SqlCommand cmd = BuildCommand("CreateInvoice");
        cmd.Parameters.AddWithValue("@customerId", customerId);
        cmd.Parameters.AddWithValue("@billingAddressId",
            billingAddressId);
        cmd.Parameters.AddWithValue("@date", date);

        cmd.Connection.Open();
        int result = (int)cmd.ExecuteScalar();
        cmd.Connection.Close();
        return result;
    }

}

It is now possible to create more of these derived classes for various parts of the application all sharing the same common code in the base class.

Tags:

Misc

db4o: An Embeddable Database Engine for Object-Oriented Environments

Just posted details about this event for Scottish Developers in Glasgow.

The unique design of db4o’s native object database engine makes it the ideal choice to be embedded in equipment and devices, in packaged software running on mobile or desktop platforms, or in real-time control systems – in short: in all Java and .NET environments, where no database administrator (DBA) is present. db4o is used by some of the world’s most innovative companies, including Ricoh, Boeing, BMW, Bosch, Seagate, and Intel.

Jim Paterson describes the key features of db4o, and shows how to save objects in the database, use the db4o query mechanisms, control transactions and object activation, refactor databases, replicate data to a relational database and use db4o in a web app. You will also find out about extensive the developer resources provided by the db4o community.

Jim is a Lecturer in the School of Engineering and Computing at Glasgow Caledonian University. He is an author of “The Definitive Guide to db4o”, the first book to be published about db4o, and has written articles for the onJava and ASP Today websites. He has a particular interest in teaching software development. He has a PhD in Physics, and worked previously as a research physicist and as a software engineer.

Map of the venue showing nearby transport links

Room 16 is on Level 2 of the Continuing Professional Development Centre. The CPDC is on the left side (just past the bookshop) as you enter the university from the pedestrian entrance on Cowcaddens Road.

The event is FREE and all are welcome.

Tags:

Misc

Normalising the data model

Sometimes I see on forums someone who is trying to get some SQL statement to wield data in a particular way but the data model is just thwarting their attempts, or if they do get something to work the SQL statement that does the job is horrendously complex. This tends to happen because the data is not normalised (or “normalized” if you are using the American spelling) to the third normal form. Denormalised data models tend to happen for two reasons. Firstly, because the modeller is inexperienced and does not realise the faux pas they have made in the model, and, secondly, because the modeller has found the a properly normalised data model just doesn’t have the performance needed to do the job required.

The Scenario

In the example scenario that I am going to present an private education company is looking to build a system that helps track their tutors and students. So as not to be overwhelming I am only going to concentrate on one aspect of the system – the tutor. A tutor may be multilingual and can teach in a variety of languages and they may also be able to teach a number of subjects. The Tutor table has joins to a table for languages and a table for the subjects. The model looks like this:

The denormalised data model
Partially denormalised data model

As you can see there are 3 joins from Tutors to Languages and 4 joins from Subjects to Tutors. This makes joins between these tables particularly complex. For example, to find out the languages that a tutor speaks then a query like this would have to be formed.

SELECT  l1.Name AS LanguageName1,
        l2.Name as LanguageName2,
        l3.Name as LanguageName3
FROM Tutors AS t
LEFT OUTER JOIN Languages AS l1 ON l1.LanguageID = t.Language1
LEFT OUTER JOIN Languages AS l2 ON l2.LanguageID = t.Language2
LEFT OUTER JOIN Languages AS l3 ON l3.LanguageID = t.Language3
WHERE t.TutorID = @TutorID

So, what happens if the tutor is fluent in more than three languages? Either the system cannot accept the fourth language it will have to be changed to accommodate it. If the latter option is chosen imagine the amount of work needed to make that change.

A similar situation occurs with the join to the Subjects table.

Solution

A better way to handle this sort of situation is with a many-to-many join. Many database systems cannot directly create a many-to-many join between two tables and must create an intermediate table. For those database systems that appear to be able to model a many-to-many join directly (GE-Smallworld comes to mind) what is actually happening is that an intermediate table is being created in the background that isn’t normally visible and the database takes care of this automatically.

The resulting data model will look like this

The normalised data model
Normalised data model

This allows a tutor to be able to register any number of languages or subjects. It also means that any joins on the data are easier as there are no duplicate joins for each Language or Subject. The above SELECT statement can be rewritten as:

SELECT  l.Name AS LanguageName
FROM Tutors AS t
INNER JOIN TutorLanguage as tl ON tl.TutorID = t.TutorID
INNER JOIN Languages as l ON tl.LanguageID = l.LanguageID
WHERE t.TutorID = @TutorID

This will result in one row being returned for each language rather than all the languages being returned into one row. It is possible to pivot the results back to one row, but currently in SQL Server 2000 that would add more complexity to the query than I am willing to discuss in this article. If you want to know how to pivot results in SQL Server 2000 then see the page on Cross-Tab Reports in the SQL Server books-online. SQL Server 2005 will allow PIVOTed results directly. For more information between the SQL Server 2000 and 2005 way of doing things see: Pivot (or Unpivot) Your Data – Windows IT Pro

Migrating existing data

Naturally, if you have existing data using the denormalised schema and you want to migrate it to the normalised schema then you will need to be careful about the order in which changes are made lest you lose your data.

  1. Create the intermediate table.
  2. Change any stored procedures using the denormalised schema to the normalised schema.
    • You may also need to change code outside the database. If you find yourself needing to do this then I strongly recommend that you read about the benefits of stored procedures.
  3. Perform an insert for each of the denormalised joins into the intermediate table
  4. Remove the old joins.

If possible the above should be scripted so that the database changes occur as quickly as possible as, depending on your situation, you may have to take your production system off-line while making the change. Testing the changes in a development environment first should ensure that the scripts are written well and don’t fall over when being run on the production database.

To move the denormalised Language joins to the normalised schema some SQL like this can be used.

INSERT INTO TutorLanguage
    SELECT TutorID, Language1 AS LanguageID
    FROM Tutors
    WHERE Language1 IS NOT NULL
UNION
    SELECT TutorID, Language2 AS LanguageID
    FROM Tutors
    WHERE Language2 IS NOT NULL
UNION
    SELECT TutorID, Language3 AS LanguageID
    FROM Tutors
    WHERE Language3 IS NOT NULL

It can, of course, be written as a series of individual INSERT INTO…SELECT statements rather that a large UNIONed SELECT

NOTE: This was rescued from the Google Cache. The original date was Sunday 3rd April 2005.

Tags:

Misc

Types of join

Occasionally there is a post on a forum asking what a certain type of join is all about, so I thought it would probably be good to have a stock explanation to refer people to so that I don’t re-write near enough the same response each time the question arises.

First lets consider these two tables.

A

Key         Data
----------- ----------
1           a
2           b

B

Key         Data
----------- ----------
1           c
3           d

We can see that the only match is where Key is 1.

INNER JOIN

In an INNER JOIN that will be the only thing returned. If we use the query

SELECT A.[Key] AS aKey, A.Data AS aData, B.[Key] AS bKey, b.Data AS bData
FROM A
INNER JOIN B ON a.[Key] = b.[Key]

the returned set will be

aKey        aData      bKey        bData
----------- ---------- ----------- ----------
1           a          1           c

In the case of the various outer joins non-matches will be returned also.

LEFT OUTER JOIN

In a LEFT OUTER JOIN everything on the left side will be returned. Any matches on the right side will be returned also, but if there is no match on the right side then nulls are returned instead.

The query

SELECT A.[Key] AS aKey, A.Data AS aData, B.[Key] AS bKey, b.Data AS bData
FROM A
LEFT OUTER JOIN B ON a.[Key] = b.[Key]

returns

aKey        aData      bKey        bData
----------- ---------- ----------- ----------
1           a          1           c
2           b          NULL        NULL

RIGHT OUTER JOIN

The RIGHT OUTER JOIN is very similar to the LEFT OUTER JOIN, except that, of course, the matching is reversed. Everything on the right side is returned, and only matches on the left side are returned. Any non-matches will be filled with nulls on the left side.

The query

SELECT A.[Key] AS aKey, A.Data AS aData, B.[Key] AS bKey, b.Data AS bData
FROM A
RIGHT OUTER JOIN B ON a.[Key] = b.[Key]

returns

aKey        aData      bKey        bData
----------- ---------- ----------- ----------
1           a          1           c
NULL        NULL       3           d

FULL OUTER JOIN

A FULL OUTER JOIN returns a set containing all rows from either side, matched if possible, but nulls put in place if not.

The query

SELECT A.[Key] AS aKey, A.Data AS aData, B.[Key] AS bKey, b.Data AS bData
FROM A
FULL OUTER JOIN B ON a.[Key] = b.[Key]

returns

aKey        aData      bKey        bData
----------- ---------- ----------- ----------
1           a          1           c
2           b          NULL        NULL
NULL        NULL       3           d

CROSS JOIN

The CROSS JOIN doesn’t obey the same set of rules as the other joins. This is because it doesn’t care about matching rows from either side, so there is no ON qualifier within the join clause. This is a simple join that joins all rows on the left side to all rows on the right side. Where the inner join and left/right outer join cannot return more rows than exist in the most populous of the source tables and the full outer join’s maximum result set if the sum of the source rows, the CROSS JOIN will return the product of rows from each side. If you have 5 rows in Table A, and 6 rows in Table B it will return a set containing 30 rows.

The query

SELECT A.[Key] AS aKey, A.Data AS aData, B.[Key] AS bKey, b.Data AS bData
FROM A
CROSS JOIN B

returns

aKey        aData      bKey        bData
----------- ---------- ----------- ----------
1           a          1           c
2           b          1           c
1           a          3           d
2           b          3           d

NOTE: This was rescued from the Google Cache: The original date was Monday, 27th February 2006.

Tags:

Misc

The Normal Forms of Database Normalization

I’m doing a clear out at the moment as I’m getting the house ready to sell – I’m planning to move to West Lothian – and I came across some old notes about database normalisation. It was a summary about the first 5 normal forms so I thought I’d share them.

First Normal Form: Every column contains just one value. That means that if you have a person’s name it should be split up to forename and surname, an address is split up so that the city and postcode are in separate columns, and so on.

Second Normal Form: The First Normal Form plus every table has a primary key. That means that everything can be uniquely referenced through the primary key.

Third Normal Form: The Second Normal Form plus every non-key column depends on the primary key. So for example, if you have an Orders table then columns such as DateOfOrder and DateOfDispatch are valid but listing the client’s details such as name, address and so on are not valid.

Fourth Normal Form: Remove repeating columns to a new table. Repeating columns are indicative of a missing relationship. So, if you have an Orders table it will not have item1, item2, and item3 columns. They will be removed to a separate table and form part of a join to the orders table.

Fifth Normal Form: Remove repetition within columns. Simply, this is enumerated values. For example, a company may accept Visa, Mastercard and American Express for payment. Rather than repeat those strings in the order payment table, they can be places in a small lookup table and the order payment table can refer to their smaller primary key.

However, I should point out that my notes are at odds with other references that I found. It would seem that the correct Fifth Normal Form is to do with ternary many-to-many relationships. In fact a quick search on the internet found that what ever I had written down as the Fifth Normal Form isn’t to be found. More research is in order to find out where my idea of the Fifth Normal Form comes from

NOTE: This was rescued from the Google Cache. The original date was Saturday, 22nd April 2006.

Misc

Delete the row, or set a deleted flag?

For auditing purposes some database system run without deleting rows from the database – they just flag the row as being deleted and ensure that queries that run on the table add a line to the WHERE clause to filter out the “deleted” rows.

However, as Daniel Turini points out on Code Project, this can hurt the database performance.

Specifically he says:

Bear in mind that having a flag leads to bad index performance. In my (rather old) SQL Server DO’s and DONT’s[^], I suggest you not to index on the “sex” or “gender” field:

First, let’s understand how indexes speed up table access. You can see indexes as a way of quickly partitioning a table based on a criteria. If you create an index with a column like “Sex”, you’ll have only two partitions: Male and Female. What optimization will you have on a table with 1,000,000 rows? Remember, mantaining an index is slow. Always design your indexes with the most sparse columns first and the least sparse columns last, e.g, Name + Province + Sex.

Creating a “deleted” flag will lead to bad performance, on several common cases. What I’d suggest is to move those rows to a “Deleted” table, and have a view using UNION ALL so you can easily access all the data for auditing purposes.

Not to mention that it’s easy to forget an “AND NOT IsDeleted” in some of your queries.

However, others recon the performance hit is negligable. Or that moving the data to a new table could damage the data. It is an interesting debate and you can read it here, from the start.

NOTE: This was rescued from the Google Cache. The original date was Thursday, 29th June 2006.

Tags: