How to get a value from a text box into the database

This question was asked on a forum and I took some time to construct a reasonably lengthy reply so I’m copying it to my blog for a bit of permanence.

I suspect that many of my regular readers will be dismayed at the lack of proper architecture (e.g. layering) but we all had to start somewhere and I suspect that your first programs were not properly layered or structured either. I know mine certainly weren’t. My aim with this was to show how a simple goal can be achieved, what basic things are needed and how to fit it all together by doing the simplest thing that would work (a mantra from the agile world).

Here’s the post (slightly edited to put back some of the original context):

Okay – Let’s step back and show the whole thing from text box to database. NOTE: that this example shows everything in one place. This is generally considered poor practice, but as you are only just starting I’ll not burden you with the principles of layered architecture and the single responsibility principle and so on. (Just be aware they exist and one day you’ll have to learn about them)

So, let’s say you have a form with two text boxes, one for a name, and one for an age. Lets call them NameTB and AgeTB. The user can enter information in these text boxes and press a button that adds them to the database.

First, we need to get the data from the text boxes into a form we can use.

string name = NameTB.Text;
int age = Convert.ToInt32(AgeTB.Text);

Since text boxes only deal with strings we have to convert the string into a number (an Int32 – a 32bit integer) for the age value.

Now, we need to set up the database connection and command in order to insert this. I’ll assume you already have a connections string to your database, I’ve called it myConnectionString for this example.

SqlConnection myConnection = new SqlConnection(myConnectionString);
SqlCommand myCommand = new SqlCommand("INSERT Person(NameField, AgeField) "+
    "VALUES (@nameParam, @ageParam)", myConnection);

I’ve now set up the SQL Command with an insert statement. I’ve assumed there is a table called Person and it has two columns called NameField and AgeField. I’m also going to insert the values via parameters, which I’ve indicated with @nameParam and @ageParam. SQL Server requires that all parameter names start with an @ symbol. Other databases may vary.

myCommand.Parameters.AddWithValue("@nameParam", name);
myCommand.Parameters.AddWithValue("@ageParam", age);

We’ve now added the parameters into the SQL command and we’ve given each parameter the value we got earlier. Finally:

myConnection.Open();
myComment.ExecuteNonQuery();
myConnection.Close();

This opens the connection, runs the INSERT statement and closes the connection again. We’re using ExecuteNonQuery because we don’t expect any results back from SQL Server. If we were expecting data back (e.g. because we were using a SELECT statement) we could use ExecuteReader (for many rows/columns) or ExecuteScalar (for a single value).

This is a very basic example. I’ve not shown any error checking or exception handling. There is also the implicit assumption that all this code resides inside a button click event, which is considered poor practice for anything but a small or throw away application.

Tip of the Day #4 (Connection Strings in Config files)

From .NET 2.0 onwards a new and improved configuration management system has been put in place. You can now add a <connectionString> element to the config file and use it to place the connection strings to the database and then retrieve then in a consistent way in your application. It supports multiple connection strings too if you need to access multiple databases.

The config file looks like this:

<configuration>
...
   <connectionStrings>
    <add name="Default" connectionString="Server=(local);database=MyDatabase"/>
  </connectionStrings>
...
<configuration>

From the .NET application you can access the connection string like this:

connectionString =
    ConfigurationManager.ConnectionStrings["Default"].ConnectionString;

Just remember to add a reference to System.Configuration in your project and ensure that the code file is using the System.Configuration namespace as well.

Creating Many-to-Many joins

A topic that comes up from time to time in forums is how to join two tables together when there is a many-to-many relationship. Typical examples include teachers-to-students or articles-to-tags (to create a “tag cloud”)

If you have made any relationships in a database you will see that it is very easy to create a one-to-many join. For example, a web forum may have many posts, but a post only belongs in one forum.

To create a many-to-many relationship you need to create an intermediate table. This is a table that each side of the many-to-many can have a one-to-many relationship with. The following diagram shows the many-to-many relationship between a blog posts and the tags on it (This is not a full model, just enough to show the relationship)

Many-to-many

The BlogPost has its primary key (BlogPostId), as does the Tag (TagId). Normally you would see that key being used as the foreign key in the other table, however that wouldn’t work with a many-to-many relationship.

In order to join the two tables together an “intermediate table” needs to be created that just contains the two primary keys from either side of the relationship. Those two foreign keys make up a compound* primary key in the intermediate table.

It is normal to name the intermediate table after the each table that forms the relationship. In this case it would be “BlogPostTag” after BlogPost and Tag.

In order to join a row in the BlogPost table to a row in the Tag table you only need to insert a new row in the BlogPostTag table with the keys from either side. e.g.

INSERT BlogPostTag VALUES(@blogPostId, @tagId);

In order to remove the relationship between a blog post and a tag you only need to delete the row from the intermediate table. e.g.

DELETE BlogPostTag WHERE BlogPostId = @blogPostId AND TagId = @tagId;

 

 

* a “compound key” is one which is made up of more than one column.

 

Spatial Operations in SQL Server 2008 (Katmai) – Union and Convex Hull

CODE EXAMPLES IN THIS POST WORK WITH THE NOVEMBER 2007 CTP (CTP 5) OF SQL SERVER 2008.

Say you would like to create a polygon out of a group of points. One way of doing this is to union the points together then create a convex hull from those points. A convex hull is a polygon that contains all the points of the geometries that it is made from. “The convex hull may be easily visualized by imagining an elastic band stretched open to encompass the given object; when released, it will assume the shape of the required convex hull.” [Wikipedia:Convex Hull]

It is possible to create a convex hull from just two points, however in this case you will end up with a linestring rather than a polygon because a polygon requires a minimum of 3 points.

DECLARE @a geometry
DECLARE @b geometry

SELECT @a = geometry::STGeomFromText('POINT(0 0)',0),
       @b = geometry::STGeomFromText('POINT(10 10)', 0);

SELECT @a.STUnion(@b).STConvexHull().ToString();

Results in: LINESTRING (10 10, 0 0)

With an additional point a polygon can be created.

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText('POINT(0 0)',0),
       @b = geometry::STGeomFromText('POINT(10 10)', 0),
       @c = geometry::STGeomFromText('POINT(20 0)', 0);

SELECT @a.STUnion(@b).STUnion(@c).STConvexHull().ToString();

Results in: POLYGON ((20 0, 10 10, 0 0, 20 0))

What you’ll notice is that the polygon has 4 points, but we only gave 3 to start with. That is because the first and last point in the polygon are the same.

If you were to look at the geometry that had been created with just the union operations before the convex hull was made then you’ll see it is a MultiPoint: MULTIPOINT ((10 10), (20 0), (0 0))

graph1Unioning different types of geometry together, such as a point, linestring and polygon (see figure on the right) will, if the geometries don’t overlap, result in a GeometryCollection. For instance the code:

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText(
            'POLYGON ((25 5, 15 15, 5 5, 25 5))',0),
       @b = geometry::STGeomFromText(
            'POINT(5 10)', 0),
       @c = geometry::STGeomFromText(
            'LINESTRING(20 20, 30 5)', 0);

SELECT  @a.STUnion(@b).STUnion(@c).ToString();

 

Will result in the following: GEOMETRYCOLLECTION (POINT (5 10), LINESTRING (20 20, 30 5), POLYGON ((5 5, 25 5, 15 15, 5 5)))

Moving the point to a position within the polygon, such as POINT(15 10) will result in a geometry collection that does not contain a separate point. As the point is within the boundary of the polygon it does not need to be separately listed in the geometry collection. The actual geometry looks like this: GEOMETRYCOLLECTION (LINESTRING (20 20, 30 5), POLYGON ((5 5, 25 5, 15 15, 5 5)))

graph2Moving the linestring to travel from 5,10 to 30,10 (through the polygon) results in a geometry collection with two linestrings (see figure on the left). One that runs from 5,10 to the boundary of the polygon at 10,10 and the second that runs from the  boundary of the polygon at 20,10 to the original end point at 30,10. The resulting MultiGeometry looks like this: GEOMETRYCOLLECTION (LINESTRING (30 10, 20 10), LINESTRING (10 10, 5 10), POLYGON ((5 5, 25 5, 20 10, 15 15, 10 10, 5 5)))

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText(
            'POLYGON ((25 5, 15 15, 5 5, 25 5))',0),
       @b = geometry::STGeomFromText(
            'POINT(15 10)', 0),
       @c = geometry::STGeomFromText(
            'LINESTRING(5 10, 30 10)', 0);

SELECT @a.STUnion(@b).STUnion(@c).ToString();

Other posts in this series:

Getting started with Spatial Data in SQL Server 2008

THIS POST REFERS TO THE NOVEMBER 2007 CTP (CTP 5) OF SQL SERVER 2008

This post is probably going to be a wee bit random. After the running around over the last couple of weeks with the MSDN event (Sharepoint for Developers) in Edinburgh, trying to get the Developer Day Scotland website up, an invite to a VBUG event in Reading, the Community Leaders Day at the IoD, DDD6 and the TechNet event of Andrew Fryer‘s 8 Reasons to migrate to SQL Server 2008 I’ve finally got round to trying CTP5 of SQL Server 2008. I actually installed it in a virtual machine within hours of it becoming available for download, but it is only now I’m getting round to trying it out.

First off lets start with the way spatial data is held in SQL Server. There are two spatial types, geometry and geography. Although they sounds very close there is a fair difference between then and it is probably best not to confuse them. However, both types are very well named.

Geometry is a simple two-dimensional grid with X,Y coordinates. The British National Grid is an example of this. I would guess geometry would be most useful in systems where data comes in a specific planar/flat-earth projection, or where mapping of small areas (such as the internals of a building) are needed. Lengths and areas for geometry are easy to work out. The coordinates will have a unit of measurement attached, for instance the National Grid in the UK is in metres, so the distance between any two points can be worked out by simple Pythagorean maths and the result returned in the same unit of measurement as the coordinates.

Geography fits the spatial data on a sphere with lat/long coordinates. This is a better choice for international data or for countries where the land mass is simply too big to fit in one planar projection. However, it is important to realise that lat/long is still projected. There are various schemes for fitting a lat/long position to a place on the earth. It is important to know which is being used otherwise data from different sources may not match up. It is not so simple to calculate distances and areas on a geography type as the distance between two coordinates changes depending on where those coordinates are. For example, a line that is 5º from east to west is smaller the closer to the pole it gets with the largest distance at the equator.

According to the documentation geography also has some other limitations. No geography object may be greater in size that a hemisphere. When loading data into the database it will generate an ArgumentException if tried, and if the result of an spatial operation results in a geography greater than a single hemisphere then the result will be null.

Finally, before getting on with some code, a note on SRID (Spatial Reference Identifier). Each piece of spatial data must be tagged with an SRID. Geometry types can have a SRID of 0 (which means undefined) but geographies must have a defined SRID. By default geography types use an SRID of 4326 which equates to WGS84. Spatial operations can only occur between spatial types with the same SRID. The result of spatial operations between two pieces of data with different SRIDs is null.

With that brief introduction to geometry and geography how do you create data in the database.

CREATE TABLE Town
(
    TownId int NOT NULL,
    Name nvarchar(256),
    TownGeom geometry)

To populate the column there are a number of ways of getting the data in. Currently SQL Server supports WKT (Well Known Text), WKB (Well Known Binary) and GML (Geographic Markup Language). For other data types converters will need to be written. The following example shows WKT:

INSERT Town
VALUES(1, 'Pitcardine',
       geometry::STGeomFromText(
       'POLYGON ((0 0, 370 0, 370 160, 200 250, 0 250, 0 0))', 0));

It is also possible to use the more precise method STPolyFromText. Naturally the parser will be more strict about what WKT it accepts when using the more specialised methods. For example if the WKT for a line string is sent to the STPolyFromText method the error would look like this:

Msg 6522, Level 16, State 1, Line 2
A .NET Framework error occurred during execution of user defined routine or aggregate 'geometry':
System.FormatException: 24142: Expected POLYGON at position 1. The input has LINESTR.
System.FormatException:
   at Microsoft.SqlServer.Types.OpenGisWktReader.RecognizeToken(String token)
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePolygonTaggedText()
   at Microsoft.SqlServer.Types.OpenGisWktReader.ReadPolygon()
   at Microsoft.SqlServer.Types.SqlGeometry.STPolyFromText(SqlChars polygonTaggedText, Int32 srid)
.

At present it doesn’t seem to be possible to return the column as GML as the method isn’t found. The documentation for the method doesn’t work either so I suspect it is a feature that isn’t ready yet.

What is a DAL (Part 4)

As has been mentioned previously, one of the purposes of the DAL is to shield that application from the database. That said, what happens if a DAL throws an exception? How should the application respond to it? In fact, how can it respond to an exception that it should not know about?

If something goes wrong with a query in the database an exception is thrown. If the database is SQL Server then a SqlException is thrown. If it isn’t SQL Server then some other exception is thrown. Or the DAL may be performing actions against a completely different type of data source such as an XML file, plain text file, web service or something completely different. If the application knows nothing about the back end database (data source) then how does it know which exception to respond to?

In short, it doesn’t. It can’t know which of the myriad of possible exceptions that could be thrown will be and how to respond to it. The calling code could just catch(Exception ex) but that is poor practice. It is always best to catch the most specific exception possible.

The answer is to create a specific exception that the DAL can use. A DalException that calling code can use. The original exception is still available as an InnerException on the DalException.

using System;
using System.Runtime.Serialization;

namespace Dal
{
    public class DalException : Exception
    {
        public DalException()
            : base()
        {
        }

        public DalException(string message)
            : base(message)
        {
        }

        public DalException(string message, Exception innerException)
            : base(message, innerException)
        {
        }

        public DalException(SerializationInfo info, StreamingContext context)
            : base(info, context)
        {
        }
    }
}

The DAL will catch the original exception, create a new one based on the original and throw the new exception.

public DataSet GetPolicy(int policyId)
{
    try
    {
        SqlDataAdapter da =
            (SqlDataAdapter)this.BuildBasicQuery("GetPolicy");
        da.SelectCommand.Parameters.AddWithValue("@id", policyId);
        DataSet result = new DataSet();
        da.Fill(result);
        return result;
    }
    catch (SqlException sqlEx)
    {
        DalException dalEx = BuildDalEx(sqlEx);
        throw dalEx;
    }
}

The code for wrapping the original exception in the DAL Exception can be refactored in to a separate method so it can be used repeatedly. Depending on what it needs to do it may be possible to put that as a protected method on one of the abstract base classes

private DalException BuildDalEx(SqlException sqlEx)
{
    string message = string.Format("An exception occured in the Policy DALrn" +
        "Message: {0}", sqlEx.Message);
    DalException result = new DalException(message, sqlEx);
    return result;
}

Previous articles in the series:

 

 

Technorati Tags: , ,

 

The stored procedure now runs how fast?

A few days ago I was having some problems with a stored procedure taking too long. On our test server the stored procedure ran acceptably in a few seconds so it came as somewhat as a shock when trying the application out on the client’s test server that the same code was timing out after 20 minutes. Now given the number of times this stored procedure was going to be called the data aggregation and extraction utility it was part of would take somewhere in the region of 2 weeks to run.

So, I asked a collegue to have a look at my stored proc to see if a second set of eyes could see where the problem might lie. One of his suggestions was to pull the main select (which ran to about 2 screens of text) apart and run the subqueries independently so that we could looks at the different parts in more detail in order to determine where the bottleneck was. So I did that and discovered that the individual parts ran a lot quicker than the whole. So, I wondered, what if I just dump the results of these extracted subqueries into table variables and just plug the table variables into the main select. Curiously, it went back to taking a few seconds to run the stored procedure.

It would seem that the query optimiser got itself in a bit of a fankle over joining the subqueries up with the main query. But by extracting them to put their results in to table variables only around 7000 rows were used in the final join rather than the near 200 million (of which all but ~7000 were discarded during the join operation) . This speed improvement is very important as the live server is likely to have many millions of more rows since we last copied it to use as the test system.

When operating on very large datasets with joins onto subqueries, it can be useful to extract the subqueries in to temporary tables or table variables in order to speed up the query, and as a bonus the stored procedure will be easier to read.

NOTE: This was rescued from the Wayback Machine. The original was dated Sunday, 31st October 2004.

The benefits of Stored Procedures

There are a number of ways to access data in SQL Server, or any enterprise DBMS. There are lots of books that discuss getting data in and out of databases and the best ways to do that. Many advocate the use of stored procedures to ensure the safety of the data.

The tree main benefits that I see on stored procedures are:

  • Abstraction
  • Security
  • Performance

Stored Procedures add an extra layer of abstraction in to the design of a software system. This means that, so long as the interface on the stored procedure stays the same, then the underlying table structure can change with no noticable consequence to the application that is using the database.

For instance, if the database has to be denormalised to get a little extra performance in certain situations then the stored procedures can handle the additional updates and inserts necessary to ensure the integrity of the data across the tables. Without this the each of the callers would have ensure that these changes had taken place. Of course, the use of stored procedures does not in anyway grant a waiver from properly designing the data model, but it can help if the perfect normalised model has to give way for performance improvements.

This layer of abstraction also helps put up an extra barrier to would be intruders. If access to the data in SQL Server is only ever permitted via stored procedures then permission does not need to be explicitly set on any of the tables. Therefore none of the tables should ever need to be exposed directly to outside applications. For an outside application to modify the database, it must go through stored procedures.

Stored procedures can be written to validate any input that is sent to them to ensure the integrity of the data beyond the simple constraints otherwise available on the tables. Parameters can be checked for valid ranges. Information can be cross checked with data in other tables.

Even if it is thought that someone attempting to crack into a website will never get this far in, from a security perspective, anything that can reduce the attack surface is beneficial. 

Performance can be improved by the use of stored procedures. They are precompiled so when they are run there is no additional lag as the SQL is parsed, compiled, execution plans drawn up and then run, they just run because all that extra work is done at the time the CREATE PROCEDURE or ALTER PROCEDURE commands are run rather than when procedures themselves are run.

Another area in which stored procedures improve performance is that is pushes all the work onto the server in one go. A stored procedure can perform a series of queries and return many tables in, what is to the outside world, one operation. This saves the calling process from making many requests and the additional time of several network roundtrips. It also means that, if the contents of one set of data being returned is dependent on the results of a previous set of data that is being retrieved through the same stored procedure, that the data only has to flow from the database server to the application. If stored procedures were not being used it would mean that the data from the first database call has to get sent back to the database for the second call in order for it to continue retrieving the information needed by the application.

For instance. Lets say that Northwind traders send out a quarterly statement to its customers, and that for each statement certain information needs to be extracted from the database. The tables Customer, Order and Order Details are used. This information could be retrieved in several steps by calling the database for each set of information as it is needed to generate the statements. First with a SELECT * FROM Customers WHERE CustomerID = @CustomerID. This gets the details for the head of the statement. Then a SELECT * FROM Orders WHERE CustomerID = @CustomerID AND OrderDate>=@StartDate AND OrderDate<=@EndDate to get the details for each individual order by that customer. And finally a series of calls (one for each of the Order records that were retrieved) like SELECT * FROM [Order Details] WHERE OrderID = @OrderID

Assuming that the customer in question is “Rattlesnake Canyon Grocery” and the period for the statement is Q1 1998 then that is 5 roundtrips to the database and 5 times the database has to parse some SQL. This could be done by a single stored procedure that takes only one trip to the database and is precompiled.

 

CREATE PROCEDURE GetQuarterlyStatement
@CustomerID nvarchar(5),
@StartDate datetime,
@EndDate datetime
AS
SELECT * FROM Customers
    WHERE CustomerID=@CustomerID
SELECT * FROM Orders
    WHERE CustomerID=@CustomerID
    AND OrderDate>=@StartDate
    AND OrderDate<=@EndDate
    ORDER BY OrderDate DESC
SELECT [Order Details].* FROM [Order Details]
    INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID
    WHERE CustomerID=@CustomerID
    AND OrderDate>=@StartDate
    AND OrderDate<=@EndDate
    ORDER BY OrderDate DESC
GO

 

The stored procedure is now doing in one trip what previously took 5 trips. Of course, this example is somewhat contrived for brevity, in a real application there would be joins to the product tables, and the columns would be listed rather than using SELECT * and so on.

NOTE: This was rescued from the Wayback Machine. The original date was Friday, 1st October, 2004.

Tags:


Original comments:

A drawback using stored procedures is portability. For each targetted DBMS, you may have to rewrite SP code. If a client side abstraction such as ODBC or ADO is used, one can keep a common code base, making CM and Release Engineering activities easier. Of course it comes with a price – performance.

10/12/2004 12:23 AM | Jörgen

 

Jörgen, While that is a potential drawback, in my experience when moving from one DBMS to another the client side code has to change anyway regardless of client side abstraction. However, with a well structured DAL, the impact can be kept to a minumum.

10/12/2004 12:28 AM | Colin Angus Mackay

SQL Injection Attacks

Every day I see messages on various forums asking for help with SQL. Nothing wrong with that. People want to understand how something works, or have a partial understanding but something is keeping them from completing their task. However, I frequently also see messages that have SQL statements being built in C# or VB.NET that are extremely susceptible to injection attack. Sometimes it is from the original poster and, while they really need to learn to defend their systems, that is fine as they are trying to learn. Nevertheless there is also a proportion of people responding to these questions that give advice that opens up gaping security holes in the original poster’s system, if they follow that advice.

Consider this following example:

C#

static DataSet GetCustomersFromCountry(string countryName)
{
    SqlConnection conn = new SqlConnection("Persist Security Info=False;"+
        "Integrated Security=SSPI;database=northwind;server=(local)");
    string commandText = string.Format("SELECT * FROM Customers WHERE Country='{0}'",
        countryName);
    SqlCommand cmd = new SqlCommand(commandText, conn);
    SqlDataAdapter da = new SqlDataAdapter(cmd);
    DataSet ds = new DataSet();
    da.Fill(ds);
    return ds;
}

VB.NET

Function GetCustomersFromCountry(ByVal countryName As String) As DataSet
    Dim conn As SqlConnection = New SqlConnection("Persist Security Info=False;" + _
        "Integrated Security=SSPI;database=northwind;server=(local)")
    Dim commandText As String = String.Format( _
        "SELECT * FROM Customers WHERE Country='{0}'", _
        countryName)
    Dim cmd As SqlCommand = New SqlCommand(commandText, conn)
    Dim da As SqlDataAdapter = New SqlDataAdapter(cmd)
    GetCustomersFromCountry = New DataSet
    da.Fill(GetCustomersFromCountry)
End Function

What happens here is that what ever the value of countryName is will be inserted (injected, if you prefer) directly into the SQL string. More often than not I see examples of code on forums where there has been absolutely no checking done and the developer has used countryNameTextBox.Text directly in the string format or concatenation statement. In these cases just imagine what the effect of various unrestricted text box entries might be.

For instance, imagine the values a malicious user might put in the text box on a web form. What if they type ';DROP TABLE Customers;-- ?

That would expand the full SQL Statement passed by the .NET application to be

SELECT * FROM Customers WHERE Country='';DROP TABLE Customers; -- '

So, no more customers (at least in the database… But how long in real life?)

Some people might then say, sure, but who in their right mind would give that kind of access on a SQL Server to the ASP.NET account? If you ask that question then you cannot have seen the number of people who post code with the connection strings clearly showing that, firstly, they are using the sa account for their web application and, secondly, by posting their problem to a forum they have given to the world the password of their sa account.

Some others might say, yes yes yes, but wouldn’t an attacker would have to know what the overall SQL statement is before they can successfully inject something? Not so, I say. If you look at code posted on forums it becomes obvious that the vast majority of values from textboxes are inserted right after an opening apostrophe, like the example above. Based on that assumption, all an attacker needs to do is close the apostrophe, add a semi-colon and then inject the code they want. Finally, just to make sure that any remaining SQL from the original statement is ignored they add a couple of dashes (comment markers in SQL)

These defenders-of-bad-SQL-because-you-can-never-completely-secure-your-system-anyway-so-why-bother will often follow up with, okay okaay! But the attacker would have to know the structure of the database as well! Well, maybe not. Normally there are common table names. I’m sure most people that have been dealing with databases for a few years will have come across many with tables with the same names. Customers, Users, Contacts, Orders, Suppliers are common business table names. If that doesn’t work it may be possible to inject an attack on sysobjects. Often an attacker just gets lucky or notices a quirky output when entering something unusual and uses that to work on cracking the web site or database.

So here I present three tips for improving the security of your SQL Server database. In no particular order, they are: Use parameterised queries. Login using an appropriate account and grant only the permissions necessary. Use stored procedures.

* Using parameterised queries is really very simple, and it can make your code easier to read, and therefore to maintain. Parameters also have other advantages too (for instance you can receive values back from parameters, not just use them for sending information into the query). The previous code example can be changed very easily to use parameters. For instance:

C#

static DataSet GetCustomersFromCountry(string countryName)
{
    SqlConnection conn = new SqlConnection("Persist Security Info=False;"+
        "Integrated Security=SSPI;database=northwind;server=(local)");
    string commandText = "SELECT * FROM Customers WHERE Country=@CountryName";
    SqlCommand cmd = new SqlCommand(commandText, conn);
    cmd.Parameters.Add("@CountryName",countryName);
    SqlDataAdapter da = new SqlDataAdapter(cmd);
    DataSet ds = new DataSet();
    da.Fill(ds);
    return ds;
}

VB.NET

Function GetCustomersFromCountry(ByVal countryName As String) As DataSet
    Dim conn As SqlConnection = New SqlConnection("Persist Security Info=False;" + _
        "Integrated Security=SSPI;database=northwind;server=(local)")
    Dim commandText As String = "SELECT * FROM Customers WHERE Country=@CountryName"
    Dim cmd As SqlCommand = New SqlCommand(commandText, conn)
    cmd.Parameters.Add("@CountryName", countryName)
    Dim da As SqlDataAdapter = New SqlDataAdapter(cmd)
    GetCustomersFromCountry = New DataSet
    da.Fill(GetCustomersFromCountry)
End Function

* The application should be set up to use a specific account when accessing the SQL Server. That account should then be given access to only the things it needs. For instance:

GRANT SELECT ON Customers TO AspNetAccount

It is generally unwise to GRANT permission ON someObject TO PUBLIC because then everyone has the permission.

* My final tip is to use only stored procedures for selecting and modifying data, because then the code that accesses the tables is controlled on SQL server. You then do not need to grant access directly to the tables, only the stored procedures that are called. The extra protection then comes by virtue of the fact that the only operations that can be performed are those that the stored procedures allow. They can perform additional checks and ensure that relevant related tables are correctly updated.

NOTE: This was rescued from the Wayback Machine. The original post was dated Saturday, 25th September 2004.

Tags:


Original posts:

Excellent post Colin. I’d always wondered what a SQL Injection attack was without actually bothering to Google for it – and now I know.

9/26/2004 6:08 AM | Rob

this is great job, colin. i’ve just know the words ‘SQl injection’ but dont know what it exactly means. now i’ve got it. i can defend my own database.

thanks

10/3/2004 3:41 PM | Fired Dragon

I’ve seen people who’ve read this article thinking they can’t do it because you’ve only given .NET syntax. You should probably emphasize that ‘classic’ ADO also has Command objects with a Parameters collection, as does the more obscure underlying OLE DB object model. You can also use parameters with ODBC. There’s no excuse – parameters are cleaner, more secure, and less prone to error than building a SQL string. The slight disadvantage is that the code becomes a little less portable – SQL Server uses the @param notation, Oracle uses :param, and the OleDbCommand and OdbcCommand both use positional parameters marked with a ?

12/18/2004 1:01 AM | Mike Dimmick

I just can’t get this to work. I do exactly as the example shows, and the query doesn’t replace the parameters with the values to search for. The query came up with nothing, until I entered a row in the database with the name of the parameter as a field. It finds that, so it just doesn’t parse the command. I don’t get it.

7/27/2005 10:46 AM | Vesa Ahola

Since I cannot see your code I cannot see what is going wrong. However, I wrote a longer article about the subject over on codeproject.com. Perhaps that may help. See:

http://www.codeproject.com/cs/database/SqlInjectionAttacks.asp

7/27/2005 4:28 PM | Colin Angus Mackay

Normalising the data model

Sometimes I see on forums someone who is trying to get some SQL statement to wield data in a particular way but the data model is just thwarting their attempts, or if they do get something to work the SQL statement that does the job is horrendously complex. This tends to happen because the data is not normalised (or “normalized” if you are using the American spelling) to the third normal form. Denormalised data models tend to happen for two reasons. Firstly, because the modeller is inexperienced and does not realise the faux pas they have made in the model, and, secondly, because the modeller has found the a properly normalised data model just doesn’t have the performance needed to do the job required.

The Scenario

In the example scenario that I am going to present an private education company is looking to build a system that helps track their tutors and students. So as not to be overwhelming I am only going to concentrate on one aspect of the system – the tutor. A tutor may be multilingual and can teach in a variety of languages and they may also be able to teach a number of subjects. The Tutor table has joins to a table for languages and a table for the subjects. The model looks like this:

The denormalised data model
Partially denormalised data model

As you can see there are 3 joins from Tutors to Languages and 4 joins from Subjects to Tutors. This makes joins between these tables particularly complex. For example, to find out the languages that a tutor speaks then a query like this would have to be formed.

SELECT  l1.Name AS LanguageName1,
        l2.Name as LanguageName2,
        l3.Name as LanguageName3
FROM Tutors AS t
LEFT OUTER JOIN Languages AS l1 ON l1.LanguageID = t.Language1
LEFT OUTER JOIN Languages AS l2 ON l2.LanguageID = t.Language2
LEFT OUTER JOIN Languages AS l3 ON l3.LanguageID = t.Language3
WHERE t.TutorID = @TutorID

So, what happens if the tutor is fluent in more than three languages? Either the system cannot accept the fourth language it will have to be changed to accommodate it. If the latter option is chosen imagine the amount of work needed to make that change.

A similar situation occurs with the join to the Subjects table.

Solution

A better way to handle this sort of situation is with a many-to-many join. Many database systems cannot directly create a many-to-many join between two tables and must create an intermediate table. For those database systems that appear to be able to model a many-to-many join directly (GE-Smallworld comes to mind) what is actually happening is that an intermediate table is being created in the background that isn’t normally visible and the database takes care of this automatically.

The resulting data model will look like this

The normalised data model
Normalised data model

This allows a tutor to be able to register any number of languages or subjects. It also means that any joins on the data are easier as there are no duplicate joins for each Language or Subject. The above SELECT statement can be rewritten as:

SELECT  l.Name AS LanguageName
FROM Tutors AS t
INNER JOIN TutorLanguage as tl ON tl.TutorID = t.TutorID
INNER JOIN Languages as l ON tl.LanguageID = l.LanguageID
WHERE t.TutorID = @TutorID

This will result in one row being returned for each language rather than all the languages being returned into one row. It is possible to pivot the results back to one row, but currently in SQL Server 2000 that would add more complexity to the query than I am willing to discuss in this article. If you want to know how to pivot results in SQL Server 2000 then see the page on Cross-Tab Reports in the SQL Server books-online. SQL Server 2005 will allow PIVOTed results directly. For more information between the SQL Server 2000 and 2005 way of doing things see: Pivot (or Unpivot) Your Data – Windows IT Pro

Migrating existing data

Naturally, if you have existing data using the denormalised schema and you want to migrate it to the normalised schema then you will need to be careful about the order in which changes are made lest you lose your data.

  1. Create the intermediate table.
  2. Change any stored procedures using the denormalised schema to the normalised schema.
    • You may also need to change code outside the database. If you find yourself needing to do this then I strongly recommend that you read about the benefits of stored procedures.
  3. Perform an insert for each of the denormalised joins into the intermediate table
  4. Remove the old joins.

If possible the above should be scripted so that the database changes occur as quickly as possible as, depending on your situation, you may have to take your production system off-line while making the change. Testing the changes in a development environment first should ensure that the scripts are written well and don’t fall over when being run on the production database.

To move the denormalised Language joins to the normalised schema some SQL like this can be used.

INSERT INTO TutorLanguage
    SELECT TutorID, Language1 AS LanguageID
    FROM Tutors
    WHERE Language1 IS NOT NULL
UNION
    SELECT TutorID, Language2 AS LanguageID
    FROM Tutors
    WHERE Language2 IS NOT NULL
UNION
    SELECT TutorID, Language3 AS LanguageID
    FROM Tutors
    WHERE Language3 IS NOT NULL

It can, of course, be written as a series of individual INSERT INTO…SELECT statements rather that a large UNIONed SELECT

NOTE: This was rescued from the Google Cache. The original date was Sunday 3rd April 2005.

Tags: