Aggregate of columns, not rows

In SQL Server it is very easy to aggregate a value across many rows. You simply just write something like this:

SELECT MAX(SomeColumn)
FROM SomeTable

This will return one row with the aggregate value (in this case MAXimum) of the column.

However, what if you want to do this across columns. What if you have a table with two or more columns you want to aggregate for each row?

You can do this with a tedious case statement that just gets more and more cumbersome for each additional column you have, especially if any of the columns are nullable. Or, you can create a subquery with a table in it and you aggregate the table in the subquery exposing the final value to your main query already aggregated.

Here’s an example:

SELECT Id, 
       (SELECT MAX(num) FROM (VALUES(st.ANumber), (st.AnotherNumber)) AS AllNumbers(num)) AS MaxNumber
FROM SomeTable st

The second line contains the subquery which is then exposed as a column in the final result set.

The subquery effectively pivots the columns into rows, then aggregates the rows. Just be careful with where you put the brackets so that it interprets them as separate rows rather than columns.

This also deals with NULL values quite effectively too, since the aggregate function will ignore any null value it finds.

DROP/CREATE vs ALTER on SQL Server Stored Procedures

We have a number of migration scripts that modify the database as our applications progress. As a result we have occasion to alter stored procedures, but during development the migrations may be run out of sequence on various test and development databases. We sometimes don’t know if a stored procedure will exist on a particular database because of the state of various branches of development.

So, which is better, dropping a stored procedure and recreating it, or just simply altering the existing one, and if we’re altering it, what needs to happen if it simply doesn’t exist already?

DROP / CREATE

This is the code we used for the DROP/CREATE cycle of adding/updating stored procedures

IF EXISTS (SELECT * FROM sys.objects WHERE type = 'P' AND name = 'MyProc')
BEGIN
       DROP PROCEDURE MyProc;
END
GO
 
CREATE PROCEDURE MyProc
AS
	-- Body of proc here
GO

CREATE / ALTER

We recently changed the above for a new way of dealing with changes to stored procedures. What we do now is detect if the procedure exists or not, if it doesn’t then we create a dummy. Regardless of whether the stored procedure previously existed or not, we can now ALTER the stored procedure (whether it was an existing procedure or the dummy one we just created

That code looks like this

IF NOT EXISTS (SELECT * FROM sys.objects WHERE type = 'P' AND name = 'MyProc')
BEGIN
       EXEC('CREATE PROCEDURE [dbo].[MyProc] AS BEGIN SET NOCOUNT ON; END')
END
GO
 
ALTER PROCEDURE [dbo].[MyProc]
AS
	-- Body of proc here
GO

This looks counter-intuitive to create a dummy procedure just to immediately alter it but it has some advantages over drop/create.

If an error occurs with drop/create while creating the stored procedure and there had been a stored procedure to drop originally you are now left without anything. With the create/alter in the event of an error when altering the stored procedure the original is still available.

Altering a stored procedure keeps any security settings you may have on the procedure. If you drop it, you’ll lose those settings. The same goes for dependencies.

Tip of the day: How to tell why your app couldn’t log on to SQL Server

When you get a log in failure on SQL Server the message you get back from SQL Server Management Studio, or in a .NET Exception is vague for security. They don’t want to give away too much information just in case.

For example, the exception message will be something like “Login failed for user ‘someUser’.” which doesn’t give you much of a clue as to what is actually happening. There could be a multitude of reasons that login failed.

If you want more information about why a log-in failed you can open up the event viewer on the machine that SQL Server is installed on and have a look. You’ll find a more detailed message there.

The wider messages may be things like:

  • “Login failed for user ‘someUser’. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]”
  • Login failed for user ‘someUser’. Reason: Password did not match that for the login provided. [CLIENT: <local machine>]
  • Login failed for user ‘someUser’. Reason: Failed to open the explicitly specified database. [CLIENT: <local machine>]
    Note: This could be because the database doesn’t exist, or because the user doesn’t have permissions to the database.

Optimising clustered indexes in SQL Server 2008

I’ve just found a script on another blog to go through all the clustered indexes in a SQL Server database and rebuild them in order to reduce fragmentation and improve the disk IO needed to get the data. The original script didn’t take into account tables in different schemas so I updated it. I also added some metrics to it so I could get a sense of how long the operation takes on each table.

The script can take quite a while to run. On my database it took over 6 minutes just to initially run the query to work out what needed rebuilding, and each index can take several seconds (or possibly more if you have a lot of data) on its own.

The new script is here:

SET NOCOUNT ON
DECLARE @Schema SYSNAME;
DECLARE @Table SYSNAME;
DECLARE @Index SYSNAME;
DECLARE @Rebuild NVARCHAR(4000)
DECLARE @StartTime DATETIME = GETUTCDATE();

PRINT (CONVERT(NVARCHAR(100), GETUTCDATE(), 113) + ' : Rebuild all indexes with over 10% fragmentation.')

DECLARE DB CURSOR FOR 
SELECT SS.name [schema], SO.Name [table], SI.Name [index]
FROM SYS.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
INNER JOIN SYS.objects SO ON SO.object_id=SYS.dm_db_index_physical_stats.object_id
INNER JOIN SYS.schemas SS ON SO.schema_id=SS.schema_id
INNER JOIN SYS.indexes SI ON SI.index_id=SYS.dm_db_index_physical_stats.index_id AND 
SI.object_id=sys.dm_db_index_physical_stats.object_id 
-- Begin select only clustered indexes Index_id = 1
AND SI.Index_id = 1
-- End select only clustered indexes Index_id = 1
WHERE avg_fragmentation_in_percent > 10.0 AND SYS.dm_db_index_physical_stats.index_id > 0
ORDER BY SO.Name 

OPEN DB
FETCH NEXT FROM DB INTO @Schema, @Table, @Index
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @Rebuild = 'ALTER INDEX ' + @Index + ' ON ' + @Schema + '.' + @Table + ' REBUILD'

  PRINT (CONVERT(NVARCHAR(100), GETUTCDATE(), 113) + ' : ' + @Rebuild)

  -- Comment out the following line to see what tables would be affected without rebuilding the indexes
  EXEC SP_EXECUTESQL @Rebuild

  FETCH NEXT FROM DB INTO @Schema, @Table, @Index
END
CLOSE DB
DEALLOCATE DB

DECLARE @Duration DATETIME = GETUTCDATE() - @StartTime;
PRINT (CONVERT(NVARCHAR(100), GETUTCDATE(), 113) + ' : Finished. Duration = '+CONVERT(NVARCHAR(100), @Duration, 114))

SET NOCOUNT OFF

Rewiring the users and logins in SQL Server

As a developer I find that I’m frequently backing up and restoring SQL Server databases between servers for development and testing purposes. However, each time I do the link between the login (a server concept) and the user (a per database concept) gets broken.

There is a stored procedure in SQL Server to wire it all back up again and I keep forgetting what it is. So here it is (my aide memoir):

sp_change_users_login: It maps an a database user to a SQL Server login.

The quick and easy way is as follows:

sp_change_users_login 'Update_One', 'myUserName', 'myLoginName'

 

If you really must do dynamic SQL…

I may have mentioned in previous posts and articles about SQL Injection Attacks that dynamic SQL (building SQL commands by concatenating strings together) is a source of failure in the security of a data driven application. It becomes easy to inject malicious text in there to cause the system to return incorrect responses. Generally the solution is to use parameterised queries

However, there are times where you may have no choice. For example, if you want to dynamically reference tables or columns. You can’t do that as the table name or column name cannot be replaced with a parameter. You then have to use dynamic SQL and inject these into a SQL command.

The problem

It is possible for SQL Server to do that concatenation for you. For example:

CREATE PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @sql nvarchar(max) =
		'SELECT ' + @ColumnName +
		' FROM ' + @TableName +
		' WHERE Id = '+cast(@Id as nvarchar(20));
	EXEC(@sql)
END
GO

This is a simple stored procedure that gets some data dynamically. However, even although everything is neatly parameterised it is no protection. All that has happened is that the location for vulnerability (i.e. the location of the construction of the SQL) has moved from the application into the database. The application is now parameterising everything, which is good. But there is more to consider than just that.

Validating the input

The next line of defence should be verifying that the table and column names passed are actually valid. In SQL Server you can query the INFORMATION_SCHEMA views to determine whether the column and tables exist.

If, for example, there is a table called MainTable in the database you can check it with a query like this:

SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'MainTable'

And it will return:

INFORMATION_SCHEMA.TABLES

There is a similar view for checking columns. For example:

INFORMATION_SCHEMA.COLUMNS

As you can see, the INFORMATION_SCHEMA.COLUMNS view also contains sufficient detail on the table so that when we implement it we only have to make one check:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + @ColumnName +
            ' FROM ' + @TableName +
            ' WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

Formatting the input

The above is only part of the solution, it is perfectly possible for a table name to contain characters that mean it needs to be escaped. (e.g. a space character or the table may share a name with a SQL keyword). To escape a table or column name it is enclosed in square brackets, so a table name of My Table becomes [My Table] or a table called select becomes [select].

You can escape table and column names that wouldn’t ordinarily require escaping also. It makes no difference to them.

The code now becomes:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT [' + @ColumnName + '] ' +
            'FROM [' + @TableName + '] ' +
            'WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

But that’s not quite the full story.

Really formatting the input

What if you have a table called Cra]zee Table? Okay – Why on earth would you have a table with such a stupid name? It happens, and it is a perfectly legitimate table name in SQL Server. People do weird stuff and you have to deal with it.

At the moment the current stored procedure will simply fall apart when presented with such input. The call to the stored procedure would look like this:

EXEC GetData 1, 'Cra]zee Table', 'MadStuff'

And it gets past the validation stage because it is a table in the system. The result is a message:

Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'Table'.

The SQL produced looks like this:

SELECT [MadStuff] FROM [Cra]zee Table] WHERE Id = 1

By this point is should be obvious why it failed. The SQL Parser interpreted the first closing square bracket as the terminator for the escaped section.

There are other special characters in SQL that require special consideration and you could write code to process them before adding it to the SQL string. In fact, I’ve seen many people do that. And more often than not they get it wrong.

The better way to deal with that sort of thing is to use a built in function in SQL Server called QUOTENAME. This will ensure the column or table name is properly escaped. The stored procedure we are now building now looks like this:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + QUOTENAME(@ColumnName) +
            ' FROM ' + QUOTENAME(@TableName) +
            ' WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

Things that can be parameterised

There is still something that can be done to this. The Id value is being injected in to the SQL string, yet it is something that can quite easily be parameterised.

The issue at the moment is that the SQL String is being executed by using the EXECUTE command. However, you cannot pass parameters into this sort of executed SQL. You need to use a stored procedure called sp_executesql. This allows parameters to be defined and passed into the dynamically created SQL.

The stored procedure now looks like this:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + QUOTENAME(@ColumnName) +
            ' FROM ' + QUOTENAME(@TableName) +
            ' WHERE Id = @Identifier';
        EXEC sp_executesql @sql, N'@Identifier int',
                           @Identifier = @Id
    END
END
GO

This is not quite the end of the story. There are performance improvements that can be made when using sp_executesql. You can find out about these in the SQL Server books-online.

And finally…

If you must use dynamic SQL in stored procedures do take care to ensure that all the data is validated and cannot harm your database. This is an area in which I tread very carefully if I have no other choice.

Try and consider every conceivable input, especially inputs outside of the bounds of your application. Remember also, that defending your database is a multi-layered strategy. Even if you have the best firewalls and security procedures elsewhere in your system a determined hacker may find a way though your other defences and be communicating with the database in a way in which you didn’t anticipate. Assume that an attacker has got through your other defences, how do you provide the data services to your application(s) yet protect the database?

Tip of the Day #5 (SQL Server memory usage)

You can limit the amount of memory that SQL Server uses by using the sp_configure stored procedure. By limiting the amount of memory that SQL Server is permitted to use it means that more memory is available to other applications or other instances of SQL Server. In fact books on-line recommends setting the minimum and maximum memory used on each instance of SQL Server running on the same machine as SQL Server does not make any attempts to balance memory usage across instances.

In order to use this you must be in an advanced mode. To set this up use:

EXEC sp_configure 'show advanced options', 1
RECONFIGURE WITH OVERRIDE 

 

Next, to make the actual change you need the following:

EXEC sp_configure 'max server memory (MB)', 512
RECONFIGURE WITH OVERRIDE 

 

The above example will set the maximum amount of memory the server will use to 512MB. The RECONFIGURE WITH OVERRIDE is necessary in order for the change to take effect immediately. If it is missed out then the change won’t take place until the SQL Server is restarted.

If you want to check that the change has taken place you can use the following:

EXEC sp_configure 'max server memory (MB)' 

 

This will just display the current setting. You will get a result set that looks something like this:

SQL Server 2005 memory options result set

The congif_value is the value that the SQL Server is currently configured with. However, it may not be what is currently in force. The run_value shows you what is currently in force.

SQL Server 2005 memory options dialogIf you don’t want to type so much SQL yourself, then you can do the same in the SQL Server Management Studio. Right-click the server in the object explorer and select “properties” from the context menu. This will bring you up a dialog with all the server level properties in it. Go to the “memory” page and you can set the values that you want there. There are a couple of radio buttons that will allow you to switch between the currently configured value and the running value. By pressing Okay the updated value is applied to the server immediately.

For more information:

Technorati Tags: ,

Spatial Operations in SQL Server 2008 (Katmai) – Union and Convex Hull

CODE EXAMPLES IN THIS POST WORK WITH THE NOVEMBER 2007 CTP (CTP 5) OF SQL SERVER 2008.

Say you would like to create a polygon out of a group of points. One way of doing this is to union the points together then create a convex hull from those points. A convex hull is a polygon that contains all the points of the geometries that it is made from. “The convex hull may be easily visualized by imagining an elastic band stretched open to encompass the given object; when released, it will assume the shape of the required convex hull.” [Wikipedia:Convex Hull]

It is possible to create a convex hull from just two points, however in this case you will end up with a linestring rather than a polygon because a polygon requires a minimum of 3 points.

DECLARE @a geometry
DECLARE @b geometry

SELECT @a = geometry::STGeomFromText('POINT(0 0)',0),
       @b = geometry::STGeomFromText('POINT(10 10)', 0);

SELECT @a.STUnion(@b).STConvexHull().ToString();

Results in: LINESTRING (10 10, 0 0)

With an additional point a polygon can be created.

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText('POINT(0 0)',0),
       @b = geometry::STGeomFromText('POINT(10 10)', 0),
       @c = geometry::STGeomFromText('POINT(20 0)', 0);

SELECT @a.STUnion(@b).STUnion(@c).STConvexHull().ToString();

Results in: POLYGON ((20 0, 10 10, 0 0, 20 0))

What you’ll notice is that the polygon has 4 points, but we only gave 3 to start with. That is because the first and last point in the polygon are the same.

If you were to look at the geometry that had been created with just the union operations before the convex hull was made then you’ll see it is a MultiPoint: MULTIPOINT ((10 10), (20 0), (0 0))

graph1Unioning different types of geometry together, such as a point, linestring and polygon (see figure on the right) will, if the geometries don’t overlap, result in a GeometryCollection. For instance the code:

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText(
            'POLYGON ((25 5, 15 15, 5 5, 25 5))',0),
       @b = geometry::STGeomFromText(
            'POINT(5 10)', 0),
       @c = geometry::STGeomFromText(
            'LINESTRING(20 20, 30 5)', 0);

SELECT  @a.STUnion(@b).STUnion(@c).ToString();

 

Will result in the following: GEOMETRYCOLLECTION (POINT (5 10), LINESTRING (20 20, 30 5), POLYGON ((5 5, 25 5, 15 15, 5 5)))

Moving the point to a position within the polygon, such as POINT(15 10) will result in a geometry collection that does not contain a separate point. As the point is within the boundary of the polygon it does not need to be separately listed in the geometry collection. The actual geometry looks like this: GEOMETRYCOLLECTION (LINESTRING (20 20, 30 5), POLYGON ((5 5, 25 5, 15 15, 5 5)))

graph2Moving the linestring to travel from 5,10 to 30,10 (through the polygon) results in a geometry collection with two linestrings (see figure on the left). One that runs from 5,10 to the boundary of the polygon at 10,10 and the second that runs from the  boundary of the polygon at 20,10 to the original end point at 30,10. The resulting MultiGeometry looks like this: GEOMETRYCOLLECTION (LINESTRING (30 10, 20 10), LINESTRING (10 10, 5 10), POLYGON ((5 5, 25 5, 20 10, 15 15, 10 10, 5 5)))

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText(
            'POLYGON ((25 5, 15 15, 5 5, 25 5))',0),
       @b = geometry::STGeomFromText(
            'POINT(15 10)', 0),
       @c = geometry::STGeomFromText(
            'LINESTRING(5 10, 30 10)', 0);

SELECT @a.STUnion(@b).STUnion(@c).ToString();

Other posts in this series:

Contradictory messages

While attempting to create a database project in Visual Studio 2008 against a SQL Server 2008 database I got a rather odd error message. The dialog used to create the project requests information about the SQL Server database. It clearly states “The server version must be 2005 or later”. No problem, I thought. So I put in the information about my SQL Server and database in the dialog and tested the connection. So far so good. But as soon as I hit the “Okay” button I got a new message. Apparently, “Only servers up to Microsoft SQL Server 2005 are supported.”

Contradictions