Creating Many-to-Many joins

A topic that comes up from time to time in forums is how to join two tables together when there is a many-to-many relationship. Typical examples include teachers-to-students or articles-to-tags (to create a “tag cloud”)

If you have made any relationships in a database you will see that it is very easy to create a one-to-many join. For example, a web forum may have many posts, but a post only belongs in one forum.

To create a many-to-many relationship you need to create an intermediate table. This is a table that each side of the many-to-many can have a one-to-many relationship with. The following diagram shows the many-to-many relationship between a blog posts and the tags on it (This is not a full model, just enough to show the relationship)

Many-to-many

The BlogPost has its primary key (BlogPostId), as does the Tag (TagId). Normally you would see that key being used as the foreign key in the other table, however that wouldn’t work with a many-to-many relationship.

In order to join the two tables together an “intermediate table” needs to be created that just contains the two primary keys from either side of the relationship. Those two foreign keys make up a compound* primary key in the intermediate table.

It is normal to name the intermediate table after the each table that forms the relationship. In this case it would be “BlogPostTag” after BlogPost and Tag.

In order to join a row in the BlogPost table to a row in the Tag table you only need to insert a new row in the BlogPostTag table with the keys from either side. e.g.

INSERT BlogPostTag VALUES(@blogPostId, @tagId);

In order to remove the relationship between a blog post and a tag you only need to delete the row from the intermediate table. e.g.

DELETE BlogPostTag WHERE BlogPostId = @blogPostId AND TagId = @tagId;

Technorati Tags: database,joins,relationships,many-to-many,er diagram

* a “compound key” is one which is made up of more than one column.

4 Comments

Shog9 says:

30 November, -0001 at 00:00

How would you go about handling deletion of child records (in scenarios where there’s a “many parents <-> many children” relationship)?What are your feelings on blowing normalization entirely and just storing duplicate tags directly in the BlogPost table (maybe in the case of a blogger who expects to post thousands of posts each minute but is so boring he’ll have few readers)?What are your feelings on suffixing the name of the intermediate table with “XRef”?

Colin Angus Mackay says:

30 November, -0001 at 00:00

First, which end is the child on? In order to delete rows on one end of the relationship you must first delete the relevant rows from the intermediate table that refer to the about-to-be-deleted row. Then you delete the actual row you wanted to delete in the first place.Posting the tags directly in the blog post (and denormalising the model) would be an acceptable “optimisation” if you didn’t want to build a tag cloud or to be able to look up blog posts by tag. You could probably turn on full text indexing and get some of that functionality back, though. And the building of a tag cloud could be a scheduled batch operation. But the above was really just an example of a many-to-many join situation. It wasn’t a discussion on what would be the best in a particular situation.I don’t see a problem in putting a suffix of XRef on the end of the name of the intermediate table. However, in all the systems I’ve worked on the style was simply to concatenate the names of the tables on either side in some way and leave it there.

Pete O'Hanlon says:

30 November, -0001 at 00:00

And there you have one of my pet hates. Deletes from a database. It’s just such a messy kludge, and I’ve seen many a careless developer trip over cascading deletes leading to the “Dude where’s my data?” syndrome. Oh well.

Stuff that's in my head says:

30 November, -0001 at 00:00

Stats for February

Share this:

Related

4 Comments

Leave a Comment Cancel reply