Thursday 2 November 2006

SQL 101 2: Relational Joins

This is the second in my series of posts designed to teach people the basics of SQL and it’s use in interrogating databases.

Relational Joins
Data can be linked to other data. A sales order is linked to it’s lines, it is also linked to a customer. These links are known as relational joins. When attempting to retrieve linked data you have to tell the computer what are the key fields that you want to use to establish the link.

The link is established through Primary and Foreign Keys. In my own databases I generally provide a second and arguably more intuitive link. More on this later.

Consider the following Example, which shows a simplified Sales Order Header Table, and a simplified Customer Table:

Table: GUSER.F_SOHDR
SOH_PRIMARYKEY                        Sales Order Primary Key
SOH_NUMBER                        Sales Order Number
SOH_DATE                                Date
SOH_DELIVDATE                        Delivery Date
SOH_CUCODE                        Customer Code
SOH_FK_CUSTOMER                Foreign Key to Customer Table

Table: GUSER.F_CUSTOMER
CUS_PRIMARYKEY                        Customer Primary Key
CUS_CODE                                Customer Code
CUS_NAME                                Customer Name

As you can see the customer name is not stored against the sales order header record. (There is a very valid reason for this and it concerns normalisation, see elsewhere in this document for an explanation of normalisation.)

Now if you wish to produce a SQL Query that shows all sales orders for the first week in March, that shows the order number, order date, customer code, customer name and net value, then you need to pull in the customer name from the customer table. This means that you have to apply a join in your query, and for this first example we are going to do the join via the sales order foreign key to customer (SOH_FK_CUSTOMER) and the customer primary key (CUS_PRIMARYKEY). The query looks like this:

SELECT SOH_NUMBER,SOH_DATE,SOH_CUCODE,CUS_NAME,SOH_NET
FROM GUSER.F_SOHDR,GUSER.F_CUSTOMER
WHERE SOH_FK_CUSTOMER = CUS_PRIMARYKEY
AND
SOH_DATE BETWEEN '2004 MAR 01' AND '2004 MAR 05'

The Important things to notice here are that we have had to specify in the FROM section the fact that we wish to pull in data from more than one table. The other thing we have had to do is specify that the way we want to link the sales order to the customer is through the Primary Key / Foreign Key mechanism, you can see this as the first part of the WHERE section.

As I mentioned earlier, within my own databases I generally provide a second way of linking the data between tables, through the use of a Common or Convenient code. In this example we could have used the Customer Code stored against the order (SOH_CUCODE) and the Customer Code stored in the customer table (CUS_CODE), to link the records. This is demonstrated in the example shown below, it returns exactly the same data as the first example.

SELECT SOH_NUMBER,SOH_DATE,SOH_CUCODE,CUS_NAME,SOH_NET
FROM GUSER.F_SOHDR,GUSER.F_CUSTOMER
WHERE SOH_CUCODE = CUS_CODE
AND
SOH_DATE BETWEEN '2004 MAR 01' AND '2004 MAR 05'


The only difference between the two examples is the in the first part of the WHERE section, where I have specified the column names used to join the two tables.

NOTE: If you specify more than one table in the FROM section of your query and do not specify how to link the tables in your WHERE section, then you end up with what is know as a Cartesian Product. (THIS IS A BAD THING). A Cartesian Product is mathematically a binary operation in which two objects are combined in an ‘everything in combination with everything’ fashion. This can cause a serious system slowdown, and will return nonsensical data.

An understanding of joins is essential for you to progress to more advanced SQL Queries, and to allow you to collate and interrogate data successfully.

"Temptation rarely comes in working hours. It is in their leisure time that men are made or marred." - W N Taylor

No comments: