The other day someone was showing me how Advanced Query Tool (AQT) can be used to compare the data in two tables. I started wondering if such a thing could be done using generic SQL or if one had to write a program to compare the tables. I found that such a thing could be done in DB2 using the
EXCEPT function (beginning with v9 on z/OS, I think??).
To compare all the columns/rows in TABLE1 to those in TABLE2 you can use the
EXCEPT function as follows:
SELECT * FROM TABLE1 EXCEPT SELECT * FROM TABLE2;
The above SQL will display all rows from TABLE1 that don’t match or exist on TABLE2. The number of columns being compared – and their data type – have to match and you can’t compare certain column types (such as CLOB/BLOB/XML). The nice thing about this, is the
EXCEPT function takes care of all the comparison details for you unlike when trying to use the
WHERE NOT EXISTS clause.
If at the same time you also want to know what columns/rows from TABLE2 don’t match or exist in TABLE1, reverse the order of the tables in a second
EXCEPT select query and join the two results with a
UNION clause. Example:
(SELECT * FROM TABLE1 EXCEPT SELECT * FROM TABLE2) UNION (SELECT * FROM TABLE2 EXCEPT SELECT * FROM TABLE1);
This will list out all the rows in TABLE1 that don’t exist or match on TABLE2 *AND* all the rows in TABLE2 that don’t match or exist in TABLE1. The problem with this SQL is it is difficult to tell from the output which table the row comes from. For example, if the row exists on both tables but there is a column value mismatch, the row will get printed twice – once for each
EXCEPT query. If a row doesn’t exist in the other table it will be listed once. Which table does it come from? To make it easier to identify which select/table the row is from, you can add a hard coded identifier to the select clause like the following:
SELECT J1.*, 'TABLE1' AS SRC_TBL FROM ( SELECT * FROM TABLE1 EXCEPT SELECT * FROM TABLE2 ) AS J1 UNION SELECT J2.*, 'TABLE2' AS SRC_TBL FROM ( SELECT * FROM TABLE2 EXCEPT SELECT * FROM TABLE1 ) AS J2 WITH UR;
There are many other options to fine tune the comparison such as using a
WHERE condition to only compare certain rows, or comparing specific columns by including the columns in the select clause – remember the number of columns selected from each table and their data type must match. You can also include a
FETCH FIRST n ROWS ONLY clause to avoid ‘run away’ comparisons.
I’m also assuming that the two tables being compared have the same primary/unique key defined. It’s not required, but if the table does not have a primary/unique key, it becomes difficult to sort the results in a way that makes the comparison easy to decipher.
Also take a look at the
INTERSECT function which can be used to identify all the rows that DO match.
You can read more about these functions on IBM’s website.
Have a great day!