How-to Compare Two DB2 Tables Using EXCEPT SQL Function

The other day someone was showing me how Advanced Query Tool (AQT) can be used to compare the data in two tables. I started wondering if such a thing could be done using generic SQL or if one had to write a program to compare the tables. I found that such a thing could be done in DB2 using the EXCEPT function (beginning with v9 on z/OS, I think??).

To compare all the columns/rows in TABLE1 to those in TABLE2 you can use the EXCEPT function as follows:

SELECT * FROM TABLE1
EXCEPT
SELECT * FROM TABLE2;

The above SQL will display all rows from TABLE1 that don’t match or exist on TABLE2. The number of columns being compared – and their data type – have to match and you can’t compare certain column types (such as CLOB/BLOB/XML). The nice thing about this, is the EXCEPT function takes care of all the comparison details for you unlike when trying to use the WHERE NOT EXISTS clause.

If at the same time you also want to know what columns/rows from TABLE2 don’t match or exist in TABLE1, reverse the order of the tables in a second EXCEPT select query and join the two results with a UNION clause. Example:

(SELECT * FROM TABLE1
EXCEPT
SELECT * FROM TABLE2)
UNION
(SELECT * FROM TABLE2
EXCEPT
SELECT * FROM TABLE1);

This will list out all the rows in TABLE1 that don’t exist or match on TABLE2 *AND* all the rows in TABLE2 that don’t match or exist in TABLE1. The problem with this SQL is it is difficult to tell from the output which table the row comes from. For example, if the row exists on both tables but there is a column value mismatch, the row will get printed twice – once for each EXCEPT query. If a row doesn’t exist in the other table it will be listed once. Which table does it come from? To make it easier to identify which select/table the row is from, you can add a hard coded identifier to the select clause like the following:

SELECT J1.*, 'TABLE1' AS SRC_TBL
FROM (
      SELECT * FROM TABLE1
      EXCEPT
      SELECT * FROM TABLE2
      ) AS J1
UNION
SELECT J2.*, 'TABLE2' AS SRC_TBL
FROM (
      SELECT * FROM TABLE2
      EXCEPT
      SELECT * FROM TABLE1
      ) AS J2
WITH UR;

There are many other options to fine tune the comparison such as using a WHERE condition to only compare certain rows, or comparing specific columns by including the columns in the select clause – remember the number of columns selected from each table and their data type must match. You can also include a FETCH FIRST n ROWS ONLY clause to avoid ‘run away’ comparisons.

I’m also assuming that the two tables being compared have the same primary/unique key defined. It’s not required, but if the table does not have a primary/unique key, it becomes difficult to sort the results in a way that makes the comparison easy to decipher.

Also take a look at the INTERSECT function which can be used to identify all the rows that DO match.

You can read more about these functions on IBM’s website.

Have a great day!

7 thoughts on “How-to Compare Two DB2 Tables Using EXCEPT SQL Function”

  1. I’ve come across the except as well, thought is was just what i needed to compare static data between 2 test-environments.

    My problem is however that i need to compare remote tables. On different machines.
    DBA has created Aliasses, and being logged in to machine A as dbuser A, i CAN do a select on both tables, using the alias for the remote db.

    The select -except statement however, fails on:
    ‘-512: STATEMENT REFERENCE TO REMOTE OBJECT IS INVALID’

    i.e.
    ‘select * from db1.schema1.table_a’ (local table)=OK

    ‘select * from db1.schema2.table_a’ (local alias for remote db2.schema2.table_a)=OK

    but
    ‘select * from db1.schema1.table_a except select * from db1.schema2.table_a’
    or even
    ‘(select * from db1.schema1.table_a) except (select * from db1.schema2.table_a)’
    throws error -512

    I find this strange as a straight select does work for the remote table via alias.
    Probably nothing related to except statement?

    Thnx,
    Alex

    1. Since the EXCEPT compares not only the data, but the data types, my assumption would be that the function is not able to collect data using the remote system’s metadata views/tables. This would make sense since the function is tied to the system it is sourced on and has no knowledge of the remote system. Just a thought. Bestest!

    2. Did you ever solve this problem, Alex? I have the same requirement. Short of doing a DB2 download for a hundred tables and transmitting them to one node for compare using non DBMS tools, I can’t figure it out.

  2. Loved this, worked for me 🙂 I’m a tester, doing a parallel test run with before and after code, this made it far easier to see the differences.

  3. Very useful info! But is there any way to print NULL or blanks if one row doesn’t exists in one of the sides?

  4. Is there a way to do a calculation on the two tables comparing, trying to find the delta if there is a difference in the numbers?

Leave a Reply

Your email address will not be published. Required fields are marked *