How-to Connect Excel to an External Data Source

For quick and dirty SQL reports, it’s hard to beat dedicated tools such as WinSQL, SQuirrel SQL, or AQT.  However, if you find yourself running the same query on a regular basis and then manually importing the results into Microsoft Excel and customizing the look and feel of the results within Excel (alignment, formatting, titles, etc.), you may find it quicker to connect Excel directly to the external data source (database) and letting Excel run your SQL to retrieve the data directly.

The first thing to do is establish the data source connection. Select the Data tab on the ribbon and click on the From Other Sources command in the Get External Data ribbon group.

From the pop-up menu, select “From Microsoft Query.”  This will bring up the Choose Data Source panel. (Note:  I had trouble when the ‘Use the Query Wizard to create/edit queries’ item was checked so I keep it unchecked.)

Excel Choose Data Source

Select the desired connection from the list of available connections.  If you don’t see the connection needed, you will have to manually add/configure the data source first.  Clicking OK will start the Microsoft Query application and display a connect to database dialog.

Microsoft Query Connect to Database

Enter your log-in credentials for the selected database connection and click OK.  It may take a while to connect to the data source the first time.  Once connected, Microsoft Query assumes you want to create your SQL query using the graphical/visual method (similar to MS Access) and prompts you to select the tables for your query.

Microsoft Query Add Tables

We want to key the SQL directly so close the Add Tables panel and click on the SQL button from the Microsoft Query tool bar to bring up the SQL editor.

Don’t spend too much time at this point getting your SQL perfect – the SQL can be modified/edited later from within Excel.  Just key something close enough to return some data.  For example:

Microsoft Query SQL editor

Once you have your SQL entered, click OK.  Since we didn’t use the graphical SQL editor, Microsoft gives us the following Microsoft Query warning

Microsoft Query warning

Click OK to continue.  Assuming you have no SQL errors, you should see the results of your SQL in Microsoft Query.

Microsoft Query results

Now we are ready to move the data, SQL, and connection info back to Excel.  From the File menu in Microsoft Query, select Return Data to Microsoft Office Excel.

Microsoft Query return to excel

This will close Microsoft Query and take you to an Import Data panel in Excel.  Review the options as needed and click OK.

Excel Import Data

Your results are now displayed in a new table (if that is the import option you selected – the default).

Excel results1

If you save your workbook, the data connection and SQL is saved along with it.  When you want to rerun your SQL, select the Refresh command from the Connections ribbon group, or right-click in the Table and select Refresh from the pop-up menu.  You may be prompted for your user id and password to execute the SQL.

Before you start changing/tweaking your SQL (e.g., adding/removing columns, column aliases, changing the order of the columns, sort, etc.), review the “Data formatting and layout” options on the Properties panel.  To view the properties, click on the Properties command button located in the Connections ribbon group on the Data tab.

Excel Properties command button

Excel External Data Properties

I recommend turning off the Preserve column sort/filter/layout and Preserve cell formatting options until you have finalized the report.  Once you have the SQL and all column formatting the way you want (you can use any of the column formatting options for dates, currency, highlighting, alignment, etc.),  you will want to turn on Preserve column sort/filter/layout and Preserve cell formatting. If you have manually sized your columns, and want to keep them that way when you refresh the data, turn off Adjust column width.  The best way to learn how these options affect the view is by trial and error.

When you are ready to edit/modify the SQL, select the Connections command button in the Connections ribbon group of the Data tab to open the Workbook Connections panel.

Excel Connections ribbon command

Excel Workbook Connections

On the Workbook Connections panel, select the connection (a workbook can have more than one connection) then click on Properties.  On the Connection Properties panel, select the Definition tab.  The SQL is listed in the “Command text:” box.

I find it easier to edit the SQL if I maximize Connection Properties panel (click-drag on bottom right corner of panel to expand). In this example I added column aliases to the SQL. If your alias contains a space, you can use the ” (double quotes) around the alias.  I’m not sure what the SQL limitations are; I have been able run some pretty complex queries (multiple table joins, sub-selects, where conditions, etc.) without trouble.

When you are done, click OK.  You may be prompted to enter your user id and password for the database connection.  Follow the refresh instructions above to update the report.

Excel results 2

TIP:  Once you have a workbook/spreadsheet with a data connection, you could just start with that spreadsheet instead of going through Microsoft Query.  That is, pull up the spreadsheet, save it as new workbook and then edit the SQL for the new report.  If your new report requires a different data source connection, however, I recommend starting from the beginning.

Have Fun!!

How-to Compare Two DB2 Tables Using EXCEPT SQL Function

The other day someone was showing me how Advanced Query Tool (AQT) can be used to compare the data in two tables. I started wondering if such a thing could be done using generic SQL or if one had to write a program to compare the tables. I found that such a thing could be done in DB2 using the EXCEPT function (beginning with v9 on z/OS, I think??).

To compare all the columns/rows in TABLE1 to those in TABLE2 you can use the EXCEPT function as follows:

SELECT * FROM TABLE1
EXCEPT
SELECT * FROM TABLE2;

The above SQL will display all rows from TABLE1 that don’t match or exist on TABLE2. The number of columns being compared – and their data type – have to match and you can’t compare certain column types (such as CLOB/BLOB/XML). The nice thing about this, is the EXCEPT function takes care of all the comparison details for you unlike when trying to use the WHERE NOT EXISTS clause.

If at the same time you also want to know what columns/rows from TABLE2 don’t match or exist in TABLE1, reverse the order of the tables in a second EXCEPT select query and join the two results with a UNION clause. Example:

(SELECT * FROM TABLE1
EXCEPT
SELECT * FROM TABLE2)
UNION
(SELECT * FROM TABLE2
EXCEPT
SELECT * FROM TABLE1);

This will list out all the rows in TABLE1 that don’t exist or match on TABLE2 *AND* all the rows in TABLE2 that don’t match or exist in TABLE1. The problem with this SQL is it is difficult to tell from the output which table the row comes from. For example, if the row exists on both tables but there is a column value mismatch, the row will get printed twice – once for each EXCEPT query. If a row doesn’t exist in the other table it will be listed once. Which table does it come from? To make it easier to identify which select/table the row is from, you can add a hard coded identifier to the select clause like the following:

SELECT J1.*, 'TABLE1' AS SRC_TBL
FROM (
      SELECT * FROM TABLE1
      EXCEPT
      SELECT * FROM TABLE2
      ) AS J1
UNION
SELECT J2.*, 'TABLE2' AS SRC_TBL
FROM (
      SELECT * FROM TABLE2
      EXCEPT
      SELECT * FROM TABLE1
      ) AS J2
WITH UR;

There are many other options to fine tune the comparison such as using a WHERE condition to only compare certain rows, or comparing specific columns by including the columns in the select clause – remember the number of columns selected from each table and their data type must match. You can also include a FETCH FIRST n ROWS ONLY clause to avoid ‘run away’ comparisons.

I’m also assuming that the two tables being compared have the same primary/unique key defined. It’s not required, but if the table does not have a primary/unique key, it becomes difficult to sort the results in a way that makes the comparison easy to decipher.

Also take a look at the INTERSECT function which can be used to identify all the rows that DO match.

You can read more about these functions on IBM’s website.

Have a great day!