Automated Upsert Scripts for ExecSQL

“Upsert” is a colloquial term for the standard SQL MERGE operation.

ExecSQL is a SQL script processor that includes metacommands that allow the script to interact with the user and the file system.

“Automated” means, in this case, that the upsert operation is driven by information contained in the database’s information schema. These upsert scripts will therefore work, without customization, with any tables in any database in any DBMS that supports the information schema. All that need be done to run these scripts is to list the names of the tables to be updated and identify the schemas containing the base tables and staging tables.

Although some DBMSs provide a version of the MERGE statement (e.g., Postgres’ INSERT…ON CONFLICT statement), the execsql upsert scripts provide the following features that are not found in native implementations of the MERGE operation:

  • They can be applied to any table in any database without modification.
  • They can be applied to multiple tables simultaneously, and will perform the upsert operations in top-down order to maintain referential integrity among tables.
  • Prior to performing the upsert operation, they check for null values in columns that must be non-null.
  • Prior to performing the upsert operation, they check all staging tables for duplicate primary key values.
  • Prior to performing the upsert operation, they check foreign keys in the staging tables against both the corresponding base tables and against any other appropriate staging tables.
  • They will not attempt to perform the upsert operation on any table if there are any violations of the non-null checks, primary key checks, or foreign key checks.
  • They produce a table that either a) summarizes the number of rows that violated each type of non-null, primary key, and foreign-key check, or b) summarizes the number of rows updated and the number of rows inserted for each table.
  • Optionally, they will display all the changes to be made in a GUI interface, prompting the user to approve each update and insert operation.
  • Optionally, they will record all operations carried out in a custom log file; this log may include the SQL statements executed and the data values that were added or changed.
  • If an execsql console is active, they will use the console’s status bar and progress bar to indicate the activity underway.

These upsert scripts do not replicate all of the features that DBMS-specific implementations of the MERGE operation may have, but provide an alternative to make data QA and data loading simple when the upsert operation can be carried out solely through the use of UPDATE and INSERT statements.

In addition to the upsert operation, a script is included to perform only an update of the primary key columns of a table. This is also driven by the information schema, and will work without revision on any table of any database for any supported DBMS.

DBMS Support

Because they use the information schema views provided by the DBMS, these upsert scripts can potentially be used with any DBMS that supports the information schema. However, these scripts also use additional non-standard SQL functions, such as aggregate functions to concatenate strings, that differ across DBMSs. The method of staging new data can also differ between DBMSs. Consequently, these scripts need to be tailored for different DBMSs. Currently, versions of the upsert scripts are available for the following DBMSs:

  • PostgreSQL (9.0 or later)
  • MariaDB / MySQL (5.1 or later)
  • MS-SQL Server (2008 r2 or later)