Staging Upsert Output

Overview

The Staging Upsert Output step first looks up a row in a table using one or more lookup keys. If the row can’t be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated if the “Variation number” is larger the the existing one.

Staging Upsert

Configuration Options

Option	Description
Step name	Name of the step
Connection	Select predefined database connection
Target schema	The name of the Schema for the table to write data to. This is important for data sources that allow for table names with dots ’.’ in it
Target table	Name of the target table

Lookup Keys

The transform uses one or more fields from the input stream as lookup keys to find existing rows in the target table. These keys should match the primary key or unique index columns in the target table for optimal performance.

Variation Number

The variation number is a field in both the input stream and the target table that determines whether an update should occur. The update logic works as follows:

If the row is not found → Insert the new row
If the row is found and all fields are identical → No action (skip)
If the row is found and fields differ:
- If input variation number > existing variation number → Update the row
- If input variation number ≤ existing variation number → No action (skip)

This mechanism allows you to control which version of data should be kept when multiple updates occur, useful for handling data from different sources or processing runs.

Best Practices

Performance Optimization

Index lookup keys: Create indexes on the lookup key columns in the target table
Batch processing: Process data in batches when possible to reduce database load
Commit size: Use appropriate commit sizes to balance performance and transaction safety

Data Quality

Validate lookup keys: Ensure lookup key values are not null before processing
Handle duplicates: Ensure input data has unique lookup key combinations
Variation number logic: Design your variation number logic carefully to ensure correct update behavior

Common Patterns

This transform is commonly used in combination with:

Get Timestamp: To get the last processed timestamp for incremental loads
Filter Rows: To filter data before upserting
Deem Java Expression: To calculate variation numbers or transform data before upserting

Transforms Overview - Overview of all transforms
Indexed Table Output - Alternative table output method
Bulk Loader - Bulk load to databases
Get Timestamp - Get timestamp for staging pipelines