Using the Diff Detector Component

The Diff Detector Component is an SSIS data flow pipeline component that can be used to compare fields from an 'Old Data' input and 'New Data' input. Rows from the inputs are matched using a business key (simple or compound key) and compared to each other to determine if the rows are unchanged, changed, deleted from the old data input or added in the new data input.

The component has 2 inputs:

Input Name Description
Old Data The old data input is for existing data that needs to be checked for changes.
New Data The new data input is for new data that is current, and may need to be used to update the old data.

The component has 4 outputs:

Output Name Row Type Description
Unchanged Rows Old Data Row Old data input rows that match all the compared fields of new data input rows that they were compared to.
Changed Rows Old Data and New Data Row A combination of old data input rows and new data input rows that had matching business keys, but did not match on all the compared fields.  Old data rows have the prefix 'Old.'.
Deleted Rows Old Data Row Old data input rows that did not have a matching key in the new data input.
New Rows New Data Row New data input rows that did not have a matching key in the old data input.

The component includes the following three pages to configure how you want to read data.

  • General
  • Comparison Settings
  • Error Handling

General page

The General page of the Diff Detector Component allows you to specify the general settings of the component. 

SSIS Diff Detector

Fields to Compare grid

The Fields To Compare grid will display all the available fields from the old data input.  Each of these fields can be paired to a field in the new data input.  For every field that has been paired it is possible to designate the field as a business key. A compound business key is created from all the business key fields that are selected, and the compound key must be unique within each input. The compound business key is used to determine which rows to compare in the old data input and new data input.  When fields of different types are compared, the component attempts to convert them to a common type, and compare them.  For example when comparng a decimal number, 5.00 and an integer value 5 the values will be considered equal.  This does not apply to key fields.  Key fields must be the same value and the same type to be considered equal.

Note: Fields that are different types can be compared, but a warning may be shown if the comparison combination is unsupported.  The component supports the following comparisons of different data types:


Data Type 1 Data Type 2 Notes
Any String Type Any String Type
Any Numeric Type Any Numeric Type Internally numbers are converted to a common compatable type and compared.
Any String Type Boolean The component attempts to parse the string to a boolean.  True/False and Yes/No strings are supported. This is a case insensitive comparison.
Any Numeric Type Boolean The component treats 0 as False and 1 as True.
Similarity Column

The similarity column allows a similarity threshold to be set for paired fields. These fields will be treated as a match if their similarity is above the threshold.  Similarity can be set between 1% which would be not similar at all, and 100% which would be an exact match. Business Key fields must always be a 100% match.

Filter

The old data input columns that are visible can be filtered by entering text in the Filter text box.

Hide Unpaired Fields

When the Hide Unpaired Fields checkbox is checked only paired input columns will be shown.

Hide Paired Fields

When the Hide Paired Fields checkbox is checked only unpaired input columns will be shown.

Refresh Component  Button

Clicking the Refresh Component button will reload input columns from the old data and new data inputs.

Pair Unpaired Fields Button

Clicking the Pair Unpaired Fields button will attempt to pair all the old data input fields to new data input fields with matching names.

Clear All Pairings Button

Clicking the Clear All Pairings button will clear all pairings between old data and new data fields.

Comparison Setting

The Comparison Settings page of the Diff Detector Component allows you to specify the settings that will be used when comparing fields.

SSIS Diff Detector - Comparison Settings

Culture Identifier

Specifies the culture identifier that should be used when comparing strings.

Ignore Leading and Trailing White Space

When Ignore Leading and Trailing White Space is checked leading and trailing white space of string fields will be ignored during comparison.

Ignore Case

When the Ignore Case is checked case will be ignored when comparing string fields.

Changed Output Columns - Columns from New Data Input

Selecting the 'Columns from New Data Input' option will cause the columns from the new data input to appear in the changed rows output.

Changed Output Columns - Columns from Old Data Input

Selecting the 'Columns from Old Data Input' option will cause the columns from the old data input to appear in the changed rows output.  These columns will have the prefix 'Old.'

Error Handling page

The Error Handling page allows you to specify how errors should be handled when they happen. 

SSIS Diff Detector - Error Handling

There are three options available.  

  1. Fail on error
  2. Redirect rows to error output
  3. Ignore error

When the Redirect rows to error output option is selected, rows that failed to write to the Diff Detector Component will be redirected to the 'Error Output' output of the component. As indicated in the screenshot below, the blue output connection represents rows that were successfully written, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by server or the component itself.