Using the Diff Detector Component

The Diff Detector Component is an SSIS data flow pipeline component that can be used to compare fields from an 'Old Data' input and 'New Data' input. Rows from the inputs are matched using a business key (simple or composite key) and compared to each other to determine if the rows are unchanged, changed, deleted from the old data input or added in the new data input.

The component has 2 inputs:

Input Name Description
Old Data The old data input is for existing data that needs to be checked for changes.
New Data The new data input is for new data that is current, and may need to be used to update the old data.

The component has 4 outputs:

Output Name Row Type Description
Unchanged Rows Old Data Row Old data input rows that match all the compared fields of new data input rows that they were compared to.
Changed Rows Old Data and New Data Row A combination of old data input rows and new data input rows that had matching business keys, but did not match on all the compared fields.  Old data rows have the prefix 'Old.'.
Deleted Rows Old Data Row Old data input rows that did not have a matching key in the new data input.
New Rows New Data Row New data input rows that did not have a matching key in the old data input.

The component includes the following three pages to configure how you want to read data.

  • General
  • Comparison Settings
  • Error Handling

General page

The General page of the Diff Detector Component allows you to specify the general settings of the component. 

SSIS Diff Detector

Fields to Compare grid

The Fields To Compare grid will display all the available fields from the old data input.  Each of these fields can be paired to a field in the new data input. For every field that has been paired it is possible to designate the field as a business key. A composite business key is created from all the business key fields that are selected, and the composite key must be unique within each input. The composite business key is used to determine which rows to compare in the old data input and new data input.  When fields of different types are compared, the component attempts to convert them to a common type, and compare them. For example, when comparing a decimal number, 5.00 and an integer value 5 the values will be considered equal. This does not apply to key fields. Key fields must be the same value and the same type to be considered equal.

Note: Fields that are different types can be compared, but a warning may be shown if the comparison combination is unsupported.  The component supports the following comparisons of different data types:


Data Type 1 Data Type 2 Notes
Any String Type Any String Type
Any Numeric Type Any Numeric Type Internally numbers are converted to a common compatible type and compared.
Any String Type Boolean The component attempts to parse the string to a boolean.  True/False and Yes/No strings are supported. This is a case insensitive comparison.
Any Numeric Type Boolean The component treats 0 as False and 1 as True.
Match Type Column

The match type for paired fields can be set to either Exact Match, the generic Fuzzy Match, or a more specific match type. Exact match columns must be matched exactly to be considered matches. Fuzzy Match columns must meet a similarity threshold. If you select Fuzzy Match as the match type the similarity threshold will be enabled. Only string types such as VARCHAR, NVARCHAR, NTEXT and TEXT can be used with a fuzzy match filter, a warning will be shown if any other type is selected. The available match types are:

Exact Match - Fields must be exactly the same to be considered a match. The Similarity Threshold slider is disabled because matches must always have a similarity of 100%.

Fuzzy Match - Strings are considered a match if their similarity is above the specified similarity threshold.

First Name Match - Strings are treated as first names, and common nicknames are treated as the same name. For example, Bill and William would be an exact match.

Phone Number Match - Strings are parsed to attempt to identify a 10-digit phone number which is used for comparison, ignoring any other characters.

Street Address Match - Strings are treated as street addresses, and common street name prefixes and directions are treated as equivalent.  For example N. and North, or Avenue and Av. would be treated as exact matches.

US Zip Code Match - Strings are parsed to attempt to identify a 5 or 9 digit zip code which is used for comparison, ignoring other characters.

Company Name - Strings are treated as company names, and common company suffixes such as INC, CORP and LLC are ignored when comparing.

Similarity Threshold Column

When a match type other than Exact Match is selected for a paired fields the similarity threshold slider will be enabled. The similarity threshold is a value between 1% and 100% that determines how similar the values need to be do be considered matches with a value of 1% being the least similar and 100% being an exact match.

Advanced Settings Column

The advanced properties displays which advanced settings have been selected. If you edit this column, and click on the button a dialog will launch that allows you to edit the advanced settings.

Advanced Settings - Trim Leading and Trailing Whitespace

Selecting the Trim Leading and Trailing Whitespace option will remove all whitespace at the beginning and end of the primary field and the lookup field when comparing.

Advanced Settings - Ignore Case

Selecting the Ignore Case option will ignore case of the primary field and the lookup field when comparing.

Advanced Settings - Ignore All Whitespace

Selecting the Ignore All Whitespace option will remove all whitespace from the primary field and the lookup field when comparing.

Advanced Settings - Ignore All Punctuation

Selecting the Ignore All Punctuation option will remove all punctuation from the primary field and the lookup field when comparing.

Advanced Settings - Treat Empty String as Null (since v6.0)

Enabling this option allows you to specify whether to treat empty string as NULLs when performing comparison in Diff Detector. 

Filter

The old data input columns that are visible can be filtered by entering text in the Filter text box.

Hide Unpaired Fields

When the Hide Unpaired Fields checkbox is checked only paired input columns will be shown.

Hide Paired Fields

When the Hide Paired Fields checkbox is checked only unpaired input columns will be shown.

Refresh Component  Button

Clicking the Refresh Component button will reload input columns from the old data and new data inputs.

Pair Unpaired Fields Button

Clicking the Pair Unpaired Fields button will attempt to pair all the old data input fields to new data input fields with matching names.

Clear All Pairings Button

Clicking the Clear All Pairings button will clear all pairings between old data and new data fields.

Advanced Settings

The Advanced Settings page of the Diff Detector Component allows you to specify additional settings that will be used when comparing fields.

SSIS Diff Detector - Comparison Settings

Culture Identifier

Specifies the culture identifier that should be used when comparing strings.

Enable Columns for Changed Output - Columns from New Data Input

Selecting the 'Columns from New Data Input' option will cause the columns from the new data input to appear in the changed rows output.

Enable Columns for Changed Output - Columns from Old Data Input

Selecting the 'Columns from Old Data Input' option will cause the columns from the old data input to appear in the changed rows output.  These columns will have the prefix 'Old.'

Enable Columns for Unchanged Output - Columns from New Data Input 

Selecting the 'Columns from New Data Input' option will cause the columns from the new data input to appear in the unchanged rows output.

Enable Columns for Unchanged Output - Columns from Old Data Input

Selecting the 'Columns from Old Data Input' option will cause the columns from the old data input to appear in the unchanged rows output.  These columns will have the prefix 'Old.'

Duplicate Key Handling - Allow Duplicate Keys in Old Data (since v6.0)

The 'Allow Duplicate Keys in Old Data' option would only use the first row for each set of duplicates for the comparison.

Duplicate Key Handling - Allow Duplicate Keys in New Data (since v6.0)

The component would compare the each row in the set of duplicates in New Data input with the matched row in Old Data input when option "Allow Duplicate Keys in New Data" enabled.

Error Handling page

The Error Handling page allows you to specify how errors should be handled when they happen. 

SSIS Diff Detector - Error Handling

There are three options available.  

  1. Fail on error
  2. Redirect rows to error output
  3. Ignore error

When the Redirect rows to error output option is selected, rows that failed to write to the Diff Detector Component will be redirected to the 'Error Output' output of the component. As indicated in the screenshot below, the blue output connection represents rows that were successfully written, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by server or the component itself.