Using the Diff Detector Component

The Diff Detector Component is an SSIS data flow pipeline component that can be used to compare fields from an 'Old Data' input and 'New Data' input. Rows from the inputs are matched using a business key (simple or composite key) and compared to each other to determine if the rows are unchanged, changed, deleted from the old data input or added in the new data input.

The component has 2 inputs:

Input Name Description
Old Data The old data input is for existing data that needs to be checked for changes.
New Data The new data input is for new data that is current, and may need to be used to update the old data.

The component has 4 outputs:

Output Name Row Type Description
Unchanged Rows Old Data Row Old data input rows that match all the compared fields of new data input rows that they were compared to.
Changed Rows Old Data and New Data Row A combination of old data input rows and new data input rows that had matching business keys, but did not match on all the compared fields. Old data rows have the prefix 'Old.'.
Deleted Rows Old Data Row Old data input rows that did not have a matching key in the new data input.
New Rows New Data Row New data input rows that did not have a matching key in the old data input.

The component includes the following three pages to configure how you want to read data.

  • General
  • Comparison Settings
  • Error Handling

General Page

The General page of the Diff Detector Component allows you to specify the general settings of the component.

SSIS Diff Detector

Fields to Compare grid

The Fields To Compare grid will display all the available fields from the old data input. Each of these fields can be paired to a field in the new data input. For every field that has been paired it is possible to designate the field as a business key. A composite business key is created from all the business key fields that are selected, and the composite key must be unique within each input. The composite business key is used to determine which rows to compare in the old data input and new data input. When fields of different types are compared, the component attempts to convert them to a common type, and compare them. For example, when comparing a decimal number, 5.00 and an integer value 5 the values will be considered equal. This does not apply to key fields. Key fields must be the same value and the same type to be considered equal.

Note: Fields that are different types can be compared, but a warning may be shown if the comparison combination is unsupported. The component supports the following comparisons of different data types:

Data Type 1 Data Type 2 Notes
Any String Type Any String Type
Any Numeric Type Any Numeric Type Internally numbers are converted to a common compatible type and compared.
Any String Type Boolean The component attempts to parse the string to a boolean. True/False and Yes/No strings are supported. This is a case insensitive-comparison.
Any Numeric Type Boolean The component treats 0 as False and 1 as True.
Match Type Column

The match type for paired fields can be set to either Exact Match, the generic Fuzzy Match, or a more specific match type. Exact match columns must be matched exactly to be considered matches. Fuzzy Match columns must meet a similarity threshold. If you select Fuzzy Match as the match type the similarity threshold will be enabled. Only string types such as VARCHAR, NVARCHAR, NTEXT and TEXT can be used with a fuzzy match filter, a warning will be shown if any other type is selected. The available match types are:

  • Exact Match: Fields must be exactly the same to be considered a match. The Similarity Threshold slider is disabled because matches must always have a similarity of 100%.
  • Fuzzy Match: Strings are considered a match if their similarity is above the specified similarity threshold.
  • Address Match: Strings are treated as street addresses, and common street name prefixes and directions are treated as equivalent. For example N. and North, or Avenue and Av. would be treated as exact matches.
  • First Name Match: Strings are treated as first names, and common nicknames are treated as the same name. For example, Bill and William would be an exact match.
  • Company Name: Strings are treated as company names, and common company suffixes such as INC, CORP and LLC are ignored when comparing.
  • Phone Number Match: Strings are parsed to attempt to identify a 10-digit phone number which is used for comparison, ignoring any other characters.
  • US Zip Code Match: Strings are parsed to attempt to identify a 5 or 9 digit zip code which is used for comparison, ignoring other characters.
  • Country Match (since v7.1): Strings are treated as country names, and country codes are matched with English short country names. For example, Canada and CA would be an exact match.
  • US State Match (since v7.1): Strings are treated as states, and codes are matched with the subdivision names. For example, New York and NY would be an exact match.
Similarity Column

When a match type other than Exact Match is selected for paired fields, the similarity threshold slider will be enabled. The similarity threshold is a value between 1% and 100% that determines how similar the values need to be do be considered matches with a value of 1% being the least similar and 100% being an exact match.

Comparison Settings Column

The comparison properties display which comparison settings have been selected. If you edit this column and click on the button a dialog will launch that allows you to edit the comparison settings.

Trim Leading and Trailing Whitespace

Selecting the Trim Leading and Trailing Whitespace option will remove all whitespace at the beginning and end of the primary field and the lookup field when comparing.

Ignore Case

Selecting the Ignore Case option will ignore the case of the primary field and the lookup field when comparing.

Ignore All Whitespace

Selecting the Ignore All Whitespace option will remove all whitespace from the primary field and the lookup field when comparing.

Ignore All Punctuation

Selecting the Ignore All Punctuation option will remove all punctuation from the primary field and the lookup field when comparing.

Treat Empty String as Null (since v6.0)

Enabling this option allows you to specify whether to treat an empty string as NULLs when performing a comparison in Diff Detector.

Filter

The old data input columns that are visible can be filtered by entering text in the Filter text box.

Hide Unpaired Fields

When the Hide Unpaired Fields checkbox is checked only paired input columns will be shown.

Hide Paired Fields

When the Hide Paired Fields checkbox is checked only unpaired input columns will be shown.

Refresh Component Button

Clicking the Refresh Component button will reload input columns from the old data and new data inputs.

Pair Unpaired Fields Button

Clicking the Pair Unpaired Fields button will attempt to pair all the old data input fields to new data input fields with matching names.

Clear All Pairings Button

Clicking the Clear All Pairings button will clear all pairings between old data and new data fields.

Expression fx Button

Clicking the fx button to launch SSIS Expression Editor to enable dynamic updates of the property at run time.

Generate Documentation Button

Clicking the Generate Documentation button to generate a Word document that describes the component's metadata including relevant mapping, and so on.

Advanced Settings Page

The Advanced Settings page of the Diff Detector Component allows you to specify additional settings that will be used when comparing fields.

SSIS Diff Detector - Advanced Settings

String Comparison
Culture Identifier

Specifies the culture identifier that should be used when comparing strings.

Enable Columns for Changed Output
Columns from New Data Input

Selecting the 'Columns from New Data Input' option will cause the columns from the new data input to appear in the changed rows output.

Columns from Old Data Input

Selecting the 'Columns from Old Data Input' option will cause the columns from the old data input to appear in the changed rows output. These columns will have the prefix 'Old.'

Enable Columns for Unchanged Output
Columns from New Data Input

Selecting the 'Columns from New Data Input' option will cause the columns from the new data input to appear in the unchanged rows output.

Columns from Old Data Input

Selecting the 'Columns from Old Data Input' option will cause the columns from the old data input to appear in the unchanged rows output. These columns will have the prefix 'Old.'

Duplicate Key Handling (since v6.0)
Allow Duplicate Keys in Old Data

The 'Allow Duplicate Keys in Old Data' option would only use the first row for each set of duplicates for the comparison.

Allow Duplicate Keys in New Data

The component would compare each row in the set of duplicates in the New Data input with the matched row in the Old Data input when the option "Allow Duplicate Keys in New Data" is enabled.

Error Handling Page

The Error Handling page allows you to specify how errors should be handled when they happen.

SSIS Diff Detector - Error Handling

There are three options available.

  1. Fail on error
  2. Redirect rows to error output
  3. Ignore error

When the Redirect rows to error output option is selected, rows that failed to write to the Diff Detector Component will be redirected to the 'Error Output' output of the component. As indicated in the screenshot below, the green output connection represents rows that were successfully written, and the red 'Error Output' connection represents erroneous rows. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by the server or the component itself.