Using the Diff Detector Component
The Diff Detector Component is an SSIS data flow pipeline component that can be used to compare fields from a primary and secondary input. Rows from the inputs are matched using a primary key (simple or compound key) and compared to each other to determine if the rows are unchanged, changed, deleted from the primary data source or added in the secondary data source.
The component has 4 outputs:
|Output Name||Row Type||Description|
|Unchanged Rows||Primary Row||Primary source rows that match all the compared fields of secondary source rows with the same key.|
|Changed Rows||Primary Row||Primary source rows that did not match all the compared fields of secondary source rows with the same key.|
|Deleted Rows||Primary Row||Primary source rows that did not have a matching key in the secondary source.|
|New Rows||Secondary Row||Secondary source rows that did not have a matching key in the primary source.|
The component includes the following three pages to configure how you want to read data.
- Comparison Settings
- Error Handling
The General page of the Diff Detector Component allows you to specify the general settings of the component.
- Fields to Compare grid
The Fields To Compare grid will display all the available fields from the primary input source. Each of these fields can be paired to a field in the secondary source. For every field that has been paired it is possible to designate these fields as a primary key. A compound primary key is created from all the primary fields that are selected, and the compound key must be unique within each source. The compound primary key is used to determine which rows to compare in the primary and secondary source.
The primary input columns that are visible can be filtered by entering text in the Filter text box.
- Hide Unpaired Fields
When the Hide Unpaired Fields checkbox is checked only paired input columns will be shown.
- Hide Paired Fields
When the Hide Paired Fields checkbox is checked only unpaired input columns will be shown.
- Refresh Component Button
Clicking the Refresh Component button will reload input columns from the primary and secondary sources.
- Pair Unpaired Fields Button
Clicking the Pair Unpaired Fields button will attempt to pair all the primary input fields to secondary input fields with matching names.
- Clear All Pairings Button
Clicking the Clear All Pairings button will clear all pairings between primary and secondary fields.
The Comparison Settings page of the Diff Detector Component allows you to specify the settings that will be used when comparing fields.
- Culture Identifier
Specifies the culture identifier that should be used when comparing strings.
- Ignore Leading and Trailing White Space
When Ignore Leading and Trailing White Space is checked leading and trailing white space of string fields will be ignored during comparison.
- Ignore Case
When the Ignore Case is checked case will be ignored when comparing string fields.
- Similarity Threshold
The similarity threshold allows fuzzy matching of string fields. Set the threshold to a value between 1.00 (exact match) and 0.00 (not similar at all) to allow fields to compared based on their similarity.
Error Handling page
The Error Handling page allows you to specify how errors should be handled when they happen.
There are three options available.
- Fail on error
- Redirect rows to error output
- Ignore error
When the Redirect rows to error output option is selected, rows that failed to write to the Diff Detector Component will be redirected to the 'Error Output' output of the component. As indicated in the screenshot below, the blue output connection represent rows that were successfully written, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by server or the component itself.