Data Quality and Comparison are SSIS components built to help developers have consistent, clean data.

There are currently 3 components available in the SSIS Productivity Pack that fall under this category.

The following are the data quality and comparison components available within the SSIS Productivity Pack and the link to their Help Manuals:

  • Data Profiler
    • An SSIS data flow component that can be used to analyze data and to compare rows from upstream data sources. Rows from any inputs will be passed through the component to corresponding outputs and when all the rows have been processed the component will output a single row to the "DataProfiler Output" with results of data analysis.
  • Diff Detector
    • Enables the comparison of two sources; a primary and a secondary source. Rows from the inputs are matched using a business key (simple or compound key) and compared to each other to determine if the rows are unchanged, changed, deleted from the primary data source or added in the secondary data source. 
  • Duplicate Detector
    • Compares rows within a data source to identify duplicate rows based on an approximate (Fuzzy) or exact match. The component creates two outputs: Unique Rows and Duplicate Rows. The Duplicate Rows output has 4 additional fields: Richness Score, Richness Rank, Similarity Score, and GroupID.

Youtube Video - Getting started with SSIS Productivity Pack - Data Quality and Comparison