Find and Merge CRM Duplicates With Ease Using SSIS Productivity Pack

04 November 2016
Chen Huang
Data quality is critical to your application system's user adoption. It is important to make sure that the data in your application system is accurate, complete, and consistent across the application. However, identifying and eliminating duplicate data is not a trivial job without a proper tool, it can be an extraordinary effort, and it can be a frustrating process when not done properly. Living in the data integration space, we understand how important it is to keep your data clean. For this particular reason, our team has developed a comprehensive solution that can help work with the situation, in a productive way.

When it comes to data quality, Microsoft Dynamics CRM (or Dynamics 365 for Sales as the latest name) provides some out-of-box tools that can help work with duplicate data. It allows you to set up Duplicate Detection Rules which can help reduce the opportunity of importing duplicate records, but this doesn’t guarantee eliminating the duplicates that are already in the system, and also your CRM user could always choose to ignore those duplicate warning by entering duplicate data through CRM UI. In addition to that, the out-of-box duplicate detection capabilities are limited to certain matching options which makes it very inflexible.

With that being said, we saw a big need for a better duplicate detection tool, so we recently introduced a Duplicate Detector component in our SSIS Productivity Pack, which is a collection of premium SSIS components that help achieve more when using SSIS as your integration platform.

First, let’s have a quick look at the Duplicate Detector component.

What is Duplicate Detector?

Duplicate Detector is an SSIS component which can be used to scan incoming rows from a data source and identify duplicate rows. It comes with 7 different matching types to meet your different duplicate detection needs. (Click here for the detailed matching algorithm of the component)

SSIS Duplicate Detector

You would typically connect the Duplicate Detector component after an SSIS source component which reads data from the source system that you would like to analyze the duplicates. You will notice that the component comes with two outputs, which are Duplicate Rows and Unique Rows.

The Duplicate Rows output has four additional fields which can be used for data quality analysis:

  • GroupID – A unique GroupID value is given to each group of duplicate records.
  • Richness Score - Each row is given a richness score which represents data quality with the value = (number of fields are not NULL )/(total number of fields).  
  • Richness Rank - Each row in a duplicate group is given a rank based on the richness score.
  • Similarity Score – The similarity percentage of the duplicate rows compared to the top-ranked duplicate row in the group.

Find and Merge CRM Duplicates

Finding CRM duplicates and merging them accordingly becomes a simplified process using KingswaySoft Duplicate Detector and CRM Destination Component.

By taking advantage of the data quality analysis feature of our Duplicate Detector component, the top-ranked duplicate record containing the richest information within the group, can be considered the target record that is being merged into.

  • RichnessRank == 1: the target records that is being merged into.
  • RichnessRank!= 1: the subordinate records of the merging.

Then we use Left Join on GroupCode to find the GUID of the target duplicate record and the GUID of the subordinate records.

The last step is to merge the duplicates in CRM by using CRM destination component, the configuration is extremely simple with just a few clicks to map the input columns to the destination fields. 

The data flow would be something like this:

manage dynamics crm duplicates

Sample SSIS Package

A sample SSIS package can be download here.

The sample package was prepared in an SQL Server 2008 R2 format, and it should automatically upgrade if you are using any newer version of SSIS (you might need to do some layout adjustment to make it look better).

comments powered by Disqus