Composing and Decomposing Data in SSIS

02 October 2020
Mohamed Abdalkarim

In a perfect world, integrating data from one application or database system to another should be a straight-forward and simple process. For instance, if you are integrating client information, you should map the client's name in the first system to the name field in the second one with ease. This simple approach should also apply to the client's address, telephone number, email address and any other data collected. In theory, this should all work out well; however, there are situations where it is not always that effortless.

Let us consider this example:

Delimited Data and Tabular Data

Table A has multiple records, and each record belongs to a different set of groups. However, the group assignment for a certain record is comma-separated data values in a single cell representing all the groups a record can be a member of. This format is not compatible with the target integration system; therefore, data transformation is required so that each row can represent only one group assignment as shown in Table B.

A real-life example of this scenario could be a client with multiple phone numbers. The numerous telephone details are stored within a single field on the source application, separated by a comma. As for the destination application or database, each telephone number needs to be in its own field. This makes it challenging to extract the data into a typical tabular form to integrate successfully. 

This blog post will demonstrate how using our latest components within the SSIS Productivity Pack, Composition and Decomposition, users can parse each individual value from a delimited text form with ease and vice versa.

Using Decomposition Component

Continuing with our multiple telephone number example mentioned earlier, here is an SSIS demonstration where each contact could have multiple phone numbers. All the phone numbers for a given contact are included in a single cell, separated by a comma. By enabling Data Viewer, we can examine the input records' format and see how this data is represented.

Default Output Data Viewer of Delimited Data

As shown above, this may not be the standard tabular format we expect to integrate with another database system or application with a defined schema. Using the out-of-the-box components of SSIS, a possible solution for this problem would be to use a SQL script to turn the comma-separated string into individual rows from the Source itself. However, this would not work if the source system is not a database. A second possible solution would be using a Script component to parse this data; however, this will require writing your own C# or VB code, which can be complicated and harder to maintain in integration packages.

As of version 20.1 of our SSIS Productivity Pack, we have added new Composition and Decomposition components, which can be used to work with these types of data formats. The Decomposition component is a data flow transformation component used to split delimited input values into multiple rows. Back to our example, attach the source component to the Decomposition component. Configure the component as follows:

Decomposition Component Editor

First, select the Input Column that will include the multiple data values; in this case, this is the PhoneNumber column. Next, select the format of the column by working with the Source Type option. Choose from the following options:

  • Delimited Text - If the input is in delimited text format.
  • JSON Array - If the input is a JSON array.
  • XML Elements - If the input is XML format.

In this case, the input is a "Delimited Text" and thus, configure the Delimited Text settings based on the format expected. Finally, in the Output Settings section, choose to have the parsed data values in its own new Column, or replace the existing Input Column with the new data values.

Once the Decomposition Component has been configured, enable Data Viewer to examine the format of the output. As you can see, the data has been parsed and split into different rows.

Decomposition Component Data Flow Results

Now each record will represent a contact along with a single Phone Number value. If there is a contact with multiple phone numbers, there will be multiple records for the same contact; each contains a single phone number.

Using Composition Component

The Composition component performs the exact opposite use case of the example discussed earlier. Here the dataset includes repeating rows for the same contact, each with a different phone number. The goal is to group these data values in a single cell based on a given key. So the input data is represented in the following format:

Default Output Data Viewer of Tabular Data

In this case, use our Composition component, a data flow transformation component used to combine/merge multiple rows into one row based on a defined set of keys. Therefore, attach the source component to our Composition component and configure the component as follows:

Composition Component Editor

Select the Input Column that will include the data values to group together. In this scenario, this is the PhoneNumber column. Next, select the column's format by working with the Destination Type option consisting of the same three settings mentioned earlier (Delimited Text, JSON Array, XML Elements).

Select the Key Columns which will be the criteria of which values should be grouped. In this example, it is the ID field. This means that records that share the same value for the ID field will have their PhoneNumber values grouped and separated by a comma.

There are also additional options that can be configured based on the format of the input data. For instance, the "Input Records Are Scrambled" option can be enabled if the input records are not sorted based on the Key Field. Additionally, the Output Settings section can output the combined data values in its own new Column or replace the existing Input Column with the new data values.

With the Composition component configured, enable Data Viewer to examine the output format to show how the data has been grouped in a single row based on a common ID value.

Composition Component Data Flow Results

This way, each record will represent a contact and all the PhoneNumber values that belong to it based on the ID field as a key.

Closing Notes

With our new Composition and Decomposition components, working with delimited or tabular-formatted data is now easier than ever before. Whether looking to extract each individual data value into a typical tabular format from a delimited-text formatted cell (use Decomposition) or group data values together into a single cell where values are separated by a delimiter (use Composition), our SSIS Productivity Pack will provide you with the tools needed to integrate virtually any application or database system together, regardless of how the data is formatted.

comments powered by Disqus