Using the Premium Slowly Changing Dimension Component
The Premium Slowly Changing Dimension Component is an SSIS data flow pipeline component that can be used to monitor changes in the value of dimension fields that change slowly. Slowly changing dimensions are dimensions with values that are relatively static and change infrequently and unpredictably. The component can process three types of slowly changing dimensions: Fixed Attributes which are never supposed to change, Changing Attributes which can change but don't have a history, and Historical Attributes which can change, and a history of all previous values is kept.
The component requires two inputs: a Current Dimension Input with data that should be monitored for changes, and a New Source Input with potentially changed data.
The component includes the following four pages to configure how you want to process data.
- Change Type Settings
- Advanced Settings
- Error Handling
The General page of the Premium Slowly Changing Dimension Component allows you to specify the general settings of the component.
- Fields to Compare Grid
The Fields To Compare grid will display all the available fields from the Current Dimension Input. Each of these fields can be paired to a field in the New Source Input. For every field that has been paired it is possible to designate these fields as a business key. A compound business key is created from all the key fields that are selected. The compound business key is used to determine which rows to compare in the Current Dimension Input and New Source Input. Selecting a Column Type for a mapped pair of fields will enable options on the Change Type Settings page.
The possible column types are:
- Fixed Attribute (Type 0) (Retain Original) - Attributes that are not supposed to change. An error row will be directed if this attribute changes, and an unchanged row will be directed if it does not change.
- Changing Attribute (Type 1) (Overwrite) - Attributes that are supposed to change. A changed row will be directed if this row changes, and an unchanged row will be directed if it does not change.
- Historical Attribute (Type 2) (Track History) - Attributes that are supposed to change and a history kept. If this attribute changes a changed row will be be directed to expire the current row, and a new row will be directed to represent the new current row.
- Surrogate Key - A surrogate key column has a numeric value that is unique in the current dimension input. Having a surrogate key can be useful when performing inserts and updates later in the workflow. The component can automatically increment this value in new rows that it creates.
- Current Record - The current record column type is used to identify a record as current or expired. The values that represent current and expired are defined on the change type settings page.
- Start Date - The start date column type is used to identify the column in the record that represents the start date for the period in which it is current.
- End Date - The end date column type is used to identify the column in the record that represents the end date for the period in which it is current. The format of the end date is defined on the change type settings page.
Note: Fields that are different types can be compared, but a warning may be shown if the comparison combination is unsupported. The component supports the following comparisons of different data types:
Data Type 1 Data Type 2 Notes Any String Type Any String Type Any Numeric Type Any Numeric Type Internally numbers are converted to a common compatable type and compared. Any String Type Boolean The component attempts to parse the string to a boolean. True/False and Yes/No strings are supported. This is a case insensitive comparison. Any Numeric Type Boolean The component treats 0 as False and 1 as True.
The current dimension input columns that are visible can be filtered by entering text in the Filter text box.
- Hide Unpaired Fields
When the Hide Unpaired Fields checkbox is checked only paired input columns will be shown.
- Hide Paired Fields
When the Hide Paired Fields checkbox is checked only unpaired input columns will be shown.
- Refresh Component Button
Clicking the Refresh Component button will reload input columns from the current dimension input and new source input.
- Map Unmapped Fields Button
Clicking the Map Unmapped Fields button will attempt to pair all the current dimension input fields to new source input fields with matching names.
- Clear All Mappings Button
Clicking the Clear All Mappings button will clear all pairings between current dimension input fields and new source input fields.
Change Type Setting
The Change Type Settings page has settings that are applicable to change types selected on the General page.
Changing Attribute Settings
If you have set one of the paired dimensions on the General page to change type "Changing Attribute" the changing attribute settings option will be enabled, the option is labelled "Change all the matching records, including outdated records, when changes are detected in a changing attribute". If this option is checked all records from primary input with a matching business key in secondary input will be updated with the new dimension value.
If this option is not checked, the component will attempt to identify the current record and only update that record. The current record for changing attributes can only be identified if dimensions on the General page have column types Start Date/End Date or Current Record.
- Historical Attribute Settings
If you have set the column type of one of the dimensions on the General page to Start Date/End Date or Current Record the historical attribute settings option will be enabled. There are two ways to identify the current and expired records, using a single column, or using start and end dates.
- Historical Attribute Settings - Single Column
Values used to detect current or expired columns can be used on string, numeric or boolean columns. The component will attempt to convert the strings that you specify to the type of the column to identify or mark the record as current or expired.
- Value when current - The value of the record when current.
- Value when expired - The value of the record when expired.
- Historical Attribute Settings - Date Columns
- Variable to set date values - The variable that is used to get the current date for setting start date and end date values in new and updated rows.
- End Date Value for Current Records - The value to use when setting the end date for current records, options are Null Value, Dec 31, 9999 or a custom date.
The Advanced Settings page contains additional settings to control the behavior of the premium slowly changing dimension component.
- Culture Identifier
Specifies the culture identifier that should be used when comparing strings.
- Ignore Leading and Trailing White Space
When Ignore Leading and Trailing White Space is checked leading and trailing white space of string fields will be ignored during comparison.
- Ignore Case
When the Ignore Case is checked case will be ignored when comparing string fields.
- Surrogate Key Settings
If you have selected a surrogate key column two additional options must be specified. In both cases the value can be specfied, loaded from a variable or auto detected.
- Surrogate Key Start Value - The value to use for the surrogate key in the first new row. If this value is set to Auto Detect the component will look for the highest surrogate key value in the current dimension input and use this as the start value.
- Surrogate Key Increment - The value to increment the surrogate key by for each new row. If this value is set to Auto Detect the component will look for the smallest increment between surrogate key values in the current dimension input and use this as the increment.
- Enable ChangeType Column for Default Output
When the ChangeType checkbox is checked the type of change, either 'Fixed', 'Updated' or 'New' will be included in an additional output column for each output.
Error Handling Page
The Error Handling page allows you to specify how errors should be handled when they happen.
There are three options available.
- Fail on error
- Redirect rows to error output
- Ignore error
When the Redirect rows to error output option is selected, rows that failed to write to the Premium Slowly Changing Dimension component will be redirected to the 'Error Output' output of the component. The blue output connection represents rows that were successfully written, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by server or the component itself.
The scenario used for this tutorial has one Supplier dimension table and a source file. Supplier dimension table will be updated using a source file.
The source file contains 5 records that will apply changes to the dimension table when processed.
|S0000001||ABCD Company||46 Lori Street||AL|
|S0000002||XYZ Corporation||1156 Balist Road||MA|
|S0000003||Test New Company||6540 Cole Alley||GA|
|S0000004||Blue Account||216 Reinger Shores||NE|
|S0000005||M Organization||153 Baff View||NJ|
The Supplier Dimension is a table in target database. Initially this table contains records for four Suppliers.
Building Slowing Changing Dimension SSIS Package
We are going to build an SSIS package which reads the data from Supplier Dimension and source file and load data into Premium Slowly Changing Dimension component to perform the comparison and configure what action we want to take when change occurs.
The genaral page in Premium Slowly Changing Dimension is used to define what the various columns of the dimension table are used for. Set column types for the columns as shown below.
We have set the SupplierName as Historical Attribute so that it will output an expired record to the ‘Changed Rows’ output, and a new record to the ‘New Rows’ output which can be used to update the dimension table. The State attribute is set to Change Attribute thus existing row in the dimension table is updated to contain the new attribute values.
In the Change Type Settings page, we define the values write to Current Record, Start Date and End Date columns.
The Advanced Settings page is used to specify how surrogate keys are generated and additional string comparison rules. Also we have the option to enable DetectedChangeType column which can be used as a reference of what action has been applied to the record.
After finishing the configuration in Premium Slowly Changing Dimension component, then we need to write the output data into Supplier Dimension table as the shown below.
Below is a snap shot of the output data after processing.
|5||S0000004||Blue Account||NE||1||2016-05-08||No Change|
|1||S0000001||ABC Company||AL||0||2017-01-13||2017-03-07||Expired (Type2)|
|4||S0000003||Test Company||NY||0||2016-10-26||2017-03-07||Expired (Type2)|
|6||S0000001||ABCD Company||AL||1||2017-03-07||New Record(Type2)|
|7||S0000003||Test New Company||GA||1||2017-03-07||New Record(Type2)|
|8||S0000005||M Organization||NJ||1||2017-03-07||New Record|