Using the XML Source Component

The XML Source component is an SSIS source component that can be used to retrieve XML documents from an HTTP URL or a local file. The components will then break up the structure and produce column data which can be consumed by downstream SSIS pipeline components.

Data Source Page

The data source page determines where your XML documents are coming from. The connection manager property can either be set to an HTTP Connection Manager, or a local file.

Local File

SSIS XML Source - File

  • Local File Path - The path on the file system to the XML document that will be extracted.
  • Encoding - The encoding of the document that will be retrieved.

HTTP Connection Manager

SSIS XML Source - HTTP

If an HTTP connection manager is selected the UI will display request settings that are used to determine the content of HTTP requests.

HTTP Method
The HTTP method that will be used to send requests.
Relative Path
The relative path is the second part of the URL that will be used to make the HTTP request.  It is combined with the base path from the connection manager to form the full path. The relative path can contain query string parameters, but they can also be added individually in the grid below.
Query String Parameters
Query string parameters that will be appended to the full path when sending HTTP requests.
Request Headers
Request headers that will be combined with the headers in the connection manager and included with HTTP requests.
Request Body
The body of the HTTP request.
Request Again Rule
The request again rules allows paged data to be retrieved.  After each request the request again rule is evaluated and if it evaluates to 'True' another request will be sent.  If the rule evaluates to false or is left blank no more requests will be sent.
Max Identical Requests
The max identical requests property will raise an error if a certain number of identical requests are sent.  The purpose of this feature is to prevent accidentally sending the same request in a never ending loop.  If 0 is selected the feature will be disabled and an unlimited of identical requests can be sent.
Preview Button
The preview button launch the Preview dialog which will send the first request based on the current settings of the component, and show a preview of the response. If there are more requests based on the value of the request again rule they can be sent and the response previewed by clicking the 'Next' button. The Preview dialog is a good way to identify HTTP configuration problems without running the package.
Expression Editor Preceding Request Tokens

Many of the properties on this page can be set as either a text merged field or an expression.  Properties default to text merge mode, but can be changed to expression mode in the Expression Editor. The expression editor for a property can be launched by clicking the fx button. The expression editor has some predefined tokens:

RequestCount
The number of requests that have been sent.  This will be 0 on the first request and incremented by 1 after each request.
Response Body
This is the body of the response from the previous request. On the first request is will be empty.
GetValueFromPreviousResponseHeaders
This is a function that is used to the value of HTTP headers from the previous response. This function is only available in expression mode.
Expression Editor Text Merge Mode

SSIS XML Source - Expression Editor Text Merge Mode

In text merge mode, property values will be whatever text was entered, but with any variables or special tokens replaced with their actual values before each HTTP request is sent.

Expression Editor Expression Mode

SSIS XML Source - Expression Editor Expression Mode

In expression mode the property is set to an expression which is evaluated before each request to determine the value of the field. Expressions can contain variables and special tokens, but also functions and conditional logic. The functions in the Value Lookup Functions category may be particularly useful for this component, as they can be used to look up values and Json documents and HTTP headers.

Design Page

This page allows you to import an example XML document or an XSD schema either from a local file, or using your HTTP settings.  The imported document is used to generate the expected hierarchy of incoming XML documents. From there you can add and remove nodes manually to refine the structure.

SSIS XML Source - Design

Once you've added some nodes by either importing or manually adding, you may need to refine the structure by editing the node properties. The following properties will have an effect on the extract process:

  • XML Name - This is the name of the element or attribute as it is in the expected XML. Notice when this property is changed the XPath and Data Name properties are automatically updated.
  • Data Name - This is the name of the Output (green rows) or Column (yellow rows). Notice that some rows do not have a Data Name property. This is because these are nodes that define hierarchy and have no immediate value to extract. These nodes are important however because it affects the XPath of child nodes.
  • Type - There are 6 different XML node types to choose from:
    • Complex Element - This node contains other nodes of any other type. 
    • Repeating Complex Elements   - This is a Complex Element that may repeat multiple times at the current XPath position.
    • Element - This node is a basic element containing a value.
    • Repeating Elements (Column)   - This node is a basic element containing a value that may repeat multiple times at the current XPath position. The output column will contain delimited text of all the values. You can change the qualifier and delimiter values of the delimited text in the Columns Page.
    • Repeating Elements (Output)   - This node is a basic element containing a value that may repeat multiple times at the current XPath position. This type will create a new output with a record for each occurrence of the element.
    • Attribute - This node is an attribute of the parent node.
  • Prefix - This contains a list of prefixes defined in the Namespaces table in the General Page, along with two special values:
    • <<Inherit>> - This node has the same prefix used by its parent. If there is no parent, then there is no prefix.
    • <<None>> - This node will use no prefix, even if its parent does.
  • XPath - This property is not editable but instead generated based on the above properties.

General Page

Configure Input and Output settings, along with defining namespaces.

SSIS XML Source - General

XML Column

Specifies the input column that contains the XML to extract.

Null Mode

Specifies what represents a NULL value in the XML. There are three options:

  • Not Found - If an element cannot be found then the value is NULL.
  • Empty String - If an element cannot be found or is empty then the value is NULL.
  • xsi:nil - If an element cannot be found or the element has an attribute of "xsi:nil" set to "True" then the value is NULL.
Outputs

In the Outputs data grid, you can check and uncheck outputs to specify if you wish to have these outputs in your component. You can also configure the following properties:

  • Output Name - The name of the output
  • Key Field - This specifies the identifier column for this output. This is important for linking outputs with each other. By default, the Key Field is set to '_RowIndex' which is a special field that contains the current count of this output element. This is useful because many times an XML object will have no key field because relationships are defined hierarchically. During runtime, the value of this field will go into the _ParentKeyField field of all of this output's children. 
Namespaces

Add and remove namespaces (by clicking the '+' and '-' buttons) and assign them prefixes. Note, the prefix does not need to match that of the incoming XML. It is simply used as an alias in XPaths that use this namespace.

Columns Page

Configure column settings for each output.

SSIS XML Source - Columns

Select the output whose columns you wish to configure in the top left combo box. Select a column in the data grid on the left to populate its properties in the property grid on the right. There are a couple of special columns to take note of:

  • _RowIndex - This column contains the current count of this output element. 
  • _ParentKeyField - This column contains the value of this records parent key field.