Using the XML Source Component

The XML Source component is an SSIS source component that can be used to retrieve XML documents from an HTTP URL or a local file. The components will then break up the structure and produce column data which can be consumed by downstream SSIS pipeline components.

Data Source Page

The data source page determines where your XML documents are coming from. The connection manager property can either be set to an HTTP Connection Manager, or a local file.

Connection: Local File

SSIS XML Source - File

Local File Path

The path on the file system to the XML document that will be extracted.

Encoding

The encoding of the document that will be retrieved.

Connection: HTTP Connection Manager

SSIS XML Source - HTTP

If an HTTP connection manager is selected the UI will display request settings that are used to determine the content of HTTP requests.

HTTP Method

The HTTP method that will be used to send requests.Available methods include:

  • GET
  • POST
  • PUT
  • DELETE
  • PATCH
Relative Path

The relative path is the second part of the URL that will be used to make the HTTP request.  It is combined with the base path from the connection manager to form the full path. The relative path can contain query string parameters, but they can also be added individually in the grid below.

Query String Parameters

Query string parameters that will be appended to the full path when sending HTTP requests.

Request Headers

Request headers that will be combined with the headers in the connection manager and included with HTTP requests.

Request Body

The body of the HTTP request.

Import WSDL

Since v7.0, XML Source component supports importing WSDL to generate SOAP request. Clicking the Import WSDL to load WSDL file and build your SOAP request.

SSIS XML Source - Import WSDL

WSDL

The WSDL specifies the location of the WSDL that you are trying to read from. This location can be either a location on the file system, or a URL. For example a value of: c:\wsdl\mathservice.wsdl or http://localhost:28854/MathService.svc?wsdl will both work. There is a button labelled "..." beside the source WSDL field that will launch a dialog let you select a file from the file system. If you select a WSDL file using the open file dialog provided by the "..." button the WSDL file will automatically be loaded upon clicking OK.

Load WSDL

The Load WSDL button will try to load a WSDL file from the location specified in the Source WSDL field, and it will populate the other fields in the editor with information from the loaded WSDL file. Use this button after you have entered the path of a WSDL file into the source WSDL field. If you select a WSDL file from the open file dialog provided by the "..." button the WSDL file will automatically be loaded upon clicking OK.

SOAP Version

The SOAP Version option indicates the SOAP version that the target web service uses. Our component will automatically detect the SOAP version while loading the WSDL file.

Service

Select a Service from the list of services contained in the web service.

Binding

Select a Binding from the list of bindings contained in the service.

Operation

Select an Operation from the list of operations contained in the binding.

Custom Header

Custom headers can be specified in the Custom Headers grid.

WS Addressing Headers

WS-Addressing Headers can be used to specify message routing data within SOAP request header per the target web service requires. The following options are available:

  • Unique Message ID (MessageID)
  • Message Destination (To): Message destination URI.
  • Source Endpoint (From): Source endpoint that dispatched this message
  • Reply Endpoint (ReplyTo): Reply endpoint to which reply messages should be dispatched
  • Fault Endpoint (FaultTo): Fault endpoint to which fault messages should be dispatched
  • Action (Action): Required action.
  • Relationship (RelatesTo): Message ID to which the message relates
Namespaces

Namespace can be specified in the Namespaces grid.

Generate Request/Response button

The Generate Request/Response button allows you to populate the SOAP request/response based on the latest changes you have made in above settings.

Request

The body of the SOAP request generated by above settings.

Response

The body of the SOAP response generated by above settings.

Import

The Import option allows you to determine which part is going to be imported to XML Source component. There are five options available:

  • Request Body
  • Response Body (Design Page)
  • Http Headers
  • Http Method
  • Nampespaces
Request Again Rule

The request again rules allows paged data to be retrieved. After each request the request again rule is evaluated and if it evaluates to 'True' another request will be sent.  If the rule evaluates to false or is left blank no more requests will be sent.

Max Identical

The max identical property will raise an error if a certain number of identical requests are sent. The purpose of this feature is to prevent accidentally sending the same request in a never ending loop. If 0 is selected the feature will be disabled and an unlimited of identical requests can be sent.

Preview Button

The preview button launch the Preview dialog which will send the first request based on the current settings of the component, and show a preview of the response. If there are more requests based on the value of the request again rule they can be sent and the response previewed by clicking the 'Next' button. The Preview dialog is a good way to identify HTTP configuration problems without running the package.

Import Settings Button

Click Import Settings button to import the configurations from a file.

Export Settings Button

Click Export Settings button to save the configurations of the component to a JSON file later for reuse.

Expression fx Icon

Click the blue fx icon to launch SSIS Expression Editor to enable dynamic update of the property at run time.

Generate Documentation Icon

Click the Generate Documentation icon to generate a Word document which describes the component's metadata including relevant mapping, and so on.

Expression Editor

Many of the properties on this page can be set as either a text merged field or an expression. Properties default to text merge mode, but can be changed to expression mode in the Expression Editor. The expression editor for a property can be launched by clicking the fx button. The expression editor has some predefined tokens:

RequestCount

The number of requests that have been sent. This will be 0 on the first request and incremented by 1 after each request.

ResponseBody

This is the body of the response from the previous request. On the first request is will be empty.

GetValueFromPreviousResponseHeaders

This is a function that is used to the value of HTTP headers from the previous response. This function is only available in expression mode.

ValueExistsFromPreviousResponseHeaders

This function returns a boolean specifying if a value of the header with a specified name exists from the response of the preceding request.

AccessToken

Returns the access token for sending requests. The token file needs to be generated by OAuth 2 Token Generated and specified in the HTTP Connection Manager's Authentication page.

Expression Editor Text Merge Mode

SSIS XML Source - Expression Editor Text Merge Mode

In text merge mode, property values will be whatever text was entered, but with any variables or special tokens replaced with their actual values before each HTTP request is sent.

Expression Mode

SSIS XML Source - Expression Editor Expression Mode

In expression mode the property is set to an expression which is evaluated before each request to determine the value of the field. Expressions can contain variables and special tokens, but also functions and conditional logic. The functions in the Value Lookup Functions category may be particularly useful for this component, as they can be used to look up values and Json documents and HTTP headers.

Design Page

This page allows you to import an example XML document or an XSD schema either from a local file, or using your HTTP settings. The imported document is used to generate the expected hierarchy of incoming XML documents. From there you can add and remove nodes manually to refine the structure.

SSIS XML Source - Design

Once you've added some nodes by either importing or manually adding, you may need to refine the structure by editing the node properties. The following properties will have an effect on the extract process:

  • XML Name: This is the name of the element or attribute as it is in the expected XML. Notice when this property is changed the XPath and Data Name properties are automatically updated.
  • Data Name: This is the name of the Output (green rows) or Column (yellow rows). Notice that some rows do not have a Data Name property. This is because these are nodes that define hierarchy and have no immediate value to extract. These nodes are important however because it affects the XPath of child nodes.
  • Type: There are 6 different XML node types to choose from:
    • Complex Element: This node contains other nodes of any other type. 
    • Repeating Complex Elements: This is a Complex Element that may repeat multiple times at the current XPath position.
    • Element: This node is a basic element containing a value.
    • Repeating Elements (Column): This node is a basic element containing a value that may repeat multiple times at the current XPath position. The output column will contain delimited text of all the values. You can change the qualifier and delimiter values of the delimited text in the Columns Page.
    • Repeating Elements (Output): This node is a basic element containing a value that may repeat multiple times at the current XPath position. This type will create a new output with a record for each occurrence of the element.
    • Attribute: This node is an attribute of the parent node.
  • Prefix: This contains a list of prefixes defined in the Namespaces table in the General Page, along with two special values:
    • <<Inherit>>: This node has the same prefix used by its parent. If there is no parent, then there is no prefix.
    • <<None>>: This node will use no prefix, even if its parent does.
  • XPath: This property is not editable but instead generated based on the above properties.

General Page

Configure Input and Output settings, along with defining namespaces.

SSIS XML Source - General

Null Mode

Specifies what represents a NULL value in the XML. There are three options:

  • Not Found: If an element cannot be found then the value is NULL.
  • Empty String: If an element cannot be found or is empty then the value is NULL.
  • xsi:nil: If an element cannot be found or the element has an attribute of "xsi:nil" set to "True" then the value is NULL.
Outputs

In the Outputs data grid, you can check and uncheck outputs to specify if you wish to have these outputs in your component. You can also configure the following properties:

  • Output Name: The name of the output
  • Key Field: This specifies the identifier column for this output. This is important for linking outputs with each other. By default, the Key Field is set to '_RowIndex' which is a special field that contains the current count of this output element. This is useful because many times an XML object will have no key field because relationships are defined hierarchically. During runtime, the value of this field will go into the _ParentKeyField field of all of this output's children. 
Namespaces

Add and remove namespaces (by clicking the '+' and '-' buttons) and assign them prefixes. Note, the prefix does not need to match that of the incoming XML. It is simply used as an alias in XPaths that use this namespace.

Columns Page

Configure column settings for each output.

SSIS XML Source - Columns

Select the output whose columns you wish to configure in the top left combo box. Select a column in the data grid on the left to populate its properties in the property grid on the right.

The Columns Page grid consists of:
  • Column Name: Column that will be retrieved.
  • Data Type: The data type of this field.
  • Properties window for the field listed
    • Column Type: Description of the selected column.
    • Name: Specify the column name.
    • CodePage: Specify the Code Page of the field.
    • Data type: The data type can be changed according.
    • Length: Specify the Length of the fields. If the data type specified is a string, the length specified here would be the maximum size. If the data type is not a string, the length will be ignored.
    • Precision: Specify the number of digits in a number.
    • Scale: Specify the number of digits to the right of the decimal point in a number.

There are a couple of special columns to take note of:

  • _RowIndex: This column contains the current count of this output element. 
  • _ParentKeyField: This column contains the value of this records parent key field.