Using the XML Extract Component

The XML Extract Component is a transformation component used to extract column data from XML documents. Configure your outputs by importing an example XML, Importing an XSD schema, or manually adding/removing nodes. From there just configure the columns for each output and the component is ready to extract data from XML documents from the Input.

Design Page

This page allows you to import an example XML document or an XSD schema to generate the expected hierarchy of incoming XML documents. From there you can add and remove nodes manually to refine the structure.

XML Extract Editor - Design Page

Once you've added some nodes by either importing or manually adding, you may need to refine the structure by editing the node properties. The following properties will have an effect on the extract process:

  • XML Name - This is the name of the element or attribute as it is in the expected XML. Notice when this property is changed the XPath and Data Name properties are automatically updated.
  • Data Name - This is the name of the Output (green rows) or Column (yellow rows). Notice that some rows do not have a Data Name property. This is because these are nodes that define hierarchy and have no immediate value to extract. These nodes are important however because it affects the XPath of child nodes.
  • Type - There are 6 different XML node types to choose from:
    • Complex Element - This node contains other nodes of any other type. 
    • Repeating Complex Elements  - This is a Complex Element that may repeat multiple times at the current XPath position.
    • Element - This node is a basic element containing a value.
    • Repeating Elements (Column)  - This node is a basic element containing a value that may repeat multiple times at the current XPath position. The output column will contain delimited text of all the values. You can change the qualifier and delimiter values of the delimited text in the Columns Page.
    • Repeating Elements (Output)  - This node is a basic element containing a value that may repeat multiple times at the current XPath position. This type will create a new output with a record for each occurrence of the element.
    • Attribute - This node is an attribute of the parent node.

  • Prefix - This contains a list of prefixes defined in the Namespaces table in the General Page, along with two special values:
      • <<Inherit>> - This node has the same prefix used by its parent. If there is no parent, then there is no prefix.
      • <<None>> - This node will use no prefix, even if its parent does.
  • XPath - This property is not editable but instead generated based on the above properties.
General Page

Configure Input and Output settings, along with defining namespaces.

XML Extract Editor - General Page

XML Column

Specifies the input column that contains the XML to extract.

Null Mode

Specifies what represents a NULL value in the XML. There are three options:

  • Not Found - If an element cannot be found then the value is NULL.
  • Empty String - If an element cannot be found or is empty then the value is NULL.
  • xsi:nil - If an element cannot be found or the element has an attribute of "xsi:nil" set to "True" then the value is NULL.

In the Outputs data grid, you can check and uncheck outputs to specify if you wish to have these outputs in your component. You can also configure the following properties:

  • Output Name - The name of the output
  • Key Field - This specifies the identifier column for this output. This is important for linking outputs with each other. By default, the Key Field is set to '_RowIndex' which is a special field that contains the current count of this output element. This is useful because many times an XML object will have no key field because relationships are defined hierarchically. During runtime, the value of this field will go into the _ParentKeyField field of all of this output's children. 

Add and remove namespaces (by clicking the '+' and '-' buttons) and assign them prefixes. Note, the prefix does not need to match that of the incoming XML. It is simply used as an alias in XPaths that use this namespace.

Columns Page

Configure column settings for each output.

XML Extract Editor - Columns Page

Select the output whose columns you wish to configure in the top left combo box. Select a column in the data grid on the left to populate its properties in the property grid on the right. There are a couple of special columns to take note of:

  • _RowIndex - This column contains the current count of this output element. 
  • _ParentKeyField - This column contains the value of this records parent key field.
Error Handling

This page allows you to specify how errors should be handled when they happen.

XML Extract Editor - Error Handling Page

There are three options available. 
  1. Fail on error
  2. Redirect rows to error output
  3. Ignore error

When the Redirect rows to error output option is selected, rows that failed to be sent will be redirected to the 'Error Output' output of the Transformation Component. As indicated in the screenshot below, the blue output connection represents rows that were successfully sent, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by the component.

XML Extract Editor - Error Output

  • Element - This node is a basic element containing a value.