The XML Extract Component is a transformation component used to extract column data from XML documents. Configure your outputs by importing an example XML, Importing an XSD schema, or manually adding/removing nodes. From there just configure the columns for each output and the component is ready to extract data from XML documents from the Input.
This page allows you to import an example XML document or an XSD schema to generate the expected hierarchy of incoming XML documents. From there you can add and remove nodes manually to refine the structure.
Once you've added some nodes by either importing or manually adding, you may need to refine the structure by editing the node properties. The following properties will have an effect on the extract process:
- XML Name - This is the name of the element or attribute as it is in the expected XML. Notice when this property is changed the XPath and Data Name properties are automatically updated.
- Data Name - This is the name of the Output (green rows) or Column (yellow rows). Notice that some rows do not have a Data Name property. This is because these are nodes that define hierarchy and have no immediate value to extract. These nodes are important however because it affects the XPath of child nodes.
Type - There are 5 different XML node types to choose from:
- CollectionElement - This node contains a list of elements. Notice the XPath of the child elements will contain a '[n]' at the end. This means this element repeats. This node can only have one non-attribute child.
- ComplexElement - This node often just defines hierarchy and is simply a node that contains nodes.
- ComplexElements - This is a ComplexElement that will repeat multiple times at the current XPath position (Notice the '[n]' at the end of the XPath to indicate this).
- Element - This node is a basic element with containing a value.
- Attribute - This node is an attribute of the parent node.
Prefix - This contains a list of prefixes defined in the Namespaces table in the General Page, along with two special values:
- <<Inherit>> - This node has the same prefix used by its parent. If there is no parent, then there is no prefix.
- <<None>> - This node will use no prefix, even if its parent does.
- XPath - This property is not editable but instead generated based on the above properties.
Configure Input and Output settings, along with defining namespaces.
Specifies the input column that contains the XML to extract.
- Null Mode
Specifies what represents a NULL value in the XML. There are three options:
- Not Found - If an element cannot be found then the value is NULL.
- Empty String - If an element cannot be found or is empty then the value is NULL.
- xsi:nil - If an element cannot be found or the element has an attribute of "xsi:nil" set to "True" then the value is NULL.
In the Outputs data grid, you can check and uncheck outputs to specify if you wish to have these outputs in your component. You can also configure the following properties:
- Output Name - The name of the output
- Key Field - This specifies the identifier column for this output. This is important for linking outputs with each other. By default, the Key Field is set to '_RowIndex' which is a special field that contains the current count of this output element. This is useful because many times an XML object will have no key field because relationships are defined hierarchically. During runtime, the value of this field will go into the _ParentKeyField field of all of this output's children.
Add and remove namespaces (by clicking the '+' and '-' buttons) and assign them prefixes. Note, the prefix does not need to match that of the incoming XML. It is simply used as an alias in XPaths that use this namespace.
Configure column settings for each output.
Select the output whose columns you wish to configure in the top left combo box. Select a column in the data grid on the left to populate its properties in the property grid on the right. There are a couple of special columns to take note of:
- _RowIndex - This column contains the current count of this output element.
- _ParentKeyField - This column contains the value of this records parent key field.
This page allows you to specify how errors should be handled when they happen.
- There are three options available.
- Fail on error
- Redirect rows to error output
- Ignore error
When the Redirect rows to error output option is selected, rows that failed to be sent will be redirected to the 'Error Output' output of the Transformation Component. As indicated in the screenshot below, the blue output connection represents rows that were successfully sent, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by the component.