Using the HDFS Destination Component

The HDFS Destination Component is an SSIS data flow pipeline component that can be used to write data to HDFS. You can create, delete, move, or append objects that allow a particular action with this component. There are three pages of configuration:

  • General
  • Columns
  • Error Handling

The General page is used to specify general settings for the HDFS Destination Component. The Columns page allows you to map the columns from upstream components to HDFS fields in the destination object. The Error Handling page allows you to specify how errors should be handled when they occur.

General Page

The General page allows you to specify general settings for the component.

HDFS Destination Editor

Connection Manager

The HDFS Destination Component requires a Hadoop connection. The Hadoop Connection Manager option will show all HDFS connection managers that have been created in the current SSIS package or project.

Storage Service Object:

This option allows you to specify the object you want to work with, whether it is a File or a Directory.

Action

The Action option allows you to specify how data should be written to HDFS. There are currently four (4) supported:

  • Create: Creates a new File or Directory.
  • Move: Moves a File or Directory.
  • Append (Available only for when working with Files): Adding new data at the end of a file.
  • Truncate (Available only for when working with Files): Removing data from the tail of the file.
  • Concat (Available only for when working with Files): Merging the data of files.
  • Create Symbolic Link: Creating a Link of a File or Directory.
  • Delete: Deletes a File or Directory.
Refresh Component Button

Clicking the Refresh Component button causes the component to retrieve the latest metadata and update each attribute to its most recent metadata.

Map Unmapped Fields Button

By clicking this button, the component will try to map any unmapped HDFS attributes by matching their names with the input columns from upstream components. This is useful when your source component has recently added more columns, in which case you can use this button to automatically establish the association between input columns and unmapped destination attributes.

Clear All Mappings Button

By clicking this button, the component will reset all your mappings in the destination component.

Expression fx Button

Clicking the fx button to launch SSIS Expression Editor to enable dynamic updates of the property at run time.

Generate Documentation Button

Clicking the Generate Documentation button to generate a Word document that describes the component's metadata including relevant mapping, and so on.

Columns Page

The Columns page of the HDFS Destination Component allows you to map the columns from upstream components to the HDFS destination fields.

The Columns page displays a grid that contains four columns as shown below.

HDFS Destination Editor

  • Input Column: You can select an input column from an upstream component for the corresponding HDFS field.
  • Destination Field: The HDFS field that you are writing data.
  • Data Type: This column indicates the type of value for the current field.
  • Unmap: This column can be used to unmap the field from the upstream input column, or otherwise it can be used to map the field to an upstream input column by matching its name if the field is not currently mapped.

Error Handling Page

The Error Handling page allows you to specify how errors should be handled when they happen.

HDFS Destination Editor

There are three options available.

  1. Fail on error
  2. Redirect rows to error output
  3. Ignore error

When the Redirect rows to error output option is selected, rows that failed to write to HDFS will be redirected to the 'Error Output' output of the Destination Component. As indicated in the screenshot below, the blue output connection represents rows that were successfully written, and the red 'Error Output' connection represents rows that were erroneous. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by HDFS or the component itself.

Error Output

Note: Use extra caution when selecting Ignore error option, since the component will remain silent for any errors that have occurred.