Using the Azure Blob Storage Source Component

The Azure Blob Storage Source Component is an SSIS data flow pipeline component that can be used to read/retrieve data from Azure Blob Storage.

The component includes the following two pages to configure how you want to read data.

  • General
  • Columns

General Page

The General page of the Azure Blob Storage Source Component allows you to specify the general settings of the component.

SSIS Azure Blob Storage Source

Source Item Settings
Connection Manager

The Azure Blob Storage Source Component requires a connection in order to connect to Azure Blob Storage. The Connection Manager drop-down will show a list of all connection managers that are available to your current SSIS package.

Source Item Path

The Source Item Path specifies the location of the file or folder that you are trying to read from. Click the ellipsis button ('...') to open up a browser dialog to select an item.

Item Selection Mode

The Item Selection Mode settings specify what sub-items (if any) you wish to retrieve. The available modes are:

  • Selected Item: Retrieves only the item specified at Source Item Path
  • Recursive: Retrieves the selected item (specified by the Source Item Path option) and all sub items recursively.
  • Recursive (Files Only): Retrieves items the same as the Recursive mode but only returns files.
  • Selected Level (Files Only): Retrieves the selected item and all immediate files under the folder as specified by the Source Item Path option.
Misc
Page Size

The Page Size lets you specify the maximum number of blobs to return. The default is set to 1000.

Include Metadata

The Include Metadata lets you specify that blob metadata should be returned in the response.

Note: Azure Blob Storage supports custom metadata, represented as HTTP headers. When reading custom metadata from Azure Blob Storage, you need to add a new column on the Columns page so that the custom metadata value can be retrieved correctly. The column name should be exactly the same as the HTTP header name, which is usually in an x-ms-meta-{MetadataName} format. E.g. x-ms-meta-FileVersion.

Include Snapshots

The Include Snapshots lets you specify that snapshots should be included in the response.

Include Copy

The Include Copy lets you specify that metadata related to any current or previous Copy Blob operation should be included in the response.

Include Uncommitted Blobs

The Include Uncommitted Blobs lets you specify that blobs for which blocks have been uploaded, but which have not been committed, be included in the response.

Refresh Component Button

Clicking the Refresh Component button causes the component to retrieve the latest metadata and update each field to its most recent metadata.

Expression fx Button

Clicking the fx button to launch SSIS Expression Editor to enable dynamic updates of the property at run time.

Generate Documentation Button

Clicking the Generate Documentation button to generate a Word document that describes the component's metadata including relevant mapping, and so on.

Columns Page

The Columns page of the Azure Blob Storage Source component shows you all available attributes from the object that you specified on the General page.

SSIS Azure Blob Storage Source - Columns Page

On the top left of the grid, you can see a checkbox, which can be used to toggle the selection of all available fields. This is a productive way to check or uncheck all available fields.

The Columns Page grid consists of:

  • Azure Blob Storage Field: Column that will be retrieved from the current item (file or folder).
  • Data Type: The data type of this field. If a data type field is grey and looks like a button clicking on it will cycle through common data types for that field.
  • Properties window for the field listed. These values are configurable:
    • Name: Specify the Column name.
    • Data Type: The data type can be changed according.
    • Length: If the data type specified is a string, the length specified here would be the maximum size. If the data type is not a string, the length will be ignored.
    • Precision: Specify the number of digits in a number.
    • Scale: Specify the number of digits to the right of the decimal point in a number.
    • CodePage: Specify the Code Page of the field.

Note: As a general best practice, you should only select the fields that are needed for the downstream pipeline components.