Using the Azure Data Lake Storage Connection Manager

The Azure Data Lake Storage Connection Manager is an SSIS connection manager component that can be used to establish connections with Azure Data Lake Storage (Gen1 / Gen2).

To add an Azure Data Lake Storage connection to your SSIS package, right-click the Connection Manager area in your Visual Studio project, and choose "New Connection..." from the context menu. You will be prompted the "Add SSIS Connection Manager" window. Select the "Azure Data Lake Storage" item to add the new Azure Data Lake Storage Connection Manager.

New Connection

Add Azure Data Lake Storage Connection

The Azure Data Lake Storage Connection Manager contains the following two pages which configure how you want to connect to Azure Data Lake Storage.

  • General
  • Advanced Settings

General Page

The General page on the Azure Data Lake Storage Connection Manager allows you to specify general settings for the connection.

Azure Data Lake Storage Connection Manager

Authentication
Azure Data Lake Storage

This option allows you to select the type of Azure Data Lake Storage you are trying to connect to. Available options are:

  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
Storage Endpoints Domain (Available only for Azure Data Lake Storage Gen2)

This option allows you to specify the location of the Azure Data Lake Storage Gen2 account you are trying to connect to. Available options are:

  • Azure
  • Azure China
  • Azure Germany
  • US Government
  • Other (enter below)
Storage Account

This option allows you to specify the name of the Azure Data Lake Storage account you are trying to connect to

Authentication Mode

This option allows you to select the type of authentication you want to use in order to connect to your Azure Data Lake Storage instance. Available options are:

  • OAuth Authorization Code
  • OAuth Client Credentials (service-to-service authentication)
  • Shared Key (Available only for Azure Data Lake Storage Gen2)
  • SAS Token
OAuth Authorization Code
Get Token

This button completes the entire OAuth authentication process inside of the toolkit. All you need to do is log in to the service endpoint and authorize our app to generate your token.

Azure Data Lake Storage Connection Manager

Tenant Id

The Tenant Id option allows you to specify the unique ID which identifies the tenant you are connecting to.

Client Id

The Client Id option allows you to specify the unique ID which identifies the application making the request.

Client Secret

The Client Secret option allows you to specify the client secret belonging to your app.

Redirect Url

The Redirect Url option allows you to specify the Redirect Url to complete the authentication process.

Generate Token (In App)...

The Generate Token File (In App)... button completes the entire OAuth authentication process inside of the toolkit. All you need to do is log in to the service endpoint and authorize our app to generate your token.

Generate Token (In Browser)...

The Generate Token File (In Browser)... button completes the OAuth authentication using your default browser. After you click this button simply follow the steps in the dialog to generate your token.

Path to Token File

The path to the token file on the file system.

Token File Password

The password to the token file.

OAuth Client Credentials (service-to-service authentication):
Tenant Id

The Tenant Id option allows you to specify the unique ID which identifies the tenant you are connecting to.

Client Id

The Client Id option allows you to specify the unique ID which identifies the application making the request.

Client Secret

The Client Secret option allows you to specify the client secret belonging to your app.

SAS URL (For SAS Token authentication)

Azure Data Lake Storage Gen2 APIs support authorization with SAS URL for working with SAS Token.

Shared Key (Available only for Azure Data Lake Storage Gen2)

Azure Data Lake Storage Gen2 APIs support authorization with an Azure Storage Shared Key which can be specified using this option.

Misc
Bulk Download Behavior(since v23.1)  (Available only for Azure Data Lake Storage Gen2)

The Bulk Download Behavior can be chosen from the below three options

  • Allow external file modifications

  • Fail on external file modifications

  • Prevent external file modifications

Upload Chunk Size (in MB)

The Upload Chunk Size option allows you to specify the size of the file content to be divided to upload large files sequentially.

Download Chunk Size (in MB) (since v21.1)

Specify the Chunk Size to download large files from Azure Data Lake Storage in parts. 

Timeout (secs)

The Timeout (secs) option allows you to specify a timeout value in seconds for the connection. The default value is 120 seconds.

Retry on Intermittent Errors

This is an option designed to help recover from possible intermittent outages or disruption of service so the integration does not have to be stopped because of such temporary issues. Enabling this option will allow service calls to be retried upon certain types of failure. A service call may be retried up to 3 times before an exception is fired. Retries occur after 0 seconds, 15 seconds, and 60 seconds. Warning: although we have carefully designed this feature so that such retries should only happen when it is deemed to be safe to do so, in some extreme occasions, such retried service calls could result in the creation of duplicate data.

Test Connection

After all the connection information has been provided, you may click the Test Connection button to test if the connection settings entered are valid.

Advanced Settings Page

The Advanced Settings page on the Azure Data Lake Storage Connection Manager allows you to specify some advanced and optional settings for the connection.

Azure Data Lake Storage Connection Manager

Proxy Server Settings
Proxy Mode

The Proxy Mode option allows you to specify how you want to configure the proxy server setting. There are three options available.

  • No Proxy
  • Auto-detect (Using system configured proxy)
  • Manual
Proxy Server

Using the Proxy Server option allows you to specify the name of the proxy server for the connection.

Port

The Port option allows you to specify the port number of the proxy server for the connection.

Username (Proxy Server Authentication)

The Username option (under Proxy Server Authentication) allows you to specify the proxy user account.

Password (Proxy Server Authentication)

The Password option (under Proxy Server Authentication) allows you to specify the proxy user's password.

Note: The Proxy Password is not included in the connection manager's ConnectionString property by default. This is by design for security reasons. However, you can include it in your ConnectionString if you want to parameterize your connection manager. The format would be ProxyPassword=myProxyPassword; (make sure you have a semicolon as the last character). It can be anywhere in the ConnectionString.