Using the CosmosDB Source Component

The CosmosDB Source Component is an SSIS data flow pipeline component that can be used to read / retrieve data from CosmosDB.

The component includes the following four pages to configure how you want to read data.

  • General.
  • Document Designer.
  • Columns.
  • Advanced.

General

The General page of the CosmosDB Source Component allows you to specify the general settings of the component. 

CosmosDB Source Editor

Connection Manager

The CosmosDB Source Component requires a connection in order to connect to a CosmosDB instance. The Connection Manager drop-down will show a list of all CosmosDB connection managers that are available to your current SSIS package.

Database

This option lists all the available Databases in the CosmosDB instance. After selecting the Database you wish to read from, the Collection drop down will be populated with the available Collections in the selected Database.

Collection

This option lists all the Collections available in the selected Database.

Partition Key

The Partition key is required when reading documents or attachments in a partitioned collection.

Advanced Settings

This option navigates to the Advanced Page of the CosmosDB Source Component.

Partition Key Range Id

Requests can be executed against specific partition key ranges. This is used to process the change feed in parallel across multiple consumers. Please note that Partition Key Range Id cannot be specified along with Partition Key.

Source Type

The Source Type option allows you to specify whether you want to read a Document, Attachment, or use the Document Change Feeds option.

  • Documents: Retrieves Data from Documents.
  • Document Change Feeds: Retrieves the captured changes in data from CosmosDB based on a token or a certain time.
  • Attachments: Retrieves attachments of a document resource.
Document Resource Id

This option allows you to specify the Id associated with the resource you are trying to retrieve (Only available when working with Attachments Source Type).

Input Variable Type

This option allows you to specify how you want to retrieve Document Change Feeds (Only available when working with Document Change Feeds Source Type).

  • Start Time: Retrieves the captured changes in data from CosmosDB based on a date time.
  • Continuation Token: Retrieves the captured changes in data from CosmosDB based on token.
Input Variable

This option lists all the available parameters or user variables in your package which will hold your Input for the Document Change Feeds operation (Only available when working with Document Change Feeds Source Type).

Output Variable

This option lists all the available parameters or user variables in your package which will hold the output of the Document Change Feeds operation (Only available when working with Document Change Feeds Source Type).

Query

This textbox allows you to specify a query in order to retrieve and filter your data from CosmosDB.This option is only available for Documents and Attachments Source Types.

Expression fx Icon

Click the blue fx icon to launch SSIS Expression Editor to enable dynamic update of the property at run time.

Generate Documentation Icon

Click the Generate Documentation icon to generate a Word document which describes the component's metadata including relevant mapping, and so on.

Document Designer

The Document Designer page allows you to build the design of the document you are trying to read, or import the design from an existing document. This page is only available for Document and Document Change Feeds Source Types.

CosmosDB Source Editor

The Document Designer includes the following two tabs:

  • Details View.
  • Additional Settings.

In the Details View tab, the top part of the page is used to manually configure the nodes in the design:

  • Add Node: This button will add a new node to your Document design.
  • Remove Nodes: This button will remove a node from your Document design.
  • Direction buttons: These buttons can be used to rearrange the position of the nodes.
  • Rename Nodes: This option allows you to specify how the node name should be represented.
    • Use Qualified Names: When this option is selected, the output/column name will be set to the full qualified node name based on the node location in the document.
    • Use Short Names: When this option is selected, the output/column name will be set to the given Node Name directly.
  • Filter Columns: This option allows you to show or hide certain Columns in the grid.
    • Show Basic Columns: When this option is selected, only basic columns will be shown in the grid.
    • Show All Columns: When this option is selected, all available columns will be shown in the grid.
  • Filter Nodes: This option allows you to filter the list of nodes shown in the grid by typing a keyword in the textbox.

The Details View grid consists of:

  • Node Type: This options allows specify the type of the Node in your document design, There are four options available:
    • Array.
    • Object
    • Value.
    • Raw: This type can be used when trying to retrieve data under a node exactly as it is in the document.
  • Node Name : The Name of the Node in the document.
  • Output/Column Name: The name which will be set for the output or the column of a node.
  • Is Repeated: This option allows you to specify if a node is repeated within a document. (Available when Show All Columns is selected)
  • Output type: The type of output for a node such as a Column or a Secondary Output depending on the Node Type.
  • Output Settings: This option allows you to specify the settings of each output such as the datatype of Value Node Types.

In the Additional Settings tab, you would find the following options:

  • Null Mode: This option allows you to specify the handling of Null values.
  • 'Is Repeated' Text Qualifier:  This option allows you to specify the  Text Qualifier used in a document when the  Is Repeated property is set to True for one or more node. There are four options available:
    • Double-quote(“).
    • Single-quote (‘).
    • Tick (`).
    • None.
  • 'Is Repeated' Text Delimiter: This option allows you to specify the  Text Delimiter used in a document when the  Is Repeated property is set to True for one or more node. There are seven options available:
    • Newline (\n).
    • Carriage Return (\r).
    • Semicolon (;).
    • Colon (:).
    • Comma (,).
    • Tab (\t).
    • Vertical Bar (|).
Import

This option allows you to import the design of your document from one of the following four sources:

  • Designer Settings: Import the design from an existing .designer.settings file.
  • Document (CosmosDB): Import the design based on the retrieved document from the connection manager.
  • JSON (Local File): Import the design based on a JSON file on your local file system.
  • JSON Schema (Local File): Import the design based on a JSON Schema file on your local file system.
CosmosDB Document Importer

When selecting the Document (CosmosDB) Import option, the CosmosDB Document Importer window will open which allows you to specify a query which will set the design of the Source Component based on the retrieved document.

CosmosDB Source Editor
  • Documents to scan: This option allows you to specify the maximum number of retrieved documents which will be used to set the design of the Source Component. Setting this option to 0 will read all the retrieved documents.
Export

Designer Settings: This option allows you to export the current document design to a .designer.settings file which can be used later to import the same design in a different component.

Columns Page

The Columns page of the CosmosDB Source Component shows you the available columns based on the settings in the Document Designer page.

CosmosDB Source Editor

On the top left of the grid, you can see a checkbox, which can be used to toggle the selection of all available fields. This is a productive way to check or uncheck all available fields. The Columns Page grid consists of:

  • Include Field Checkbox: A checkbox that determines if the field will be available as an output column.
  • Column Name: Column that will be retrieved from the document.
  • Data Type: The data type of this field. 
Hide Unselected Fields

When the Hide Unselected Fields checkbox is checked unselected output columns will be hidden.

Hide Selected Fields

When the Hide Selected Fields checkbox is checked used selected columns will be hidden.

Filter

The output columns that are visible can be filtered by entering text in the Filter text box.

Note: As a general best practice, you should only select the fields that are needed for the downstream pipeline components. Do this in the columns page using the checkboxes or in the General page by removing the column from the command entirely.

Advanced Page

The Advanced page of the CosmosDB Source Component shows you additional options when retrieving data from CosmosDB.

CosmosDB Source Editor

Consistency Level

You can choose from the dropdown the type of consistency level required for the (query/read feed) operation. Available options are:

  • Null (Default).
  • Strong.
  • Bounded Staleness.
  • Session.
  • Eventful.
  • Consistent Prefix.
Disable RU/Minute Usage

This option can be used to enable/disable Request Units(RUs)/minute capacity to serve the query if regular provisioned RUs/second is exhausted.

Enable Cross Partition Query

This option can enable the user to send more than one request to execute the query in CosmosDB service.

Enable Low Precision Order By

This option can be used to enable low precision order by in the CosmosDB service.

Enable Scan In Query

This option can be used to enable scans on the queries which couldn't be served as indexing was opted out on the requested paths.

Max Buffered Item Count

The maximum number of items that can be buffered client side during parallel query execution in CosmosDB service.

Max Degree Of Parallelism

The number of concurrent operations run client side during parallel query execution in CosmosDB service.

Max Item Count

The maximum number of items to be returned in the enumeration operation in the CosmosDB service.

Populate Query Metrics

Enable or disable the request option for document query requests in the CosmosDB service.

Session Token

The session token for use with session consistency in the CosmosDB service.