Parsing Advanced JSON Structures with Ease Using SSIS Productivity Pack

16 February 2021
KingswaySoft Team

Many applications these days offer a REST API to facilitate data extracting or processing. Many of those modern REST APIs use JavaScript Object Notation (JSON) format to serialize and deserialize data when exchanging data between the client and the server (essentially the service side). This provides some great interoperability between the server and clients on different platforms. In working with those REST APIs in an ETL process, it becomes imperatively important to have a proper strategy to work with such data structure in an efficient manner. Previously we have written a blog post that demonstrates How to Work with RESTful Endpoints in your SSIS ETL Process, in which we talked about a relatively simple JSON parsing scenario which utilizes the Import JSON (Web) function button in our JSON Source/Extract component to import the JSON document served by the API service endpoint lively, this method requires a very minimal effort and it can be used to work with most of the JSON structures with merely a few mouse clicks. Once the document structure has been imported, the component should be mostly ready for us. We have however come into some advanced JSON structure which requires some more tweaking in order to work for its perfect use case. In this blog post, we will look at a few more complex JSON structures that we have encountered in the past and show you how you can best process them using the KingswaySoft JSON Source/Extract components to extract the data in SSIS.

Example #1: Array values

In our first example, we will be dealing with JSON array values. Let’s take a look at the following sample JSON file:

{
	"id": 51,
	"name": "John Smith",
	"roles": [
		"admin",
		"contributor",
		"editor"
	]
}

The JSON structure consists of one object which has three properties: id, name and roles. The roles property is an array of string values.

For a JSON document like this, we can configure JSON Source/Extract component as the following:

As shown in the screenshot above, the roles property is set to Array node type with a Value type subnode. Note the Value type node is set to be a repeated node. For a JSON design like this, the array values will be outputted in delimited text format. 

The roles property can also be outputted as a separate output by setting the Output Type to Output on the roles Array node.

With a JSON design like the screenshot above, the JSON Source/Extract component generates two outputs. You may need to pay attention to two special columns - _RowIndex and _ParentKeyField:

  • _RowIndex: This column contains the current row index of this output, this field can be used as a key field when the node has no primary key field.
  • _ParentKeyField: This column contains the value of this records parent key field.

The _ParentKeyField column in the roles output indicates the relationship between parent and child rows.

Example #2: Multi-dimensional array

Multi-dimensional JSON array stores an array inside another JSON array. Given an example below is a sample JSON document with contacts multi-dimensional array.

{
	"name": "contacts",
	"contacts": [
		[
			"John",
			"Smith",
			"[email protected]"
		],
		[
			"Joe",
			"Doe",
			"[email protected]"
		]
	]
}

 The JSON Document Designer in JSON Source/Extract component can be configured like something below:

The contacts property is configured to Array node type and set its output type as an Output. The contacts array output contains an Array subnode of string values. As you can see from the screenshot above, we have named the three Value nodes with friendly columns names of firstname, lastname and email. Let's set up a dummy destination component (in our case, we used a Premium Derived Column component) and enable the Data Viewer to check the contacts output result. Note that this is one of the outputs created by the JSON source component, it works as a child output.

Example #3: Key-Value pairs

Now we will be getting into something that is even more interesting. Below is a sample JSON structure that contains some key-value pairs. The properties array consists of 3 objects and each object has 2 properties: name and value.

{
	"id": 51,
	"properties": [
		{
			"name": "firstname",
			"value": "John"
		},
		{
			"name": "lastname",
			"value": "Smith"
		},
		{
			"name": "email",
			"value": "[email protected]"
		}
	]
}

 For a JSON structure like this, we would like to pivot the properties values and use name as column name in the output. Based on that, the JSON Source/Extract component can be configured as below:

The properties node is set to Array node type with an output type of Key Value Pivot. The subnode Object contains two Value nodes – name and value, the name Value node is configured as the key field. When Key Value Pivot mode is used, we can define the pivot key values in Output Settings to let the JSON Source/Extract component to create output columns for each value.

Let’s check the JSON Source/Extract output result:

As you can observe from the above screenshot, we can see that the JSON Source/Extract Component is working according to our expectation and converted our source data to our required format.

Previously we had a blog post of Extracting Key/Value Pairs JSON Structure into Tabular Format which demonstrates key-value pairs JSON structure in mode details.

Example #4: Property Name as Column Value

The sample JSON document below shows is a JSON structure that id value is used as an object name.

{
	"contacts": {
		"51": {
			"firstname": "John",
			"lastname": "Smith",
			"email": "[email protected]"
		},
		"101": {
			"firstname": "Joe",
			"lastname": "Doe",
			"email": "[email protected]"
		}
	}
}

For this JSON structure, we want to have the object name outputted as a property at the same level as firstname, lastname and email. The JSON Source/Extract component can be configured like this:

The contacts node is an Object node type and set to output and an Output. Note that the subnode Object is configured to Property Name As Column Value output type and given a friendly name of id. The following screeshot shows the output results where we can find the component works correctly to output the JObject name as the column value.

Closing Note

In this blog post, we went through some relatively complex JSON designs and demonstrated how to work with such document structures in an efficient manner using the KingswaySoft JSON Source component. The same strategies can be used to work with the JSON Extract component as well which share the same infrastructure of JSON data processing in our SSIS Productivity Pack product. With a few tweaks of the components, we can achieve a proper reading purpose of such complex documents. Ultimately, our objective is to provide an ETL solution that can work with virtually any complex JSON documents with ease without having to write a single line of code in SSIS.

We hope you find this blog helpful.

comments powered by Disqus