How Do I Process JSON data

Background

JSON is the de-facto format for semi-structured data storage and interchange. The basic data types in JSON are number, string, boolean, object, and array.

 Objects and arrays allow you to have nested data structures. This makes JSON ideal for storing any data structure, but also requires extra work to handle.

JSON data can be stored as either of the following:

  •  files where each file contains a single object
  •  files that contain multiple objects delimited by newlines

 MongoDB documents are read as JSON objects. Many Rest API sources also return JSON objects, arrays, or line-delimited Jsons.

Processing JSON Objects in Integrate.io ETL - Tutorial

Integrate.io ETL allows you to process JSON objects and extract data from them in various ways.

Throughout this tutorial, we’ll be using the following JSON example (with indentation and line breaks to make it human-readable):

{
    "id": 152,
    "name": "Adam",
    "surname": "Thompson",
    "birthdate": "1980-05-02T00:00:00Z",
    "active": true,
    "billing_address": {
        "street": "780 California st.",
        "city": "San Francisco",
        "zipcode": "94103",
        "country": "USA"
    },
    "products": [
        {
            "product_id": 27
            ,"plan_id": 6
        },
        {
            "product_id": 22
            ,"plan_id": 7
        }
    ],
    "location_data":
    {
        "city": "Pittsburgh",
        "country": "USA",
        "coordinates":
        {
            "longitude": 80.0154,
            "latitude": 40.3964
        }
    }
}

Loading JSON Data from files

When loading JSON data from files, simply select the JSON record type and proceed to the next step to select the input fields.

Integrate.io ETL automatically detects fields at the first level of the JSON hierarchy. Complex types such as JSON objects or arrays are automatically transformed into Integrate.io ETL’s complex data types - map and bag respectively, which allows you to easily work with these fields as you will see below.

Integrate.io ETL also allows you to use JSONPath to traverse JSON objects in a dynamic fashion. In some cases, you may prefer to use JsonPath to extract data from nested objects and arrays at the first level of the hierarchy. To do this, change the data type at the source from map or bag to string.

Navigating objects

In our example, location_data is a JSON object at the first level of the hierarchy and is automatically transformed into a map. The following examples show how to extract field values from a map in a Select component:

location_data#‘city’ returns the string value 'Pittsburgh'

location_data#‘coordinates’ returns the nested JSON object as a string value '{"longitude": 80.0154,"latitude": 40.3964}', which you may also want to break into its elements. There are two ways to process deeper nested objects:

  1. You can transform the JSON string into a map again, and extract a field value:
JsonStringToMap(location_data#'coordinates') -> coordinates
coordinates#'longitude' -> longitude
  1. You can use JsonExtractScalar to extract a field value:
JsonExtractScalar(location_data#'coordinates','$.longitude')

Generally speaking, you should transform a JSON object to a map if you’re extracting multiple fields from it. Use JsonExtractScalar if you are only extracting a single field or a deeply nested field.

Sometimes, fields in an object are unknown and may be used as key-value arrays. In this case, you may want to extract the object’s key names and values and flatten the object:

billing_address = Flatten(MapToBag(billing_address))
key = billing_address#'key'
value = billing_address#'value'

Navigating arrays

JSON arrays at the first level of the hierarchy are transformed into bags. Bags are collections of data items that you can either transform to bag functions or flatten out to items using the Flatten operator so that each item in the array becomes a separate record.

If the JSON array items are JSON objects (and not scalar values like string, number, boolean), the result of flattening an array are JSON object strings from which you can extract data from again by using JsonStringToMap or JsonExtract* functions. For example:

product = Flatten(products)
product = JsonStringToMap(product)
product_id = product#'product_id'
plan_id = product#'plan_id'

JsonExtract* functions also allow you to extract data from specific items within the array. If the product field is defined as string instead of Json array, you can use the following JsonPath expression to get the first product id:

ToJson(products)
JsonExtractScalar
(products,'$.[0].product_id')

Checking if a string is a valid JSON

The JsonExtract* functions fail if the first argument is not a valid JSON. JsonStringToMap and JsonStringToBag are less stringent and simply return null if the argument is not a valid JSON. Apply the appropriate function to a string and compare it to null in a Filter component or in a CASE statement (within a Select component) in order to check if a field is a valid JSON.

Example 1:

CASE WHEN JsonStringToMap('{"id":1,"name":"John"}') is null THEN 'not valid' ELSE 'valid json object' END

returns ‘valid JSON object’

Example 2:

CASE WHEN JsonStringToMap('{"id":1,"name"') is null THEN 'not valid' ELSE 'valid json object' END

returns ‘not valid’

Example 3:

CASE WHEN JsonStringToBag('[1,2,3]') is null THEN 'not valid' ELSE 'valid json array' END

returns ‘valid JSON array’

  Example 4:

CASE WHEN JsonStringToBag('{"id":1,"name":"John"}') is null THEN 'not valid' ELSE 'valid json array' END

returns ‘not valid’ (because it’s not a json array).