API Ingestion Guide with Integrate.io

Introduction

Integrate.io’s universal REST API source component gives you the ability to access data from an infinite number of applications and systems. This guide will walk you through the process of accessing your API data. We will start by reading the documentation for the API you are trying to extract data from, finding the pertinent information you need, and showing you how to configure the REST API source component with that information.

Steps for accessing API data

  1. Find the API documentation. Many APIs have their documentation publicly available on the internet, although occasionally some will only be accessible to users after logging in to the platform/console. If you can’t find it, reach out to the customer support of your application.

  2. Seek out the particular information and credentials you need. We need a few pieces of information from the documentation. They can typically be found in an Authentication section and a section that describes the different types of data that are available from the API and the specific locations where that data is available (i.e. endpoints) along with instructions for how to ask for the data:

    • How does the API require us to pass our authentication credentials?

    • What is the URL of the API endpoint (i.e. the specific location) we want to retrieve data from? For reference, the URL for an API endpoint is typically comprised of the base API URL along with the endpoint. In our Stripe API example below, the base URL is https://api.stripe.com and the endpoint we're going to use is /v1/customers. The full URL of the API endpoint we're going to access is https://api.stripe.com/v1/customers.

    • What method does that endpoint use? (Such as GET, POST, PUT, etc)

    • Does the API require any headers, query parameters, or body? 

    • How is the data returned?

  3. Configure the REST API source component inside an Integrate.io ETL package. See the following examples for instructions.


Ingesting API data in Integrate.io ETL - Tutorial

The following three sample API calls build upon one another in complexity while demonstrating the flexibility of the REST API source component and covering the majority of the types of APIs that Integrate.io customers call within the platform. We will refer to the steps above as we walk through the process of accessing the data from each of the following APIs.

#1 Stripe API - GET method with an API key

My company uses Stripe for billing and I’m trying to ingest that data into my data warehouse for better visibility for my Customer Success team. I search for “Stripe API documentation” and find it at stripe.com/docs/api. I see “Authentication” in the left sidebar. This section of the documentation gives us the information I need. It explains that Stripe uses an API key for authentication, and that there are two ways you can pass that API key to the API:

  1. Using Basic Authentication it is the username and no password is required.

  2. Or you can pass it as the token in an Authorization header like this: Authorization : Bearer <your API key>

thumbnail image

Next, I need to find the particular object/table where the data I want is stored. I’m interested in customer data and I see Customers in the left sidebar. When I click on it, it expands to show me actions that are available with this data and takes me to this page where I can see the endpoints that are associated with those actions:

thumbnail image

I want a list of all the customers. I can find that if I scroll down:

thumbnail image

This section of the documentation will give me the rest of the information I need:

  • The URL is https://api.stripe.com/v1/customers

  • The method is GET

  • There are no required parameters or headers (they would show on the right hand dark gray box in the Curl function as -h). All of the parameters that are listed on the left are optional. 

  • The data from the API is returned in a JSON object (indicated by the curly brackets). However, you can verify that in the Introduction section of the Stripe API documentation as well.

Now I can configure the REST API source component in a dataflow package within Integrate.io. If I’m going to pass the API key as the username in Basic Authentication, I select “Basic” in Step 1 of the REST API component and place the API key in the username field, leaving the password field blank:

thumbnail image

Alternatively, if I’m going to pass the API key as the token in the Authorization header I select “None” in Step 1 and click “Add” below the word Headers in Step 2. Then I fill in “Authorization” on the key side and “Bearer <my API key> on the value side. This is also where I’ll place the URL and select GET in the method dropdown:

thumbnail image

Then I scroll down to the Response section, select JSON since I know that’s how my API will return its data, and click Next.

thumbnail image

I can see the four fields being returned by the API. In the Data Preview, I can see the customer data that I’m most interested in is inside the data bag (i.e. array).

thumbnail image

I go back to Step 2 and edit the Base record JSONPathExpression selection from Object to Custom. The component automatically adds $.data[*] to the field. This path tells the component to parse all the fields inside the data bag/array. Click Next.

thumbnail image

The component is now parsing the customer fields that I want. I click Select All and Save the component. I have completed the configuration of the REST API source component to ingest Stripe API data.

thumbnail image

#2 Pendo API - POST method with an API key

In this next example, I need to extract data from Pendo. I search for “Pendo API documentation” on the internet and find it at https://engageapi.pendo.io. I see “Getting Started” in the left sidebar. This section of the documentation tells me that I need an Integration Key in order to access the API and walks me through the steps I need to follow to get one.

thumbnail image

Next, I know I want aggregation data so I click on “Aggregation” in the sidebar. I see “Aggregation Endpoint Examples” so I click on that and look at the “Accounts - Today” example:

thumbnail image

This section of the documentation will give me the rest of the information I need. It tells me:

  • The API key is passed in a X-Pendo-Integration-Key header.

  • The URL is https://app.pendo.io/api/v1/aggregation

  • The method is POST. A POST method means we are sending data to the API. This will mean that you are sending a Body field and a Content-type header to tell the API what format to expect the data in the Body field to take. Common types are: 

    • JSON which requires an application/json content-type header 

    • Url-encoded which requires an application/x-www-form-urlencoded content-type header

  • There are two required headers and a body field:

    • X-Pendo-Integration-Key : <PENDO_INTEGRATION_KEY>

    • Content-type : application/json

    • You can click on View More to see the entire Body field of this example call. 

  • Within the Body field, we are passing a request to return the data in JSON format.

I can see the Curl function in the dark gray box on the right and how each of those pieces of information is being passed to the API.

Now I can configure the REST API source component in a dataflow package within Integrate.io.

thumbnail image

After configuring this part of the component, the rest of the steps are the same as in the first example with the Stripe API.

#3 Dataloop API - OAuth2.0 with an expiring JWT

In this example, I need to extract data from the Dataloop API. I search for “Dataloop API documentation” on the internet and find it at https://dataloop.ai/docs/rest-api-connection. The Dataloop API documentation opens right up to API Authentication. I read that for a M2M (machine to machine) login from an external system (like Integrate.io ETL) the API requires a JWT (JSON web token) to be passed with every request for data. The documentation tells me that I need an Integration Key in order to access the API and walks me through the steps I need to follow to get one.

thumbnail image

Reading on I find out that I need to pass my username and password to the API in order to get a JWT, but the JWT will expire in 24 hours.  

thumbnail image

Assuming a successful response, the body of the API will contain an access token, id token, and refresh token. I will need to pass the id token.

thumbnail image

This means that I need to call the token endpoint before I can make a request for data. Given these requirements, the setup for ingesting from this API will involve a couple of additional steps compared to the first two examples.

OAuth2.0 flows like this one that the Dataloop API uses are becoming increasingly common. OAuth 2.0 uses Access Tokens. An Access Token is a piece of data that represents the authorization to access resources on behalf of the end-user. At a high level, this process follows this flow:

  1. The Client requests authorization (authorization request) from the Authorization server, supplying the client id and secret as identification and defining the grant type. Grants are the set of steps a Client has to perform to get resource access authorization. A system like Integrate.io ETL platform will often use the Client Credentials grant type which is used for non-interactive applications e.g., automated processes, microservices, etc.

  2. The Authorization server authenticates the Client and responds with either an Authorization Code or Access Token, depending on the grant type. With the Client Credentials grant type it will be an access token.

  3. With the Access Token, the Client requests access to the resource from the Resource server.

For more information on how OAuth2.0 works, see this article.

Now we will configure this in Integrate.io ETL: In order to facilitate calling the authorization endpoint before making a request for the data, I will use a Curl function in the variables section. When a job runs, the variables evaluate first. Thus, I will make a call to the authorization endpoint, retrieve my access token, and then pass the token variable into the dataflow which executes next. Let me show you how.

First I open the variables section (the button with the three dots on it in the upper right section of the dashboard.) I create a package variable to hold each of the credentials I will be passing to the authorization endpoint. For instance, in our Dataloop example above^ I need to pass in a username, password, and type. Keep in mind that these values all need to be string data types in order to be passed into the Curl function so enclose them in single quotes. Next, I will create a variable for the URL and the Content-type header. The header must be formatted as JSON so I follow this syntax: {“key”:”value”}

thumbnail image

Next I will create a variable named “body” that will hold the JSON body field with our credentials that we’re sending to the authorization endpoint. After naming the variable, I click on the pencil and paper icon in the Expression field to open the Expression Editor. 

thumbnail image

I will use a CONCAT function to combine the literal string parts with the variables I’m using to store the credentials. Make sure you remember to enclose each literal string in single quotes and separate the arguments with commas. See this doc for more information on the CONCAT function.

thumbnail image

Next I will create the variable where I’ll write the Curl function to call the authorization endpoint. You can see the syntax for this function in the tool tip. The username and password parameters mentioned here are for Basic authentication so I won’t be using them. The only two required parameters are the URL and method. The Curl function will return a map that includes headers, body, and status. After the closing parentheses of the function you’ll see #’body’. That syntax tells the function that I only want the body of the API response. See this doc for more information on the Curl function. 

thumbnail image

I will create one more variable to parse out the id_token from the body of the API response. The documentation above^ tells me it will come back in a JSON object with access token, id token, and refresh token. I will use the JsonExtractScalar function to parse out the id_token. See this doc for more information on the JsonExtractScalar function. 

thumbnail image
Here are the eight variables I created. Keep in mind that the variables evaluate from top to bottom so I can’t use a variable in a subsequent step if I haven’t declared it above.

thumbnail image

Now that I have my id_token I’m ready to make a request for data. I will use this sample request described here in the API documentation:

thumbnail image

Finally I can configure the REST API source component and call the id_token variable. The $ is used in the Integrate.io ETL platform to indicate that something is a variable.

thumbnail image

When the job runs, the variables will evaluate and by the time this component in the dataflow executes, that id_token variable will be replaced with the id_token value. However, while you are setting up the REST API source component you need it to call the API and return the data in order to be able to complete the setup and save the component. Place a hardcoded token value in place of the $id_token so that the component will load. Then once you are finished building the package, go back and replace the hardcoded value with the variable name and re-save.

Conclusion

We hope that after reading through this guide you have a better idea of how to set up your dataflow packages to access data from your applications and systems via their REST APIs. If you still have questions, don't hesitate to reach out to our fantastic support team via the live chat in the platform. We're here to help!