Integrate.io ETL can access your Google Drive data. This article details creating the Google Drive connection in Integrate.io ETL.
To create a Google Drive connection in Integrate.io ETL:
- Click the Connections icon (lightning bolt) on the top left menu.
- To create a connection, click New connection.
- Choose Google Drive.
- Click Authenticate. (If required, sign in to Google Drive)
- Click Connect to authorize Integrate.io ETL access to your Google Drive account.
- In the new new Google Drive connection window, name the connection and click Create Google Drive connection.
To modify Google Drive connections in Integrate.io ETL:
- Click the Connections icon (lightning bolt) on the top left menu.
- Click a connection to re-connect if your password has changed or the token has expired.
-
To delete a Google Drive connection, click on the three vertical dots on the far right of the connection listing and select the Delete connection option.
To build a data pipeline in Integrate.io ETL to read files (.txt or .csv) stored in Google Drive:
REST API Component: get_file_ids
- The first call to the Google Drive API will be to search for the files you want to extract and get their ids.
- Authentication - Click Connection and then select your Google Drive connection. If you haven't created your Google Drive connection yet, click + New and follow the instructions found above.
- URL - Enter the URL for the Google Drive API: https://www.googleapis.com/drive/v3/files?q=$query_string. Make sure the method is set to GET. Replace the variable $query_string with your own URL encoded query string. The query string is how you pass your search parameters to the API in order to find the files you want to download. For other examples of query strings, see this doc.
- Response - Type $.files[*] in the Base record JSONPath Expression field.
- Input fields - Click + icon to the right of the 'id' field to move it over to the Selected fields. Then click Save.
Select Component: get_files
- The second call to the Google Drive API will be to pass those ids to the API and get the files that match them.
- Type this expression in the Expression Editor: CCurl(CONCAT('https://www.googleapis.com/drive/v3/files/', id, '?alt=media'), 'GET', '', '', '$google_drive_connection_id')#'body'. Replace the variable $google_drive_connection_id with your own unique Google Drive connection id string. This can be found in the Connections menu -- find your GoogleDrive connection and there will be a unique connection id string beneath the connection name. Click on it to copy it. (It will look something like this: GOOGLEDRIVE_CONNECTION_10818)
Select Component: flatten_split_records
- Pass the api_response_body field into the function STRSPLITTOBAG along with the newline character as the delimiter regex and the result of that into Flatten (see these docs for more information on the STRSPLITTOBAG or Flatten functions) - Flatten(STRSPLITTOBAG(api_response_body, '\n')). Click Save.
Filter Component: remove_header
- Select the NOT condition from the dropdown menu. Then select the line field from the dropdown menu on the Expression side, then select the Text matches (regex) in the Operator dropdown menu, and lastly type .*add_time.* in the Value/Expression field. This will remove the record which contains the column headers from the data set.
Select Component: split_line
- Pass the line field into the CSVSPLIT function with the comma selected as the delimiter regex (for more information on the CSVSPLIT function see this article) - CSVSPLIT(line, ','). Click Save.
Select Component: extract_fields
- After the previous component executes the line field is a tuple. You can parse the tuple into its individual items using the syntax field_name.$0, field_name.$1, field_name.$2, etc. You can see this in the examples of this doc. When you parse the tuple, the items are returned as byte array datatypes. Explicitly cast them to the correct datatype. In the template, all fields have been cast to chararray (string). If you prefer a different datatype, change chararray to one of the datatypes that Integrate.io supports. See this doc for supported datatypes and methods of casting and parsing. Last, add fields up to the number of columns in your schema by clicking the + icon to the right of any of the fields and rename the aliases according to your column names. For instance, if field_0 is the 'id' column, type 'id' in the alias for that field.
Destination Component: drive_data
- The template shows a Redshift destination component; however, if you'd prefer to use a different destination, delete the Redshift component and select a destination component of your choice.
- Choose target connection- Select your target connection. If you haven't created your connection yet, click+ New.
- Destination properties - Fill in the values for target schema and table, select an operation type, pre or post-action SQL, and advanced options.
-
Schema mapping - Click Auto-fill to bring in all of the fields. If you've selected a Merge operation type, click the Key box next to the merge key field(s). Click Save.