Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.integrate.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Integrate.io uses BigQuery’s streaming insert API to ingest data into BigQuery. This requires billing to be enabled on the Google Cloud project.

Requirements

  • Billing must be enabled on the Google Cloud project.
  • Ensure you have permission to create a service account with the BigQuery Data Editor role on the project.

Setting up BigQuery for ELT & CDC

Create service account

Creating a service account for FlyData to use requires the BigQuery Data Editor role.
1
Go to your project’s service accounts page and click + CREATE SERVICE ACCOUNT.
Google Cloud service accounts page with Create Service Account button
2
Enter the Service account name and add a description.
Entering service account name and description
Click CREATE AND CONTINUE.
3
Select the BigQuery Data Editor role.
Selecting the BigQuery Data Editor role for the service account
4
Select the BigQuery Job User role.
Click CONTINUE.
5
Click DONE.
6
The service account should have been created successfully. Find it in the service accounts list and click it.
Service accounts list showing the newly created account
7
Go to the KEYS tab.
8
Click ADD KEY and select CREATE NEW KEY.
Keys tab with Add Key dropdown showing Create New Key option
9
Select JSON and click CREATE.
Selecting JSON key type for the service account credentials
10
A JSON file should have been downloaded to your system. Upload it.We will fetch the client_email and private_key properties from the JSON credential file.

Limitations

BigQuery doesn’t support deduplication. Instead, the following columns are added to the destination table for future deduplication on the customer’s side. __offset - Offset column. Keeps track of the offset in incremental order to determine which existing row is the latest.
__op - Operation column. Tells what kind of operation is performed for a particular row (0 - Created, 1 - Inserted, 2 - Updated, 3 - Deleted)
This query is to list the de-duplicated records:
SELECT customer.*
FROM (SELECT id,
             ARRAY_AGG(
                     t ORDER BY t.__offset DESC LIMIT 1
                 )[OFFSET(0)] customer
      FROM \`euphoric-effect-296205.northwind.customer\` t
      GROUP BY id);
This query is to upsert a staging table that Integrate.io is syncing to, into a production table used for querying:
MERGE INTO \`northwind\`.customers o
    USING (SELECT event.*
           FROM (SELECT customer_id,
                        ARRAY_AGG(
                                t ORDER BY t.__offset DESC LIMIT 1
                            )[OFFSET(0)] event
                 FROM customers_staging_1 t
                 GROUP BY customer_id)) as s
    ON o.customer_id = s.customer_id
    WHEN MATCHED AND s.__op = 3 THEN
        DELETE
    WHEN MATCHED THEN
        UPDATE SET o.company_name = s.company_name, o.__op = s.__op
    WHEN NOT MATCHED THEN
        INSERT ROW
After deleting tables on BigQuery, we have to wait (5+ minutes) to let the change propagate to all the machines in the BigQuery cluster. Otherwise, it might lead to some rows missing. This happens on resync requests. Please contact support@integrate.io to assist with the resync timing of user tables.

Data Type Mapping - BigQuery

IP Allowlist

Initial Sync Process

Last modified on April 20, 2026