Skip to main content

Overview

Integrate.io uses BigQuery’s streaming insert API to ingest data into BigQuery. This requires billing to be enabled on the Google Cloud project.

Requirements

  • Billing must be enabled on the Google Cloud project.
  • Ensure you have permission to create a service account with the BigQuery Data Editor role on the project.

Setting up BigQuery for ELT & CDC

Create service account

Creating a service account for FlyData to use requires the BigQuery Data Editor role.
1
Go to your project’s service accounts page and click + CREATE SERVICE ACCOUNT.
Google Cloud service accounts page with Create Service Account button
2
Enter the Service account name and add a description.
Entering service account name and description
Click CREATE AND CONTINUE.
3
Select the BigQuery Data Editor role.
Selecting the BigQuery Data Editor role for the service account
4
Select the BigQuery Job User role.
Click CONTINUE.
5
Click DONE.
6
The service account should have been created successfully. Find it in the service accounts list and click it.
Service accounts list showing the newly created account
7
Go to the KEYS tab.
8
Click ADD KEY and select CREATE NEW KEY.
Keys tab with Add Key dropdown showing Create New Key option
9
Select JSON and click CREATE.
Selecting JSON key type for the service account credentials
10
A JSON file should have been downloaded to your system. Upload it.We will fetch the client_email and private_key properties from the JSON credential file.

Limitations

BigQuery doesn’t support deduplication. Instead, the following columns are added to the destination table for future deduplication on the customer’s side. __offset - Offset column. Keeps track of the offset in incremental order to determine which existing row is the latest.
__op - Operation column. Tells what kind of operation is performed for a particular row (0 - Created, 1 - Inserted, 2 - Updated, 3 - Deleted)
This query is to list the de-duplicated records:
SELECT customer.*
FROM (SELECT id,
             ARRAY_AGG(
                     t ORDER BY t.__offset DESC LIMIT 1
                 )[OFFSET(0)] customer
      FROM \`euphoric-effect-296205.northwind.customer\` t
      GROUP BY id);
This query is to upsert a staging table that Integrate.io is syncing to, into a production table used for querying:
MERGE INTO \`northwind\`.customers o
    USING (SELECT event.*
           FROM (SELECT customer_id,
                        ARRAY_AGG(
                                t ORDER BY t.__offset DESC LIMIT 1
                            )[OFFSET(0)] event
                 FROM customers_staging_1 t
                 GROUP BY customer_id)) as s
    ON o.customer_id = s.customer_id
    WHEN MATCHED AND s.__op = 3 THEN
        DELETE
    WHEN MATCHED THEN
        UPDATE SET o.company_name = s.company_name, o.__op = s.__op
    WHEN NOT MATCHED THEN
        INSERT ROW
After deleting tables on BigQuery, we have to wait (5+ minutes) to let the change propagate to all the machines in the BigQuery cluster. Otherwise, it might lead to some rows missing. This happens on resync requests. Please contact support@integrate.io to assist with the resync timing of user tables.

Data Type Mapping - BigQuery

IP Allowlist

Initial Sync Process

Last modified on April 20, 2026