> ## Documentation Index
> Fetch the complete documentation index at: https://www.integrate.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# BigQuery destination for ELT & CDC

> Configure Google BigQuery as a destination in Integrate.io ELT & CDC. Set up your BigQuery dataset to receive replicated source data.

## Overview

Integrate.io uses BigQuery's [streaming insert API](https://cloud.google.com/bigquery/streaming-data-into-bigquery) to ingest data into BigQuery. This requires billing to be enabled on the Google Cloud project.

### Requirements

* Billing must be enabled on the Google Cloud project.
* Ensure you have permission to create a service account with the `BigQuery Data Editor` role on the project.

### Setting up BigQuery for ELT & CDC

#### Create service account

Creating a service account for FlyData to use requires the BigQuery Data Editor role.

<Steps>
  <Step>
    Go to your project's [service accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts) and click `+ CREATE SERVICE ACCOUNT`.

    <Frame>
      <img src="https://mintcdn.com/integrateio/MGWLTifrhXADDsEf/images/cdc/destinations/image-18.png?fit=max&auto=format&n=MGWLTifrhXADDsEf&q=85&s=fe78c185efd399f6c532d8b0608ccea5" alt="Google Cloud service accounts page with Create Service Account button" width="540" height="224" data-path="images/cdc/destinations/image-18.png" />
    </Frame>
  </Step>

  <Step>
    Enter the `Service account name` and add a description.

    <Frame>
      <img src="https://mintcdn.com/integrateio/MGWLTifrhXADDsEf/images/cdc/destinations/image-19.png?fit=max&auto=format&n=MGWLTifrhXADDsEf&q=85&s=c087333bde3211be0752b4b85a0e4002" alt="Entering service account name and description" width="546" height="352" data-path="images/cdc/destinations/image-19.png" />
    </Frame>

    Click `CREATE AND CONTINUE`.
  </Step>

  <Step>
    Select the `BigQuery Data Editor` role.

    <Frame>
      <img src="https://mintcdn.com/integrateio/MGWLTifrhXADDsEf/images/cdc/destinations/image-20.png?fit=max&auto=format&n=MGWLTifrhXADDsEf&q=85&s=004f79f72af01af3fcd493ef52d74265" alt="Selecting the BigQuery Data Editor role for the service account" width="545" height="533" data-path="images/cdc/destinations/image-20.png" />
    </Frame>
  </Step>

  <Step>
    Select the `BigQuery Job User` role.\
    Click `CONTINUE`.
  </Step>

  <Step>
    Click `DONE`.
  </Step>

  <Step>
    The service account should have been created successfully. Find it in the [service accounts list](https://console.cloud.google.com/iam-admin/serviceaccounts) and click it.

    <Frame>
      <img src="https://mintcdn.com/integrateio/MGWLTifrhXADDsEf/images/cdc/destinations/image-21.png?fit=max&auto=format&n=MGWLTifrhXADDsEf&q=85&s=71753920929aee56c2301b4380ddcb3a" alt="Service accounts list showing the newly created account" width="1200" height="55" data-path="images/cdc/destinations/image-21.png" />
    </Frame>
  </Step>

  <Step>
    Go to the `KEYS` tab.
  </Step>

  <Step>
    Click `ADD KEY` and select `CREATE NEW KEY`.

    <Frame>
      <img src="https://mintcdn.com/integrateio/MGWLTifrhXADDsEf/images/cdc/destinations/image-22.png?fit=max&auto=format&n=MGWLTifrhXADDsEf&q=85&s=dcde905418951172e774a604c4184fdd" alt="Keys tab with Add Key dropdown showing Create New Key option" width="585" height="413" data-path="images/cdc/destinations/image-22.png" />
    </Frame>
  </Step>

  <Step>
    Select `JSON` and click `CREATE`.

    <Frame>
      <img src="https://mintcdn.com/integrateio/MGWLTifrhXADDsEf/images/cdc/destinations/image-23.png?fit=max&auto=format&n=MGWLTifrhXADDsEf&q=85&s=8659a1b316e9774bd6e73570d860483e" alt="Selecting JSON key type for the service account credentials" width="599" height="390" data-path="images/cdc/destinations/image-23.png" />
    </Frame>
  </Step>

  <Step>
    A JSON file should have been downloaded to your system. Upload it.

    We will fetch the `client_email` and `private_key` properties from the JSON credential file.
  </Step>
</Steps>

#### Limitations

BigQuery doesn’t support deduplication. Instead, the following columns are added to the destination table for future deduplication on the customer’s side.

**\_\_offset** - Offset column. Keeps track of the offset in incremental order to determine which existing row is the latest.\
**\_\_op** - Operation column. Tells what kind of operation is performed for a particular row (0 - Created, 1 - Inserted, 2 - Updated, 3 - Deleted)

This query is to list the de-duplicated records:

```sql theme={null}
SELECT customer.*
FROM (SELECT id,
             ARRAY_AGG(
                     t ORDER BY t.__offset DESC LIMIT 1
                 )[OFFSET(0)] customer
      FROM \`euphoric-effect-296205.northwind.customer\` t
      GROUP BY id);
```

This query is to upsert a staging table that Integrate.io is syncing to, into a production table used for querying:

```sql theme={null}
MERGE INTO \`northwind\`.customers o
    USING (SELECT event.*
           FROM (SELECT customer_id,
                        ARRAY_AGG(
                                t ORDER BY t.__offset DESC LIMIT 1
                            )[OFFSET(0)] event
                 FROM customers_staging_1 t
                 GROUP BY customer_id)) as s
    ON o.customer_id = s.customer_id
    WHEN MATCHED AND s.__op = 3 THEN
        DELETE
    WHEN MATCHED THEN
        UPDATE SET o.company_name = s.company_name, o.__op = s.__op
    WHEN NOT MATCHED THEN
        INSERT ROW
```

After deleting tables on BigQuery, we have to wait (5+ minutes) to let the change propagate to all the machines in the BigQuery cluster. Otherwise, it might lead to some rows missing. This happens on resync requests. Please contact [support@integrate.io](mailto:support@integrate.io) to assist with the resync timing of user tables.

## Related

<CardGroup cols={2}>
  <Card title="Data Type Mapping - BigQuery" icon="arrow-right" href="/cdc/data-type-mapping-bigquery" horizontal />

  <Card title="IP Allowlist" icon="arrow-right" href="/cdc/ip-list" horizontal />

  <Card title="Initial Sync Process" icon="arrow-right" href="/cdc/initial-sync-process" horizontal />
</CardGroup>
