> ## Documentation Index
> Fetch the complete documentation index at: https://www.integrate.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# BigQuery source for ELT & CDC

> Configure Google BigQuery as a data source in Integrate.io ELT & CDC. Replicate BigQuery datasets and tables to other warehouse destinations.

## Overview

|                           |                                                                                                                     |
| :------------------------ | :------------------------------------------------------------------------------------------------------------------ |
| **Description**           | Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data |
| **Link**                  | [https://cloud.google.com/bigquery](https://cloud.google.com/bigquery)                                              |
| **Type**                  | Full Table Sync + Change Data Capture (CDC)                                                                         |
| **Supported Replication** | Initial Sync, Continuous Sync                                                                                       |
| **Authentication Type**   | Service Account (JSON Key)                                                                                          |

### Required Roles

Your service account needs these three roles in Google Cloud:

| Role                           | Why it's needed                                 |
| :----------------------------- | :---------------------------------------------- |
| **BigQuery Data Viewer**       | Allows reading data from your tables            |
| **BigQuery Job User**          | Allows running queries to detect changes        |
| **BigQuery Read Session User** | Allows fast data extraction during initial sync |

### Required Configuration

To capture updates and deletes (not just inserts), you need to enable **change history** on your BigQuery tables. See the section below for instructions.

### Enabling Change History

To enable change history on a table for continuous sync with CHANGES TVF:

**Single table:**

```sql theme={null}
ALTER TABLE \`project.dataset.your_table\`
SET OPTIONS (enable_change_history = TRUE);
```

**All tables in a dataset:**

```sql theme={null}
DECLARE stmt STRING;

FOR tbl IN (
  SELECT table_name
  FROM \`project.dataset.INFORMATION_SCHEMA.TABLES\`
  WHERE table_type = 'BASE TABLE'
)
DO
  SET stmt = FORMAT("""
    ALTER TABLE \`project.dataset.%s\`
    SET OPTIONS (enable_change_history = TRUE)
  """, tbl.table_name);

  EXECUTE IMMEDIATE stmt;
END FOR;
```

### Continuous Sync Modes

| Mode            | What it captures              | Setup required                       |
| :-------------- | :---------------------------- | :----------------------------------- |
| **All Changes** | Inserts, updates, and deletes | Enable change history on your tables |
| **Append Only** | New rows only (inserts)       | No setup required                    |

### Limitations

* **Pipeline downtime**: If your pipeline is stopped for more than 7 days, you may need to perform a full resync. BigQuery only retains change data for up to 7 days by default.
* **Change history required for updates/deletes**: To capture updates and deletes, you must enable change history on your tables. Without it, only new rows (inserts) will be replicated.
* **Nested data**: Complex nested fields (STRUCT, ARRAY) are converted to JSON strings in the destination.
* **Pipeline restart required**: If you enable change history on a table that's already syncing, restart the pipeline for the changes to take effect.

### Tables Without Primary Keys

BigQuery tables that don't have a primary key are fully supported. When a table has no primary key, the system automatically generates a surrogate key column called `_iio_id`.

**How it works:**

* `_iio_id` is a **deterministic SHA-256 hash** computed from all column values in each row
* No user configuration is required; surrogate key generation is enabled by default

Tables without primary keys use **Append Only** replication. Only new rows (inserts) are captured. Updates and deletes are not tracked since there is no stable row identifier in the source table.
