Overview
| Description |
Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data |
| Link |
https://cloud.google.com/bigquery |
| Type |
Full Table Sync + Change Data Capture (CDC) |
| Supported Replication |
Initial Sync, Continuous Sync |
| Authentication Type |
Service Account (JSON Key) |
Required Roles
Your service account needs these three roles in Google Cloud:
| Role |
Why it's needed |
| BigQuery Data Viewer |
Allows reading data from your tables |
| BigQuery Job User |
Allows running queries to detect changes |
| BigQuery Read Session User |
Allows fast data extraction during initial sync |
Required Configuration
To capture updates and deletes (not just inserts), you need to enable change history on your BigQuery tables. See the section below for instructions.
Enabling Change History
To enable change history on a table for continuous sync with CHANGES TVF:
Single table:
ALTER TABLE `project.dataset.your_table`
SET OPTIONS (enable_change_history = TRUE);
All tables in a dataset:
DECLARE stmt STRING;
FOR tbl IN (
SELECT table_name
FROM `project.dataset.INFORMATION_SCHEMA.TABLES`
WHERE table_type = 'BASE TABLE'
)
DO
SET stmt = FORMAT("""
ALTER TABLE `project.dataset.%s`
SET OPTIONS (enable_change_history = TRUE)
""", tbl.table_name);
EXECUTE IMMEDIATE stmt;
END FOR;
Continuous Sync Modes
| Mode |
What it captures |
Setup required |
| All Changes |
Inserts, updates, and deletes |
Enable change history on your tables |
| Append Only |
New rows only (inserts) |
No setup required |
Limitations
- Pipeline downtime: If your pipeline is stopped for more than 7 days, you may need to perform a full resync. BigQuery only retains change data for up to 7 days by default.
- Change history required for updates/deletes: To capture updates and deletes, you must enable change history on your tables. Without it, only new rows (inserts) will be replicated.
- Nested data: Complex nested fields (STRUCT, ARRAY) are converted to JSON strings in the destination.
- Pipeline restart required: If you enable change history on a table that's already syncing, restart the pipeline for the changes to take effect.
Tables Without Primary Keys
BigQuery tables that don't have a primary key are fully supported. When a table has no primary key, the system automatically generates a surrogate key column called _iio_id.
How it works:
_iio_id is a deterministic SHA-256 hash computed from all column values in each row
- No user configuration is required — surrogate key generation is enabled by default
Tables without primary keys use Append Only replication. Only new rows (inserts) are captured. Updates and deletes are not tracked since there is no stable row identifier in the source table.