Sources - BigQuery

Overview

Description Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data
Link https://cloud.google.com/bigquery
Type Full Table Sync + Change Data Capture (CDC)
Supported Replication Initial Sync, Continuous Sync
Authentication Type Service Account (JSON Key)

Required Roles

Your service account needs these three roles in Google Cloud:

Role Why it's needed
BigQuery Data Viewer Allows reading data from your tables
BigQuery Job User Allows running queries to detect changes
BigQuery Read Session User Allows fast data extraction during initial sync

Required Configuration

To capture updates and deletes (not just inserts), you need to enable change history on your BigQuery tables. See the section below for instructions.

Enabling Change History

To enable change history on a table for continuous sync with CHANGES TVF:

Single table:

ALTER TABLE `project.dataset.your_table`
SET OPTIONS (enable_change_history = TRUE);

All tables in a dataset:

DECLARE stmt STRING;

FOR tbl IN (
  SELECT table_name
  FROM `project.dataset.INFORMATION_SCHEMA.TABLES`
  WHERE table_type = 'BASE TABLE'
)
DO
  SET stmt = FORMAT("""
    ALTER TABLE `project.dataset.%s`
    SET OPTIONS (enable_change_history = TRUE)
  """, tbl.table_name);

  EXECUTE IMMEDIATE stmt;
END FOR;

Continuous Sync Modes

Mode What it captures Setup required
All Changes Inserts, updates, and deletes Enable change history on your tables
Append Only New rows only (inserts) No setup required

Limitations

  • Pipeline downtime: If your pipeline is stopped for more than 7 days, you may need to perform a full resync. BigQuery only retains change data for up to 7 days by default.
  • Change history required for updates/deletes: To capture updates and deletes, you must enable change history on your tables. Without it, only new rows (inserts) will be replicated.
  • Nested data: Complex nested fields (STRUCT, ARRAY) are converted to JSON strings in the destination.
  • Pipeline restart required: If you enable change history on a table that's already syncing, restart the pipeline for the changes to take effect.

Tables Without Primary Keys

BigQuery tables that don't have a primary key are fully supported. When a table has no primary key, the system automatically generates a surrogate key column called _iio_id.

How it works:

  • _iio_id is a deterministic SHA-256 hash computed from all column values in each row
  • No user configuration is required — surrogate key generation is enabled by default

Tables without primary keys use Append Only replication. Only new rows (inserts) are captured. Updates and deletes are not tracked since there is no stable row identifier in the source table.