Sources

Sources - BigQuery

Overview

Description	Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data
Link	https://cloud.google.com/bigquery
Type	Full Table Sync + Change Data Capture (CDC)
Supported Replication	Initial Sync, Continuous Sync
Authentication Type	Service Account (JSON Key)

Required Roles

Your service account needs these three roles in Google Cloud:

Role	Why it's needed
BigQuery Data Viewer	Allows reading data from your tables
BigQuery Job User	Allows running queries to detect changes
BigQuery Read Session User	Allows fast data extraction during initial sync

Required Configuration

To capture updates and deletes (not just inserts), you need to enable change history on your BigQuery tables. See the section below for instructions.

Enabling Change History

To enable change history on a table for continuous sync with CHANGES TVF:

Single table:

ALTER TABLE `project.dataset.your_table`
SET OPTIONS (enable_change_history = TRUE);

All tables in a dataset:

DECLARE stmt STRING;

FOR tbl IN (
  SELECT table_name
  FROM `project.dataset.INFORMATION_SCHEMA.TABLES`
  WHERE table_type = 'BASE TABLE'
)
DO
  SET stmt = FORMAT("""
    ALTER TABLE `project.dataset.%s`
    SET OPTIONS (enable_change_history = TRUE)
  """, tbl.table_name);

  EXECUTE IMMEDIATE stmt;
END FOR;

Continuous Sync Modes

Mode	What it captures	Setup required
All Changes	Inserts, updates, and deletes	Enable change history on your tables
Append Only	New rows only (inserts)	No setup required

Limitations

Pipeline downtime: If your pipeline is stopped for more than 7 days, you may need to perform a full resync. BigQuery only retains change data for up to 7 days by default.
Change history required for updates/deletes: To capture updates and deletes, you must enable change history on your tables. Without it, only new rows (inserts) will be replicated.
Nested data: Complex nested fields (STRUCT, ARRAY) are converted to JSON strings in the destination.
Pipeline restart required: If you enable change history on a table that's already syncing, restart the pipeline for the changes to take effect.

Tables Without Primary Keys

BigQuery tables that don't have a primary key are fully supported. When a table has no primary key, the system automatically generates a surrogate key column called _iio_id.

How it works:

_iio_id is a deterministic SHA-256 hash computed from all column values in each row
No user configuration is required — surrogate key generation is enabled by default

Tables without primary keys use Append Only replication. Only new rows (inserts) are captured. Updates and deletes are not tracked since there is no stable row identifier in the source table.

ELT & CDC
Knowledge Base

ELT & CDC Knowledge base

Getting Started

9 Articles

60 Articles

Destinations

7 Articles

Monitoring

7 Articles

Security

5 Articles

Transforms

1 Articles

Data Observability

10 Articles

Sources - BigQuery

Overview

Required Roles

Required Configuration

Enabling Change History

Continuous Sync Modes

Limitations

Tables Without Primary Keys

ELT & CDC Knowledge Base

ELT & CDC Knowledge base

Getting Started

9 Articles

Sources

60 Articles

Destinations

7 Articles

Monitoring

7 Articles

Security

5 Articles

Transforms

1 Articles

Data Observability

10 Articles

Sources - BigQuery

Overview

Required Roles

Required Configuration

Enabling Change History

Continuous Sync Modes

Limitations

Tables Without Primary Keys

See Also

ELT & CDC
Knowledge Base