Overview
Many data warehouse tables (e.g., BigQuery views, external tables, or tables created without constraints) do not define primary key columns. Without a primary key, Integrate.io cannot track rows for deduplication or incremental sync. The Surrogate Key feature solves this by automatically generating a deterministic_iio_id column, a SHA-256 hash computed from all user data columns in each row. This hash serves as a synthetic primary key, enabling these tables to be selected and replicated.
Supported Sources
| Source | Supported | Notes |
|---|---|---|
| BigQuery | Yes | Enabled by default |
How It Works
Hash Generation
For each row in a table without a primary key, Integrate.io:
The hash is deterministic: the same row data always produces the same
_iio_id, regardless of which BigQuery API is used to read it.
Example
Consider a BigQuery tableusers with no primary key:
Source Table (BigQuery)
| name | |
|---|---|
| Alice Johnson | alice@example.com |
| Bob Smith | bob@example.com |
| Carol Davis | null |
_iio_id for each row.
Destination Table (after replication)
| _iio_id | name | |
|---|---|---|
bd5d05a59bdf4667... | Alice Johnson | alice@example.com |
22cae61171dea987... | Bob Smith | bob@example.com |
486af6f1101a528b... | Carol Davis | null |
_iio_id column is marked as the primary key in the destination schema, enabling the destination sink to create the appropriate constraint.
How each hash is computed
Columns are sorted alphabetically:email, name.
For Alice Johnson, the hash input is:
SHA-256 → bd5d05a59bdf4667ea52b80fa88d2900c371850a825f538aaf388f6123065333
For Carol Davis, the null email is represented as a null byte (\\0):
SHA-256 → 486af6f1101a528b970ae6a33dae64ba65ae9f1eef7b3e1d609cd41ee258df1c
Limitations
- Append-only sync: Surrogate key tables only capture new rows. Updates and deletes are not tracked because there is no stable identifier to match rows across syncs.
- Schema changes: If columns are added or removed from the source table, the
_iio_idhash for existing rows will change (since different columns are included in the hash). This effectively treats them as new rows after the schema change.