Automatic surrogate primary key generation for source tables that lack real primary keys.
Overview
Many data warehouse tables (e.g., BigQuery views, external tables, or tables created without constraints) do not define primary key columns. Without a primary key, Integrate.io cannot track rows for deduplication or incremental sync.
The Surrogate Key feature solves this by automatically generating a deterministic _iio_id column — a SHA-256 hash computed from all user data columns in each row. This hash serves as a synthetic primary key, enabling these tables to be selected and replicated.
Supported Sources
| Source |
Supported |
Notes |
| BigQuery |
Yes |
Enabled by default |
How It Works
Hash Generation
For each row in a table without a primary key, Integrate.io:
- Collects all user data columns (excludes only the
_iio_id column itself)
- Sorts column names alphabetically for deterministic ordering
- Normalizes each value to a consistent string representation
- Computes a SHA-256 hash of the joined string
- Outputs a 64-character lowercase hex string as
_iio_id
The hash is deterministic — the same row data always produces the same _iio_id, regardless of which BigQuery API is used to read it.
Example
Consider a BigQuery table users with no primary key:
Source Table (BigQuery)
This table has no primary key columns, so Integrate.io automatically generates _iio_id for each row.
Destination Table (after replication)
The _iio_id column is marked as the primary key in the destination schema, enabling the destination sink to create the appropriate constraint.
How each hash is computed
Columns are sorted alphabetically: email, name.
For Alice Johnson, the hash input is:
alice@example.com \\x1F Alice Johnson
SHA-256 → bd5d05a59bdf4667ea52b80fa88d2900c371850a825f538aaf388f6123065333
For Carol Davis, the null email is represented as a null byte (\\0):
\\0 \\x1F Carol Davis
SHA-256 → 486af6f1101a528b970ae6a33dae64ba65ae9f1eef7b3e1d609cd41ee258df1c
Limitations
- Append-only sync: Surrogate key tables only capture new rows. Updates and deletes are not tracked because there is no stable identifier to match rows across syncs.
- Schema changes: If columns are added or removed from the source table, the
_iio_id hash for existing rows will change (since different columns are included in the hash). This effectively treats them as new rows after the schema change.