Surrogate Key (_iio_id)

Automatic surrogate primary key generation for source tables that lack real primary keys.

Overview

Many data warehouse tables (e.g., BigQuery views, external tables, or tables created without constraints) do not define primary key columns. Without a primary key, Integrate.io cannot track rows for deduplication or incremental sync.

The Surrogate Key feature solves this by automatically generating a deterministic _iio_id column — a SHA-256 hash computed from all user data columns in each row. This hash serves as a synthetic primary key, enabling these tables to be selected and replicated.

Supported Sources

Source	Supported	Notes
BigQuery	Yes	Enabled by default

How It Works

Hash Generation

For each row in a table without a primary key, Integrate.io:

Collects all user data columns (excludes only the _iio_id column itself)
Sorts column names alphabetically for deterministic ordering
Normalizes each value to a consistent string representation
Computes a SHA-256 hash of the joined string
Outputs a 64-character lowercase hex string as _iio_id

The hash is deterministic — the same row data always produces the same _iio_id, regardless of which BigQuery API is used to read it.

Example

Consider a BigQuery table users with no primary key:

Source Table (BigQuery)

name	email
Alice Johnson	alice@example.com
Bob Smith	bob@example.com
Carol Davis	null

This table has no primary key columns, so Integrate.io automatically generates _iio_id for each row.

Destination Table (after replication)

_iio_id	name	email
`bd5d05a59bdf4667...`	Alice Johnson	alice@example.com
`22cae61171dea987...`	Bob Smith	bob@example.com
`486af6f1101a528b...`	Carol Davis	null

The _iio_id column is marked as the primary key in the destination schema, enabling the destination sink to create the appropriate constraint.

How each hash is computed

Columns are sorted alphabetically: email, name.

For Alice Johnson, the hash input is:

alice@example.com \\x1F Alice Johnson

SHA-256 → bd5d05a59bdf4667ea52b80fa88d2900c371850a825f538aaf388f6123065333

For Carol Davis, the null email is represented as a null byte (\\0):

\\0 \\x1F Carol Davis

SHA-256 → 486af6f1101a528b970ae6a33dae64ba65ae9f1eef7b3e1d609cd41ee258df1c

Limitations

Append-only sync: Surrogate key tables only capture new rows. Updates and deletes are not tracked because there is no stable identifier to match rows across syncs.
Schema changes: If columns are added or removed from the source table, the _iio_id hash for existing rows will change (since different columns are included in the hash). This effectively treats them as new rows after the schema change.

ELT & CDC
Knowledge Base

ELT & CDC Knowledge base

Getting Started

9 Articles

Sources

60 Articles

Destinations

7 Articles

Monitoring

7 Articles

Security

5 Articles

Transforms

1 Articles

Data Observability

10 Articles

Surrogate Key (_iio_id)

Overview

Supported Sources

How It Works

Hash Generation

Example

Source Table (BigQuery)

Destination Table (after replication)

How each hash is computed

Limitations

ELT & CDC Knowledge Base

ELT & CDC Knowledge base

Getting Started

9 Articles

Sources

60 Articles

Destinations

7 Articles

Monitoring

7 Articles

Security

5 Articles

Transforms

1 Articles

Data Observability

10 Articles

Surrogate Key (_iio_id)

Overview

Supported Sources

How It Works

Hash Generation

Example

Source Table (BigQuery)

Destination Table (after replication)

How each hash is computed

Limitations

See Also

ELT & CDC
Knowledge Base