Surrogate Key (_iio_id)

Automatic surrogate primary key generation for source tables that lack real primary keys.

Overview

Many data warehouse tables (e.g., BigQuery views, external tables, or tables created without constraints) do not define primary key columns. Without a primary key, Integrate.io cannot track rows for deduplication or incremental sync.

The Surrogate Key feature solves this by automatically generating a deterministic _iio_id column — a SHA-256 hash computed from all user data columns in each row. This hash serves as a synthetic primary key, enabling these tables to be selected and replicated.

Supported Sources

Source Supported Notes
BigQuery Yes Enabled by default

How It Works

Hash Generation

For each row in a table without a primary key, Integrate.io:

  1. Collects all user data columns (excludes only the _iio_id column itself)
  2. Sorts column names alphabetically for deterministic ordering
  3. Normalizes each value to a consistent string representation
  4. Computes a SHA-256 hash of the joined string
  5. Outputs a 64-character lowercase hex string as _iio_id

The hash is deterministic — the same row data always produces the same _iio_id, regardless of which BigQuery API is used to read it.

Example

Consider a BigQuery table users with no primary key:

Source Table (BigQuery)

name email
Alice Johnson alice@example.com
Bob Smith bob@example.com
Carol Davis null

This table has no primary key columns, so Integrate.io automatically generates _iio_id for each row.

Destination Table (after replication)

_iio_id name email
bd5d05a59bdf4667... Alice Johnson alice@example.com
22cae61171dea987... Bob Smith bob@example.com
486af6f1101a528b... Carol Davis null

The _iio_id column is marked as the primary key in the destination schema, enabling the destination sink to create the appropriate constraint.

How each hash is computed

Columns are sorted alphabetically: email, name.

For Alice Johnson, the hash input is:

alice@example.com \\x1F Alice Johnson

SHA-256bd5d05a59bdf4667ea52b80fa88d2900c371850a825f538aaf388f6123065333

For Carol Davis, the null email is represented as a null byte (\\0):

\\0 \\x1F Carol Davis

SHA-256486af6f1101a528b970ae6a33dae64ba65ae9f1eef7b3e1d609cd41ee258df1c

Limitations

  • Append-only sync: Surrogate key tables only capture new rows. Updates and deletes are not tracked because there is no stable identifier to match rows across syncs.
  • Schema changes: If columns are added or removed from the source table, the _iio_id hash for existing rows will change (since different columns are included in the hash). This effectively treats them as new rows after the schema change.