PostgreSQL source for ELT & CDC


Type	Write-ahead log (WAL) Replication
Supported Replication	Initial Sync, Continuous Sync
Authentication Type	Password Authentication

Setting up PostgreSQL CDC for ELT & CDC

ELT & CDC uses WAL based logical replication with the pgoutput decoding plugin for PostgreSQL sources. It requires a database user with the LOGIN and REPLICATION roles. The sync user does not need to own your tables, and does not need CREATE permission on the database or schema.

Supported providers

Requirements

PostgreSQL versions 10.x or above.
A database user with the LOGIN and REPLICATION roles.
Tables with a PRIMARY KEY.
Master (primary) instances of PostgreSQL. Log based replication only works for master instances.

Features

Feature	Supported	Notes
Full (Historical) sync	Yes
Incremental sync	Yes
Replicate DELETE	Yes
UPSERT	Yes
Append only mode	Yes	Can be specified at a table level
Exclude tables	Yes
Exclude columns	Yes
SSL support	Yes
SSH tunnel	Yes	SSH tunnel guide

How setup works

ELT & CDC supports two setup models for PostgreSQL. Which one applies is determined automatically from the privileges of the user you connect with.

Model	When it applies	What you do
Connector-managed	The sync user has `CREATE` on the database and schema, owns the synced tables, and owns any existing publication.	Nothing. The connector creates and maintains the publication, replication slot, and heartbeat table automatically.
Customer-managed (no ownership)	The sync user lacks one or more of the privileges above.	Run the setup SQL once (see below). The connector never changes table ownership; it only reads, streams from the replication slot, and ticks the heartbeat.

When you configure the pipeline, Integrate.io inspects the live database and the connecting user’s capabilities, then generates only the SQL you still need to run. Statements that are already satisfied (existing grants, publication, slot, or heartbeat table) are omitted. If the user can fully self-manage, no manual setup is required.

Setup steps (customer-managed)

Use these steps when the sync user does not own the tables. Run the SQL as a privileged user (for example, the table owner or an administrator).

Create the sync user

Create a dedicated user with LOGIN, then assign the REPLICATION role.

CREATE ROLE integrateio WITH PASSWORD '<password>' LOGIN;
ALTER ROLE integrateio WITH REPLICATION;

On Amazon RDS and Aurora, grant replication through the managed role instead:

GRANT rds_replication TO integrateio;

Grant read privileges

Grant the sync user access to the schema and read access to the tables you want to replicate. No CREATE privilege is required.

GRANT USAGE ON SCHEMA <schema> TO integrateio;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema> TO integrateio;
GRANT REFERENCES ON ALL TABLES IN SCHEMA <schema> TO integrateio;

Create the heartbeat table

Create the heartbeat table the connector uses to keep the replication slot advancing while your synced tables are idle, and grant the sync user write access to it.

CREATE TABLE IF NOT EXISTS <schema>.integrateio_heartbeat (last_heartbeat timestamptz PRIMARY KEY);
GRANT INSERT, UPDATE, DELETE ON <schema>.integrateio_heartbeat TO integrateio;

Create the publication

Create a publication that includes your synced tables and the heartbeat table. Choose the form that matches your privileges and PostgreSQL version (see the table below).

CREATE PUBLICATION <publication_name> FOR TABLE <schema>.<table_1>, <schema>.<table_2>, <schema>.integrateio_heartbeat;

Create the replication slot

Create the logical replication slot the connector streams changes from using the pgoutput plugin.

SELECT pg_create_logical_replication_slot('<slot_name>', 'pgoutput');

Publication forms

The publication can be created in three forms. Integrate.io detects an existing publication’s form and uses it; for new ones, pick based on the privileges available.

Form	SQL	Requirements	Notes
List	`CREATE PUBLICATION ... FOR TABLE <tables>`	PostgreSQL 10+, table owner	Explicitly lists tables and the heartbeat table. Supports `ALTER PUBLICATION ... ADD TABLE`.
Schema	`CREATE PUBLICATION ... FOR TABLES IN SCHEMA <schema>`	PostgreSQL 15+, superuser	Auto-includes current and future tables in the schema (including the heartbeat table).
All tables	`CREATE PUBLICATION ... FOR ALL TABLES`	Superuser	Auto-includes current and future tables in the database.

Heartbeat

PostgreSQL uses the confirmed_flush_lsn to determine which WAL segments can be safely removed, so if it’s not updated because the tables in your publication aren’t changing, the WAL storage can grow significantly. To ensure that the confirmed_flush_lsn advances even when there are no transactions for the selected tables, the connector creates (or uses) a heartbeat table. It inserts a single record and updates it every 10 minutes while the pipeline is running.

CREATE TABLE IF NOT EXISTS <schema>.integrateio_heartbeat (last_heartbeat timestamptz PRIMARY KEY);

This table is not replicated to your destination and its updates do not count towards your usage.

Frequently asked questions (FAQs)

Does the sync user need to own my tables?

No. ELT & CDC supports a customer-managed model that requires only LOGIN and REPLICATION on the user, plus USAGE on the schema and SELECT/REFERENCES on the tables. Table ownership and CREATE on the database or schema are not required. If the user happens to have those privileges, the connector manages the publication, slot, and heartbeat table for you automatically.

What SQL do I need to run?

Integrate.io inspects your database during setup and generates only the statements that are still missing (grants, heartbeat table, publication, and replication slot). If everything is already in place, or the user can fully self-manage, no manual SQL is needed.

What happens when I de-select a table?

For a customer-managed publication, the connector only adds missing tables; it never drops tables you previously published, because it does not own the publication. A de-selected table left in the publication is harmless (it adds some extra WAL decoding overhead but causes no data loss). To remove the overhead, drop it manually with ALTER PUBLICATION <publication_name> DROP TABLE <schema>.<table>;.

How does Integrate.io handle Postgres replication slots?

The connector creates a replication slot when the initial sync starts, and then listens for changes from it once the initial sync completes. When you delete or archive a pipeline, the connector removes the replication slot from your Postgres database if it can still access it.

How are large JSONB objects handled?

Large JSONB objects map to TEXT/VARCHAR/STRING columns and might be truncated, depending on the destination.For tables that use TOAST storage (typically tables with large JSONB or TEXT columns), set their REPLICA IDENTITY to FULL so the connector can sync the data correctly during continuous sync.

Getting Started

Pipelines

Sources

Destinations

Monitoring

Security

Transforms

Data Observability

PostgreSQL source for ELT & CDC

Setting up PostgreSQL CDC for ELT & CDC

Supported providers

Requirements

Features

How setup works

Setup steps (customer-managed)

Publication forms

Heartbeat

Frequently asked questions (FAQs)

Amazon RDS PostgreSQL

Aurora PostgreSQL

Self-hosted PostgreSQL

Google Cloud SQL for PostgreSQL

SSH Tunnel

PostgreSQL PrivateLink

​Setting up PostgreSQL CDC for ELT & CDC

​Supported providers

​Requirements

​Features

​How setup works

​Setup steps (customer-managed)

​Publication forms

​Heartbeat

​Frequently asked questions (FAQs)

​Related

Amazon RDS PostgreSQL

Aurora PostgreSQL

Self-hosted PostgreSQL

Google Cloud SQL for PostgreSQL

SSH Tunnel

PostgreSQL PrivateLink

Setting up PostgreSQL CDC for ELT & CDC

Supported providers

Requirements

Features

How setup works

Setup steps (customer-managed)

Publication forms

Heartbeat

Frequently asked questions (FAQs)

Related