> ## Documentation Index
> Fetch the complete documentation index at: https://www.integrate.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# PostgreSQL source for ELT & CDC

> How to set up PostgreSQL as a CDC source in Integrate.io ELT & CDC, covering the no-ownership customer-managed setup, WAL requirements, and heartbeat.

|                           |                                   |
| :------------------------ | :-------------------------------- |
| **Type**                  | Write-ahead log (WAL) Replication |
| **Supported Replication** | Initial Sync, Continuous Sync     |
| **Authentication Type**   | Password Authentication           |

## Setting up PostgreSQL CDC for ELT & CDC

ELT & CDC uses [WAL based logical replication](https://www.postgresql.org/docs/10/logical-replication.html) with the `pgoutput` decoding plugin for PostgreSQL sources. It requires a database user with the `LOGIN` and `REPLICATION` roles. The sync user does **not** need to own your tables, and does **not** need `CREATE` permission on the database or schema.

### Supported providers

* [Amazon RDS PostgreSQL](/cdc/amazon-rds-postgresql)
* [Amazon Aurora PostgreSQL](/cdc/aurora-postgresql)
* [Azure Single Node PostgreSQL](/cdc/azure-postgresql)
* [Self-hosted (generic) PostgreSQL](/cdc/self-hosted-postgresql)
* [Google Cloud SQL for PostgreSQL](/cdc/google-cloud-sql-for-posgresql)

### Requirements

* PostgreSQL versions **10.x** or above.
* A database user with the `LOGIN` and `REPLICATION` roles.
* Tables with a `PRIMARY KEY`.
* Master (primary) instances of PostgreSQL. Log based replication only works for master instances.

### Features

| Feature                | Supported | Notes                               |
| :--------------------- | :-------- | :---------------------------------- |
| Full (Historical) sync | Yes       |                                     |
| Incremental sync       | Yes       |                                     |
| Replicate DELETE       | Yes       |                                     |
| UPSERT                 | Yes       |                                     |
| Append only mode       | Yes       | Can be specified at a table level   |
| Exclude tables         | Yes       |                                     |
| Exclude columns        | Yes       |                                     |
| SSL support            | Yes       |                                     |
| SSH tunnel             | Yes       | [SSH tunnel guide](/cdc/ssh-tunnel) |

## How setup works

ELT & CDC supports two setup models for PostgreSQL. Which one applies is determined automatically from the privileges of the user you connect with.

| Model                               | When it applies                                                                                                   | What you do                                                                                                                                                 |
| :---------------------------------- | :---------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Connector-managed**               | The sync user has `CREATE` on the database and schema, owns the synced tables, and owns any existing publication. | Nothing. The connector creates and maintains the publication, replication slot, and heartbeat table automatically.                                          |
| **Customer-managed (no ownership)** | The sync user lacks one or more of the privileges above.                                                          | Run the setup SQL once (see below). The connector never changes table ownership; it only reads, streams from the replication slot, and ticks the heartbeat. |

When you configure the pipeline, Integrate.io inspects the live database and the connecting user's capabilities, then generates **only the SQL you still need to run**. Statements that are already satisfied (existing grants, publication, slot, or heartbeat table) are omitted. If the user can fully self-manage, no manual setup is required.

## Setup steps (customer-managed)

Use these steps when the sync user does not own the tables. Run the SQL as a privileged user (for example, the table owner or an administrator).

<Steps>
  <Step title="Create the sync user">
    Create a dedicated user with `LOGIN`, then assign the `REPLICATION` role.

    ```sql theme={null}
    CREATE ROLE integrateio WITH PASSWORD '<password>' LOGIN;
    ALTER ROLE integrateio WITH REPLICATION;
    ```

    On Amazon RDS and Aurora, grant replication through the managed role instead:

    ```sql theme={null}
    GRANT rds_replication TO integrateio;
    ```
  </Step>

  <Step title="Grant read privileges">
    Grant the sync user access to the schema and read access to the tables you want to replicate. No `CREATE` privilege is required.

    ```sql theme={null}
    GRANT USAGE ON SCHEMA <schema> TO integrateio;
    GRANT SELECT ON ALL TABLES IN SCHEMA <schema> TO integrateio;
    GRANT REFERENCES ON ALL TABLES IN SCHEMA <schema> TO integrateio;
    ```
  </Step>

  <Step title="Create the heartbeat table">
    Create the heartbeat table the connector uses to keep the replication slot advancing while your synced tables are idle, and grant the sync user write access to it.

    ```sql theme={null}
    CREATE TABLE IF NOT EXISTS <schema>.integrateio_heartbeat (last_heartbeat timestamptz PRIMARY KEY);
    GRANT INSERT, UPDATE, DELETE ON <schema>.integrateio_heartbeat TO integrateio;
    ```
  </Step>

  <Step title="Create the publication">
    Create a publication that includes your synced tables and the heartbeat table. Choose the form that matches your privileges and PostgreSQL version (see the table below).

    ```sql theme={null}
    CREATE PUBLICATION <publication_name> FOR TABLE <schema>.<table_1>, <schema>.<table_2>, <schema>.integrateio_heartbeat;
    ```
  </Step>

  <Step title="Create the replication slot">
    Create the logical replication slot the connector streams changes from using the `pgoutput` plugin.

    ```sql theme={null}
    SELECT pg_create_logical_replication_slot('<slot_name>', 'pgoutput');
    ```
  </Step>
</Steps>

### Publication forms

The publication can be created in three forms. Integrate.io detects an existing publication's form and uses it; for new ones, pick based on the privileges available.

| Form       | SQL                                                    | Requirements                | Notes                                                                                        |
| :--------- | :----------------------------------------------------- | :-------------------------- | :------------------------------------------------------------------------------------------- |
| List       | `CREATE PUBLICATION ... FOR TABLE <tables>`            | PostgreSQL 10+, table owner | Explicitly lists tables and the heartbeat table. Supports `ALTER PUBLICATION ... ADD TABLE`. |
| Schema     | `CREATE PUBLICATION ... FOR TABLES IN SCHEMA <schema>` | PostgreSQL 15+, superuser   | Auto-includes current and future tables in the schema (including the heartbeat table).       |
| All tables | `CREATE PUBLICATION ... FOR ALL TABLES`                | Superuser                   | Auto-includes current and future tables in the database.                                     |

## Heartbeat

PostgreSQL uses the `confirmed_flush_lsn` to determine which WAL segments can be safely removed, so if it's not updated because the tables in your publication aren't changing, the WAL storage can grow significantly.

To ensure that the `confirmed_flush_lsn` advances even when there are no transactions for the selected tables, the connector creates (or uses) a heartbeat table. It inserts a single record and updates it every 10 minutes while the pipeline is running.

```sql theme={null}
CREATE TABLE IF NOT EXISTS <schema>.integrateio_heartbeat (last_heartbeat timestamptz PRIMARY KEY);
```

This table is not replicated to your destination and its updates do not count towards your usage.

### Frequently asked questions (FAQs)

<AccordionGroup>
  <Accordion title="Does the sync user need to own my tables?">
    No. ELT & CDC supports a customer-managed model that requires only `LOGIN` and `REPLICATION` on the user, plus `USAGE` on the schema and `SELECT`/`REFERENCES` on the tables. Table ownership and `CREATE` on the database or schema are not required. If the user happens to have those privileges, the connector manages the publication, slot, and heartbeat table for you automatically.
  </Accordion>

  <Accordion title="What SQL do I need to run?">
    Integrate.io inspects your database during setup and generates only the statements that are still missing (grants, heartbeat table, publication, and replication slot). If everything is already in place, or the user can fully self-manage, no manual SQL is needed.
  </Accordion>

  <Accordion title="What happens when I de-select a table?">
    For a customer-managed publication, the connector only **adds** missing tables; it never drops tables you previously published, because it does not own the publication. A de-selected table left in the publication is harmless (it adds some extra WAL decoding overhead but causes no data loss). To remove the overhead, drop it manually with `ALTER PUBLICATION <publication_name> DROP TABLE <schema>.<table>;`.
  </Accordion>

  <Accordion title="How does Integrate.io handle Postgres replication slots?">
    The connector creates a replication slot when the initial sync starts, and then listens for changes from it once the initial sync completes. When you delete or archive a pipeline, the connector removes the replication slot from your Postgres database if it can still access it.
  </Accordion>

  <Accordion title="How are large JSONB objects handled?">
    Large JSONB objects map to `TEXT`/`VARCHAR`/`STRING` columns and might be truncated, depending on the destination.

    For tables that use `TOAST` storage (typically tables with large `JSONB` or `TEXT` columns), set their [`REPLICA IDENTITY`](https://www.postgresql.org/docs/10/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY) to `FULL` so the connector can sync the data correctly during continuous sync.
  </Accordion>
</AccordionGroup>

## Related

<CardGroup cols={2}>
  <Card title="Amazon RDS PostgreSQL" icon="arrow-right" horizontal href="/cdc/amazon-rds-postgresql" />

  <Card title="Aurora PostgreSQL" icon="arrow-right" horizontal href="/cdc/aurora-postgresql" />

  <Card title="Self-hosted PostgreSQL" icon="arrow-right" horizontal href="/cdc/self-hosted-postgresql" />

  <Card title="Google Cloud SQL for PostgreSQL" icon="arrow-right" horizontal href="/cdc/google-cloud-sql-for-posgresql" />

  <Card title="SSH Tunnel" icon="arrow-right" horizontal href="/cdc/ssh-tunnel" />

  <Card title="PostgreSQL PrivateLink" icon="arrow-right" horizontal href="/cdc/postgresql-privatelink-set-up" />
</CardGroup>