> ## Documentation Index
> Fetch the complete documentation index at: https://www.integrate.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# PostgreSQL source for ELT & CDC

> How to set up and configure PostgreSQL as a CDC source in Integrate.io ELT & CDC, including supported providers, WAL requirements, and heartbeat setup.

| **Description**           | The World's Most Advanced Open Source Relational Database |
| :------------------------ | :-------------------------------------------------------- |
| **Type**                  | Write-ahead log (WAL) Replication                         |
| **Supported Replication** | Initial Sync  Continuous Sync                             |
| **Authentication Type**   | Password Authentication                                   |

## Setting up PostgreSQL CDC for ELT & CDC

ELT & CDC uses [WAL based logical replication](https://www.postgresql.org/docs/10/logical-replication.html) - `pgoutput`decoding for PostgreSQL database source and requires a user with `LOGIN` and `REPLICATION` role. It also requires the user to have CREATE permission on the schema and database to be synced.

### Supported providers

* [Amazon RDS PostgreSQL](/cdc/amazon-rds-postgresql)
* [Amazon Aurora PostgreSQL](/cdc/aurora-postgresql)
* [Azure Single Node PostgreSQL](/cdc/azure-postgresql)
* [Self-hosted (generic) PostgreSQL](/cdc/self-hosted-postgresql)
* [Google Cloud SQL for PosgreSQL](/cdc/google-cloud-sql-for-posgresql)

### Requirements

* PostgreSQL  versions 10.x or above.
* Database user with CREATE permission on the schema and database to be synced.
* Tables with PRIMARY KEY .
* Sync user should be the OWNER of the tables.
* The ownership requirement is for adding the tables to the PUBLICATION that we create in order for receiving replication events from the database. For more info, click [here](https://www.postgresql.org/docs/10/sql-alterpublication.html).
* Master instances of Postgres (log based replication only works for master instances).

### Features

| Feature                | Supported | Notes                                |
| :--------------------- | :-------- | :----------------------------------- |
| Full (Historical) sync | Yes       |                                      |
| Incremental sync       | Yes       |                                      |
| Replicate DELETE       | Yes       |                                      |
| UPSERT                 | Yes       |                                      |
| Append only mode       | Yes       | Can be specified at a table level    |
| Exclude tables         | Yes       |                                      |
| Exclude columns        | Yes       |                                      |
| SSL Support            | Yes       |                                      |
| SSH tunnel             | Yes       | [SSH Tunnel Guide](/cdc/ssh-tunnel/) |

### Heartbeat

PostgreSQL uses the `confirmed_flush_lsn` to determine which WAL segments can be safely removed, so if it's not updated because the tables in your publication aren't changing, the WAL storage can indeed grow significantly.

To ensure that the `confirmed_flush_lsn` advances even when there are no transactions for the selected tables, we created a heartbeat table, insert a single record and update it every 10 minutes while the pipeline is running.

```sql theme={null}
create table if not exists <pipeline schema>.integrateio_heartbeat ( last_heartbeat timestamptz primary key);
```

This table will not be replicated to your destination and the updates will not count towards your usage.

### Frequently Asked Questions (FAQs)

<AccordionGroup>
  <Accordion title="How do you handle Postgres replication slots?">
    We create a replication slot when the initial sync starts. After the initial sync completes, we start listening for changes from it. Once a pipeline is deleted or archived by you, we try to remove the replication slot from your Postgres DB provided that we can still access it.
  </Accordion>

  <Accordion title="How do you handle large JSONB objects?">
    Large JSONB objects are mapped as `TEXT`/`VARCHAR`/`STRING` columns and might be truncated (depending on destination). Note about the `TOAST` table - if there are tables which use the toast tables (usually tables with huge `JSONB` or `TEXT` columns) then their `REPLICA IDENTITY` ([https://www.postgresql.org/docs/10/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY](https://www.postgresql.org/docs/10/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY)) needs to be set to FULL for us to properly sync the data in case of continuous sync.
  </Accordion>
</AccordionGroup>

## Related

<CardGroup cols={2}>
  <Card title="Amazon RDS PostgreSQL" icon="arrow-right" href="/cdc/amazon-rds-postgresql" horizontal />

  <Card title="Aurora PostgreSQL" icon="arrow-right" href="/cdc/aurora-postgresql" horizontal />

  <Card title="Self-hosted PostgreSQL" icon="arrow-right" href="/cdc/self-hosted-postgresql" horizontal />

  <Card title="Google Cloud SQL for PostgreSQL" icon="arrow-right" href="/cdc/google-cloud-sql-for-posgresql" horizontal />

  <Card title="SSH Tunnel" icon="arrow-right" href="/cdc/ssh-tunnel" horizontal />

  <Card title="PostgreSQL PrivateLink" icon="arrow-right" href="/cdc/postgresql-privatelink-set-up" horizontal />
</CardGroup>
