> ## Documentation Index
> Fetch the complete documentation index at: https://www.integrate.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# ETL: Amazon Redshift Source

> Configure the Amazon Redshift source component to read data from Redshift tables and views in your Integrate.io ETL data pipeline.

## Connection Setup

<Frame>
  <iframe className="w-full aspect-video rounded-xl" src="https://fast.wistia.com/embed/iframe/uakqa0l2qt" title="Connect Amazon Redshift" allow="autoplay; fullscreen" allowFullScreen />
</Frame>

Connect Integrate.io ETL to your [Redshift](https://www.integrate.io/blog/guide-to-amazon-redshift/) cluster in order to write data to it, or read data from it. Watch this short [video walkthrough](https://share.getcloudapp.com/p9uGqxDO) or read below for further instructions.

<Frame>
  <iframe className="w-full aspect-video rounded-xl" src="https://fast.wistia.com/embed/iframe/lzhxfigdvk" title="Allowing Xplenty access to your Redshift cluster" allow="autoplay; fullscreen" allowFullScreen />
</Frame>

## In Redshift console

<Steps>
  <Step>
    Start by navigating to the Redshift console --> **Clusters** screen and clicking on the **Properties** tab.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-118.jpg?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=d7bb0ff1a9b8d1fdeb268790003c5ce8" alt="Redshift cluster Properties tab in the AWS console" width="1417" height="761" data-path="images/connectivity-and-security/image-118.jpg" />
    </Frame>
  </Step>

  <Step>
    Scroll down to the Network and Security section. Make sure that the cluster is set with the value for **Publicly Accessible** to Yes. Then, click on the **VPC Security Group** to verify and/or modify the security rules.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-119.jpg?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=aacc0a4a91607b474675d004a2c33c38" alt="Network and Security section with Publicly Accessible setting and VPC Security Group" width="876" height="442" data-path="images/connectivity-and-security/image-119.jpg" />
    </Frame>
  </Step>

  <Step>
    In the Security Group screen, select the **Inbound** tab.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-120.jpg?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=32706baf60ba59e3efc5f9f651ede637" alt="Security Group Inbound rules tab" width="1407" height="710" data-path="images/connectivity-and-security/image-120.jpg" />
    </Frame>
  </Step>

  <Step>
    There should be rules for the IP addresses listed [here](/etl/integrateio-etls-ip-list/). If those rules need to be altered or don't exist, click Edit.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-121.jpg?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=b8fde7b675110bc63f4c38db3311e517" alt="Inbound rules with IP address entries for Integrate.io access" width="1174" height="416" data-path="images/connectivity-and-security/image-121.jpg" />
    </Frame>
  </Step>

  <Step>
    Edit any existing rules or click **Add Rule** to add a new rule. For each rule, select the type of database and enter the Redshift port. Then click **Save**.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-122.jpg?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=a9713ab5716686ebafbf150ee15feab6" alt="Adding a new inbound rule with database type and Redshift port" width="1120" height="453" data-path="images/connectivity-and-security/image-122.jpg" />
    </Frame>
  </Step>
</Steps>

**To configure your cluster to only accept SSL encrypted connections:**

First, access the parameter group and edit it to set require\_ssl to true.

<Frame>
  <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-123.webp?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=f3452dd56103d5e73145a31c3725ccf0" alt="Redshift parameter group with require_ssl set to true" width="1200" height="743" data-path="images/connectivity-and-security/image-123.webp" />
</Frame>

Then, Navigate to Config --> Workload management. If you created the cluster with a default parameter group, create a new parameter group and modify the cluster to associate to that parameter group. Click **Edit** on the cluster homepage, then go to **Database Configurations** to associate the parameter group with the current cluster.

<Frame>
  <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-124.webp?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=cc70dbd69d7bd47f68334c4c72f1264f" alt="Associating a parameter group with the Redshift cluster" width="1200" height="544" data-path="images/connectivity-and-security/image-124.webp" />
</Frame>

### To verify or modify the security rules of an instance on EC2-Classic (without VPC)

In your [Redshift Cluster](https://www.integrate.io/blog/data-warehousing-for-dummies-a-beginners-guide-to-setting-up-an-amazon-redshift-data-warehouse/) Security Group, modify a rule or add a new rule for each IP address listed [here](/etl/integrateio-etls-ip-list/):

<Frame>
  <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-125.png?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=2893914deae154891666615517123916" alt="EC2-Classic Security Group with CIDR/IP authorization fields" width="823" height="287" data-path="images/connectivity-and-security/image-125.png" />
</Frame>

* In the Connection Type dropbox, choose CIDR/IP.
* In the CIDR/IP to Authorize field, enter the IP addresses from [this list](/etl/integrateio-etls-ip-list/).

## Create a Redshift user

* [Create a Redshift user](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_USER.html).
* Grant it the following permissions:
  * If you intend to only append data into a table, give the user minimal permissions required to execute the COPY command.
  * If you intend to merge data into a table, give the user minimal permissions required to execute the COPY command, create a table, and insert and update to your target tables.
  * Note that truncate requires Integrate.io ETL to either be the owner of the target table or have superuser access.

## To define a connection in Integrate.io ETL to Amazon Redshift

<Steps>
  <Step>
    Click **Connections** (the lightning bolt) on the left side of the dashboard.
  </Step>

  <Step>
    Click **New connection**.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-126.webp?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=618fc4ceaa2bf27510d9ddf0b200233f" alt="Connections page with New connection button" width="1200" height="829" data-path="images/connectivity-and-security/image-126.webp" />
    </Frame>
  </Step>

  <Step>
    Select **Amazon Redshift** from the Analytical Database category.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-127.webp?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=ab4c9b6e413ef3ecfee14d9048ac465e" alt="Selecting Amazon Redshift from the Analytical Database category" width="1200" height="829" data-path="images/connectivity-and-security/image-127.webp" />
    </Frame>
  </Step>

  <Step>
    Type a name for the connection.
  </Step>

  <Step>
    If you allow direct access from Integrate.io ETL's IP addresses, enter the hostname and port. If direct access is not allowed, read more about setting a tunnel connection [here](/etl/allowing-integrateio-etl-access-to-my-server-behind-a-firewall/).
  </Step>

  <Step>
    Enter the user name and password you created for Integrate.io ETL to use.
  </Step>

  <Step>
    Enter the default database to use. If you leave it empty, the user's default database will be used.
  </Step>

  <Step>
    Set the region to the AWS region in which the Redshift cluster was created. If the region requires AWS Signature v4 (see list [here](https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region)) you may need our support team's help with allowing Integrate.io ETL access to read from this Amazon Redshift connection.
  </Step>

  <Step>
    Click **Test connection**. If the credentials are correct, a message that the cloud storage connection test was successful appears.
  </Step>

  <Step>
    Click **Create connection** and the connection is created and appears in the list of connections.

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-128.webp?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=9f96db9364a47827ebca744f3312a5ea" alt="Completed Redshift connection form with test confirmation" width="1200" height="829" data-path="images/connectivity-and-security/image-128.webp" />
    </Frame>

    <Frame>
      <img src="https://mintcdn.com/integrateio/XpeHmuMzUiEmfOEz/images/connectivity-and-security/image-129.webp?fit=max&auto=format&n=XpeHmuMzUiEmfOEz&q=85&s=1f6a2c7989b15df0d0ce05b63714be1c" alt="Newly created Redshift connection in the connections list" width="1200" height="829" data-path="images/connectivity-and-security/image-129.webp" />
    </Frame>
  </Step>
</Steps>

***

<Frame>
  <iframe className="w-full aspect-video rounded-xl" src="https://fast.wistia.net/embed/iframe/uakqa0l2qt" title="Using components: Amazon Redshift Source" allow="autoplay; fullscreen" allowFullScreen />
</Frame>

Use the Amazon Redshift source component to read data stored in an Amazon Redshift table, view or using a query. The source component uses Amazon Redshift's UNLOAD statement to pull data into files in Amazon S3 and then reads the files.

<Frame>
  <img src="https://mintcdn.com/integrateio/2ttHYDu3EKov-VoY/images/creating-packages/using-components-amazon-redshift-source/image-1.png?fit=max&auto=format&n=2ttHYDu3EKov-VoY&q=85&s=3ca2492f6a47f7da9e40fe59bb4d5dd1" alt="Amazon Redshift source component in the pipeline designer" width="1200" height="828" data-path="images/creating-packages/using-components-amazon-redshift-source/image-1.png" />
</Frame>

## Connection

Select an existing Amazon Redshift connection or create a new one (for more information, see [Allowing Integrate.io ETL access to my Redshift cluster](/etl/allowing-integrateio-etl-access-to-my-redshift-cluster/).)

## Source Properties

<Frame>
  <img src="https://mintcdn.com/integrateio/2ttHYDu3EKov-VoY/images/creating-packages/using-components-amazon-redshift-source/image-2.png?fit=max&auto=format&n=2ttHYDu3EKov-VoY&q=85&s=2b786fcbe7fa8f91967a310ea9f22833" alt="Redshift source properties with access mode, schema, table, and where clause" width="1200" height="828" data-path="images/creating-packages/using-components-amazon-redshift-source/image-2.png" />
</Frame>

* **Access mode** - select table to extract an entire table/view or query to execute a query.
* **Source schema** - the source table's schema. If empty, the default schema is used.
* **Source table/view** - the table or view name from which the data will be imported.
* **where clause** - optional. You can add predicates clauses to the WHERE clause as part of the SQL query that is built in order to get the data from the database. Make sure to skip the keyword WHERE.
  | Good | `prod_category = 1 AND prod_color = 'red'`       |
  | :--- | :----------------------------------------------- |
  | Bad  | `WHERE prod_category = 1 AND prod_color = 'red'` |
* **Query** - type in a SQL query. Make sure to name all columns uniquely.
* **Null string** - NULL values in string columns will be replaced with the string specified here. By default NULL values will appear like empty strings.

## Source Schema

<Frame>
  <img src="https://mintcdn.com/integrateio/2ttHYDu3EKov-VoY/images/creating-packages/using-components-amazon-redshift-source/image-3.png?fit=max&auto=format&n=2ttHYDu3EKov-VoY&q=85&s=ecb0f0b7c991fd15bf768718b3f4f659" alt="Source schema with field selection and data type mapping" width="1200" height="828" data-path="images/creating-packages/using-components-amazon-redshift-source/image-3.png" />
</Frame>

After defining the source table/view/query select the fields to use in the source.

With table access mode, the fields you select are used to build the query that will be executed to read the data.

With query access mode, select all the fields that are defined in the query and make sure to use the same column names

Define the data type for the field. Use the following table when matching Redshift data types to Integrate.io ETL data types.

| **Amazon Redshift**     | **Integrate.io ETL** |
| :---------------------- | :------------------- |
| varchar, nvarchar, text | String               |
| smallint, int           | Integer              |
| bigint                  | Long                 |
| decimal, real           | Float                |
| double precision        | Double               |
| timestamp, date         | DateTime             |
