How to Set Up Amazon Redshift on AWS

Table of Contents

Dashboard - View the dashboard containing metrics for the clusters
Clusters - Create clusters, view list of clusters, and perform cluster-related actions
Queries - A list of recent queries and the SQL text for each. This option is useful for troubleshooting problem queries
Editor - Run database queries
Datashares - Authorize accounts to access datashares
Config - Configure connections to Redshift from SQL client tools. Here is where you configure JDBC and ODBC options.
Marketplace - Provides information and other tools or Amazon Web Services to complement your Redshift clusters
Advisor - Provides recommendations for configuration changes to help prioritize your optimization efforts
Alarms - Create alarms on clusters to notify you of performance issues or to track metrics over a specified period
Events - Track events and get reports on events such as date event occurred, description, or event source
What's new - Shows information about new Amazon Redshift features and product updates

In a digitally powered economy, having access to data can help companies navigate market changes, perform customer analytics and adjust their strategy to meet demand. Unfortunately, most of the data company’s generate are unstructured and siloed across various departments in the organization. According to Forbes, 95 percent of businesses cite the need to manage unstructured data as a problem for their business. Amazon Redshift is a powerful data warehousing solution that stands out among competitors. The platform helps companies bring together their data in meaningful ways to support informed decision-making. Here is a brief tutorial on how to set up Amazon Redshift for your organization.

Amazon Redshift Key Concepts

Amazon Redshift is a petabyte-scale data warehouse service based on PostgreSQL hosted in the AWS cloud. It is a suite of computing resources called nodes and clusters. Each cluster runs one or more databases. The platform supports a variety of client connections to on-premise and cloud systems. These tools support many business use cases including business intelligence (BR), reporting, big data, and analytics. A notable success story for the platform is iHeart Radio. The company leveraged Redshift to streamline its ETL workflow for over 130 data flows. Before we get started, let’s review a few concepts about the platform.

Connections

The platform supports client application connections via a variety of industry-standard connection methods. These methods include JDBC and ODBC drivers for PostgreSQL.

Compute Nodes

A node represents an individual compute resource that has a dedicated CPU, memory, and disk storage. AWS offers concurrency scaling to accommodate increased resource usage.

Leader Node

When there is more than one node, the system appoints one as a leader node. The leader node manages and coordinates with the worker nodes to carry out database operations. Once it receives a query, the leader compiles the code and distributes the code to worker nodes. Each worker node carries out a portion of the work to return results for the query.

Node Slices

Nodes are partitioned into what are known as slices. Each slice is allocated a portion of the node’s memory and disk space to accomplish the work assigned to it. Each slice on the node works in parallel to execute the operation.

Clusters

A cluster in Redshift is a collection of one or more nodes. Each cluster runs the Redshift engine and hosts one or more databases.

Internal Network

Redshift provides high-performance workload management by taking advantage of proximity and high-bandwidth connections over a private network. This reduces latency and allows the leader and worker nodes to communicate directly over an isolated network.

Databases

Each cluster houses one or more databases. The databases are stored on compute nodes.

Prerequisites for Setting Up Amazon Redshift on AWS

Before getting started, there are a couple of things you will need to do. The first step is signing up for AWS and second configuring your firewall.

Signup for Amazon Web Services

The first step is creating an AWS account by using this link.
Follow the on-screen instructions to complete registration.

Configure Firewall

The Redshift data warehouse is best set up with access through port 5439 on your firewall. If your client computer is behind a firewall, be sure this port is configured to allow access. You can allow access through another port if necessary. However, you won’t be able to change it for your cluster after it is created.

Setting Up Amazon Redshift on AWS

AWS Redshift can transfer datasets from a variety of data sources using some of the options noted below. However, for this tutorial, we’ll be showing step-by-step how to create an Amazon Redshift cluster to load data from an AWS S3 bucket.

Getting Familiar With the AWS Console

Upon logging into the Redshift console, you will see a dashboard that contains an overview of your clusters. It shows the total number of nodes, number of automated snapshots, clusters with their status. The navigation pane contains options for performing the necessary operations on your cluster. The navigation pane consists of:

Create a Cluster

The first step in getting set up is creating a cluster to store your databases.

Log in to the management console
Navigate to the Amazon Redshift console
On the upper right of the screen, select a region to create the cluster
Select the Clusters option from the left-hand navigation menu
Select Create Cluster
On the Cluster configuration page enter values for Cluster identifier, Node type, Nodes, and how you plan to use the cluster (Free Trial or Production).
In the database configuration section, specify the Admin user name and the Admin user password. The Database name and Database port are optional at this stage.

Create an IAM Role

An IAM role is required for any AWS service that needs to connect to your clusters. This role needs the appropriate permissions to access the resource. The new role that you create will be attached to your cluster.

Navigate to the IAM console
In the navigation pane, choose Role
From there choose Create Role
In the AWS Service group, select Redshift
Select Redshift-Customizable for Use Case
Select Next
Select Permissions
For Attach permission use case, select Customizable, then click Next: Permissions
Choose AmazonS3ReadOnlyAccess
Click Next: Tags
You can choose to enter tags that help add further details to the role
Click Next: Review
In the Role name field, enter a name for your role
Click Create Role

Grant Access to Query Editors

There are two methods for querying your Redshift database: one of the AWS-provided query editors or a SQL client tool. Using the query editor is the easiest way to get started with executing commands on the database. This tutorial will focus on using a query editor. The next step is attaching the appropriate policies to the IAM user that is used to access your cluster.

Navigate to the IAM console
Select Users
Select the user that needs access to the query editor and choose Add permissions
In the Policy Name field select: AmazonRedshiftQueryEditor and AmazonRedshiftReadOnlyAccess
Click next to review your configuration
Select Add Permissions

Create a Database

To create a database perform the following steps:

Open the query editor
Create your database using standard SQL (example: CREATE DATABASE SALESDB;)
From here, you can use the COPY command to load sample data or load data from Amazon S3.

How Integrate.io Can Help

If you are looking to use Redshift as a part of your ETL inbound data process, Integrate.io can help. As a low-code development platform with a suite of over 100 prebuilt API integrations, companies can create pipelines that access data from a variety of sources. The tool features a user-friendly drag-n-drop interface that doesn’t require technical knowledge to use. It empowers business users to use the tool to get the information they need without relying on IT resources. To get started, reach out to us today to schedule a 7-day demo or pilot.

Big Data

How to Set Up Amazon Redshift on AWS

Amazon Redshift Key Concepts

Connections

Compute Nodes

Leader Node

Node Slices

Clusters

Internal Network

Databases

Prerequisites for Setting Up Amazon Redshift on AWS

Signup for Amazon Web Services

Configure Firewall

Setting Up Amazon Redshift on AWS

Getting Familiar With the AWS Console

Create a Cluster

Create an IAM Role

Grant Access to Query Editors

Create a Database

How Integrate.io Can Help

Mastering SQL Queries in Excel

ETL Developer vs Data Engineer: Key Differences

How Enterprise Automation Transforms Workflows

Solutions

Support

Company

Language

How to Set Up Amazon Redshift on AWS

Amazon Redshift Key Concepts

Connections

Compute Nodes

Leader Node

Node Slices

Clusters

Internal Network

Databases

The Unified Stack for Modern Data Teams

Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer

Prerequisites for Setting Up Amazon Redshift on AWS

Signup for Amazon Web Services

Configure Firewall

Setting Up Amazon Redshift on AWS

Getting Familiar With the AWS Console

Create a Cluster

Create an IAM Role

Grant Access to Query Editors

Create a Database

How Integrate.io Can Help

Related Readings

Mastering SQL Queries in Excel

ETL Developer vs Data Engineer: Key Differences

How Enterprise Automation Transforms Workflows

Subscribe To The Stack Newsletter

Solutions

Support

Company

Language

Subscribe To
The Stack Newsletter