What Is AWS Redshift?

AWS Redshift is a cloud data warehouse from Amazon Web Services and forms a part of their widely used cloud-based computing platform. Data warehouses are storage solutions for large amounts of data. Data can come from any number of sources such as apps or a data lake – and for modern businesses, those sources are increasing all the time. Data warehouses store this data so that businesses can analyze and utilize it for strategic insights. AWS Redshift allows companies to store a large amount of data on the cloud, in a format called “clusters” that allows for parallel queries and speedy access.

History of AWS Redshift

Redshift is a physics term: the increase in wavelength of light and other electromagnetic radiation. It’s linked to Hubble’s Law and the Doppler Effect – and, shockingly, has nothing to do with Amazon Redshift. Amazon actually named their world-beating data warehouse Redshift in a slightly snarky attempt to distance themselves from conventional database provider, Oracle. According to respected ezine AWS for Business, Redshift literally refers to shifting away from Oracle and their bold, red logo.

The ability to process data in parallel builds on the Massive Parallel Processing (MPP) capabilities of ParAccel – a database that Amazon was a key investor in. The idea was to create something more than a database – Redshift not only stores data but allows analysis of huge amounts of data in parallel, including multiple, simultaneous queries, sometimes hundreds of millions in a few seconds.

Underneath the AWS branding is a revamped version of PostgreSQL, the open-source relational database management system (RDBMS). This means Redshift can handle connections to any other SQL-based application or business intelligence tool.

Amazon launched Redshift in 2012, and the hype at the time was about how much less it would cost businesses to run huge amounts of big data. At the time, the media reported that dealing with a terabyte of data a year could drop in price from $25,000 to $1000, with no surprise charges once a package was agreed. Scalability was also a key selling point, with businesses able to add or remove nodes on-demand depending on whether they were dealing with gigabytes or data at a petabyte-scale.

Today, Amazon estimates Redshift has over 15,000 users, including brands such as McDonald's, Pfizer, and Lyft.

AWS Redshift – The Benefits

One of the primary reasons Amazon Redshift is so widely used by large enterprises is the sheer volume of data it can handle and the fast query performance. Redshift can deal with petabytes of data at a time. A petabyte is 1024 terabytes or approximately a quadrillion bytes. Redshift prioritizes columnar storage, which is highly efficient and more cost-effective in most use cases.

Redshift is also a fully managed data warehouse service, with plenty of automation for processes like backups, configuration, and security updates.

The Redshift cluster format allows data to be “sliced” and divided into ever more granular data sets for deeper insights. This also allows multiple users to access data from different parts of the website, and work on that data simultaneously. Having computed nodes in clusters also means AWS Redshift is completely scalable. When your workloads increase, simply add more nodes. Because the clusters work concurrently with the option for concurrency scaling, adding more does not reduce the speed at which data analysis takes place. Machine learning capabilities allow Redshift to improve query performance by retaining information about common queries, so in essence, this data warehouse service is constantly updating.

AWS Redshift also, of course, connects easily to other AWS services like Amazon S3 or DynamoDB, allowing organizations that already use Amazon’s cloud computing platform to add Redshift to their data management suite with very little difficulty. Administrators can manage Redshift clusters in security groups to restrict specific users' access to the data, even if they have access to other services via their AWS account. Redshift also connects to Amazon EMR, often used to manage big data services like Apache Hadoop or Spark.

Organizations that utilize Redshift for their data warehousing needs can also connect any number of tools. These include data analytics, data pipelines from databases or traditional data storage solutions, third-party apps and SaaS, or ETL solutions designed to collate and channel all a business’s data into one convenient destination.

App developers can get access to the AWS Redshift API. There is also an AWS console or the AWS CLI (command line interface), providing multiple ways to interact with Redshift and your business data.

AWS Redshift Use Cases

It’s clear that with Amazon Redshift, it’s easy to store your business data. But what do businesses actually use Redshift for?

The primary business use case is data collation. This might sound basic, but companies have needed to collect and store data safely for many years. They also have to meet extremely stringent data protection regulations in some cases, such as the GDPR. While Redshift doesn’t automatically make a business data compliant, it helps by ensuring that the company can store its data consistently and utilize similar protocols across all data clusters. Redshift also allows administrators to manually adjust their identity and access management (IAM) parameters for security optimization. Redshift also utilizes encryption like SSL for dealing with SQL queries for added security. Users can also use the deployment of clusters in a virtual private cloud (VPC) as required. 

Another critical business use case is for gaining deep insights using granular data. Because Redshift can deal with incredibly huge amounts of data at a time, it allows the storage of raw, unaggregated data. The less condensed or grouped the data storage is, the higher level of detail that Business Intelligence (BI) tools can gain from the data.

Redshift’s ability to handle large volumes of data also makes it ideal for corporations that, by their nature, deal with continuous, complex queries. Yet it can also deal with huge volumes of relatively simple queries, but many of them. An example is Nasdaq, the financial and tech corporation and owner of the Nasdaq Stock Exchange. Nasdaq receives billions of records such as quotes, orders, trades, or trade cancellations, which all have to be accurately represented on the stock exchange. Because the market is always growing, there’s no option to not have a scalable solution. Nasdaq’s VP of software engineering reported that by using AWS Redshift and other AWS services, the company made a jump from dealing with 30 billion records to 70 billion every day.

Another reason businesses move to AWS Redshift is to cut costs. Maintaining an onsite or offsite physical data warehouse is costly and requires a whole team to manage it and maintain it. Redshift is cloud-based, so it doesn't take up any room and doesn't add strain on internal systems. It’s managed, so there’s no IT maintenance to worry about.

Integrate.io and AWS Redshift

It’s important to note that while AWS Redshift is an excellent data warehouse, your data integrity is only as good as the pipelines created between data sources and the data warehouse. Manually creating individual connections and managing them in-house is not only exhausting, it’s untenable as your business grows.

Integrate.io provides a scalable cloud-based ETL solution. ETL stands for Extract, Transform and Load. It refers to the process of connecting all your data from a variety of sources, including business-critical SaaS and even other storage solutions. Integrate.io’s integrations include:

  • Box
  • Eventbrite
  • GitHub
  • Google My Business
  • HubSpot
  • MailChimp
  • Salesforce
  • Shopify

...and over a hundred more, with more added regularly. Integrate.io takes AWS Redshift and turns it into a one-stop repository for every scrap of business data, making it an even more effective data warehouse to link to your business intelligence tools. Users can create a connection to Redshift with ease, using the intuitive graphical interface. You simply need a few details about your data clusters, which are easily obtainable from the Amazon Redshift console.

With Integrate.io, you can work data both ways from Redshift – either moving it out for additional processing or linking all your critical business data pipelines into your super-fast cloud-based data warehouse for effective aggregation.

If you would like to know more about how Integrate.io and Amazon Redshift work seamlessly together, get in touch and ask a member of our friendly team about a 7-day demo of the Integrate.io ETL product.

Glossary of Terms

A guide to the nomenclature of data integration technology.