Introduction

With the release of Query Priorities and Automatic WLM, the AWS team has given Amazon Redshift administrators a powerful new way to define and prioritize data workloads. Without using Workload Management (WLM), each query gets equal priority. The result is that some workloads may end up using excessive cluster resources and block business-critical processes.

We first covered Amazon Redshift WLM on our blog before Query Priorities were released, and found that the benefits of Automatic WLM’s ability to dynamically allocate memory for each query didn’t fully outweigh the limitations imposed by not being able to control which workloads should be prioritized.

The launch of Query Priorities for Automatic WLM addresses this main limitation head-on, and makes the process of defining your Redshift WLM more directly align with the process of defining your business data SLAs and objectives. In this post we’ll cover the Query Priority feature in detail as we walk through 3 simple steps for setting up Amazon Redshift WLM with Query Priorities.

Understanding Amazon Redshift Automatic WLM and Query Priorities

The idea behind Auto WLM is simple: rather than having to decide up front how to allocate cluster resources (i.e. concurrency and memory) to queries, Auto WLM allocates resources dynamically for each query it processes. In principle, this means that a small query will get a small amount of memory, and a large query will get a large amount of memory. Since Redshift is able to give each query the resources it needs (and ideally no more), it is better able to make the most efficient use of the cluster, and to prevent queries from going to disk. However, being able to allocate more memory to big queries comes at the expense of increased queue time while other queries wait for those resources to be freed up (see our previous post for more details).

Query Priority is a new optional feature of Auto WLM that you can (and should) enable if you’re using Auto WLM. Query Priority lets you create one or more Auto WLM query queues and assign a relative priority to each queue, which can be one of:

  • Highest
  • High
  • Normal
  • Low
  • Lowest

(There is also a Critical priority which only superusers can use to run one query at a time at a priority higher than any other query in the system).

Redshift uses these query priorities in three ways:

  1. When queries are submitted to the cluster, Redshift uses the priority to decide which queries should run and which should queue. Queries in lower priority queues will still run, but will queue longer on average than queries in higher priority queues.
  2. When queries run, higher priority queries are allocated more resources (i.e. memory) than lower priority queries.
  3. Queries in lower priority queues may get aborted more often to make room for queries in higher priority queues.

As with Manual WLM, queries are routed to Query Priority queues in one of two ways: either by database user group, or by “query group” (a label you can set in the database session using the SET query_group TO statement).

Three steps to Amazon Redshift Automatic WLM with Query Priority

The first two steps of configuring Auto WLM with Query Priority are the same as the first two steps we recommend when configuring your Manual WLM queues: setup individual users and define your workloads.

1. Setup individual users

As we discuss in detail in our Manual WLM configuration post, it’s important to create an individual database user for each application, person, or process that submits queries to your Redshift cluster. Sharing database user accounts between applications or users makes it impossible to isolate workloads or properly track performance of your queries and cluster (not to mention that it also has security implications). So start by making sure you have an individual user account created for each source of queries in your system.

2. Define your workloads and group your users

Once you have created or enumerated all of your individual database users, the next step is to group them into database workloads. For example, you might identify workloads like:

  • Loads: Data loaded into or unloaded from your cluster. These are COPY and UNLOAD statements.
  • Transforms: Batch jobs and scheduled transformations. INSERT, UPDATE and DELETE transactions
  • Ad-hoc: These are queries by analysts / dashboards. SELECT statements.

You may also decide to create more granular breakdown of workloads. But ultimately each workload should represent a single process and have a definable performance requirement (i.e. data SLA), even if your performance requirements are approximate (e.g. “I want our dashboard queries to be fast, but our transformations can be slower”). Create a database user group for each workload and add the corresponding users to the user groups. You may have one user per group, or hundreds.

3. Rank your workloads and assign priorities

Next, rank your workloads by priority. For example, you may decide that serving your ad-hoc queries quickly is the most important, followed by the execution of your transforms, followed by loading new data. In this case you might end up with:

Rank Workload User groups
1 Looker dashboards dashboards
2 Ad-hoc queries analysts
3 Transforms transforms
4 Data loads and unloads data_users

Finally, assign a priority to each rank from Highest to Lowest by deciding if it is acceptable for a workload to be delayed if it means the workloads with higher priorities will run more quickly. If two or more workloads have the same impact on your data pipeline and SLAs, give them the same priority.

Rank Workload User groups Priority
1 Looker dashboards dashboards Highest
2 Ad-hoc queries analysts, developers Highest
3 Transforms transforms High
4 Data loads and unloads data_users Normal

We recommend giving your default queue the priority of Lowest–this forces you to be explicit about your workload priorities and prevents any overlooked workload from bringing down the performance of your cluster.

Bonus step: Enable Concurrency Scaling

Finally, decide which, if any, of your workloads may need additional capacity to keep queue times down during periods of increased read-only queries, and enable Concurrency Scaling for those workloads (you can read more about it in our Concurrency Scaling blog post). Typical workflow candidates would be ad-hoc, dashboard, or reporting queries, where your query volume may fluctuate and increased queue times will directly impact your users.

Keep in mind that queries will only be routed to concurrency-scaling clusters if:

  • Queries are read-only
  • Queries don’t reference tables that use an interleaved sort key
  • Queries don’t reference user-defined temporary tables

Also keep in mind that AWS charges for capacity used for concurrency scaling, so you should only enable it if the increased performance of the workload justifies the cost.

Conclusion

In this post we’ve given guidelines for how to configure Amazon Redshift WLM with Query Priority queues. In our next post we’re going to show some real-world examples and performance results for Automatic WLM with Query Priority in action, so stay tuned!