Adding compression to large, uncompressed columns will have a big impact on cluster performance. Compression accomplishes two things:
- Reduce storage utilization. Because file compression reduces the size footprint of data, you’ll use less of the disk on your cluster nodes.
- Improve query performance. Because there is less data to scan or join on, I/O usage is limited which increases query speeds.
We recommend using the Zstandard (ZSTD) encoding algorithm. This relatively new algorithm provides a high compression ratio and works across all Amazon Redshift data types. ZSTD is especially good with VARCHAR and CHAR fields that have a mixture of long and short strings. Also, unlike some of the other algorithms, ZSTD is unlikely to increase storage utilization,
Below is a real-world example of applying ZSTD to three Amazon Redshift logging tables. The average storage reduction is over 50%!