Amazon Redshift is a powerful, managed data warehouse that enables organizations to analyze large amounts of data economically. As data sizes increase and queries become more complex, Redshift's performance and cost can be at risk. According to a survey conducted by TDWI in partnership with a cloud services company, one of the top reasons companies migrate to cloud data warehousing is to take advantage of their flexible costs. To get the most out of your implementation, it is essential to employ strategies for optimizing Redshift cost and performance, including optimizing redshift queries, cluster sizing, data loading and other practices. This guide will give you the proven optimized strategies to ensure optimal return on investment from your Redshift implementation - whether you're starting a new one or already have a cluster set up.
The Cost Optimization Processes
Before we can look at the strategies, we must look at the optimisation process. Part one of this blog outlined how to design a well-architected Redshift ecosystem. The natural question that follows is ‘How’ to optimise it as an enterprise grows.
The foremost thing to remember is that optimisation is a continuous process. For instance, the cluster can have an optimal balance between costs and performance today. As mentioned earlier, as the organisation grows, there could be many factors that disrupt this balance, including but not limited to,
- Increase in data volume
- Change in the Business demand
- Technology evolution
Either way, regularly revisiting the enterprise data warehouse set-up is crucial to an optimised system.
Let us now look at some practices that can make your Redshift ecosystem more efficient and cost-optimized. The optimisation strategies are discussed in terms of the complexity of the effort involved vs. the impact on performance. While some are straightforward, others require significant planning and may include architectural changes.
However, it is essential to remember that optimising for cost alone is not the goal. A performant system with strategic cost optimisations is the ideal outcome.
1. Using Reserved instances
AWS allows us to purchase reserve nodes for steady-state workloads by choosing the duration of commitment between 1 to 3 years. There is also the option of choosing the cost mode, which can be
- No upfront
- Partial Upfront
- Or All upfront
The underlying AWS cost principle is, “The more you reserve and the more you pay upfront, the higher the discounts.” Simply put, maximum discounts (up to 70 % compared to On-Demand instances) can be obtained when opting for 3-year reserved instances with all upfront payments.
This cost is ideal for enterprises with immense workloads. However, medium-scale establishments, or startups on the growth path, typically start with on-demand instances. Depending on the duration of use and demand, such smaller companies can periodically revisit their cost strategy and gradually move to reserved instances for fixed workloads while still on-demand instances for all other applications.
The other important point is that reserved instances are ideal when used for always-on instances. Here’s a real-life use case we’ve seen,
One of our clients spent around $110K for on-demand DS2.8XLarge, with two node instances for the past eight months. We saved 25% of their cost by moving to reserved instances with one year of commitment and no upfront charges. It is that simple!
2. Pause and Resume
Pause and resume is another simple yet effective approach to optimization.
Consider this scenario – one of our clients was using on-demand DC2. Large, 24*7 as part of their development instance. When we found that the CPU utilization dropped to 10% during non-business hours /weekends or holidays compared to 60% during business hours, we suggested halting these resources when not in use. Just enabling the ‘pause and resume’ option helped them save costs by 50%.
What is happening here? Why does pausing an instance correspond to cost savings? When on-demand instances are paused, only storage charges are applicable! You do not have to pay for computing when no computation is happening. Organizations using on-demand instances exclusively will stand to gain the most by using this option.
3. Elastic Resize
Organisations prefer a data warehouse that is faster to scale and does not want to compromise between performance and concurrency. The perfect solution to that is elastic resize. Elastic resize allows you to scale a cluster up or down within minutes.
For example, when faced with a need to handle a sudden surge of data on the TB scale in one of our client workloads, we simply used the elastic resize option! The nodes were scaled up instantly to increase computing and storage without planning for capacity.
Suppose organizations have planned their resources for their peak capacity of clusters by just enabling the flexible resize option. In that case, they can obtain significant cost savings, to the order of 15%.
However, you need to keep a few factors in mind while using the elastic resize option
You cannot scale more than double the number of existing nodes using elastic resize. You will need to use the classic resize option.
Scaling down instances to the previous size of clusters or less than half the number of clusters may or may not be possible depending on the storage requirement.
4. Concurrency Scaling
Another interesting feature that impacts Redshift performance is Concurrency Scaling, which is enabled at the workload management (WLM) queue level. The WLM allows users to manage priorities within workloads in a flexible manner. With the help of this feature, short, fast-running queries can be moved to the top of long-running queues.
The concurrency redshift scaling feature automatically spins up transient clusters, serves the requests in the queue, and scales down the clusters. You can support virtually unlimited concurrent users and queries without compromising redshift query performance. We must know here that Concurrency scaling is used only for reporting purposes.
For every 24 hours the cluster is in use; the account is credited with 1 hour of free concurrency scaling by AWS. These free concurrency hours can be accumulated and used effectively during predictable workloads. On exceeding the accumulated hours, concurrency scaling is billed at per second cost of cluster price
5. Using the latest RA3 instances
Recently, one of the most welcome Amazon Redshift features has been the introduction of RA3 instances. RA3 instances allow us to scale compute and storage independently, which means that the pricing is also loosely coupled with compute and storage.
In terms of benefits, we have seen up to 200% improvement in performance during the benchmarking exercise, which we recently conducted between RA3 vs. DS2.Xlarge. Since RA3 nodes are based on a large cache capacity, with high-performance SSD backed by S3, it is good to validate your instances to see if the RA3 can better fit your storage and computing needs.
6. Right-sizing
Currently, Redshift provides 3 flavors of instances,
- Dense Compute – ideal for high reporting and minimal storage demands
- Dense storage – used for high storage
- RA3 instances
Choosing the optimal size of production instances helps lower the cost between 15-20%. It is always good to start sizing based on workloads depending on CPU, I/O, disk, and network requirements.
7. Resolving long-running queries
The last optimisation strategy is the most complex of the lot because it impacts the current design & architecture of the existing system.
The most common issues organisations face today are long-running queries and the inability to achieve parallelism. The underlying factors that cause this are:
- Data skew across nodes.
- A high amount of I/O
- Data not being compressed optimally
To resolve this, we need to look at the key strategies to improve performance and pay attention to distribution keys, sort keys, and encoding aspects.
- Use proper distribution keys, which help minimise data movement across nodes.
- AUTO distribution can be considered for tables with less than 5 Million records.
- For Optimal performance, choose columns used in Joins/filters as the sort key.
- Be cautious in using Interleaved Sort Keys, as it will add more overhead.
A new encoding type AZ64 has been included. AZ64 is Amazon’s proprietary compression encoding algorithm that targets high compression ratios and better processing of queries. Using the AZ64, we see close to 30% storage benefits and a 50% increase in performance compared with LZO and ZSTD coding methods.
Along with these best practices, we can also look at the benefits of Redshift materialised views to improve long-running queries.
Redshift Best Practices For Performance Optimization
Here are some best practices for Redshift performance optimization, including:
1. Choose The Correct Data Distribution Key And Style
In Amazon Redshift, selecting the optimal distribution style is essential for optimizing query performance. When you load data into a table, the rows of the table will be redistributed across compute nodes to adhere to the chosen distribution style. The goal is simple: to help Retrieve data quickly and avoid unnecessary redistribution steps.
Below are 4 common distribution styles used in Amazon Redshift:
- Key Distribution: This data distribution strategy is used in a Redshift cluster; it ensures that tables are correctly addressed by joining queries, making the process more efficient. If you aim to optimize queries with major dimension tables, this option may be best for you; if not, consider one of the other options.
- Even Distribution: A straightforward approach to data distribution, leader nodes feed data into other nodes on a round-robin basis. When no joins are involved, this style is often preferable.
- All Distribution: All table elements have their local copies at each computing node, which makes redshift query optimization operations faster but tends to occupy more storage on the Redshift cluster unless the tables are small and unchanging. This can be advantageous when those conditions are met.
- Auto Distribution: Auto Distribution is a default style in AWS Redshift that dynamically evaluates the size of your tables and chooses an appropriate type for improved performance. This approach offers scalability, as when your data grows over time, Redshift will automatically reassess and adjust to ensure optimal performance.
2. Select The Correct Column Compression
When working with Amazon Redshift, it's important to remember that data blocks store the value of a single column for multiple rows. Column compression is recommended because it saves on storage and disk I/O utilization. Creating a table by properly setting the column compression strategy may prevent data from being stored uncompressed, increasing table size. To correct this issue, search for columns that aren't being compressed and use ALTER statement instructions to modify their respective column compressions accordingly.
"ALTER TABLE tablename ALTER COLUMN column_name ENCODE new_encode_type"
3. Archive Unused Data
As data within the warehouse increases, managing older datasets effectively is important. In off-peak hours, you can create an archival framework that unloads and deletes these old datasets from Redshift and places them into Amazon S3. The framework follows a three-step process:
Step 1: Analyze and gather information on data that can be archived is conducted. Info, such as the table name, filter condition, etc., is put into a config table (if necessary).
Step 2: Utilizing the Unload command, relevant data from Redshift is sent to S3 via SQL select statement generation.
Step 3: Data from the Redshift Table is deleted using the same filter condition mentioned earlier. Vacuum Delete is then used to reclaim storage space accordingly.
For added convenience, we suggest enabling Glue Crawlers and creating an external schema in Redshift should users require access to said datasets again.
4. Use Spectrum For Infrequently Used Data
Amazon Spectrum is an effective way to store infrequently used data sets. You can use Spectrum to store third-party data users do not often access. Every time a user requests information from S3, Spectrum charges, so it is important to compress and store the data in Parquet format. These are the steps you can follow when using Amazon Spectrum:
1. Compress data and store it in S3 with a Parquet format; each table should be stored in its folder.
2. Create a Data Catalog database with AWS Glue and schedule a crawler; point the crawled path towards the S3 folders.
3. Utilize the external schema from the Data Catalog in Redshift for viewing information pulled from S3 into your database tables.
5. Take Frequent Action on Redshift Advisor
It is advisable to regularly inspect the Redshift Advisor to get relevant recommendations based on the performance and usage of your cluster. You will find several suggestions offered by the Amazon Redshift Advisor, such as utilizing Distribution Keys to improve query performance, running VACUUM SORT on tables, optimizing storage with compression encoding to boost query performance, and initializing Interleaved Sort Key.
Bonus: Leverage Auto Management along with optimisation strategies
Having discussed optimization strategies, we have an additional tip. Leveraging Redshift’s Auto workload management (WLM) feature and the above optimisation strategies leads to a more effective system. Redshift is gradually working towards Auto Management, where machine learning manages your workload dynamically.
Auto WLM involves applying machine learning techniques to manage memory and concurrency, thus helping maximize query throughput. Through WLM, Redshift manages memory and CPU utilisation based on usage patterns. Auto WLM lets you prioritize your queries, ensuring that high-priority queries execute first as queries are continually submitted.
To leverage Auto Management,
- Ensure that AUTO REDSHIFT ANALYZE, AUTO SORT, and AUTO VACUUM are enabled.
- If you use interleaved sort keys, you must run VACUUM REINDEX.
Nevertheless, if you manage your workloads manually, some of the best practices are as follows.
- Do not create more than four queues
- Use QWR to monitor performance from bad queries
- Do not have more than 15 concurrent users
- It is always a good practice to leave at least 5% of memory unallocated.
Conclusion
With an ongoing, well-thought-out optimization plan, an organization should and can keep its Redshift DW performant and its costs down even as the DW evolves. By choosing Agilisium’s Redshift Optimization Program, organizations can take a holistic look at their Redshift workloads to identify challenges and address them thoroughly and rapidly through a phased approach, extracting more value from AWS Redshift in the process.
That’s not all; maintaining the balance between cost, agility, and performance is built into the program allowing organizations to reap continual benefits.