Proven strategies to optimize Amazon Redshift for cost and performance
According to a survey conducted by TDWI in partnership with a cloud services company, one of the top reasons that companies migrate to cloud data warehousing is to take advantage of the flexible costs they offer. However, cost optimization is the most common challenge enterprises face as changing workloads affect the cost and performance of even the most well-built data warehouse. In this blog, let’s talk about proven optimization strategies that can help enterprises get the most out of their Amazon Redshift investment.
The Optimization Processes
Before we can look at the strategies, we need to look at the optimization process. In part one of this blog, we laid out how to design a well-architected Redshift ecosystem. The natural question that follows is ‘How’ to optimize it as an enterprise grows.
The foremost thing to remember is that optimization is a continuous process. For instance, today, the cluster can have optimal balance between costs and performance. As mentioned earlier, as the organization grows, there could be many factors that disrupt this balance, including but not limited to,
- Increase in data volume
- Change in the Business demand
- Technology evolution
Either way, regularly revisiting enterprise data warehouse set up is crucial to an optimized system.
Let us now look at some of the practices that can make your Redshift ecosystem more efficient and cost-optimized. The optimization strategies are discussed in terms of the complexity of effort involved vs. the impact on performance. While some of them are easy and straightforward, others require significant planning and may include architecture changes.
However, it is essential to remember that optimizing for cost alone is not the goal. A performant system with strategic cost optimizations is the ideal outcome.
1. Using Reserved instances
AWS allows us to purchase reserve nodes for steady-state workloads by choosing the duration of commitment between 1 to 3 years. There is also the option of choosing the cost mode, which can be
- No upfront
- Partial Upfront
- Or All upfront
The underlying AWS cost principle here simply is,“The more you reserve and the more you pay upfront, the higher the discounts.” Simply put, maximum discounts (up to 70 % compared to On-Demand instances) can be obtained when opting for 3-year reserved instances with all upfront payment.
This costing is ideal for enterprises with immense workloads. However, medium-scale establishments, or startups on the path of growth, typically start with on-demand instances. Depending on the duration of use and demand, such smaller companies can periodically revisit their cost strategy and make a gradual move to reserved instances for fixed workloads, while still on-demand instances for all other applications.
The other important point to note here is that reserved instances are ideal when used for always-on instances. Here’s a real-life use case we’ve seen,
One of our clients spent around $110K for on-demand DS2.8XLarge, with two node instances for the past eight months. We saved 25% of their cost merely by moving to reserved instances with one year of commitment and no upfront charges. It is that simple!
2. Pause and Resume
Pause and resume is another simple yet effective approach to optimization.
Consider this scenario – one of our clients was using on-demand DC2.Large, 24*7 as part of their development instance. When we found that the CPU utilization dropped to 10% during non-business hours /weekends or holidays compared to 60% during business hours, we suggested halting these resources when not in use. Just enabling the ‘pause and resume’ option helped them save cost by 50%.
What is happening here? Why does pausing an instance correspond to cost savings? When on-demand instances are paused, only storage charges are applicable! You do not have to pay for compute when no computation is happening. Organizations using on-demand instances exclusively will stand to gain the most by using this option.
3. Elastic Resize
Organizations prefer a data warehouse that is faster to scale and do not want to compromise between performance and concurrency. The perfect solution to that is elastic resize. Elastic resize allows you to scale a cluster up or down within minutes.
For example, when faced with a need to handle a sudden surge of data on the TB scale in one of our client workloads, we simply used the elastic resize option! The nodes were scaled up instantly to increase compute and storage without planning for capacity in advance.
Suppose organizations have currently planned their resources for their peak capacity of clusters by just enabling the elastic resize option. In that case, they can obtain significant cost savings, to the order of 15%.
However, you need to keep a few factors in mind while using the elastic resize option
- You cannot scale more than double the number of existing nodes using elastic resize.You will need to use the classic resize option.
- Scaling down of instances to the previous size of clusters or less than half the number of clusters may or may not be possible depending on the storage requirement.
4. Concurrency Scaling
Another interesting feature that impacts Redshift performance is the Concurrency Scaling, which is enabled at the workload management (WLM) queue level. The WLM allows users to manage priorities within workloads in a flexible manner. With the help of this feature, short, fast-running queries can be moved to the top of long-running queues.
The concurrency scaling feature automatically spins up transient clusters, serves the requests in the queue, and automatically scales down the clusters. You can support virtually unlimited concurrent users and queries, without compromising on query performance. What we need to know here is that Concurrency scaling is used only for reporting purposes.
For every 24 hours, the cluster is in use; the account is credited with 1 hour of free concurrency scaling by AWS. These free concurrency hours can be accumulated and used effectively during predictable workloads. On exceeding the accumulated hours, concurrency scaling is billed at per second cost of cluster price
5. Using the latest RA3 instances
One of the most welcome Amazon Redshift features in recent times has been the introduction of RA3 instances. RA3 instances allow us to scale compute and storage, independently, which means that the pricing is also loosely coupled with compute and storage.
In terms of benefits, we have seen up to 200% improvement in performance during the benchmarking exercise, which we recently conducted between RA3 vs. DS2.Xlarge. Since RA3 nodes are based on a large cache capacity, with high-performance SSD backed by S3, it is good to validate your instances to see if the RA3 can be a better fit for your storage and compute needs.
Currently, Redshift provides 3 flavors of instances,
- Dense Compute – ideal for high reporting and minimal storage demands
- Dense storage – used for high storage
- RA3 instances
Choosing the optimal size of production instances helps in lowering the cost between 15-20%. It is always good to start sizing based on workloads depending on CPU, I/O, disk, and network requirements.
7. Resolving long-running queries
The last optimization strategy is the most complex of the lot because it impacts the current design & architecture of the existing system.
The most common issue that organizations face today is long-running queries and inability to achieve parallelism. The underlying factors that cause this are:
- Data skew across nodes.
- A high amount of I/O
- Data not being compressed optimally
To resolve this, we need to look at the key strategies to improve performance and pay attention to distribution keys, sort keys, and encoding aspects.
- Use proper distribution keys, which help minimize data movement across nodes.
- AUTO distribution can be considered for tables with less than 5 Million records.
- For Optimal performance, choose columns used in Joins/filters as sort key.
- Be cautious in using Interleaved Sort Keys as it will add more overhead.
A new encoding type AZ64 has been included. AZ64 is Amazon’s proprietary compression encoding algorithm targets high compression ratios and better processing of queries. Using the AZ64, we see close to 30% storage benefits and a 50% increase in performance compared with LZO and ZSTD coding methods.
Along with these best practices, we can also look at the benefits of Redshift materialized views to improve long-running queries.
Bonus: Leverage Auto Management along with optimization strategies
Having discussed optimization strategies, we have an additional tip. Leveraging Redshift’s Auto workload management (WLM) feature in addition to the optimization strategies mentioned above leads to a more effective system. Redshift is gradually working towards Auto Management, where machine learning manages your workload dynamically.
Auto WLM involves applying machine learning techniques to manage memory and concurrency, thus helping maximize query throughput. Through WLM, Redshift manages memory and CPU utilization based on usage patterns. Auto WLM lets you prioritize your queries, ensuring that high-priority queries execute first as queries are continually submitted.
To leverage Auto Management,
- Ensure that AUTO ANALYZE, AUTO SORT, and AUTO VACUUM are enabled.
- If you are using interleaved sort keys, you need to run VACUUM REINDEX.
Nevertheless, if you choose to manage your workloads manually, some of the best practices are as follows.
- Do not create more than four queues
- Use QWR to monitor performance from bad queries
- Do not have more than 15 concurrent users
It is always a good practice to leave at least 5% memory unallocated.
With an ongoing, well thought out optimization plan, an organization should and can keep its Redshift DW performant and its costs down and even as the DW evolves. By choosing Agilisium’s Redshift Optimization Program, organizations can take a holistic look at their Redshift workloads to identify challenges and address them thoroughly and rapidly through a phased approach, extracting more value from AWS Redshift in the process.
That’s not all; maintaining the balance between cost, agility, and performance is built into the program allowing organizations to reap continual benefits.
Our in-house experts are proficient in cloud data warehousing, an ever-evolving technology space. We offer a comprehensive cost optimization package that has cut our clients’ Redshift costs in half! Facing a similar challenge? Please write to us.
The lack of a single source of truth, quality data and ad hoc manual reporting processes undermined top management’s visibility of integrated insights on sales, sales rep interactions, marketing reach, brand performance, market share, and territory management. Understandably, the client wanted to align information that has hitherto been in silos, to gain a 360-degree product movement view, to optimize sales planning and gain competitive edge.