Building an AWS well-architected (WAF) data warehouse on Redshift
With thousands of customers, Redshift is the most widely adopted Enterprise Data Warehouse. In the last 18 months alone, over 200 new features have been added to Redshift, helping it maintain an edge over its competition in terms of performance and predictable cost. Having worked closely with the product team as an AWS Advanced Consulting Partner for Data and Analytics, it is apparent that Redshift offers a gamut of advantages. However, in our interactions with clients, we found that maintaining the initial enthusiasm of when they migrated to Redshift is challenging for most organizations.
The primary cause of this drop in enthusiasm is that today, architects play a crucial role in designing enterprise solutions on the cloud. However, this architect community struggles to keep up with the rapid pace of innovation due to limited time and bandwidth; they cannot experiment, learn new features, and the latest best practices. Subsequently, an organization’s ability to extract the maximum value from their existing Redshift investments is severely curtailed.
In this two-part blog series on Redshift Optimization, we will detail, 1) the critical design considerations for building an AWS well-architected (WAF) data warehouse on Redshift and 2) how organizations can optimize their EDW in the face of rapidly changing business needs without compromising the WAF pillars.
Key Design Considerations for a Redshift EDW built leveraging WAF
While working within a Redshift ecosystem, be it a completely new implementation or reviewing an existing setup, it must incorporate the five fundamental design principles laid out by AWS as part of the well-architected framework. Considering the WAF helps to build stable and efficient systems, which organizations can leverage to be agile and keep up with changing business needs. As listed below, the five pillars provide a consistent approach to evaluate and implement designs that can scale with an organization’s application needs over time.
- Cost Optimization
- Performance Efficiency
- Operational excellence
Let us take a brief look at how an architect approaches an EDW’s design, taking each of these five pillars into consideration.
The key to eliminating or avoiding unnecessary costs comes from understanding how Redshift clusters consume resources. This will mean answering questions like,
- How much cloud-capacity would be needed?
- Are you choosing the right size for the cluster?
- Is there compensation in performance?
- Are the choices cost-optimal?
It is essential to understand consumption as the Redshift environment set up is heavily influenced by it. An architect may choose to apply one or several of the strategies listed below based on consumption and workload patterns,
- Right-sizing cluster for optimal cost or speed rather than peak workloads
- Pricing strategies like choosing reserved capacity for regular workloads
- Ensuring the right type of nodes, and increasing or decreasing node type according to changing workloads
An architect will also turn to benchmarking to arrive at optimal requirements and use the most cost-effective resources. Choosing appropriate instances and resources is the first step in cost saving.
The performance efficiency pillar provides tips on measuring the performance of the workload. While evaluating performance for a Redshift EDW, an architect tries to understand running workloads and answer questions like –
- How many workloads are run in parallel?
- How efficiently are they using the cluster?
- Is the data and workload distribution optimal?
- How is memory utilization?
- Are there any unutilized or under-utilized resources?
The overall idea is to measure the workload’s performance and optimize resources and scale based on demand. As demand changes, regular performance review can help unearth issues that need attention, and we cover this in-depth in our next blog. The focus is on a data-driven approach, making acceptable tradeoffs to arrive at a high-performing architecture.
The security pillar aims to protect the information, systems, and assets in the cloud. From creating an identity foundation and enabling data traceability, to ensuring security at every layer, the aim is to protect data in transit and rest.
A critical security consideration is to ensure that an organization’s network and data security design are well thought out based on their policy and performance needs. Organizations must be able to control who can do what, prevent security incidents, and identify and take immediate action in security breaches. This requires a well-defined process, one that maintains confidentiality and complies with regulatory requirements.
An architect also takes strategic advantage of the AWS Shared Responsibility Model, where AWS maintains the cloud’s security, and an organization takes care of security in the cloud. Another critical security component is the continuous monitoring and auditing of cloud deployment.
Business continuity is the key issue addressed through the Reliability pillar. While designing a Redshift architecture for reliability, an architect ensures that the following considerations are fulfilled.
- Is the Redshift ecosystem resilient towards both internal, external
- Is the solution highly available?
- Are relevant disaster recovery and backup procedures in place?
- Are the recovery measures tested?
Any ecosystem must be able to recover from infrastructure or service disruptions. A well-architected Redshift EDW is designed to detect most failures and automatically heal itself, creating a ‘Reliable’ ecosystem.
The Operational Efficiency pillar of the Well-Architected Framework focuses on ensuring continuous operation and management of an organization’s ecosystem. An architect incorporates several techniques, processes, and strategies to achieve this – from performing operations as code, making small, frequent, reversible changes, to refining procedures frequently. The aim is to achieve the smooth functioning of all processes.
Manual processes cannot keep up with today’s pace of change. Hence, an architect leverages modern and automated processes like DevOps and CI/CD processes and evolves said processes as requirements change. Centralized monitoring and logging are built into the ecosystem to help study processes in failure or error and provide necessary recommendations. Learning from prior performance, validating procedure, and anticipating failure take prime importance in implementing a successful process.
The first step for an organization to utilize Redshift to its fullest is to set up an ecosystem that is stable, agile, and robust by following AWS’s well-architected framework. An architect(s) spend time and effort carefully considering the 5 WAF to achieve the same. However, even the most well-architected EDW suffers from increased cost and performance issues as it evolves to keep up with changing business needs. In our next blog, we will cover strategies to get Redshift costs down while also enjoying the optimum performance, even if the business requirements have changed.
The lack of a single source of truth, quality data and ad hoc manual reporting processes undermined top management’s visibility of integrated insights on sales, sales rep interactions, marketing reach, brand performance, market share, and territory management. Understandably, the client wanted to align information that has hitherto been in silos, to gain a 360-degree product movement view, to optimize sales planning and gain competitive edge.