Data warehouses are a significant hub in the data-to-insight journey, and frequently they are the last stop before downstream BI tools gather data for insights. However, with rapidly changing business needs, data warehouse workloads undergo continual tweaks in a reactive manner, and eventually, the performance takes a hit. Therefore, routine optimization is an important step to maintain a fairly performant data warehouse system. One of our Enterprise Clients, a Global Pharmaceutical giant whose business-critical Redshift workload was experiencing performance issues, approached Agilisium for a performance tuning exercise.
The performance of the Client’s 900 GB Redshift workload has been below expected levels, and they commissioned Agilisium to assess the workload and come up with recommendations. We conducted a comprehensive assessment of their Redshift workload against all 5 pillars of the AWS Well-Architected Framework (WAF) in under 3 days. This was made possible using our homegrown assessment toolkit – Agilisium’s Redshift Inspector.
On completion of the assessment, a Findings and Recommendations report with our observations/findings categorized by criticality, along with relevant recommendations was shared.
Given below are key recommendations from the Findings and Recommendations report, aligned towards the 5 pillars of the AWS WAF:
- Security: Though the VPC subnet was enabled for IP restriction, the Redshift Cluster was publicly accessible, and S3 VPC endpoint was not enabled.
- Almost always, Redshift Cluster should be set up in a private subnet. Even when configured consciously in public subnet, cluster exposure should be limited through security groups and ingress rules
- Default endpoint port 5439 needs to be leveraged for additional layer defense
- S3 VPC endpoint should be enabled to ensure controlled access to S3 buckets, objects and API functions that are in the same region as the VPC
- Cost: The client was running 12 DC2 instances on-demand.
- Our experts recommended that the client start running their workloads on reserved instances save between 50-70% of their on-demand instance cost.
- Performance: The assessment found tables with high skew values, 537 large-width columns, and unsorted percentage for 32 tables was more than 80%. In addition, Concurrency scaling was not enabled.
- Reduce data skew values to under 4, by revisiting SORT & DIST KEYS for selected tables
- Leverage time Series tables when data is in TB scale and split into a multiple-year format and leverage materialized views to combine data from all underlying tables
- Large width columns will lead to data spill to disk, avoid larger width columns to manage the memory usage effectively
- To minimize the broadcasting of the data, ensure colocation of columns used in Joins by choosing right SORT & DIST KEY
- Ensure AUTO SORT is enabled
- Cost:The client was running a 12 node DC2.large instance.
- The Client was already using Reserved Instances. Implementing performance optimization recommendations would lead to downsizing the cluster size by 1 node among other indirect cost savings. The total savings was estimated to be 10%
- Reliability:9 users had superuser access, and the 190 users had read only access
- Limit SUPERUSER access to two users and group the remaining users for enhanced reliability and resiliency
- Operational excellence: : DevOps processes for cluster management were not implemented and out-of-the-box auto management Redshift features were not leveraged.
- Enable Audit logging to track usage and make troubleshooting easier
- Automate Cluster management through Cloudformation or equivalents
- Setup auto management of workload to effectively sort data, gather statistics and reclaim deleted space
To fulfill SocialHi’5 need for a client self-service portal that was also easy to maintain, Agilisium’s 5-member expert team built a custom web application with a heavy focus on the visualization of campaign outcomes. They also developed in parallel a DevOps process to maintain, scale and operate this portal.
Web Application Architecture
A variety of AWS services and some open source technologies were used to build and run the web application. The web layer used the PHP framework, included a login and authentication system, and used AWS QuickSight to render its outcome dashboards.
The app layer was built on Python, and the backend services were run on Elastic Container Service (ECS) dockers with Auto Scaling and Auto Load Balancing (ALB) to ensure high availability of the portal. The database was run in a private subnet and used RDS MySQL as the database service.
As mentioned earlier, SocialHi5 necessitated that the solution offered was easy to maintain, scale, and operate. To that end, Agilisium’s DevOps engineers developed a 2-part DevOps process focusing on
- CI/CD for web application development
- Infrastructure Provisioning for maintenance.
Continuous Integration/Continuous Deployment (CI/CD Process)
All application (Web & App Tier) maintenance was articulated via AWS’s Code Pipeline. AWS’s Code Commit, Code Deploy, and Code Build services were invoked to automate the enhancement and maintenance of the self-service portal.
CI/CD Process Flow: Web Tier
CI/CD Process Flow: Web Tier
All infrastructure was hosted on an exclusive SocialHi5 Virtual Private Cloud (VPC), to add an extra layer of confidentiality. AWS CloudFormation templates were used to spin up and maintain a host of AWS services utilized for the self-service portal.
Serverless Web application hosting: EC2, ECS, RDS, S3, SSM, VPC, NAT Gateway, ALB with Autoscaling Group, LAMBDA, Certificate Manager, Route53 were some of the services used to get the portal live.
Security: Web Application Firewall (WAF) was used with Cross-site scripting, Geo match, and SQL injection rules to protect from common cyber threats in conjunction with the AWS inspector service.
Monitoring and Logging: CloudWatch, OpsWorks, Config & Inspector services were also invoked to cover configuration management, logging, and monitoring of the application and infrastructure.
- 50% faster Redshift queries
- 10% lower Redshift running cost
- Highly resilient and secured Redshift workload aligned with AWS best practices.