Case Study
Microsoft SQL server analytics workload migration to AWS for a leading Retail company

Client Scenario

Our client is a privately held company that offers a range of high-quality products and services to consumers. They grow, harvest, bottle and market a diverse range of products including fruits, nuts, flowers, water, wines and juices.

The customer is currently running their Data & Analytics workload on-premise. They are exploring AWS for future Data & Analytics and Data Science needs.

The long-term goal is to build a Data lake and implement Data & Analytics and Data Science services on AWS Cloud.

The near-term goal is to export data from On-prem MS SQL server to AWS Cloud to build analytics and reporting.

Challenges

  • Almost always, Redshift Cluster should be set up in a private subnet. Even when configured consciously in public subnet, cluster exposure should be limited through security groups and ingress rules
  • Default endpoint port 5439 needs to be leveraged for additional layer defense
  • S3 VPC endpoint should be enabled to ensure controlled access to S3 buckets, objects and API functions that are in the same region as the VPC

Solution Considerations

  • Data migration from On-Prem to Cloud
  • Target was to move the data into Amazon Redshift
  • DMS was considered, with an intermediate transfer to S3. But the handshake between MS SQL to S3 and S3 to Redshift was not seamless
  • AWS Glue was considered, but speed of processing data was a concern
  • Kinesis Firehose was considered, but was expensive due to the pricing based on 5KB blocks of data

To fulfill SocialHi’5 need for a client self-service portal that was also easy to maintain, Agilisium’s 5-member expert team built a custom web application with a heavy focus on the visualization of campaign outcomes. They also developed in parallel a DevOps process to maintain, scale and operate this portal.

Web Application Architecture


A variety of AWS services and some open source technologies were used to build and run the web application. The web layer used the PHP framework, included a login and authentication system, and used AWS QuickSight to render its outcome dashboards.

The app layer was built on Python, and the backend services were run on Elastic Container Service (ECS) dockers with Auto Scaling and Auto Load Balancing (ALB) to ensure high availability of the portal. The database was run in a private subnet and used RDS MySQL as the database service.

DevOps Process:

As mentioned earlier, SocialHi5 necessitated that the solution offered was easy to maintain, scale, and operate. To that end, Agilisium’s DevOps engineers developed a 2-part DevOps process focusing on

  • CI/CD for web application development
  • Infrastructure Provisioning for maintenance.

Continuous Integration/Continuous Deployment (CI/CD Process)

All application (Web & App Tier) maintenance was articulated via AWS’s Code Pipeline. AWS’s Code Commit, Code Deploy, and Code Build services were invoked to automate the enhancement and maintenance of the self-service portal.

CI/CD Process Flow: Web Tier


CI/CD Process Flow: Web Tier


Infrastructure provisioning

All infrastructure was hosted on an exclusive SocialHi5 Virtual Private Cloud (VPC), to add an extra layer of confidentiality. AWS CloudFormation templates were used to spin up and maintain a host of AWS services utilized for the self-service portal.

Serverless Web application hosting: EC2, ECS, RDS, S3, SSM, VPC, NAT Gateway, ALB with Autoscaling Group, LAMBDA, Certificate Manager, Route53 were some of the services used to get the portal live.

Security: Web Application Firewall (WAF) was used with Cross-site scripting, Geo match, and SQL injection rules to protect from common cyber threats in conjunction with the AWS inspector service.

Monitoring and Logging: CloudWatch, OpsWorks, Config & Inspector services were also invoked to cover configuration management, logging, and monitoring of the application and infrastructure.


Solution Highlights

Data Migration
  • Windows Server was setup on Amazon EC2.
  • S3 Sync was used to move the file from on-prem to the Windows Server
  • Amazon S3 was setup to transfer data from Windows Server to S3
Monitoring & Logging
  • AWS Systems Manager is setup as the Configuration Management Server
  • Patching of servers is taken care of by AWS Systems Manager
  • Amazon CloudWatch metrics are enabled to track the health of solution components
  • Logs are enabled via AWS Lambda to measure latency
Deployment Automation
  • AWS CloudFormation is used to deploy in all environments (Dev, QA, Integration, Production)

To fulfill SocialHi’5 need for a client self-service portal that was also easy to maintain, Agilisium’s 5-member expert team built a custom web application with a heavy focus on the visualization of campaign outcomes. They also developed in parallel a DevOps process to maintain, scale and operate this portal.

Web Application Architecture


A variety of AWS services and some open source technologies were used to build and run the web application. The web layer used the PHP framework, included a login and authentication system, and used AWS QuickSight to render its outcome dashboards.

The app layer was built on Python, and the backend services were run on Elastic Container Service (ECS) dockers with Auto Scaling and Auto Load Balancing (ALB) to ensure high availability of the portal. The database was run in a private subnet and used RDS MySQL as the database service.

DevOps Process:

As mentioned earlier, SocialHi5 necessitated that the solution offered was easy to maintain, scale, and operate. To that end, Agilisium’s DevOps engineers developed a 2-part DevOps process focusing on

  • CI/CD for web application development
  • Infrastructure Provisioning for maintenance.

Continuous Integration/Continuous Deployment (CI/CD Process)

All application (Web & App Tier) maintenance was articulated via AWS’s Code Pipeline. AWS’s Code Commit, Code Deploy, and Code Build services were invoked to automate the enhancement and maintenance of the self-service portal.

CI/CD Process Flow: Web Tier


CI/CD Process Flow: Web Tier


Infrastructure provisioning

All infrastructure was hosted on an exclusive SocialHi5 Virtual Private Cloud (VPC), to add an extra layer of confidentiality. AWS CloudFormation templates were used to spin up and maintain a host of AWS services utilized for the self-service portal.

Serverless Web application hosting: EC2, ECS, RDS, S3, SSM, VPC, NAT Gateway, ALB with Autoscaling Group, LAMBDA, Certificate Manager, Route53 were some of the services used to get the portal live.

Security: Web Application Firewall (WAF) was used with Cross-site scripting, Geo match, and SQL injection rules to protect from common cyber threats in conjunction with the AWS inspector service.

Monitoring and Logging: CloudWatch, OpsWorks, Config & Inspector services were also invoked to cover configuration management, logging, and monitoring of the application and infrastructure.


Data processing
  • AWS Lambda function was implemented to convert the CSV file stored in S3 to parquet
  • AWS Glue was implemented to process the parquet converted file and read it in Amazon Redshift Spectrum.
  • Redshift Spectrum was enabled to help with ad-hoc queries. There are PowerBI reports built to view snapshot data for trend analysis.
Security
  • IAM best practices and principles are followed
  • Least privileged access is provided
  • Unique non-root credentials are provided
  • Programmatic access for API calls
  • Security groups are defined to restrict traffic
  • All Data stores are in private subnet
  • Amazon KMS is used for encryption of data at rest

Solution Architecture

Results and Benefits
  • With all the data now available on AWS, our client is now enabled to move further with their long-term goal of building Data & Analytics and Data Science services
  • Scalability and Elasticity is in-built with the solution on AWS Cloud
  • With the Pay-as-you-go model, the Total Cost of Ownership of the solution is now reduced significantly
  • Optimal performance with Lambda
    • 180 MB file processed in 25 seconds
    • 3,273,300 records each of 60-70 bytes
    • Close to 1.4 Billion records processed per day