Case Study
DevOps enabled Serverless Backups for a Multinational Networking Company


Our client is a multinational manufacturer of software-defined networking products, such as routers, switches, network management and IT security products. Their marketing team utilizes a cloud-based analytics application for creating predictive models used in sales forecasting, built on Hortonworks Data Platform (HDP).

The client’s marketing data i.e., web traffic and user behaviour information, is collected from multiple sources and stored as data lakes in a Big Data Technology Stack, consisting of HDFS, Spark and Hive. The underlying HDP facilitates cluster storage.

The client’s HDP cluster is hosted on to Amazon EC2 instances and its individual nodes are attached to EBS (Elastic Block Storage) volumes for storage. The data is then permanently stored on to Amazon S3 buckets.

The Challenge

The existing analytics application faced few challenges, which impacted its normal operation:

HDP Cluster Deployment

The setting up of the server infrastructure for the clusters was manual and time-consuming. Besides, the server migration process was equally prolonged. It took nearly 4-5 days for the infrastructure setup and migration process to complete, with 2 People.

Backup and Server Administration

  • The regular backup operations on EC2 servers couldn’t be carried out, due to AWS defined threshold limits (volume of backup files of up to 10,000) set on to AMI (Amazon Machine Image) instances and EBS volumes.
  • The daily scheduled backups consumed high memory (40-50 EBS volumes) and the resources were extensively used for storing the client’s marketing data (7-8 nodes per HDP cluster, with 5-6 EBS volumes per node).

Cost Escalations

The costs related to the EBS volumes were rising significantly, due to higher space consumption and regular backup activities, with monthly costs of up to $20,000 for the backup storage services (AWS EC2 and EBS).

Our Solution

To handle clusters, manage backups and offer a cost-effective solution, Agilisium implemented a Big Data DevOps solution for the client, to aid in its infrastructure setup, application deployment and migration process.

Key Services Used

  • Lambda functions were utilised for triggering a) scheduled backup operations b) creating Amazon Machine Images (AMIs) and c) automating clean-up activities of the obsolete EBS and RDS (relational database) snapshots.
  • AWS Cloud Watch was used for authorized login processes, monitoring and management systems.
  • Daily scheduled data backups were carried out from HDFS to Amazon S3.
  • Ambari APIs were invoked from Ansible, to automate the HDP configurations, setup and installation.

HDP Cluster Deployment

AWS utilized Ansible to manage its cloud environment and key services such as EC2 and S3, support seamless scalability of the client application instances, and automate the cluster deployments and updates to the HDP components and services. There has been a substantial reduction in the HDP cluster deployment time from 5 days to 1 day, in turn, decreasing the overall Time to Market (TTM) by 80%.

Backup and Server Administration

  • AWS Lambda functions were implemented for effective backups scheduling and execution of the utilized EC2 instances and AMIs.
  • These functions also aided in backup clearances beyond 1 year of data, leading to reduced backup failures.
  • Automated processes checking for the threshold limits was carried out on the EBS and AMI backups.

Cost Escalations

The DevOps enabled Big Data Deployment has increased the cost savings by 25%, i.e., about $5000 per month, facilitated by EC2 and EBS backup storages.

Results and Benefits

With the DevOps enabled implementation, the analytics application witnessed the following benefits

Faster Time to Market (TTM)

Due to the increase in speed of infrastructure setup and support, there has been a significant reduction in time in spinning up the HDP clusters by 60-70%. In addition, the application witnessed faster responses to service change requests by 30-60%, supporting a superior and efficient data analysis process.

Operational Efficiency

Post implementation, the application performed without excess resource consumption, due to automated backups scheduling and execution, better administrative control through effective management of historical backups, and cost savings due to increased efficiency.