
Overview
The Media and Entertainment domain (M&E) is crowded with more players than ever before. Increased access to high speed internet has seen OTT streaming services emerging as a strong contender to traditional media studios. In this war for eyeballs, actionable business insights derived from processing big data is the most crucial weapon of all. Hence, big data processing technologies like Hadoop play an indispensable part in the technology stacks of M&E enterprises like our client – an entertainment conglomerate founded in 1923 and headquartered in Burbank, California, USA.
A customer insights application powered by Hortonworks was built for the client’s Data Science team. For over a year, the application was managed by Agilisium, while the client’s Data Engineering team supported and served requests to develop new queries for analysis.
This application was starting to show latency. Agilisium’s ongoing relationship with the client and data analytics expertise enabled it to proactively recommend one significant change that accelerated client’s data-to-insight journey time from days (on Hortonworks) to hours.
The Challenge
The primary change recommended by Agilisium’s experts was that the client migrate from their existing Hortonworks data platform to a cloud service/product that better suited their current needs. Hortonworks was failing to meet the client’s needs due to three main reasons,
- The average utilization peaked around 30% in spite of Data Science and Data engineering teams running data processing jobs on Hortonworks. However, it peaked at 80% – 90% during month-end activities. This meant that the client invariably paid peak utilization fees.
- There was a high volume of requests from both Data science and Engineering teams to the DevOps team. The service requests were operational in nature, hence the DevOps TurnAround Time (TAT) was at minimum a few days and could even go up to a few weeks.
- This led to increased overhead costs as two full-time resources were required to spin up clusters, monitor jobs and ensure that all teams followed DevOps processes while pushing jobs onto Hortonworks.
Consequently, it was a challenge for the client to realize their ROI from Hortonworks. In addition, Agilisium predicted that the client’s data processing needs would only increase in number and complexity over the coming quarters. When the findings were presented, the client appreciated Agilisium’s proactive recommendations and requested that we offer them a best fit solution.
Our Solution
The team of experts from Agilisium tackled the challenge presented to them systematically. Firstly, they built comparative POCs for the client’s use case using three hand-picked technologies - Databricks, AWS EMR & Qubole – all in just under 12 weeks. Secondly, the POCs were presented to the client and they chose EMR for its - elastic auto-scaling compute power, pay as you go pricing, lack of licensing fee and an easy to use management console which also addressed the issue of the high overhead.
Thirdly, although on paper the migration was simple as Hortonworks and EMR are built on the same two open-source technologies - Hadoop & Spark – in reality, the two platforms are significantly different. Therefore, each data processing job on Hortonworks had to be carefully refactored for EMR. The client had stipulated that the migration would be executed with their inhouse resources working closely with Agilisium’s team. The client’s team worked closely with Agilisium and leveraged their expertise in both products to the hilt, easing their migration process.
Finally, while the client’s team handled the migration of data processing jobs, Agilisium also worked on ensuring that EMR’s connection to the rest of the architecture was stable and that it was user friendly. This involved,
- Simplifying the complex DevOps & CI/CD processes developed originally for Hortonworks to match EMR’s features.
- Moving data storage from HDFS to S3, dramatically bringing down costs.
- Usage of tools like Jenkins, Ansible, Terraform and Airflow to automate the jobs flowing through the EMR platform, instead of complicating the architecture with point-to-point integration.
At the end of a 12-week migration effort, all the client’s data processing jobs were migrated to AWS EMR from Hortonworks leading to the client gaining big data processing capabilities at pace.
- The Total Cost of Ownership (TCO) significantly decreased due to usage of open source technology and EMR’s lack of licensing fee and pay-as-you-go pricing.
- On average, DevOps Team TAT reduced between 70 – 80% going from days to hours. Subsequently, Data Scientists could do continuous analysis on business data to obtain customer insights.
- Time-to-Market for application deployment reduced significantly by about 90% without loss in performance.