Agilisium’s AWS Big Data solution enables speed of thought insights for Universal Music Group
Data Convergence in AWS Cloud Computing
Highlights
data reconciliation time reduced from 48 to 7 hours
70%
lesser storage cost with Cloud DW
Query deadlock avoided
Quick results for data analytics
data security (encryption in s3)|
Client Profile
The client Universal Music Group (UMG) - a global music conglomerate and one of the big three music companies internationally - faced challenges in divining insights from the enormous amounts of data shared by its streaming partners.
Business challenges
- The existing MS-SQL based on-premise data warehouse (DW) was struggling to scale up and process the exponential incoming volume of data from streaming partners (Spotify, Apple, YouTube etc.). This had a proportionate impact on all downstream processes and delayed key tactical business decisions.
- Due to the slow processing time of the existing system, deep dive analyses such as Sales as of LYSD (Last year same day) was impossible.
- Upgrading the existing system was ruled out due to the potentially exorbitant licensing cost of additional servers/tools to handle the new volume of data.
- UMG was looking out for cost efficient, scalable solution that does not undermine speed and business agility.
Agilisium Solution
- Agilisium devised a cloud based, elastically scalable architecture that offers faster analytics and business agility in a cost-efficient manner.
- Agilisium was onboarded to consult, guide and execute the migration of the MS-SQL DW to Redshift. Initially, a data lake was created with the same on-prem table structure. This made it easy to integrate around 300 existing MicroStrategy reports minimizing or eliminate the impact on end users. Subsequently, the data was migrated from MS-SQL DW to Redshift. The streaming data was first enriched using Hive on EMR and loaded into S3 as multi-part files. The processed data was then moved into RedShift via a data pipeline using the bulk load copy command. The team followed best practices like enriching source data and bulk load copy command allowed for a rapid, high quality data migration.
- In addition to the migration, Agilisium reworked the entire flow of the data to better serve multiple downstream services like Qubole which enabled UM
- G analysts to query the raw data as needed for deeper Analytics, leveraging Data Lake built by Agilisium. At the end of the year-long migration a total 200+ TB was moved at the rate of 2 TB/day with an additional 250+ million records being added to the new RedShift DW every
Tech-stacks Used
emr
EC2
Qubole
DataPipeline(one of the component in AWS(like Airflow))
JAVA
Python
GitHUB
Redshift
Business outcome
- Near real time synchronization of the records helped UMG gain insights at the speed of thought. Month-end data reconciliation time reduced from 48 hours to under 7 hours.
- Cloud DW with flexible storage costs 70% lesser month on month than client’s previous set up.
- Cost effective deep dive analysis now possible due to sub second response times to downstream services. This led to UMG’s better understanding of consumption patterns & affinity to decide where to focus & invest ad dollars.