Leveraging AWS Redshift makes analytics-at-pace a reality for Universal Music Group
Empower the Analytic pace of UMG
weekly maintenance time reduced
of reports generated faster
reduced in maintenance
dramatically reducing query response times
Re-orchestrated data improved performance.
The client Universal Music Group (UMG) - a global music conglomerate and one of the big three music companies internationally - faced challenges in divining insights from the enormous amounts of data shared by its streaming partners.
- The NA-RD stored all their sales data on an Oracle DW and used Business Objects as the reporting tool to fetch and process data from this DW and generate reports. However, the Oracle DW was organized so that 14 billion records resided in a single large 14 TB table in a DW of 18 TB. This DW organization placed the NA-RD at the risk of a catastrophic data loss.
- Another negative consequence of the current setup was that of increased latency. Even crucial daily reports took upwards of 6 hours to be generated. The DW management team took over a week to complete the process of monthly data loading and reconciliation. In addition to the latency, the data quality of the loaded data was compromised.
- Consequently, business teams treated the generated reports as merely a draft, manually re- checking the data points and correcting them to arrive at usable insights.
- Agilisium’s solution was to migrate the on-premise Oracle DW to RedShift, improve latency and accuracy of this 18 TB DW – that ingested data from multiple databases and generated critical daily reports – all without impacting the 200 reports that were generated.
- Agilisium with its expertise in AWS, developed a RedShift DW solution architecture that tackled the twin issues of the existing Oracle DW – performance degradation and poor data quality. This was accomplished via,
RE-PLATFORMING OF DATA.
- The data from the Oracle DW was migrated to AWS RedShift with SnapLogic connectors in just 5 weeks. The data from Oracle was extracted into 1GB compressed files and copied on to AWS Snowball, which was then physically shipped to AWS. Subsequently, the COPY command with parallelism was used to execute the copy of data into S3 buckets. From S3, the data was copied on to RedShift DW inside UMG’s private cloud environment.
- The huge table holding 14 billion records was strategically broken up into 8 tables holding 3.5 billion records each, dramatically reducing query response times. Since this DW’s primary use case is for reporting, AWS’s Time Series table variation was used, and the data in each smaller table corresponded to a particular year. As a result, scaling up in the future would be simple, with a new table added on for each year.
RE-ORCHESTRATION OF DATA FLOW
- The Royalties DW pulled data from two other databases within UMG-NA. The data flow within the architecture was re-orchestrated with an eye on improved performance.
- The loading of table data to the Redshift DW had validations built-in to ensure that any discrepancy in the records immediately triggered an alert to the business user who could then raise a service request to correct the concerned data warehouse record. This check ensured that the reports were accurate and eliminated the need for the business user to validate the report manually.
- Moving pre-validated data into a staging table as and when data was available for loading ensured that the 5 – 6 days long month-end table load times were significantly reduced. After the staging table data was moved to the DW, further validation was done by checking the No. of records against a Control-lock table to ensure data quality.
REDUCING IMPACT ON BUSINESS USERS
- Even though the restructured tables on Redshift had a different table structure and a completely different way of loading data into said tables, a view matching the previous table structure was built as the top layer. This view layer was the layer connected to the reporting tool. Therefore, though everything in the underlying architecture had changed dramatically, the disruption to the end-user was nil.
- The weekly maintenance times reduced over 85% from 40-50 hours to 6 hours.
- On average, reports were generated 49% faster.