Data warehousing is a crucial step in the Data-to-Insight journey of enterprises. There are now several DW products in the market tailormade for various business needs. But, managing a DW and taking complete advantage of its various features to maximize ROI is still a challenge. Our client is a global music conglomerate and one of the big three music companies internationally. The client’s North American Royalties Department (NA-RD) was finding it challenging to extract value from its existing DW solution.
The existing DW for reporting and analysis was built on Oracle. This DW was used to generate the NA-RD’s daily and monthly Internal Sales Reports (ISR) to provide critical insights to internal stakeholders at the department. However, generating reports using this DW was time-consuming, and the generated reports were error-prone for a variety of reasons.
The client chose to migrate to Amazon Redshift given that it is a pay-as-you-go, licensing cost free cloud based DW service with autoscaling capabilities offered by industry leading Amazon Web Services. Agilisium, having worked with UMG for a long time and having excellent understanding of its system was entrusted with the responsibility of migrating its existing Oracle DW to Redshift to enable analytics-at-pace.
The NA-RD stored all their sales data on an Oracle DW and used Business Objects as the reporting tool to fetch and process data from this DW and generate reports. However, the Oracle DW was organized so that 14 billion records resided in a single large 14 TB table in a DW of 18 TB. This DW organization placed the NA-RD at the risk of a catastrophic data loss.
Another negative consequence of the current setup was that of increased latency. Even crucial daily reports took upwards of 6 hours to be generated. The DW management team took over a week to complete the process of monthly data loading and reconciliation. In addition to the latency, the data quality of the loaded data was compromised.
Consequently, business teams treated the generated reports as merely a draft, manually re-checking the data points and correcting them to arrive at usable insights.
Agilisium’s solution was to migrate the on-premise Oracle DW to RedShift, improve latency and accuracy of this 18 TB DW – that ingested data from multiple databases and generated critical daily reports – all without impacting the 200 reports that were generated.
Agilisium with its expertise in AWS, developed a RedShift DW solution architecture that tackled the twin issues of the existing Oracle DW – performance degradation and poor data quality. This was accomplished via,
Re-platforming of data
The data from the Oracle DW was migrated to AWS RedShift with SnapLogic connectors in just 5 weeks. The data from Oracle was extracted into 1GB compressed files and copied on to AWS Snowball, which was then physically shipped to AWS. Subsequently, the COPY command with parallelism was used to execute the copy of data into S3 buckets. From S3, the data was copied on to RedShift DW inside UMG’s private cloud environment.
The huge table holding 14 billion records was strategically broken up into 8 tables holding 3.5 billion records each, dramatically reducing query response times. Since this DW’s primary use case is for reporting, AWS’s Time Series table variation was used, and the data in each smaller table corresponded to a particular year. As a result, scaling up in the future would be simple, with a new table added on for each year.
Re-orchestration of data flow
The Royalties DW pulled data from two other databases within UMG-NA. The data flow within the architecture was re-orchestrated with an eye on improved performance.
The loading of table data to the Redshift DW had validations built-in to ensure that any discrepancy in the records immediately triggered an alert to the business user who could then raise a service request to correct the concerned data warehouse record. This check ensured that the reports were accurate and eliminated the need for the business user to validate the report manually.
Moving pre-validated data into a staging table as and when data was available for loading ensured that the 5 – 6 days long month-end table load times were significantly reduced. After the staging table data was moved to the DW, further validation was done by checking the No. of records against a Control-lock table to ensure data quality.
Reducing impact on Business users
Even though the restructured tables on Redshift had a different table structure and a completely different way of loading data into said tables, a view matching the previous table structure was built as the top layer. This view layer was the layer connected to the reporting tool. Therefore, though everything in the underlying architecture had changed dramatically, the disruption to the end-user was nil.
- The weekly maintenance times reduced over 85% from 40-50 hours to 6 hours.
- On average, reports were generated 49% faster.