Understandably, the client who is a U.S multimedia conglomerate was looking out for ways to monitor how its digital content is consumed, to decide where to invest ad dollars and increase viewership & top line.
With an extensively growing client global network and the slew of digital platforms leveraged, there was an explosion in data volume and variety.
The existing reporting process was ad hoc/manual and was time consuming and error-prone, which translated into delayed and incomplete insights. This paved way for instincts driven decisions, and undermined efforts to increase viewership and stay competitive.
Lack of integrated insights limited key business leaders’ ability to make true data-driven strategic business decisions. A new scalable Analytics platform with robust automated data integration framework, data governance and audit processes was envisaged, to achieve the following:
- Enable internal teams (Marketing, Product, Production, Social Media) to shift from instincts-driven to insights-driven decision making, by providing anytime access to all data processed automatically.
- Automated data processing & loading in the lowest grain to do the below:
- Help answer questions such as how many viewers transitioned among digital platforms, and change in number of full episode viewers (FEP).
- Enable key business decision makers to gain holistic view with minimal dependence on IT.
To address all challenges, an elastically scalable analytics platform was built on top of AWS Cloud. Given below are key features of the platform:
- Data lake was designed in AWS S3 and it served a dual purpose – a) As single source of truth, to store data from heterogenous sources in their native form. b) As platform to unearth insights from raw semi and unstructured data through Redshift Spectrum.
- Massively parallel processing (MPP) data warehouse solution using AWS Redshift was designed for expedited insights from structured data. The data was pre-processed and loaded into Redshift to reduce IT dependency for Analytics reports.
- A purpose-built data validation, cleansing, and data integration framework was designed. Talend Integration tool and custom scripts were used to fetch data from discrete data sources and load OTT services, Social impressions, and User behaviour data to the data lake and analytics platform for reporting and data science activities.
- Data lineage documents were prepared in confluence to effortlessly map data elements to source system(s).
Technologies Used: Talend Integration Cloud (with Big Data), Custom scripts on Java, Python and Linux Shell, S3 Data Lake, Redshift, DOMO, Data Bricks, Redshift Spectrum, Apache Spark.
Team size – 2 (Onsite), 1 (Offshore)
Cluster details –8xlarge, M3.xlarge. Current data volume is 5+ TB with inflow of approx. 65 GB/day
Project Duration – 6 months (ongoing)
Delivery model – Hybrid
How we worked
The project scope was decided by the client and the solution was jointly delivered in Agile methodology.
Agilisium worked closely with client’s team to design and deliver solution. Daily scrum calls ensured that key stakeholders were apprised on progress made at every stage. Weekly Status Reports and Project tracking tools were used to provide enhanced visibility.
- 4x faster data integration: Custom scripts on Linux and Python increased data integration speed by 200%, which translated into $50,000/year in cost savings.
- 360-degree view is now a few clicks away: As auto-processed data available in the lowest grain from all data sources; slicing and dicing it to unearth insights are just a few clicks away.
- No more guesswork: With anytime availability of all auto-processed data, internal teams are now able to cut the guess work in decision making.
- Advanced Analytics ready: Data in both S3 and Redshift can be leveraged for downstream scalable predictive analytics, at speed of thought.
- Agile Data platform: Data Lake platform with highly performing Parquet file formats provide flexibility and portability to move from one visualization and analytics platform to other with minimal changes.