Case Study
Niche solutioning: Unified data analytics & machine learning architecture for Gaming Analytics

Overview

The global gaming market is growing at an exponential pace. It is estimated to be worth a gross value of 139 billion dollars a year, and it has more than 2.4 million people as consumers playing video games as of 2019. The increased access to broadband internet in the recent past has only propelled online gaming at an explosive pace.

One of the globally popular genres of internet gaming is the multiplayer online game. One revenue model employed by gaming companies is the freemium game. Players start playing for free and do not pay to win; however, they can buy ancillaries like armor to enhance their skills, extends gaming time, and so on. In addition to improving the overall gaming experience, this feature also is the primary source of revenue generation via micro-transactions. Consequently, the game must be able to offer hyper-relevant, in-game merchandise to players

A similar revenue model was used by our client, an American video game developer based in Los Angeles, California. They are best known for a multiplayer online battle arena game released in 2009. This hugely popular game serves over 100 million peak monthly players. The game’s streaming data was analyzed using a custom platform built in-house, and it was failing to keep up with the growing demands of the game. Therefore, the client was looking to tune their gaming data analytics better.

The Challenge

The origin data to offer relevant in-game purchases was the thousands of data points generated per user per second (in-game player activity) while playing a match. In the existing system, this data was streamed using a custom streaming engine, built on the open-source Apache Kafka. The raw data was stored in Vertica (an SQL database) and analyzed for insights via Databricks.

However, because of the expanding number of users, the current system faced the following issues:

  • Data was in silos and duplicated, leading to latency in downstream data analytics
  • The streaming data size was at Petabyte level, and storage costs were escalating
  • It was no longer cost-effective to analyze their streaming data on Databricks
  • Modifications to the streaming engine to handle increased load was complicated and time-consuming

To overcome these issues, the gaming company considered moving its data analytics and Machine Learning (ML) to the cloud. AWS introduced Agilisium to the business via the APN Consulting Partner program due to its niche expertise in AWS analytics services. A team of our experts was tasked with architecting a solution and delivering a proof of concept (POC).

Solution Highlights

Unified Analytics Approach:

To meet the unique streaming data analytics and subsequent data science needs of the gaming company, Agilisium’s experts developed a conceptual architecture called the Unified Analytics Approach (UAA).

The Unified Analytics Approach had three pillars:

  • Single source of truth – bring all data under one roof
  • Data surfacing – make the data consumable through cataloging
  • Tap in – all other tools (data warehouse, big data, BI, and data science) tap into the single source of truth.

UAA approach use case: Proof of Concept for Gaming Analytics

The UAA approach was applied to build the Proof of Concept (POC) for gaming analytics on the AWS cloud. The siloed data in the existing system lead to latency and needless data duplication. The first pillar of the UAA addressed this issue.

All in-game player interactions, financial transactions, and chat history were captured into streams via AWS Kinesis Data Streams service, which was faster. This data functioned as the single source of truth as all tools in the system interacted with only this data for all analytics and eliminated duplication.

For this POC, the streams were then processed for two purposes:

  • Storage layer – data preparation for further analytics (Data users)

    Firstly, the single source of truth – the raw streaming data needed to be stored. For the purpose of this POC, the data ingestion benchmark was set at 20,000 messages/second. Despite the high benchmark, the in-game data was rapidly streamed using the Kinesis Streaming service, ingested by AWS’s Kinesis Firehose service and stored in Amazon S3 buckets, leading to dramatically reduced ingestion time and storage costs.

    Secondly, to enable the availability of such vast quantities of data for further analytics, it was cataloged via Glue crawlers (AWS serverless ETL service). Gaming systems require near real-time analytics, therefore, to ensure that this data is always up to date, Glue’s event-driven ETL pipelines provided instant cataloging of newly ingested data.

    Finally, the cleansed, validated, and formatted data was stored in a Redshift/Athena (SQL based) data warehouse and was made available for use by other downstream applications. The incorporation of this data cleansing step ensured that all downstream querying, analysis, and reporting applications arrived at accurate insights as they seamlessly accessed high-quality data.

  • Processing layer – In-game purchase recommendations (End users)

    As mentioned above, microtransactions are how the game generates revenue. For this purpose, the client used an in-game purchase recommendation engine to create personalized recommendations for millions of their users. The raw in-game player activity data was the source data for recommendation engine training. To make the data accessible by an ML learning model, it was processed in two different stages,

    • The raw data was aggregated using the Kinesis Data Analytics service’s Window function in 30-minute intervals (as specified by the client). It was then ingested using Kinesis Firehose and stored in S3 buckets.
    • Data in object storage and structured for databases cannot be used as is to train ML learning models. Hence, the aggregated data was restructured into the training dataset for the client’s recommendation engine. This dataset was built by deploying open source libraries like TensorFlow, MXNet, SparkML & SciKit Learn on AWS Sagemaker.

    The dataset trained the engine, which then generated the in-game purchase recommendations. These recommendations reached gamers across the globe with sub-millisecond latency via CloudFront Edge Locations. These personalized recommendations increased the probability of a user making an in-game purchase and enhanced player experience.

Architecture

Results and Benefits

The POC showcased the tremendous improvements that AWS offered the client over their current system, including, but not limited to:

  • Robust, nimble, scalable analytics platform as Agilisium’s expert architects leveraged best practices like AWS well-architected framework while designing the architecture
  • Data liberation from silos and a single source of truth for the entire platform
  • Dramatically reduced storage costs even at petabyte scale on AWS S3
  • Dynamic, scalable streaming data aggregation via Kinesis Data Analytics
  • Highly accurate, refined model dataset built on Sagemaker
  • CloudFront Edge locations enable sub-millisecond latency in the global delivery of in-game purchase recommendations

Our in-house experts are proficient in cloud gaming data analytics, an emerging niche technology space. Agilisium helps businesses stay ahead of the pack by helping them take the data-to-insight leap. Facing a similar challenge?  Please write to us.