Mutual fund is one of the top investment vehicles, with assets growing six-fold over the last decade and continued growth expected in the coming years. On any given day, investors conduct tens of thousands of transactions ranging from buying, selling, switching funds, updating personal information etc. The onus of maintaining these transactions lies with the Mutual Fund aggregators (who also function as the Registrar and Transfer Agent).
The aggregator in turn charges the investor a variety of fees, some of which are fixed and others are variable, towards managing the funds and transactions. These fees, which is an important source of revenue for the aggregators include – advisory fees, marketing & distribution fees, brokerage fees, trailer fees, custodial fees, transfer agency fees, legal fees and accounting fees.
Of particular interest here is the trailer fees, which is a payment made by the mutual fund manager to a broker, for selling the fund to an investor and providing investment advice and services. Unlike other service fees which are pretty much standardized, trailer fees vary constantly.
- Trailer fees are applicable only as long as their clients hold investments in the manufacturers’ mutual funds.
- It varies by the mutual fund held by the investor and the purchase option under which the investment is made
- Rates vary during the year and hence the duration & date when the fee is calculated matters.
Given this complexity of calculation and need for transparency, mutual fund aggregators’ IT systems undergo tremendous strain while generating periodic reports.
Our client, a financial services giant and a prominent mutual funds aggregator encountered a similar challenge with its increasing investor base. With an IT system currently running on-premises for data aggregation and analysis, they faced significant delays in computation of brokerage fees and regulatory reporting. Agilisium was brought in as a consultant, to perform ETL offloading on to the cloud and drastically reduce the processing time of trailer fees computation.
The mutual fund database consisted of records right from inception of each fund, dating back to the 90’s. Computing the trailer fee for this continuously growing database on a monthly basis, pushed the on-prem Oracle based system to its limits.
- The system had to ingest data volumes of the order of 500GB, which included 10 to 15 tables of 600 million records.
- The ingested data containing transactional records was diverse in nature requiring varied methods of processing.
- Each run of the trailer fee computation process required 52 solid hours. Any error or failure in the process required a complete re-run from scratch.
- Furthermore, the computation process overloaded the existing database, resulting in blocking all other processes for extended stretches of time. This also led to a backlog of transactions, rendering the database out of sync.
The client’s requirement was a system with a faster turnaround time, typically a few hours. To address this need, Agilisium suggested using AWS infrastructure to move data to the cloud, perform required ETL processing, and bring the results back to the Oracle database. The idea was realized as a PoC for one specific mutual fund, which required about 20 hours of processing in the Oracle database. Agilisium worked together with the AWS Professional Services Team in a primarily onsite-based delivery model over a period of four months to successfully complete the PoC implementation.
- A VPN gateway served as the link between on-prem Oracle database and the AWS platform.
- To migrate the database from Oracle to cloud, data was extracted using AWS Database Migration Service (DMS) and copied to Simple Storage Service (S3) buckets in Parquet format. This resulted in high data availability and a fully operational on-prem database during processing.
- The procedures stored in the Oracle database were re-engineered as Spark jobs. The data from S3 was read and processed by these Spark jobs on AWS Elastic MapReduce (EMR) to perform actual computation.
- Finally, with the help of Sqoop on EMR, the results from S3 were written back to the Oracle database. Sqoop allowed for writing data parallelly in multiple threads, speeding up the process effectively.
- To aid any-time querying of the transactional data for other analytics purposes, access to S3 was provided via the query service, Athena. Data references for the same were held in the Glue Data Catalog, populated using AWS Crawler.
The trailer fee computation process was completely re-engineered to suit the modified platform. The procedure was performed in three stages.
Getting the right data
The transaction data dealt with different types of funds – standard, SIP, switch and dividend based.
- Firstly, the eligible data containing only active funds for the month had to be extracted from the data dump.
- Secondly, the computation logic had to be decoded based on transaction type. The data was categorized into four. While standard, one-time transactions and monthly investments were straightforward to handle, transactions that involved a dividend fund, or a mid-way change in the fund itself required complex processing logic.
Finding the brokerage rate
The brokerage rate/percentage had to be computed for every individual transaction. The rate was decided on a brokerage structure that had 13 different levels defined based on parameters like purchase date, scheme, broker, asset, manager level etc. There was no one-on-one relationship between fund purchase and brokerage structure, and the rate changed continuously based on market value.
In order to arrive at the rate, computation was performed in two levels.
- With the possibility of every level of the brokerage structure applying to each individual transaction, a priority was defined for each brokerage level.
- A best match algorithm matched each of the records to a specific brokerage level based on the prioritization earlier defined.
From around 700 to 1000 possible combinations, the above processes reduced the output to around 270 million output records. The entire process was run in 4 parallel threads for each category of data transaction.
The output obtained was compared with the original Oracle output parameters consisting of 127 columns, to verify the authenticity of the computation process. However, there was found to be a mismatch with almost 20% of the results. On looking closely, the following issues were found and consecutively addressed.
- The way numbers are handled in Oracle and Spark are hugely different. Automated conversion of numbers from native to Spark resulted in decimal precision mismatches resulting in compounding errors.
- Complex transactions like the switch, although small in number, required fine tuning.
- Some transactions were not accounted for owing to breaks in the timeline, requiring processing them for different time durations.
Most of these issues were not straightforward and required manual intervention on an individual transaction level. Further, for every change, the entire process had to be completely validated to ensure successful logic mapping.
Once validated, the output was written back to the Oracle database. With the knowledge obtained from the PoC, the validation process has now been automated.
- The time taken for the ETL processing has been reduced from 20 hours to 2 hours.
- The accuracy of data matching between Oracle and Spark is now 99.99%
- With cloud resources being utilized only when needed, the cost is hugely reduced.
- Isolating the data processing has resulted in freeing up the on-prem database for continued usage.