Case Study
Upgrading to Databricks Delta helps a Pharma major streamline Sales planning and forecasting


The process of taking a pharmaceutical drug from patent to consumers is complex. Given that the validity of a drug patent is only 20 years, most of this goes in the actual making of the drug and obtaining the FDA approval. Thus, the onus is on sales leaders to effectively market the drug, before it can be deemed generic. For this, a thorough understanding of the Life Sciences market, the sales allocation process and trends affecting sales is needed.

Our client, a Biotechnology and Life Sciences major, needed to revamp their existing data solution as the sales allocation efficiency achieved was not satisfactory. With the help of Agilisium, a high-performance engineering platform using Databricks Delta was setup, which gave significant benefits for the customer. The approach adopted is outlined below.

The Challenge

The client’s existing system ingested sales data which included territory wise customer data from field sales, in the form of batches. The typical data pipeline passed through 5 levels of classification before flowing into a data warehouse, to be made available for consumption in the visualization layer. This made the process of allocating sales data to be complex. As the quantity of data increased and the need to process data from streaming sources came into the picture, many challenges were uncovered.

  • For every batch of data ingested, the system required 5 hours for processing.
  • Data of magnitude of 20GB needed to be processed per day, which meant ingesting, processing and validating multi-million records spread across 100+ tables, which the current setup was not capable of handling.
  • Further, the data flow cycle consisting of disparate processes led to poor data quality. This ultimately affected decisions taken regarding supply chain, product distribution and sales.
  • Due to various business needs, sales resources reallocation was carried out frequently. This was a laborious manual process, requiring the client to identify multiple sales parameters and modify the supply chain equation suitably.

Given the multi-stage data pipeline, the process was time consuming. Further, the system only supported SQL based querying, allowing only for reactive analytics. The business required a system that could proactively analyze data volumes of the order of tens of GBs and provide business insights.

Solution Highlights

Agilisium, in collaboration with the client, embarked on an exclusive 4-week consulting jumpstart project and recommended that the client revamp the existing tech stack and re-engineer the data pipelines. An efficient architecture involving Databricks and AWS was designed. Over a period of 9 months, Agilisium employed a hybrid delivery model to build a scalable and efficient big data platform.


  • The existing setup was completely transitioned to one single platform leveraging Databricks with Delta Lake, which gave a unified end-to-end platform from data integration to analytics. Databricks aided massive scale data engineering and collaboration between engineering and data science teams.
  • With Delta on S3 serving as the Data Lake,
    • Data could now be delivered to the system in both streaming and batch formats
    • Data integrity could be maintained, thanks to features like Compaction, ACID Transactions, Time Travel and Z-Ordering.


  • Data was orchestrated through the workflow using Airflow, in the form of Directed Acyclic Graphs (DAGs), which helped schedule and manage data effectively. The Airflow scheduler executed tasks systematically, within specified dependencies, allowing for flexibility in the engineering process.
  • The data pipeline was custom designed, integrating the Sales Allocation logic right from the first stage. The new design allocated Sales resources efficiently, reducing the overhead involved.
  • In addition to sales allocation, Data Science models like Random Forest, K-means clustering, ARIMA and others were used to arrive at an effective prediction strategy for region-wise and season-wise sales. In effect, proactive and predictive analytics was enabled.


  • To ensure data compatibility between the original and the new system, Agilisium worked with a mixed approach – refactoring some of the existing pipelines, redesigning others, and completely remodeling the rest.
  • Given that data privacy and security is of paramount importance in Life Sciences, Agilisium took steps to ensure data security throughout the process by implementing data governance measures in line with PHI factor, HIPAA and other industry compliant measures. Here, optional AWS services like IAM, CloudWatch, CloudTrail and others were leveraged.
  • To understand, visualize and make sense of the data, a powerful dashboarding setup was established using Tableau.


Results and Benefits
  • The sales team immensely benefitted from the latest architecture, getting powerful insights like Y-O-Y Sales outlook, sales forecasts and market share by therapeutic area. Sales could now run smart campaigns leading to better sales opportunities.
  • The solution simplified the data pipeline, reducing the batch processing time from 5hrs to 90 mins.
  • Processing is now determined to be at 4x of the original system.
  • Increased sales revenue by more than 20%.
  • Achieved up to 20% total cost of ownership (TCO) reduction.
  • Helped achieve highest level of sales allocation accuracy, successfully reducing non sales interactions.
  • Helped to segment the customers by – recency, frequency and sales value, so that sales could accurately target the right customers.