How a F500 Pharma company reduced 30% of data warehousing cost by implementing Databricks SQL
How Databricks SQL analytics implementation enabled instant access to processed data for analytics using Spark features and Databricks Photon Cluster for faster query execution.
Highlights
30%
Tableau dashboard refresh time improved with the implementation of Databricks SQL analytics.
30%
Reduction in Cost with the implementation of Databricks SQL analytics
100%
Cost for Data Loading is waved off
45%
of Project delivery time saved by detailed pre-assessment
300+
Dashboards migrated from Redshift to SQL Analytics.
Client Profile
Amgen is one of the world’s leading biotechnology companies that is committed in unlocking the potential of biology for patients suffering from serious illnesses by discovering, developing, manufacturing, and delivering innovative human therapeutics.
Business challenges
- Maintaining the same data set in multiple places like Delta Lake as well as on the data warehouse creates additional costs for clients.
- Redshift takes a longer time to run SQL Queries compared to Databricks SQL, which impacts the Tableau Dashboard refresh time.
- The company experienced difficulty in accessing data from different systems and accessing ETL instantly.
Agilisium Solution
- All reports are created using Tableau Dashboards. Earlier, these dashboards were connected to the AWS Redshift warehouse to pull the processed data after ETL.
- ETL jobs used to load data into Redshift to make it available for Dashboards.
- All ETL jobs in the application run in Databricks. DBSQL can connect with both Delta Lake and Tableau.
- With Databricks SQL analytics we don't have to keep the same data in two places.
- We have migrated 300+ dashboards from Redshift to SQL Analytics.
Tech-stacks Used
Databricks SQL
AWS Redshift
Tableau
SQL Analytics
Cloud Fetch
spark
Business outcome
- Tableau dashboard refresh time improved by 30% and cost was reduced by 30% with the implementation of Databricks SQL analytics.
- Additional cost of Databricks Cluster used for Redshift Data Load is waived off.
- Processed data is instantly accessible for analytics after ETL Query runs using Spark, hence Spark features helps to improve performances. e.g Dynamic Partition Pruning, In-Memory computation, etc.
- Utilizes Databricks Photon Cluster which helps to run queries faster.
- Databricks SQL enables high bandwidth data transfers to BI tools with Cloud Fetch. Cloud Fetch allows extracts and large result queries to transfer much faster, reducing the uptime of compute.