Overview
Excelra collaborated with a US-based precision medicine biotech company to develop scalable, AI-enabled pipelines for cancer cohort data analysis using real-world evidence and somatic testing data. The objective was to accelerate oncology treatment identification through efficient data integration, AI-driven insights, and high-throughput analysis. Excelra addressed key challenges around multi-source data complexity, slow processing speeds, and regulatory compliance by implementing advanced analytics workflows, predictive modeling, and gene enrichment tools. The modular Python-based solution handled over 5 million patient records, reduced analysis time by 98%, and empowered researchers with robust, reproducible insights for personalized cancer therapies.

Our client
A US-based precision medicine biotech company specializing in tissue testing for identifying suitable patients for precision oncology screening developed a large amount of data including patient data, genomics data, NGS data and liquid biopsy data. Our client wanted help in data integration and deriving insights by developing pipelines for cancer cohort data analysis and visualization based on somatic testing data and real world evidence to cater to their data analysis needs.

Client’s challenge
1. Handling Large-Scale, Complex Data
The client was generating vast amounts of data, including patient records, genomics data, tumor data types for multiple cancers (lung, breast, CRC, prostrate), liquid biopsy data and high-throughput screening outputs.
This posed challenges in data integration, analysis, and deriving meaningful insights, necessitating advanced approaches for data management and processing.
2. Need for AI-Driven Insights
Traditional methods of data analysis were slow and inefficient for identifying relevant cohorts (e.g., based on mutation types, biomarker expression, treatment response).
The client required AI-powered analytics to derive predictive insights and support personalized treatment strategies.
3. Scalability & Compliance Requirements
The client needed a scalable solution that could support multi-million patient data records.
Ensuring compliance with HIPAA, GDPR, and other regulatory standards was crucial.

Client’s goals
Our client’s goal was to identify patients qualifying for precision oncology treatments by leveraging real -world evidence (RWE) and somatic testing data. With their existing analysis pipelines, they struggled with slow processing times and was unable to handle large amounts of data coming from multiple sources and varied data types. The client needed to develop scalable and efficient pipelines for cancer cohort data analysis and visualization.
Our approach
1. Initial Setup & Data Integration
Excelra worked closely with the client’s team to understand their data landscape and specific needs. The following steps were taken to ensure seamless data integration:
Data Access & Ingestion: Collaborated with the client to securely access their data like somatic testing results and health outcomes data.
Cohort Selection & Parameter Identification: Identified key parameters critical for selecting patient cohorts including cancer types, therapies, and biological characteristics such as age, treatment duration, and testing periods .
Data Cleaning & Standardization: Applied advanced data engineering techniques to clean, pre-process, and integrate data into the client environment, ensuring consistency and reliability.
2. Advanced Data Interpretation & AI-Driven Insights
Once the data was integrated, Excelra deployed AI/ML-powered analytics to derive meaningful insights:
Enrichment Statistics & Predictive Modelling: Applied statistical techniques to analyze single loci variations and monotherapy response patterns.
Cohort Selection & Stratification: Developed AI-driven algorithms to classify patients into precise subgroups based on genetic and clinical indicators using mono and combination therapies.
Gene & Pathway Enrichment: Leveraged bioinformatics tools to analyze gene-level and pathway enrichment using public data sources, assessing their implications for targeted therapies. This helped in providing valuable insights into biological pathways to the cohort data.
Scalability for Large-Scale Data: Optimized data pipelines to support over 5 million patient records, ensuring rapid analysis without compromising accuracy.
3. Final Delivery & Training
To ensure smooth adoption and usability, Excelra provided:
Advanced analytics workflows: Developed a comprehensive suite of Jupyter notebooks and Python modules for cancer cohort analysis, designed to handle large-scale datasets with robust scalability and performance.
Comprehensive Testing & Optimization: Rigorous validation and benchmarking to confirm accuracy and efficiency.
User Training & Documentation: Delivered detailed training sessions and user manuals to enable the client’s team to independently operate and refine the analysis workflows.

Our solution
Excelra developed a fully functional AI-enabled cancer cohort analysis notebooks and Python modules for automated and efficient data processing.
Increased efficiency: This enhanced the data processing efficiency by 98%, reduced manual effort and analysis time to approximately 4 hrs.
Data Processing: The pipeline developed was also capable of handling large amount of data coming from multiple sources and varied data types.
Scalability and Compliance: The solution was scalable allowing to accommodate increasing loads of data without compromising performance, and ensuring compliance with HIPPA, DGPR and PII cloud security standards.

Conclusion
This exemplifies Excelra’s ability to deliver scalable, AI-powered solutions that drive innovation in precision medicine and diagnostics.This case study highlights Excelra’s expertise in developing and deploying scalable solutions. By leveraging advanced data analytics, and domain-specific expertise, Excelra enables pharmaceutical and healthcare organizations to extract meaningful insights from vast datasets.
Through our innovative approach, Excelra ensures that our solutions are not only highly adaptable to different therapeutic areas but also capable of evolving with the rapidly advancing field of biomedical research. By integrating AI-driven methodologies with high-quality curated data, Excelra empowers researchers and clinicians to make informed decisions, ultimately driving advancements in personalized treatment strategies and improving patient outcomes.