What is the molecular code in the context of antibody discovery? (Educational)

The 'molecular code' refers to the precise amino acid sequence, particularly within the Variable Light (VL) and Variable Heavy (VH) regions of an antibody. This sequence directly determines the antibody’s structure, stability, and its binding affinity and specificity for a target antigen. Deciphering this code is essential for rational antibody design and engineering.

Why is Target Affinity Analysis crucial for AI-driven drug discovery? (Educational)

Target Affinity Analysis measures the strength of the interaction between an antibody and its target. In AI-driven discovery, this quantitative data is a vital feature for training machine learning models. High-quality affinity data allows the AI to accurately predict which sequence modifications will improve binding, stability, and overall therapeutic efficacy, accelerating lead optimization.

How did Excelra's Data Curation Services enable the client's AI platform? (Case-Study)

The client needed a comprehensive, structured dataset of Therapeutic Monoclonal Antibodies (mAbs) linked to SAR data. Excelra’s Data Curation Services meticulously extracted over 24,000 data rows from 538 patents, including sequence details (VL/VH), binding affinity, and stability parameters. This high-quality, analysis-ready data formed the robust training set required to enhance the client’s AI/ML algorithm for new target identification.

What key data variables were included in the curated dataset for Antibody Sequence Mining? (Case-Study)

The dataset included critical details for Antibody Sequence Mining, such as: Clone Details (specificity, orientation, target), full Sequence Details with emphasis on Variable Light (VL) and Variable Heavy (VH) regions, Binding Affinity (quantitative results), and Stability Parameters (thermostability, pharmacokinetics). This granular detail was vital for the client's AI model development.

What makes Excelra a preferred partner for complex data curation projects like this? (Service-based)

Excelra’s expertise lies in transforming complex, unstructured life science data—especially from patents and scientific literature—into high-quality, structured, and actionable datasets. Our systematic approach, detailed protocol documentation, and deep domain knowledge ensure the accuracy and completeness required for training sensitive AI/ML models in drug discovery, which goes beyond standard data extraction.

Does Excelra’s curation capability extend beyond monoclonal antibodies? (Service-based)

Yes, our comprehensive data curation capabilities extend to various therapeutic modalities beyond monoclonal antibodies (mAbs). We also provide expert curation for Oligomers, siRNAs, Peptides, and Antibody-Drug Conjugates (ADCs), ensuring a broad range of training data sets for clients engaged in diverse areas of drug discovery and development.

Case studies

Unveiling the Molecular Code: Antibody Sequence Mining and Target Affinity Analysis

Overview

The challenge in AI Drug Discovery is often not the algorithm itself, but the quality and structure of the training data. This case study details how we partnered with a leading APAC-based AI/ML client to provide comprehensive Data Curation Services, meticulously extracting clone, sequence, and target affinity information from patents. This process, centered on Antibody Sequence Mining, delivered a structured dataset focused on Therapeutic Monoclonal Antibodies (mAbs), which served as the bedrock for training advanced AI models. This enabled the client to accelerate their efforts in precision medicine and target identification, particularly in immune oncology. This successful engagement highlights the power of strategic data leveraging to unlock the full potential of antibodies.

Our client

The client is a cutting-edge AI Drug Discovery company, committed to revolutionizing the drug discovery journey. Leveraging their proprietary workflow AI platform, they generate critical insights from customized target identification to lead generation, enabling the development of commercially valuable drugs from in-house and partnership projects. They are based in the APAC region and operate within the AI/ML Industry.

Client’s challenge

The customer faced a critical need for a comprehensive, high-quality training set specifically focused on Therapeutic Monoclonal Antibodies (mAbs) and their associated Structure-Activity Relationship (SAR) data. This dataset was essential for enhancing their AI/ML algorithm to identify new targets in the highly competitive field of immune oncology. The current unavailability of such a specialized and tailored dataset presented a significant challenge, leading to limited target identification, compromised AI/ML performance, missed opportunities, and delayed innovation. (Read more about data preparation for predictive modeling in our whitepaper on selecting and preparing data for AI/ML predictive modeling).

Client’s goals

The primary goal was to obtain a robust, comprehensive, and meticulously structured dataset of therapeutic monoclonal antibodies and their target binding data. This dataset was intended to:

Enhance their proprietary AI/ML algorithm’s capability.
Enable the precise identification of new targets within the field of immune oncology.
Accelerate the overall drug discovery process towards developing novel, personalized treatments.

Our approach

Recognizing our established legacy in data curation and our proven track record of transforming vast amounts of data into valuable, actionable insights, the client partnered with us. Our approach was comprehensive and rational:

Defining the project scope

Clearly outlining specific objectives and deliverables to ensure complete alignment with the client’s unique requirements.

Identifying data sources

Conducting thorough research to pinpoint relevant and reliable data sources, primarily patents, containing the necessary information on Therapeutic Monoclonal Antibodies (mAbs) and their binding targets.

Creation of a data extraction template

Developing a structured template that included all mandatory data variables to guarantee consistency and completeness during the curation process.

Data variable identification

Pinpointing essential data variables crucial for the client’s AI/ML algorithm/model, ensuring the curated dataset was perfectly aligned with desired outcomes. (Explore our Data Curation services).

Protocol documentation

Creating detailed documentation to maintain high-quality standards throughout the data curation process.

Data delivery

Delivering the curated data in a compatible Excel format upon completion of each target curation, ensuring ease of integration with the client’s existing systems.

Our solution

Our solution was centered on building a proprietary repository of Therapeutic Monoclonal Antibodies (mAbs) against their binding targets, meticulously curated from patent data. This repository contained key data sections vital for advanced Antibody Sequence Mining and Target Affinity Analysis:

Clone details

Information on specific clones, including specificity, orientation, and binding targets.

Sequence details

Deep dive into the sequence of monoclonal antibodies, with a focus on variable light (VL) and variable heavy (VH) regions. This is critical for understanding molecular characteristics and functional properties.

Binding affinity information

Quantitative results and methodologies related to the strength and specificity of the antibody-target interactions.

Stability parameters

Insights into thermostability and pharmacokinetics, essential for drug development considerations.

We curated 538 total patents resulting in 24,222 total data rows and 9,705 total antibody sequences. The key targets we focused on included critical immune oncology markers (PD1, PDL1, CTLA4, TIGIT, CD3, CD52) and others like PCSK9, TNF-alpha, BCMA, BAFF, VEGFA, CD202, EGFR, HER2, and C5.

Key statistics from the Therapeutic Monoclonal Antibodies dataset curation: 9,705 total antibody sequences and 24,222 data rows curated.

Antibody Sequence Mining and Target Affinity Analysis process diagram for AI Drug Discovery.

Conclusion

This project success was driven by our systematic approach and deep expertise in Data Curation Services. The meticulous extraction of desired antibody sequences and target affinity data from patents, guided by a well-defined scope, ensured the highest level of accuracy and completeness. The resulting dataset of Therapeutic Monoclonal Antibodies (mAbs) provides valuable insights into amino acid sequences and binding affinities, directly empowering the client’s AI Drug Discovery platform. This strategic leveraging of high-quality, structured data is crucial for the advancement of precision and personalized medicine. (To see another example of how structured data enables AI/ML, view our case study: Structured and analysis-ready data for AI/ML-based drug discovery).

Previous ProjectEnterprise Ontologies and Scientific data management: Scibite CENtree with Benchling
Next ProjectIdentification of genomic biomarkers for cell line differentiation