Structured and analysis-ready data for AI/ML-based drug discovery

Employing AI/ML techniques to identify small molecules for therapeutic development.

Client’s requirement:

The client required high-quality, harmonized and structured datasets of small molecules, encompassing comprehensive chemical, biological and pharmacological data. The final objective was to integrate the standardized small molecule datasets into their internal AI/ML platform for algorithm training, towards virtual hit-identification.

Our approach:

Excelra’s Global Online Structure Activity Relationship Database GOSTAR™ provides a 360-degree view of million compounds, linking their chemical structure to biological, pharmacological and therapeutic information. The heterogeneous and unstructured data captured from various data sources is transformed into a structured relational database format in GOSTAR™. All the content in GOSTAR™ is captured manually and passes through a 3-step quality control process. These normalized and structured datasets covering structure activity relationship (SAR), physicochemical properties, and ADMET parameters were integrated into the client’s internal platform to train the AI/ML algorithms for model building and activity/property prediction to support hit identification and lead optimization.

In addition to GOSTAR®, Excelra’s expertise in Cheminformatics and Data Curation Services ensured seamless harmonization of complex datasets, empowering AI-driven discovery workflows.

For similar initiatives, explore our work on Activity Landscape Analysis for Compound Datasets, where structured insights accelerated compound prioritization.

Download case study