Structured and analysis-ready data for AI/ML-based drug discovery

Employing AI/ML techniques to identify small molecules for therapeutic development.

Client’s requirement:

The client required high-quality, harmonized, and structured datasets of small molecules, encompassing comprehensive chemical, biological and pharmacological data. The final objective was to integrate the standardized small molecule datasets into their internal AI/ML platform for algorithm training, toward virtual hit identification.

Our approach:

Excelra’s Global Online Structure Activity Relationship Database (GOSTAR) provides a 360-degree view of millions of compounds, linking their chemical structure to biological, pharmacological, and therapeutic information. The heterogeneous and unstructured data captured from various sources is transformed into a structured relational database format in GOSTAR.

All the content in GOSTAR is captured manually and passes through a 3-step quality control process. These normalized and structured datasets cover structure-activity relationships (SAR), physicochemical properties, and ADMET parameters. They were integrated into the client’s internal platform to train the AI/ML algorithms for model building and activity/property prediction to support hit identification and lead optimization.