Authors: Ngoc Pham (Scientific Manager II, Bioinformatics)
Drug repurposing — the practice of identifying new therapeutic uses for existing drugs — has emerged as a critical strategy for accelerating treatment development while reducing costs and risks. Traditional drug discovery takes 10–15 years and costs over $2.6 billion per approved drug [1], while repurposed drugs can reach patients 3–12 years faster with significantly lower costs [2]. However, manually analyzing vast biomedical databases to identify repurposing candidates remains time-consuming and inefficient. Yet despite this promise, the process of identifying repurposing candidates remains fundamentally bottlenecked by manual analysis.
A researcher investigating a single disease must query multiple biomedical databases — ChEMBL, Open Targets, PubMed, ClinicalTrials.gov — synthesize hundreds of abstracts, evaluate safety profiles, cross-reference molecular mechanisms, and rank candidates based on complex criteria.
This manual process can take weeks to months for a single target-disease combination. And critically, it is not scalable: pharmaceutical companies need to evaluate hundreds of potential drug repurposing opportunities simultaneously.
This is where agentic AI for drug repurposing transforms the landscape. These systems employ autonomous agents that plan, reason, and execute complex workflows, mining biomedical databases and generating actionable insights in hours rather than weeks. The emergence of agentic AI in life sciences represents one of the most significant shifts in how pharmaceutical companies approach early-stage discovery — not by replacing researchers, but by extending what a research team can realistically evaluate in a given timeframe. We will explore how this technology is reshaping pharmaceutical research. [ADDED] For a broader introduction to how autonomous AI agents are being deployed across the life sciences sector, see Excelra’s overview of AI Agents: Transforming Intelligent Workflows in Life Sciences.
Understanding agentic AI in drug repurposing
Agentic AI differs fundamentally from traditional machine learning approaches. Rather than requiring explicit step-by-step instructions, these systems can set goals and determine their own execution paths. In drug repurposing specifically, they orchestrate multiple specialized agents, each designed to handle specific aspects of the repurposing workflow.
What makes agentic AI different from conventional AI tools? Traditional AI tools require explicit instructions for each step and cannot adapt to unexpected data. Agentic AI systems, by contrast, can:
- Plan autonomously: break down complex objectives into actionable steps without requiring manual task decomposition for each new repurposing question
- Make dynamic routing decisions: determine the best processing strategy based on candidate characteristics, clinical maturity, and available data
- Iterate and refine: self-critique results and adjust analytical approaches without human intervention at each cycle
- Coordinate across data sources: seamlessly integrate information from diverse biomedical databases, literature repositories, and safety databases
This autonomous planning and coordination capability is what distinguishes agentic AI from earlier generations of AI tools used in drug discovery — and it has direct implications for how pharmaceutical organizations should think about data readiness before deploying such systems. Excelra’s whitepaper on
Data Readiness for AI in Pharma and Biotech addresses exactly this prerequisite — how the quality and structure of your underlying data determines whether agentic AI delivers value or amplifies existing data problems.
A typical agentic drug repurposing workflow involves several specialized agents working in concert:
- Planning Agent: decomposes the repurposing objective into structured tasks with defined outputs for each downstream agent
- Data Mining Agent: executes sequential queries across biomedical databases including ChEMBL and Open Targets
- Routing Agent: classifies candidates and determines optimal enrichment strategies based on clinical development stage and mechanism of action
- Retrieval Agent: searches scientific publications and assesses evidence quality for each candidate-disease association
- Safety Agent: evaluates adverse event profiles using FAERS data and FDA label information
- Evaluation Agent: scores candidates using iterative refinement loops that incorporate feedback from other agents
This multi-agent architecture mirrors how research teams collaborate — each specialist contributing a distinct analytical layer — but operating at machine speed across hundreds of candidates simultaneously. However, it is important to note that agent behavior can vary between runs depending on model stochasticity and decision pathways, which is why proper validation and human oversight remain essential components of any production agentic AI drug repurposing system.
Figure 1. System-level view of the agentic drug repurposing pipeline showing how planning, retrieval, extraction, scoring, and evaluation agents pass structured outputs between stages.
Core workflow patterns driving efficiency
Agentic AI systems for drug repurposing leverage five fundamental workflow patterns that dramatically improve efficiency and accuracy across the candidate evaluation process.
Sequential workflows for biomedical database mining
The data mining process requires careful orchestration across multiple databases. Agentic systems implement a three-step sequential workflow designed to maximize both recall and precision in identifying viable drug repurposing candidates:
Step 1: Query ChEMBL for compounds binding to the target protein, filtering by bioactivity thresholds. The choice of threshold significantly impacts results: pChEMBL ≥ 6.0 (1 µM) is more suitable for exploratory searches, while ≥ 7.0 (100 nM) or ≥ 8.0 (10 nM) thresholds are preferable for well-characterized targets where higher selectivity is needed. In practice, starting at 6.0 and iteratively tightening based on hit volume often yields the best balance between coverage and specificity.
Step 2: Query Open Targets for clinically validated drugs targeting the same protein. Note that “clinically validated” definitions vary: Open Targets typically means Phase 2+ clinical evidence, but individual organizations often apply stricter internal criteria. Companies should expect to re-validate these associations against their own thresholds, particularly for safety and mechanism-of-action confirmation.
Step 3: Merge and deduplicate results, creating a unified candidate list ranked by binding affinity and development stage — the foundational input for all downstream agentic evaluation steps.
For organizations building or evaluating drug repurposing data assets, Excelra’s accelerated drug repurposing using advanced analytics whitepaper provides a detailed methodology for how structured analytical approaches — including database mining and candidate scoring — have been applied in real repurposing programs.
Prompt chaining for literature assessment
Evaluating scientific literature for drug repurposing evidence requires a two-stage chain where search results feed directly into LLM-based analysis:
Chain Step 1: Execute targeted PubMed searches based on drug-target-disease combinations, retrieving abstracts that cover preclinical evidence, clinical trial outcomes, and mechanistic studies.
Chain Step 2: Feed retrieved abstracts to a large language model that assesses strength of evidence, identifies supporting studies, and classifies support level across a four-tier scale: none, weak, moderate, or strong. A critical limitation to acknowledge here is that abstracts often omit key details — dosing regimens, adverse event frequencies, or subgroup analyses found only in full text. This can lead to incomplete or misleading evidence assessments for drug repurposing candidates, though full-text retrieval introduces significant additional complexity and cost.
This prompt chaining pattern allows the agentic system to combine the precision of structured database queries with the nuanced interpretation capabilities of modern LLMs — producing evidence assessments at a scale that would typically require weeks of expert review.
LLM-Based routing for adaptive processing
Not all drug repurposing candidates require the same depth of analysis. The routing agent dynamically selects enrichment strategies based on candidate characteristics, allocating computational resources in proportion to the scientific and commercial value of each candidate:
| Candidate Profile | Literature Strategy | Safety Strategy |
| Phase 3–4 approved drugs with known mechanism of action | Target-focused literature search | Comprehensive — FAERS + FDA labels |
| Phase 1–2 investigational compounds | Disease-focused literature search | FAERS only |
| Preclinical molecules | Broad search across databases | Basic safety screening |
Table 1. Routing logic that adapts literature and safety analysis depth to each candidate’s clinical maturity, preventing unnecessary compute on lower-priority drug repurposing candidates.
This intelligent routing reduces unnecessary computation while ensuring high-priority candidates receive thorough evaluation — typically achieving a 40–60% reduction in processing time compared to uniform approaches that apply the same depth of analysis regardless of candidate maturity [3].
The Evaluator-Optimizer pattern: Iterative refinement
Perhaps the most sophisticated pattern in agentic drug repurposing is the evaluator-optimizer loop, which mimics how research teams iteratively refine their analytical frameworks and candidate rankings over multiple review cycles.
The process works as follows:
- Initial Scoring: calculate composite scores from five components — target binding affinity, literature support, safety profile, development stage, and disease relevance — each weighted by the routing agent based on available data quality
- Critique Generation: an evaluator agent reviews current rankings and identifies potential issues — for example, flagging a high-affinity compound ranked low due to sparse literature when strong mechanistic rationale exists in the scientific record
- Optimization: an optimizer agent suggests score adjustments with explicit justifications, creating an auditable record of why each candidate’s ranking changed
- Re-ranking: apply adjustments and re-sort the full candidate list, surfacing compounds that initial scoring may have undervalued
- Approval Check: repeat the critique-optimize cycle until rankings are deemed appropriate or maximum iterations are reached
This iterative approach yields more robust drug repurposing candidate rankings than single-pass scoring, particularly when candidates have complementary strengths across different evaluation dimensions. The explicit justification trail also makes the output more defensible to scientific and regulatory audiences.
Figure 2. Evaluator-optimizer feedback loop showing how deterministic scoring, evaluator critiques, optimizer adjustments, and re-ranking iterate until convergence.
The evaluator-optimizer pattern has parallels in how Excelra approaches modular AI design for life sciences workflows more broadly. For additional context on how modular, composable AI architectures are being applied in pharmaceutical R&D, see our blog on Modular AI: Enhancing Efficiency and Impact in Life Sciences.
Practical challenges: Where agentic AI drug repurposing implementations go wrong
While the workflow patterns described above are technically sound, real-world agentic AI implementations in drug repurposing frequently encounter pitfalls that undermine their scientific and commercial value. Here are the critical pain points organizations face:
Insufficient Validation: the most common failure is insufficient validation of LLM-generated outputs. Teams often assume that because an agent retrieved and analyzed documents, its extracted drug candidates or evidence classifications are accurate. In practice, LLMs regularly hallucinate PubMed IDs, misclassify evidence tiers, or propose drug classes rather than specific compounds. Without rigorous validation layers cross-checking every extracted claim against source documents, these drug repurposing systems become unreliable. Building robust validation often requires as much engineering effort as the core pipeline itself.
Data Quality Over AI Sophistication: agentic AI amplifies data quality issues rather than solving them. Many organizations discover too late that 60–70% of their implementation effort goes to data harmonization and cleaning rather than AI sophistication. Inconsistent naming conventions, stale databases, and unstructured fields degrade agent performance immediately — regardless of the sophistication of the underlying models.
This is a recurring challenge across AI implementations in pharma and biotech. Excelra’s blog on why pharma’s AI future depends on data foundations examines how organizations that invest in data infrastructure before deploying agentic AI systems consistently achieve better outcomes than those that attempt to build the two in parallel.
Stochasticity versus Stakeholder Expectations: organizations struggle to explain to stakeholders why running the same drug repurposing query twice produces different candidate rankings. LLM stochasticity means agent behavior varies between runs, conflicting with expectations of software determinism. This becomes particularly problematic for regulatory submissions or when stakeholders question why a ranking changed between reporting cycles. Proper implementation requires probabilistic thinking and ensemble approaches — not treating agents like deterministic functions.
Uncontrolled API Costs: LLM API costs can spiral unexpectedly in production drug repurposing pipelines. A poorly optimized system might embed the same document hundreds of times, make redundant extraction calls, or fail to cache intermediate results. Without intelligent caching strategies, batch processing, and cost monitoring implemented from day one, monthly infrastructure costs can reach levels that make the entire agentic AI approach economically unviable compared to traditional manual analysis.
The practical reality is that agentic AI for drug repurposing is less about sophisticated agent design and more about engineering discipline, validation rigor, and data quality. Successful organizations treat it as a software engineering challenge with AI components — not an AI challenge with some software around it.
For a concrete example of how data quality and AI readiness challenges are addressed in practice before deploying AI-driven pipelines, see Excelra’s case study on structured and analysis-ready data for AI/ML-based drug discovery — which demonstrates how curated, AI-ready datasets underpin reliable downstream analytics.
Conclusion: The future of AI-Driven drug discovery
Agentic AI for drug repurposing represents an augmentation of human expertise — allowing researchers to focus on high-level strategy and scientific judgment while intelligent systems handle data integration, biomedical database mining, and evidence synthesis at scale. The multi-agent architecture explored throughout this article demonstrates how breaking complex repurposing problems into specialized, collaborative components yields more robust results than monolithic AI approaches.
As these agentic AI systems mature, we anticipate integration of real-world evidence from electronic health records, predictive modeling of clinical trial outcomes, and automated hypothesis generation for novel drug-target-disease combinations. The key challenge is not technical capability — modern LLMs and biomedical databases provide the necessary infrastructure for sophisticated drug repurposing pipelines. The challenge is thoughtful workflow design that respects the nuances of pharmaceutical research, regulatory expectations, and organizational readiness.
For organizations exploring agentic AI implementation in their drug repurposing programs, we recommend starting with well-defined repurposing questions where ground-truth validation data already exists, establishing clear success metrics before deployment, and maintaining human oversight during initial production runs. While the foundational technology shows significant promise, successful integration into discovery pipelines requires careful attention to validation, data quality, and organizational change management.
Excelra’s experience building data-driven drug repurposing programs — including collaborations with leading pharmaceutical companies — positions us uniquely to support organizations at this intersection of data science and AI-driven discovery. To understand how Excelra approaches drug repurposing as a scientific and data challenge, explore our dedicated data-driven drug repurposing blog and our work in AI agents in life sciences.
References
- Wouters, O. J., McKee, M., & Luyten, J. (2020). Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009-2018. JAMA, 323(9), 844–853. https://doi.org/10.1001/jama.2020.1166
- Pushpakom, S., Iorio, F., Eyers, P. A., et al. (2019). Drug repurposing: progress, challenges and recommendations. Nature Reviews Drug Discovery, 18(1), 41–58. https://doi.org/10.1038/nrd.2018.168
- Harnessing agentic AI in life sciences companies | McKinsey
- Seal, S., Huynh, D. L., Chelbi, M., Khosravi, S., Kumar, A., Thieme, M., … & Spjuth, O. (2025). AI Agents in Drug Discovery. arXiv preprint arXiv:2510.27130.
What is agentic AI and how does it differ from traditional AI in drug discovery?
Agentic AI refers to AI systems that can set goals, plan multi-step workflows, and execute those plans autonomously — adapting their approach based on intermediate results without requiring explicit step-by-step human instruction. In drug discovery, traditional AI tools are typically task-specific: a model trained to predict binding affinity does exactly that and nothing more. Agentic AI systems, by contrast, can orchestrate multiple specialized agents across an entire drug repurposing workflow — querying biomedical databases, synthesizing literature evidence, evaluating safety profiles, and ranking candidates — all within a single automated pipeline. The key distinction is autonomy and coordination: agentic systems can decide which databases to query, how deeply to evaluate each candidate, and when to seek additional evidence, rather than following a fixed script.
How does agentic AI accelerate drug repurposing specifically?
Agentic AI accelerates drug repurposing by automating the most time-intensive steps in candidate identification and evaluation. A researcher manually investigating a single disease-target combination might spend weeks querying databases like ChEMBL and Open Targets, reviewing PubMed literature, evaluating FAERS safety data, and ranking candidates against multiple scientific criteria. An agentic AI system can execute this same workflow across hundreds of target-disease combinations simultaneously, completing in hours what would take months of manual effort. The efficiency gain comes from three sources: parallel processing across multiple candidates, automated evidence synthesis using large language models, and intelligent routing that allocates analytical depth in proportion to each candidate’s scientific priority — preventing unnecessary computation on low-priority compounds.
What biomedical databases do agentic AI systems use for drug repurposing?
The most commonly used biomedical databases in agentic AI drug repurposing pipelines include ChEMBL — the primary source for compound bioactivity data and target-binding information — Open Targets for clinically validated drug-target associations supported by genetic and clinical evidence, PubMed for scientific literature and evidence synthesis, ClinicalTrials.gov for development stage and clinical outcome data, and FAERS (FDA Adverse Event Reporting System) for safety and adverse event profiles. The agentic architecture matters because different candidates require different database combinations: a Phase 3 drug needs comprehensive FAERS analysis and FDA label review, while a preclinical molecule may only require broad ChEMBL and literature screening. The routing agent dynamically determines which databases each candidate requires, optimizing both cost and analysis quality.
What are the biggest risks of using agentic AI in drug repurposing?
The most significant risk is insufficient validation of LLM-generated outputs. Because language models can confidently generate plausible-sounding but factually incorrect information — hallucinating PubMed identifiers, misclassifying evidence tiers, or proposing drug classes instead of specific compounds — every claim extracted by an agentic system must be cross-checked against source documents before it is used in decision-making. Other major risks include data quality amplification: agentic AI does not fix poor data, it amplifies existing inconsistencies across databases. Stochastic outputs create reproducibility challenges that conflict with regulatory expectations of deterministic software. And uncontrolled LLM API costs can make production pipelines economically unviable without careful caching and batch processing architecture. Successful agentic AI drug repurposing implementations treat these as engineering problems, not AI problems.
How much does it cost to implement an agentic AI drug repurposing pipeline?
Implementation costs vary widely depending on scope, data infrastructure, and validation requirements. At the low end, a focused proof-of-concept system evaluating a single target class might cost $50,000–$200,000 in engineering and data preparation — with ongoing LLM API costs of a few thousand dollars per month if properly optimized. Enterprise-scale systems covering multiple therapeutic areas, with full validation frameworks and integration into existing R&D data infrastructure, typically require $500,000 to several million dollars in initial investment. The most overlooked cost driver is data harmonization: organizations consistently report that 60–70% of actual implementation effort goes to cleaning, standardizing, and curating input data rather than to the AI architecture itself. Organizations that underestimate this consistently overspend and underdeliver.
Can agentic AI drug repurposing outputs be used in regulatory submissions?
Agentic AI outputs can inform regulatory submissions, but they cannot be submitted directly without significant human review, validation, and documentation. Regulatory agencies including the FDA and EMA require that any computational evidence supporting drug development decisions be reproducible, validated against ground truth, and accompanied by transparent methodology documentation. The stochastic nature of LLM-based agents — where the same query can produce different outputs in different runs — is fundamentally incompatible with the determinism that regulatory reviewers expect. Organizations using agentic AI to support repurposing decisions should implement ensemble approaches, maintain complete audit trails of agent decisions and data sources, and treat AI-generated rankings as input to human expert review rather than final conclusions. The FDA’s emerging framework for AI/ML in drug development provides additional guidance on evidence standards.
Building Your Agentic AI Drug Repurposing Capability?
Excelra combines deep pharmaceutical domain expertise with AI engineering capabilities to help organizations design, validate, and deploy agentic AI pipelines for drug repurposing and broader drug discovery applications. Whether you are assessing data readiness, designing multi-agent workflows, or building out candidate evaluation frameworks, our team is ready to help.
