Contact Us
Posts By :



Excelra’s GOBIOM is a comprehensive biomarker database comprising exploratory, preclinical and clinical biomarker intelligence. By providing critical insights into diagnosis, prognosis, treatment response, safety, efficacy and toxicity; GOBIOM facilitates data driven clinical and therapeutic decision making to accelerate drug development.

GOBIOM data coverage

The GOBIOM database is meticulously curated by our team of scientific experts who excerpt, enhance, and enrich biomarker data from various sources into a cutting-edge curation application. A variety of biomarker datasets encompassing experimental, pre-clinical, clinical, pharmacological and non-pharmacological, and treatment outcomes is captured using controlled vocabulary and ontologies.

Figure 1: GOBIOM Database – Statistical Highlights

Figure 2 (a): GOBIOM Database- Biomarker coverage by Nature

Figure 2 (b): GOBIOM Database- Biomarker coverage by Utility

Figure 2 (c): GOBIOM Database- Biomarker coverage by Study Population

GOBIOM updates

Figure 3: GOBIOM statistics on new biomarkers added (Nov'20-Jan'21)

GOBIOM provides information on proteomic, genomic, biochemical, imaging, physiological, clinical scoring scales and cellular biomarkers across 18 different therapeutic areas, covering over 6000 therapeutic indications. Below are the statistics on the updated biomarkers by their nature.

Figure 4: Biomarker statistics by nature

Biomarker data is curated and mapped to their reported utilities like diagnosis, prognosis, disease progression, treatment response, surrogate, efficacy, susceptibility/risk of disease and safety/toxicity. Below are the statistics on the updated biomarkers by their application.

Figure 5: Biomarker statistics by application

GOBIOM is updated on a fortnightly basis with latest biomarker data for indications across different therapeutic areas. Below are the highlights of indications updated by biomarker counts

Figure 6: Top indications by number of biomarkers updated

Content coverage

The GOBIOM database is composed of many different types of content, from scientific literature to publicly available datasets which include:

  • Clinical Trial Registries
  • Journals Scientific Reviews
  • Conferences
  • Patents
  • FDA/EMEA Reports
  • Company Websites
  • Public Sources

Curation process

The content in GOBIOM is assessed and selected for indexing based on the information it covers, such as experimental, pre-clinical, clinical, pharmacological & non-pharmacological, and treatment outcomes coverage. Records identified for indexing, are manually curated and reviewed by a team of scientific experts, trained in translational science. The processed records are enriched using a set of controlled vocabulary and ontologies to standardize and create a relational data model connecting biomarkers to diseases & drugs.


Drug resistance and inadequate response to commonly administered drugs across different therapeutic indications poses a major challenge to clinicians and researchers. This leads to unnecessary heath care costs and is incurring additional research expenditure. Despite marked improvement in new therapies, many patients experience progression of disease or disease recurrence asserting the need for early detection of drug response/resistance by evaluating biomarkers. Certain biomarkers are predictive of drug response and possess high potential for use in general clinical applications for personalized medicine.

Although choosing the right biomarkers associated with drug response and drug resistance presents a major challenge to researchers, it is essential for helping design effective patient stratification strategies. Excelra’s GOBIOM (Global Online Biomarker Platform) helps in understanding clinical implications of genetic alterations and their relationship with gene-drug-disease system by interpreting scientific literature. The GOBIOM biomarker database collates genomic variation data from disparate sources and stores them in a highly structured and easily accessible forms to help researchers gain new insights into tumor biology and predict patients’ responses to treatment.

Variant analysis platform in GOBIOM to query gene variants

The variant analyzer platform in GOBIOM enables the user to study variants within the context of the gene and visualize the gene variant data in UCSC genome browser. The Gene coordinate from the database in GRCh37 or GRCh38 format for example, would be the input parameter in UCSC genome browser, which has various annotation tracks beneath genome coordinate positions, allowing for rapid visual correlation of different types of genomic data.

Figure 1: Queried gene coordinates are visualized as a two layered chromosomal view, where outer layer represents the chromosome number and inner layer represents the location of the gene coordinate.

The 'Chromosomal View Tool' in GOBIOM enables search of gene variants by Entrez ID, Chromosome number and Disease name

Gene variants in GOBIOM are represented as a three layered chromosomal view where, the outer layer represents chromosome number, the middle layer represents GRCh38 genomic build and the inner layer represents GRCh37 genomic build. Curated data pertaining to gene variants can be accessed by clicking on the gene coordinate data points in the chromosomal view visualization.

Figure 2: Chromosomal view platform in GOBIOM enables search of gene variants by Entrez ID, Chromosome number and Disease name.

Figure 3: Gene variants search by Disease name. The image on the right side represents gene variants in Breast cancer.

In this manner, focused biomarker databases like GOBIOM can be a very useful resource to identify biomarkers predictive of drug response or resistance, further facilitating selection of right patient population who are most likely to respond to treatment.


GOBIOM is built from exploratory, preclinical and clinical content to provide critical insights into diagnosis, prognosis, treatment response, safety, efficacy and toxicity. GOBIOM is the biomarker intelligence database to drive clinical and therapeutic decisions and to accelerate drug development.

GOBIOM Data Coverage

GOBIOM database is meticulously curated by our team of scientific experts who excerpt, enhance, and enrich biomarker data from various sources into a cutting-edge curation application. A variety of biomarker datasets encompassing experimental, pre-clinical, clinical, pharmacological and non-pharmacological, and treatment outcomes is captured using controlled vocabulary and ontologies.

Figure 1:GOBIOM Database – Statistical Highlights

GOBIOM Updates

Figure 2: Updates between Mar’21 & June’21

GOBIOM+ provides information on Proteomic, Genomic, Biochemical, Imaging, Physiological, Clinical Scoring scales and Cellular biomarkers across 18 different therapeutic areas, covering over 6000 therapeutic indications. Below are the statistics on the updated biomarkers by their nature (Fig 3).

Figure 3: Biomarker Statistics by Nature

Biomarker data is curated and mapped to their reported utilities like diagnosis, prognosis, disease progression, treatment response, surrogate, efficacy, susceptibility/risk of disease and safety/toxicity. Below are the statistics on the updated biomarkers by their application (Fig 4).

Figure 4: Biomarker Statistics by Application

GOBIOM is updated on a fortnightly basis with latest biomarker data for indications across different therapeutic areas. Below are the highlights of indications updated by biomarker counts (Fig 5)

Figure 5:Top indications by number of biomarkers updated between Mar’21 and June ‘21

Content Coverage

The GOBIOM database is composed of many different types of content, from scientific literature to publicly available datasets.

  • Clinical Trial Registries
  • Journals Scientific Reviews
  • Conferences
  • Patents
  • FDA/EMEA Reports
  • Company Websites
  • Public Sources

Curation Process

The content is assessed and selected for indexing based on the information it covers, such as experimental, pre-clinical, clinical, pharmacological & non-pharmacological, and treatment outcomes coverage. Records identified for indexing, are manually curated and reviewed by a team of scientific experts, trained in translational science. The processed records are enriched using a set of controlled vocabulary and ontologies to standardize and create a Relational Data Model Connecting biomarkers to diseases & drugs.


GOBIOM (Global Online Biomarker Database) is the world’s largest manually curated database of validated and putative biomarkers, providing critical insights into the relationship between biomarker and disease. GOBIOM as a one-stop platform provides clinical and pre-clinical information on biochemical, genomic, imaging, metabolite, clinical scoring scales and cellular markers spanning across 18 different therapeutic areas. The database covers ~3400 therapeutic indications with reported utilities like diagnosis, prognosis, disease progression, treatment response, surrogate, efficacy and toxicity.

A proprietary tetrahedron model is adopted in the framework of database, which links biomarkers, indication, drug, target and study population for simplifying biomarker data analysis. Information in GOBIOM database is gathered from diverse data sources which include clinical trials, scientific conferences, regulatory-approved documents, literature databases, patents etc.

GOBIOM Features

Primary emphasis lays in connecting the biomarker data with analytics enabling faster access to precise actionable information. Querying the database is easier using different search options like Quick search, Basic search, Advanced search and Batch query. Enhanced user experience is facilitated by coupling the search results with visualization to derive meaningful insights. One of the critical highlights is the comprehensive and structured reports such as the biomarker report, disease report, drug report, and biomarker indication reports with integrated filtering and search options.

The database has customized sections for Variant-PGx, disease enrichment markers, marker-marker correlation, biomarker expression profile, drug labels, and approved diagnostics. A host of diverse analytics tools like comparative analysis, heatmap and cluster analysis, gene variant analysis, dashboards and chromosomal view offer deeper insights, intelligence and innovation in biomarker discovery. Equipped with the advanced microservice architecture GOBIOM offers added value in terms of high-quality content, user-friendly UI, intuitive reports, powerful analytics, and eye-catching visualizations.

Real world applications of GOBIOM

GOBIOM can help address the following real-world questions in clinical research and drug development:

  • Insights on translational, emerging and established biomarkers
  • Identify proteins/genes targeted by drugs
  • Identification of biological pathways in which biomarkers are involved
  • Select patients who are likely to respond to treatment
  • Identify efficacy biomarkers of drugs for effective designing of clinical trials
  • Select safety biomarkers to understand the toxicity profile of drugs
  • Identification of biomarkers predictive of drug resistance and failure of treatment
  • Find biomarkers indicative of disease progression and for monitoring of disease
  • Understand the competitive landscape in a given therapeutic area or a specific indication
  • Selection of pre-clinical/animal models that are used to evaluate safety and efficacy of biomarkers
  • Insights on probable off-label use of approved drugs in other indications
  • Track pre-clinical to clinical translation of a biomarker

Who does GOBIOM benefit?

GOBIOM caters to biomarker research needs of diverse scientific community comprising system biologists, clinical/pre-clinical scientists, translational researchers, clinical development teams, pharmacology/toxicology researchers and diagnostics development teams across varied customer segments including Pharma, Diagnostics, Academics, Research institutes, Biotechnology, and AI/ML companies. It is a powerful solutioning platform addressing critical questions in the R&D cycle from drug discovery to clinical development.


The drug development process is complex and expensive, owing to the lack of access to appropriate data. Clinical research should address the entire disease process, from risk, through diagnosis to treatment and outcome. The true value of a biomarker database should be measured by the critical questions that it can address for the success of the clinical trials and accelerating drug development. The high-quality GOBIOM biomarker intelligence platform can be utilized for informed clinical and therapeutic decision making.


Genesis of the concept of FAIRification

The information age (mid-20th century onwards) witnessed a boom in data generation and digitization. The current century is an era of not only data creation but also data analytics which yields the true value of data. The effective utilization of the data to enhance value for all is an evolving concept which resulted in the inception of ‘FAIR’ data practices. FAIRification comprises of 15 guiding principles outlined by Wilkinson et al (2016) which are aimed at enhancing the findability, accessibility, interoperability and reusability of data. It is a way of connecting and harnessing the power of data being generated to maximize its utility.

Each principle has a core set of values which reflect its utility. The Findability principle assigns the data/metadata a universality providing recognition for research. Accessibility operates on accountability. It reflects the openness amongst the data providers to provide access to the data. Interoperability provides ease and equal accessibility to the data. Reusability empowers multiple uses of the data thereby increasing its impact. With these core values FAIRification can have a significant positive impact on the mindset of the data creators, making the community more inclusive for a more meaningful utilization of data2.

Figure 1: FAIR guiding principles as outlined by Wilkinson et al 20161

Need of the hour (Challenges, Necessity and Current approaches)

Conventionally, data lakes and data warehouses were constructed to manage and disseminate high-quality data consistently and with ease. Protected, siloed data however turned out to be a major impediment to knowledge discovery and innovation among relevant stakeholders. Lack of community standard ontologies to normalize heterogenous data, high costs and intense efforts incurred to regulate digital processes and resources, also proved to be key challenges to productivity and cross-domain collaborations. Need of the hour was to enable efficient data management, governance and availability. Implementation of the FAIR guiding principles results in breaking down data silos that will make make data available to both humans and machines.

Aligning FAIRified data is needed at multiple levels with high priority for:

  • Unlocking scientific transformation – Adopt a work force culture shift from ‘my data’ to a ‘corporate’s valued asset and spawn FAIR data sharing amongst multiple benefactors for data transformation. Scientific queries will be answered more rapidly in an ad hoc and flexible manner.
  • Reducing expenses – As per a recent EU report, not having FAIR data costs an estimated €10.2bn/year to the European economy3. Hence, FAIRification is urgently required to reduce costs and risks towards data discovery, enhance rewards, and generate long-term return on investment (ROI).
  • Increasing strategic value addition – Harness the power of AI/ML on FAIR data to accelerate the creation of new, valuable data assets and associations. The time across the value chain from R&D to outcomes will significantly reduce, productivity will increase and drug pipelines can be accelerated.
  • Minimize data wrangling – The time, price, and effort invested in gathering, selecting, cleaning, and transforming raw data into high-quality, standardized analysis-ready formats could be decreased by using FAIRified data.

FAIR principles are anecdotal and act as guidelines in the FAIRification process. The current approaches evaluate FAIRness through crucial maturity indicators and quantifiable metrics applied to data, metadata, and associated infrastructure4. Scoring for findability and accessibility is attained at the metadata level in one go, but assessment of interoperability and reusability might entail intensive iterations. The FAIRification process can be broadly categorized into the following steps:

  • Retrieve non-FAIR data: Access data to be FAIRified
  • Analyze retrieved data: Examine data content with respect to concepts, structure, relationships between different data elements, different data identification methodologies and analysis, provenance, etc.
  • Define semantic model for data: Use community, purpose, and domain-specific ontologies and controlled vocabularies to describe and define dataset entities, concepts, and relations in an accurate, unambiguous and machine-actionable format
  • Make data linkable: The non-FAIR data can be transformed into linkable data by applying the semantic model defined in step 3 using Semantic Web and Linked Data technologies. This ensures interoperability and reuse, facilitating the integration of the data with other types of data and systems
  • Assign license: Ensure data license information is included, else reuse of data might get hampered
  • Define metadata for the dataset: Ensure that the data is described by proper and rich metadata to support all aspects of FAIR data assessment
  • Deploy FAIR data resource: Deploy or publish the FAIRified data, together with relevant metadata and a license, so that the metadata can be indexed by search engines and the data can be accessed, even if authentication and authorization is required.

Figure 2: FAIRification workflow adapted from GO FAIR4.

Utility and benefits

The benefits of FAIRifying data are multi-pronged. For research communities, the obvious benefits include seamless data acquisition, semantic calibration, integration and data analytics5. This has multi-fold utility in terms of cutting down research and development time and promoting a virtual knowledge network amongst the scientific community. This leads to significant advances in the knowledge of domains in a relatively short period of time. The benefits to the biopharmaceutical sector are huge as well. These include reduced time for drug discovery due to data sharing and clear data reuse policies across the sector, increased innovation in personalized medicine by use of real-world data and availability of high-quality data for AI/ML based analytics.

From the point of view of business, the impact is three-fold namely; financial, operational and customer oriented6.

  • Financial impact: A study by Barua A et al measuring the impact of effective data on business shows that upon improving the usability of data by a mere 10% there is an estimated increase in sales per employee by 14.4%. Also, reduction in the effort and time to make data useful for the user results in a significant improvement in the productivity per employee6.
  • Operational impact: It involves effective utilization of assets, accurate planning and forecasting6.
  • Customer-oriented impact: It results in a better ability to innovate in relatively short periods of time6.

Thus, processes like FAIRfication which are intimately involved in improving data sharing and usability would have an overall long-term positive impact on business6.

These benefits are now being recognized across sectors and many organizations are investing time and effort towards FAIRification with a long-term astute goal in mind.

Excelra’s approach

Excelra understands the importance and the value thereof of FAIRifying data. Evaluating FAIRness is the fundamental step towards FAIRification. Considering the benefits to the biopharmaceutical sector, as a first step Excelra has devised a streamlined process with a customizable questionnaire and standard operating procedure (SOP) for evaluation of a given database for its compliance to the FAIR principles. Databases and their associated data and metadata are assessed by domain experts. A specialized quantitative assessment assists our partners to better understand the extent of compliance. A detailed report along with recommendations is provided to understand the steps for better compliance to the FAIR guidelines.

Figure 3: Schema for FAIR evaluation of a given database.

Case study

As a first step towards understanding the FAIRness of the existing databases, 12 public databases covering proteins, drugs, genes, pathways and diseases were evaluated for their compliance to the FAIR principles. The assessment was based on the methodology developed by Excelra elaborated in the previous section. The databases assessed are enlisted below-

  • Proteins– PDB, Binding DB, UniProt
  • Drugs/Chemicals entities– PharmGKB, ChEMBL, PubChem, DrugBank
  • Genes– NCBI Gene, Ensembl, GWAS catalog
  • Pathway – Reactome
  • Diseases– DisGeNET

The questionnaire and SOP were extensively utilized for scoring and evaluating each database. Based on the assessment and scoring the results were collated.

Figure 4: Summary of the quantitative assessment of 12 public databases for FAIR compliance.

Salient features from the analysis are:

  • All databases are compliant to >13 of the 15 principles.
  • All the databases irrespective of their themes have relevant descriptive metadata elements incorporated. These are usually compliant to the FAIR principles related to metadata and their associated identifiers.
  • Findable- The 12 databases evaluated are public and are compliant to the ‘Findable’ principle. PharmaGKB and GWAS catalog are partially compliant to F2 as they lack certain metadata types.
  • Accessible- Most databases evaluated, have data dumps accessibility. These are compliant to the ‘Accessibility’ principle. However, 5 databases are not compliant with the A2 principle. This principle discusses the availability of metadata even after the data is no longer available. This indicates that some databases do not make versions of their databases available. Lack of retrospective data and or lack of evidence of the existence of data leads to A2 non-compliance.
  • Interoperable- Databases evaluated in the current study show compliance to the principles of ‘Interoperability’. ChEMBL however, is not I3 compliant as the chemical entity output does not include references to related metadata
  • Reusability- Certain databases are partially compliant to R1. It is observed that the ‘About’ page of these databases do not provide all the requisite information. All the databases are compliant with the R1.1, R1.2 and R1.3 principles.

FAIR being a new evolving concept which has come up in the past decade, many databases are yet to be fully FAIR compliant. Although it is observed that most databases in the life sciences domain frequently utilized by both academicians and in industries are mostly FAIR compliant. The extensive use of these and their proven utility across years is also a proof to how being FAIR compliant has helped them.

Excelra’s edge

The key differentiators that set Excelra apart from other experts in FAIRness evaluation can be summarized in the following points:

  • Excelra holds 18+ years of experience in data sciences with 60+ PhDs in 600+ talent pool. We are associated with 90+ clients across the globe and provide expert support in various capacities to 15 of the top 20 pharma companies
  • The organization is equipped with data, deep domain expertise and data science capabilities
  • Vast experience in related services including but not limited to Data curation, Data annotation, Data validation, Ontology management, Data wrangling, Data management & integration
  • Domain experts are well-acquainted with diverse range of data types from Discovery to Real World from various data sources
  • Excelra possesses wide variety of in-house data analysis tools
  • Excelra is well versed in delivering tailored end-to-end database solutions to various pharma, biotech, healthcare and AI/ML companies
  • Finally, the presence of a multidisciplinary blend of Math, Computation, and Life Sciences expertise under one roof enables Excelra to offer customized FAIRification solutions with a quick turnaround time

Future outlook

FAIRifying data will thus, accelerate data driven scientific and knowledge discovery. By adopting FAIR data, the scientific and industrial communities will be able to capitalize on the benefits of new age technologies such as AI/ML in further reducing cost and time.


As the novel coronavirus pandemic continues to spread in an unyielding manner, COVID-19 has brought about unprecedented social, economic and health disruptions across the globe. Furthermore, there are currently no specific anti-viral drugs or vaccines to effectively treat or prevent COVID-19. Developing new drugs or vaccines from scratch is a lengthy process, and thus, is impractical to face the immediate global challenge.

Drug Repurposing - A New Hope

In the current crisis, in order to achieve fast and reliable outcomes, repurposing existing drugs can help identify potent treatments against novel coronavirus, and is the only realistic option available. Researchers worldwide have come up with several drug repositioning approaches including: virtual screening procedures, employing docking of different databases containing FDA approved drugs, SARS-CoV-2-human protein-protein interactions mapping, diseases-related molecular networks, drug-target interaction deep learning, iterative network-building, text mining, etc.

Various classes of drugs have been identified in these drug repurposing programs, including many that have important physiological and/or immunological effects such as those that affect viral proteases, viral envelope proteins, replication machinery, neurotransmitter regulation, cytokine signalling, immune modulation, kinase signalling, lipid metabolism, protein processing and DNA synthesis or repair.

However, most of this crucial data is dispersed across numerous publications, reports, databases and knowledge-repositories.

Excelra's COVID-19 Drug Repurposing Database

At this critical juncture, we at Excelra have decided to extend support in solidarity to the ongoing global scientific efforts aimed at identifying safe and effective therapeutic options to treat those affected by the novel coronavirus disease. To this end, our expert scientific team has consolidated the COVID-19 Drug Repurposing Database.

This ‘open-access’ database presents a landscape of ‘Approved’ small molecules and biologics with known preclinical, pharmacokinetic, pharmacodynamic, and toxicity profiles; which can rapidly enter either Phase 2 or 3 or may even be used directly in clinical settings against COVID-19. The database additionally includes information on promising drug candidates that are in various ‘clinical, pre-clinical and experimental’ stages of drug discovery and development.

Supported with referenced literature covering the holistic landscape of drug, disease, target, and mechanism of action; we aim to provide critical insights into SARS-CoV-2 biology and mechanism of COVID-19 disease pathogenesis.


Here is a glimpse into the platform showcasing the features described above:

We hope that these drug repositioning approaches and identified drugs can help the global biotech and pharma community design rapid clinical trials, for developing treatments against COVID-19.

Drug Repurposing at Excelra is powered by our Global Repurposing Integrated Platform (GRIP) that combines proprietary repurposing databases, algorithms, analytics tools and a visualization engine. The database within GRIP has been built by amassing chemical data (over 7 million chemical entities), biological data and clinical data (over 200,000 data points) which together contribute to more than 10 million associations among ‘drug-disease-target’ triads.

About Excelra:

Excelra’s data and analytics solutions empower innovation in life sciences across the value chain from discovery to market. The Excelra Edge comes from a seamless amalgamation of proprietary curated data assets, deep domain expertise and data science. The company’s multifaceted teams harmonize and analyse large volumes of disparate unstructured data using cutting-edge technologies. We galvanize data-driven decisions to unlock operational efficiencies to accelerate drug discovery and development. Over the past 18 years, Excelra has been the preferred data and analytics partner to over 150 global clients including 15 of the top 20 large Pharma companies.


COVID-19, a pandemic caused by the novel SARS-CoV-2 virus, has rapidly spread across the world in an unprecedented and devastating manner. Currently there are over 5 million confirmed cases and over 360 K deaths as reported by the World Health Organization. Furthermore, there are currently no approved drugs or vaccines to specifically treat the SARS-CoV-2 virus. In these turbulent times, fervent research endeavors are underway by several global biopharma companies, research institutes, hospitals and government organizations; with the single-minded aim to develop novel drugs and vaccines to treat COVID-19 patients in a safe and effective manner.

Excelra’s COVID-19 Biomarker Database is our contribution to the global scientific community, to help identify biomarkers from published clinical trials against the novel coronavirus disease. The database is a collection of manually curated clinical biomarkers, meticulously annotated by our data-scientists, to support the development of drugs/vaccines for treating COVID-19. With the number of clinical trials increasing by the day, identification and selection of potential biomarkers for inclusion in the clinical trials is of paramount importance for the success of COVID-19 clinical studies.

This ‘Open-Access’ biomarker database is excerpted from our GOBIOM platform –  the world’s largest biomarker intelligence database. By providing insights into the relationship between biomarkers and the SARS-CoV-2  infection, the biomarkers are classified based on their nature as: Proteomic, Genomic, Biochemical, Cellular, Physiological, Imaging and Scoring scales, to simplify biomarker data analysis. Furthermore, to identify the ‘utility’ of biomarkers in a given clinical study, the individual biomarkers are mapped to their respective FDA-NIH recommended BEST (Biomarkers, EndpointS, and other Tools) category. This includes classification across the pharmacodynamic/response, diagnostic, susceptibility/risk, monitoring, prognostic, predictive and safety categories (depicted in Figure 1).

Figure 1: *Numbers indicate clinical trial count

Each biomarker category is further assiduously mapped to the Context of Use (COU) of biomarker, which is a summary on the potential utility of biomarker in the given study and the drug/vaccine that is being investigated in the clinical trial. With direct links to ‘referenced literature’, each biomarker is further sorted into primary, secondary or tertiary categories based on biomarkers’ involvement in the disease pathophysiology.

Biomarker Intelligence from Meta-Analysis of COVID-19 Research & Clinical Trials

Meta-analysis of clinical trials in our COVID-19 Biomarker Database identified SARS-CoV-2 RNA, C-reactive protein, Interleukin-6, D-Dimer, TNF-Alpha, Ferritin, Immunoglobulin G, Lactate dehydrogenase, Interleukin-10 & Immunoglobulin M as most researched biomarkers in clinical trials. (Figure 2)

Figure 2: *Values represent number of clinical trials evaluating the biomarker

All the aforementioned information has been captured and presented in a user-friendly dashboard that allows quick search and filtering options. Below is a snapshot of the user-interface. (Figure 3)

Figure 3: COVID-19 Biomarker Database Dashboard

The database will be regularly updated with new biomarkers from clinical trials published in

We hope that our COVID-19 Biomarker Database can help the global biotech and pharma community to expedite the development of treatments to combat COVID-19.


Any data that requires support from technological and infrastructural investments in order to get meaningful insights is defined as “Big Data.” The main reasons contributing for Big Data are the exponential growth in the data due to increased usage and the requirement to integrate these datasets for gaining valuable insights. A good example is the data in drug discovery processes (1).

This blog aims to provide insights into various types of Big Data in drug discovery, and highlights the applications of Big Data in fast-tracking the drug discovery process by using machine learning (ML) approaches.

What is Big Data in drug discovery?

Big data in drug discovery refers to the data collected from biological, chemical, pharmacological and clinical domains (2). The attributes that define the characteristics of these datasets include: fast producing, large size, complex, heterogeneous and high value data with commercial opportunities. Some of the large datasets of use in drug discovery processes are highlighted below:

Biology datasets:

Biological data provides insights to understand the underlying mechanisms associated with disease state, prediction and validation of potential target proteins for therapeutics, development of new bioassay techniques for identifying treatment modalities associated with potential targets, predictions on how treatments will interact with the body when given to a patient and finally assistance in the design of effective clinical trials (2).

The data types that define biological data are: drug target data, OMICS data (genomic, transcriptomic, proteomic and metabolomic data), exome data, GWAS data, gene expression data, disease-relevant animal and cellular models data, gene knockout or knockdown data etc.

Chemistry datasets:

Chemistry datasets are useful in the design of high-throughput screening libraries which assist in identifying and validating therapeutic targets in silico. These datasets assist in the prediction of molecular properties required for drug compounds and help provide insights in understanding how those molecules interact with biological macromolecules (3).

The data types that define chemistry data are: chemical structural representations, chemical line notations or identifiers (SMILES & InChI), molecular property descriptors, topological descriptors, topographical descriptors, structure-activity-relationship (SAR) and compound specific biological data.

Pharmacology datasets:

Pharmacological data in drug discovery provides information about the compounds or drugs tested in animal models in combination with assay data on protein targets in cell- or tissue- based models that allows the investigation of the effects of compounds at different levels of biological complexity (4).

The data types that define the pharmacological data are: absorption, distribution, metabolism, elimination, toxicity (ADMET) data, functional in-vitro assay and in-vivo assay properties.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Clinical datasets:

The clinical datasets in drug discovery provide the valuable information in relation to the patient data (5).


The data types that define the clinical datasets are: safety and efficacy data, treatment response and side-effect profiles, patient stratification data, competitive landscape and trial design data.


The information contained in all the aforementioned large and complex datasets offer opportunities to explore and understand mechanisms associated with a disease state, and provides the possibility to prevent and treat such conditions.

What is artificial intelligence and what are its applications in drug discovery?

Scientists working globally in drug discovery research generate voluminous pharmaceutical Big Data which is by nature, multisource and multidimensional. It is becoming increasingly difficult to not only stay informed on all the available literature, but also, to properly parse and integrate this Big Data into one’s own work-flows within various research projects. In order to overcome the hurdles associated with Big Data in drug discovery, pharmaceutical or information technology companies adopted artificial intelligence (AI) technologies to provide robust solutions that could fast track the drug discovery process.

When a machine exhibits human cognitive skills like the ability to learn and solve a problem, then the term describing the actions of the machine is defined as artificial intelligence (AI) (6). AI comprises of technologies like Machine Learning and Deep Learning methods. Machine Learning methods are well established for learning and prediction of novel properties, while Deep Learning methods show great prospects in drug design owing to their powerful generalization and feature extraction capability. Both these methods have made remarkable progress, in their usefulness and applicability and offer opportunities across all stages of drug discovery (7,8) .

Some of the applications of artificial intelligence in drug discovery include:

  • Protein design and function
    • Prediction of protein folding
    • Prediction of protein-protein interactions
  • Hit discovery
    • Generation of chemical libraries or new molecule fingerprints
    • Virtual screening
    • Drug repurposing
  • Hit to lead optimization
    • Generating models for de novo design of drugs
    • QSAR models prediction
    • Prediction of molecular descriptors
    • Prediction of topological & topographical descriptors
  • Prediction of ADMET properties
    • Prediction of pharmacokinetic parameters like ADME properties
    • Prediction of toxicity properties
    • Pharmacodynamics modeling

Challenges and limitations associated with Big Data & AI in drug discovery

Some of the major challenges associated with Big Data in drug discovery include: data generation, data integration, data quality, data storage and management (2). Furthermore, errors in reproducibility and standardization of data, data format difficulties for chemical structure representations, missing original data, lack of contextual information, insufficient availability of disease-relevant human data in some disease areas, curse of dimensionality, bias in data, gaps of fundamental understanding in many diseases, issues associated in clinical-translational for target discovery and validation, complexities in managing entity name space and ontologies are other critical challenges associated with Big Data in drug discovery. In addition, protecting patient data and de-identification of personal data are legitimate concerns with respect to data storage and management.

Although artificial intelligence technologies are promising new techniques, their related studies still have some limitations (8). The processing and analyzing of large amount of data will affect the performance reliability in generating data models. Interpretation of complex data, as in the case of data associated with biological mechanisms, is another limitation for models generated using these methods.

How Excelra can support your AI-based drug discovery programs

Standardized and high-quality datasets are essential for AI/ML based drug discovery programs. Excelra’s GOSTAR is the world’s largest medicinal chemistry intelligence database providing comprehensive and structured SAR data for more than 8 million compounds. Available as a ‘one-stop data source’ for in silico drug discovery, GOSTAR captures a variety of small molecule activities encompassing SAR, physicochemical, metabolic, ADME and toxicological profiles into a relational database format.

GOSTAR datasets are created with industry-accepted ontologies that can be delivered in flexible file-formats such as:

  • Flat files
  • Hierarchical files
  • Databases (Oracle, MySQL, etc.)
  • Semantic format

“10 of the Top 20 pharma companies utilize GOSTAR to support their drug discovery programs”


Introduction to drug target biology and its role in drug discovery

Drug discovery and development is a long, expensive and risky process. On average it takes roughly 12 years and around $2.6 billion to develop a new drug (1). The Food and Drug Administration (FDA) of United States, considered as the world’s foremost pharmaceutical regulatory authority, requires comprehensive evidence on every new drug’s safety and efficacy profile for market approval. However, it is not mandatory for FDA to have information on drug target identity or target biology for providing approval (2). Nonetheless, drug safety and efficacy are directly linked to the target biology that fundamentally drives mechanism of action (MoA) of the drug. Ultimately the drug target plays an important role in the disease pathophysiology.

It is noteworthy that ‘Safety’ not only depends on drug related parameters but also on the drug target profile. The target expression in diseased and healthy tissues, phenotypes and pathways perturbed on target inhibition or activation have a significant impact on target safety. The role of a target in the disease pathophysiology, and expression pattern has a huge bearing on the drug efficacy, hence a comprehensive knowledge on the target associated diseases and information on proof of concept studies becomes mandatory. A Target Dossier is a compilation of all the information that is critical for assessing target safety and risk, and for developing suitable mitigation strategies. The dossier also helps to prepare, plan, and execute a data driven informed drug discovery and development program.

Strategies to advance drug discovery

  1. Phenotypic screening

  2. Target-based screening

In the case of phenotypic screening or classical pharmacology, drugs are identified without prior knowledge or bias towards a specific molecular target. The pharmacological actions of a drug are identified in cells, tissues, or animals. Although an important approach, it is beyond the scope of our current blog. The second highly successful strategy is target-based drug discovery (3).


“Of 113 first in class drugs approved by the US FDA from 1999 to 2013, 70% were identified through target-based drug discovery” (3) (4)


Target-based drug discovery can be used for the discovery and development of small molecules, antibodies, protein and peptide-based drugs. In target-based drug discovery, the knowledge of the biological (drug) target and its role in disease pathophysiology is essential before a pharmaceutical company starts lead discovery (3). At this stage a comprehensive and unbiased Target Dossier can assist in better project planning, execution and early decision-making. A document containing comprehensive 360target information on aspects such as target structure and experimental models can be used to plan high throughput assays to screen compound libraries for “hit” identification.

Figure 1: Advantages of Target-based drug discovery approaches

Drug discovery is often associated with high risk, resource crunch and time constraints. Pharmaceutical innovation is also limited by the fact that drug companies tend to work on the same proven drug targets. This leads to increased competition and “me too” drugs (5). The level of competition for the proven drug targets is higher than the large proportion of novel targets in the pre-clinical stages of development (5). Also, competition for novel targets increases along with the clinical stages, with relatively little competition in preclinical stages and high competition by phase III (6). As there is advancement in clinical phase, the validity of a target also increases and so does the level of competition (5). Hence, due diligence around a target of interest becomes a mandatory exercise for a pharmaceutical company before embarking on the long, time-consuming, expensive and high-risk endeavour of drug discovery and development. Clinical translatability to ensure that a drug’s activity against the target is efficacious with minimal side effects marks the crucial difference between success and failure of drug development.

Figure 2: Properties of a good drug target

A comprehensive Target Dossier considers all the properties of a good target. If a target fails on one or multiple parameters, it can have a negative impact on the drug discovery and development process. Hence, sooner a pharmaceutical company has a complete drug target overview, faster it can take a go/no-go decision and save on precious resources.

Need of the hour

Challenges Faced by Pharmaceutical Companies:

  • Increased competition for proven targets
  • Increased competition for novel targets as project moves up the development ladder
  • High risk of failure for development of novel targets
  • Resource and time constraints

Opportunities for Pharmaceutical Companies:

  • An innovative and promising drug target assessment
  • Portfolio diversification and expansion
  • Molecular target assessment – Structure, function and associated phenotypes
  • Early knowledge of associated risk
  • Better planning and risk mitigation strategies
  • Ideas on target-related/stratification biomarkers
  • Potential diseases for the drug target
  • Early proof-of-concept
  • Assayability of a drug target
  • Druggability assessment
  • Competitors: First in class assessment
  • Current approach on comprehensive analysis of target involves computational, publicly available database and literature-based analyses.

Excelra's Custom Target Dossier Services

Excelra is strongly positioned to deliver tailor-made target assessment dossiers based on unique requirements of our global Biotech and Pharma clients. The dossier is a compendium of information on the complete target profile including structural, systemic and functional aspects of a protein and the gene-encoding it.

Figure 3: Excelra provides a 360-degree view of a target to facilitate critical ‘Go/No-go decision making’

Target Dossier Case-study


A comprehensive target dossier on two new targets (Isoforms) of interest for their comparative assessment.

Target Profiling Workflow

Figure 4: Comparison of two novel potential targets in the immune-oncology field pursued by a large pharma company

Solution & Recommendation

The role and mechanism-of-action (MoA) of both isoforms were established in the disease of interest. One isoform was recommended over the other given that it would be first in class therapy with no current evidence of clinical testing in humans and easier to target based on crystal structure analysis.

The Excelra Edge

Excelra’s USPs & Differentiators

  • Projects led by a subject matter expert
  • Regular project updates
  • Target structure function relationship, crystal structure analysis and homology modelling
  • Extensive Assessment of target safety and risk mitigation strategies
  • Comorbidity Analysis
  • Target Druggability analysis
  • Comparative analysis of two or more targets
  • Proprietary structure activity relationship and biomarker databases GOSTAR & GOBIOM

Concluding remarks

Better understanding of disease, availability of suitable assays and experimental models, and clarity on the basic molecular mechanisms lead to the successful identification of a promising drug target (7). A comprehensive scientific document on a potential drug target will assist in an early evaluation of potential safety concerns and opportunities for commercialization. It will also reduce the possibility of attrition, enhance the pace of drug development and aid in early decision-making process.


In 2021, the FDA has approved many novel products that serve previously unmet medical needs or significantly help to advance patient quality of life. The broad indication wise distribution of all CDER’s 2021 drug approvals indicates notable advances in drug discovery1,2.

New Drug Approvals & Drugs in Pipeline (FDA) for 2021*

Table 1. Approved Drug List

Table 2. Drugs in Pipeline

*This information is updated as July 31, 2021; listed alphabetically by trade name.

Significant drug launches of 2021

  • Verquvo (Vericiguat, MERCK SHARP DOHME, 01/19/2021)
    Mitigates the risk of cardiovascular death and hospitalization for chronic heart failure
  • Cabenuva (Cabotegravir and Rilpivirine (Co-Packaged), VIIV HLTHCARE, 01/21/2021)
    Treats HIV
  • Lupkynis (Voclosporin, AURINIA, 01/22/2021)
    Treats lupus nephritis
  • Tepmetko (Tepotinib, EMD SERONO INC, 02/03/2021)
    Treats non-small cell lung cancer
  • Ukoniq (Umbralisib Tosylate, TG THERAPS, 02/05/2021)
    Treats marginal zone lymphoma and follicular lymphoma
  • Evkeeza (Evinacumab-Dgnb, REGENERON PHARMACEUTICALS, 02/11/2021)
    Treats homozygous familial hypercholesterolemia
  • Cosela (Trilacicilib Dihydrochloride, G1 THERAP, 02/12/2021),
    Mitigates chemotherapy-induced myelosuppression in small cell lung cancer
  • Amondys 45 (Casimersen, SAREPTA THERAPS INC, 02/25/2021)
    Treats Duchenne muscular dystrophy
  • Nulibry (Fosdenopterin Hydrobromide, ORIGIN, 02/26/2021)
    Reduces the risk of mortality in molybdenum cofactor deficiency Type A
  • Pepaxto (Melphalan Flufenamide Hydrochloride, ONCOPEPTIDES AB, 02/26/2021)
    Treats relapsed or refractory multiple myeloma
  • Azstarys (Serdexmethylphenidate Hydrochloride; Dexmethylphenidate Chloride, COMMAVE THERAP, 03/02/2021)
    Treats attention deficit hyperactivity disorder
  • Fotivda (Tivozanib Hydrochloride, AVEO PHARMS, 03/10/2021)
    Treats renal cell carcinoma
  • Ponvory (Ponesimod, JANSSEN PHARMS, 03/18/2021)
    Treats relapsing forms of multiple sclerosis
  • Zegalogue (Dasiglucagon Hydrochloride, ZEALAND PHARMA, 03/22/2021)
    Treats severe hypoglycemia
  • Qelbree (Viloxazine Hydrochloride, SUPERNUS PHARMS, 04/02/2021)
    Treats attention deficit hyperactivity disorder
  • Nextstellis (Drospirenone; Estetrol, MAYNE PHARMA, 04/15/2021)
    Prevents pregnancy
  • Jemperli (Dostarlimab-Gxly, GLAXOSMITHKLINE, 04/22/2021)
    Treats endometrial cancer
  • Zynlonta (Loncastuximab Tesirine-Lpyl, ADC Therapeutics SA, 04/23/2021)
    Treats certain types of relapsed or refractory large B-cell lymphoma
  • Empaveli (Pegcetacoplan, APELLIS PHARMS, 05/14/2021)
    Treats paroxysmal nocturnal hemoglobinuria
  • Rybrevant (Amivantamab-Vmjw, JANSSEN BIOTECH, 05/21/2021)
    Treats a subset of non-small cell lung cancer
  • Pylarify (Piflufolastat F-18, PROGENICS PHARMS INC, 05/26/2021)
    Identifies prostate-specific membrane antigen-positive lesions in prostate cancer
  • Lumakras (Sotorasib SIB, AMGEN INC, 05/28/2021)
    Treats types of non-small cell lung cancer
  • Truseltiq (Infigratinib Phosphate, QED THERAP, 05/28/2021)
    Treats cholangiocarcinoma whose disease meets certain criteria
  • Lybalvi (Olanzapine; Samidorphan L-Malate, ALKERMES INC, 05/28/2021)
    Treats schizophrenia and certain aspects of bipolar I disorder
  • Brexafemme (Ibrexafungerp Citrate, SCYNEXIS, 06/01/2021)
    Treats vulvovaginal candidiasis
  • Aduhelm (Aducanumab-Avwa, BIOGEN INC, 06/07/2021)
    Treats Alzheimer’s disease
  • Rylaze (Asparaginase Erwinia Chrysanthemi (Recombinant)-Rywn, JAZZ PHARMS, 06/30/2021)
    Treats acute lymphoblastic leukemia and lymphoblastic lymphoma in patients who are allergic to E. coli-derived asparaginase products, as a component of a chemotherapy regimen
  • Kerendia (Finerenone, BAYER HEALTHCARE PHARMACEUTICALS INC, 07/09/2021)
    Reduces the risk of kidney and heart complications in chronic kidney disease associated with type 2 diabetes
  • Fexinidazole (Fexinidazole, DNDI, 07/16/2021)
    Treats human African trypanosomiasis caused by the parasite Trypanosoma brucei gambiense
  • Rezurock (Belumosudil, KADMON PHARMS LLC, 07/16/2021)
    Treats chronic graft-versus-host disease after failure of at least two prior lines of systemic therapy
  • Bylvay (Odevixibat, ALBIREO PHARMA INC, 07/20/2021)
    Treats pruritus
  • Twyneo (Tretinoin and benzoyl peroxide, SOL-GEL TECHNOLOGIES LTD, 07/26/2021)
    It is a topical retinoid and antibacterial fixed-dose combination for the treatment of acne vulgaris in adults and children 9 years of age and older
  • Saphnelo (Anifrolumab, AstraZeneca, 07/30/2021)
    It is a type I interferon (IFN) receptor antagonist indicated for the treatment of adult patients with moderate to severe systemic lupus erythematosus (SLE), who are receiving standard therapy

Significant Drug launches in Pipeline for 2021

  • Oteseconazole (VT-1161, MYCOVIA PHARMACEUTICALS INC)
    It is an investigational oral antifungal in development for the treatment of recurrent vulvovaginal candidiasis (RVVC)
    It is an orally bioavailable, broad-spectrum penem β-lactam antibiotic in development for the treatment of infections caused by multi-drug resistant bacteria
  • Brixadi (Buprenorphine, BRAEBURN INC)
    It is a long-acting partial opioid agonist injection formulation in development for the treatment of opioid use disorder
  • Tenapanor (ARDELYX INC)
    It is a sodium/hydrogen exchanger 3 (NHE3) inhibitor in development for the control of serum phosphorus in adult patients with chronic kidney disease (CKD) on dialysis or Hyperphosphatemia of Renal Failure
  • Libervant (Diazepam, AQUESTIVE THERAPEUTICS INC)
    It is a buccal film formulation of the approved benzodiazepine diazepam in development for the management of seizure clusters
  • Roxadustat (FG-4592, FIBROGEN INC)
    It is a first-in-class, orally administered small molecule hypoxia-inducible factor prolyl hydroxylase (HIF-PH) inhibitor in development for the treatment of anaemia of chronic kidney disease (CKD)
    It is an investigational, potential first-in-class anti-thymic stromal lymphopoietin (TSLP) monoclonal antibody in development for the treatment of severe asthma
  • LV-101 (Carbetocin intranasal, LEVO THERAPEUTICS INC)
    It is an oxytocin analog in development as a treatment for hyperphagia and behavioral distress associated with Prader-Willi syndrome (PWS)
  • Teplizumab (PROVENTION BIO INC)
    It is an investigational anti-CD3 monoclonal antibody (mAb) in development for the delay or prevention of clinical type 1 diabetes (T1D) in at-risk individuals
    It is a novel, oral angio-immuno kinase inhibitor in development for the treatment of pancreatic and non-pancreatic neuroendocrine tumors (“NET”)
  • Lenacapavir (GILEAD SCIENCES INC)
    It is an investigational, long-acting HIV-1 capsid inhibitor in development for the treatment of HIV-1 infection in heavily treatment-experienced (HTE) people with multi-drug resistant (MDR) HIV-1 infection
    It is an investigational RNAi therapeutic in development for the treatment of the polyneuropathy of hereditary transthyretin-mediated (hATTR) amyloidosis in adults
  • Pedmark (Sodium thiosulfate, FENNEC PHARMACEUTICALS INC)
    It is a cisplatin neutralizing agent in development for the protection against hearing loss in pediatric patients receiving cisplatin chemotherapy
    It is a protein kinase-R (PKR) activator in development for the treatment of adults with pyruvate kinase (PK) deficiency
  • Arimoclomol (ORPHAZYME A/S)
    It is an investigational Heat-Shock Protein amplifier in development for the treatment of Niemann-Pick disease Type C (NPC)
  • Ruxolitinib (INCYTE DERMATOLOGY)
    It is a JAK1/JAK2 inhibitor formulated for topical application in development for the treatment of atopic dermatitis and vitiligo
  • Zimhi (Naloxone hydrochloride, ADAMIS PHARMACEUTICALS CORPORATION)
    It is a high-dose formulation of the approved opioid antagonist naloxone in development for the treatment of opioid overdose
    It is a topical aryl hydrocarbon receptor (AhR) modulating agent in development for the treatment of plaque psoriasis and atopic dermatitis
  • Plinabulin (BEYONDSPRING INC)
    It is a selective immunomodulating microtubule-binding agent (SIMBA) in development for use in combination with granulocyte colony-stimulating factor (G-CSF) for the prevention of chemotherapy-induced neutropenia (CIN)