Contact Us
Posts By :

Sudip B

Ontologies and The FAIR data principles

From Medicine to Physics, to History, and all the sciences in between, everything nowadays is “Data” that keeps on evolving by the second. This continuous and exponential growth in the “Sciences” creates the need for a solid infrastructure supporting the use and re-use of scholarly generated data across all fields. In 2016, a diversified group of stakeholders from different backgrounds representing academia, industry, funding agencies, and scholarly publishers, put collectively a “measurable” set of principles known as “The FAIR Data Principles”. By definition, “Data”, is all kinds of digital objects that are generated in research: raw research data, code, software, presentations, etc. Every letter in “FAIR” refers to four major principles among a list with a total 15 guiding principles that thoroughly describe how FAIRness of data can be achieved through technical implementation.  The FAIR principles: to make data “Findable”,” Accessible”, “Interoperable”, and “Re-usable”, stem from the notion of the open science movement of being able to re-use data in new perspectives. These principles aim to act as a guideline for scholars and researchers to improve the reusability of their data. Researchers are hence not only able to publish articles and books but also provide the original data that has shaped their work. At this point, it is worth mentioning that the “FAIR Data Principles” are unique and different from previous peer initiatives that centered on the human scholar, the FAIR Principles emphasized improving the ability of “machines” to automatically seek and use the data, in addition to human scholars. “Good data management” is not a purpose by itself, but rather it is a key component on the road to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data is published. Unfortunately, the current digital model used for scholarly data publication wards off the ability to maximally benefit from the research investments. Consequently, science funders, publishers, and governmental agencies are employing data management and stewardship plans for data generated in publicly funded experiments. Beyond proper collection, annotation, and archival, data stewardship includes the idea of ‘long-term care’ of valuable digital assets, aiming for data to be discovered and re-used for downstream investigations, either alone or in tandem with newly generated data.

To summarize, to better understand the “Fair Data Principles”, one should keep in mind the following: 

  1. Humans AND Machines are targeted for Data use and re-use.
  2. The FAIR principles are applied both to: “data” and “Metadata” *.
  3. The FAIR principles are not exclusive to open data only.
  4. The FAIR principles are not rules set in stone, rather they continuously evolve and get modified in research as needed to accommodate for future needs of the different fields and new data.

The FAIR Data Principles:

1. Findable:

The first step in (re)using data is to simply find them. This means that the data can be discovered by both Humans and MachinesThe data are referenced with unique and persistent identifiers (e.g. DOIs or Handles) and the metadata include the identifier of the data they describe. This principle is further subdivided:

  • F1. (meta)data are assigned a globally unique and persistent identifier
  • F2. Data are described with rich metadata (defined by R1 below).
  • F3. metadata clearly and explicitly include the identifier of the data they describe.
  • F4. (Meta)data are registered or indexed in a searchable resource.

2. Accessible:

The data are archived in long-term storage and can be made available using standard technical procedures. However, it does not mean that the data have to be openly available for everyone, rather, information on how the data could be retrieved (or not) must be available. Once the required data is found, the user needs to know how it can be accessed, possibly including authentication and authorization. This principle is further subdivided:

  • A1. (Meta)data are retrievable by their identifier using a standardized communications protocol.
  • A1.1 The protocol is open, free, and universally implementable.
  • A1.2 The protocol allows for an authentication and authorization procedure, where necessary.
  • A2. Metadata are accessible, even when the data are no longer available.

3. Interoperable:  

The data can be exchanged and used across different applications and systems. The data also needs to be integrated with other data from the same research field or other research fields. This is made possible by using metadata standards, standard ontologies, and controlled vocabularies in addition to meaningful links between the data and related digital research objects. In addition, the data need to interoperate with different applications or workflows for analysis, storage, and processing. This principle is further subdivided:

  • I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.
  • I2. (Meta)data use vocabularies that follow FAIR principles.
  • I3. (Meta)data include qualified references to other (meta)data.

4. Reusable:

The purpose of FAIR is to optimize the reuse of data. The data must be well documented and provide substantial information about the context of data creation. Furthermore, the data should abide by community standards and include clear terms and conditions on how the data may be accessed and reused. This allows others to assess and validate the results of the original study, hence ensuring reproducibility, or to design new projects based on the original results. Reusable data encourage collaboration and avoid duplication of work. To accomplish this, metadata and data must be well-described so that they can be replicated and/or combined in different settings. This principle is further subdivided:

  • R1. (Meta)data are richly described with a plurality of accurate and relevant attributes.
  • R1.1. (Meta)data are released with a clear and accessible data usage license.
  • R1.2. (Meta)data are associated with detailed provenance.
  • R1.3. (Meta)data meet domain-relevant community standards.

Ontology and the FAIR data principles :

Definition: In the computer science world, by definition, “ontology” is “a formal explicit specification of a shared conceptualization of a domain of interest”. In other words, ontology is an attempt of adopting, for a given academic doctrine, a commonly shared data model expressed using
a generalization that is unanimous with used technologies.

Ontologies play a vital role in providing an open, consistent, stable identifier for a given “thing” and providing consensus in the scientific community as to what that ”thing” actually is. Ontologies also describe how types, or classes, of things are interconnected.

Fundamentally speaking, “Ontologies” encapsulate the scientific knowledge in a particular scientific domain; this is why forming ontologies is so challenging: scientists must get to agree with each other! As mentioned before, ontologies are designed by Humans but are readable by Machines: they provide a mechanism to explain to a Machine how Humans understand the things that exist in the world and how they are related. Ironically the term ‘Ontology’ often gets used to indicate any kind of controlled vocabulary, list, or taxonomy; in fact, these artifacts are better thought to be more useful as being on a wide spectrum with increasingly strong semantics, from a collection of terms (tags) to enhance categorization through to a formal description of a domain with classes and relationships. Ontology is more than just using a standard name and ID. Following a standard is extremely useful because it allows the scholar to align the data both inside and outside the organization. In other terms, this is the Interoperable part of FAIR.
Through ontology, the researcher has access to all the additional knowledge that has been formed by experts enabling the scholar to form meaningful questions at different levels and layers of a particular subject.

Ontologies are the key to adopting “FAIR”:

When working with data, one should be mindful of the “FAIR” principles. In its essence, FAIR is about making data re-usable, however, how can one do this if we are not using the same language? For example, an investigator searching for articles relating to ‘stroke’ would miss references to ‘brain attack’ or ‘Cerebrovascular accident”. Thus, ontologies are the key to deciphering FAIR: data can only be reused if it is well written, classified, described, and of high quality. Ontologies play a crucial role in some of the FAIR data principles, especially in providing support for data “Interoperability “and “Reusability “. The need for
ontologies (vocabularies) is highlighted in the following principles:

I2. Data and metadata should use vocabularies that follow FAIR principles.
I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.
R1.3. (Meta)data meet domain-relevant community standards.

Furthermore, ontologies are also relevant in terms of “Findability”, (F2) requiring
to describe data with rich metadata, and “accessibility”, (A1) metadata should
be retrievable from a unique identifier. Since ontologies are often the byproduct of research activities or essential entities in many areas of research, the FAIR principles must
be applied to them, irrespective if ontologies are being used to describe data or

Our Services

Information is the cornerstone of all rational decision-making. Without the proper information/data, individuals, institutions, communities, and governments are unable systematically take optimal decisions or understand the effects of their actions. In the past decades, information technology has played a crucial role in automating and increasing the number of information spaces. Simultaneously, there has been an improvement in information access. Despite these advances, most of these automated spaces remained as independent, fragmented components in large and increasingly complex silo-based architectures. The problem is that, nowadays, many of the important questions in large organizations, governments, and scientific communities can be answered by connecting different pieces of information dispersed over these silos.

The time and efforts required to curate, integrate, and analyze silos of shattered data alone are enormous for a single researcher or group and require a significant amount of human effort, which is slow, costly, and error prone. This is where the value of:” the FAIR data principles” lies. We, at Excelra can expedite your research and all the related work to data through our programs that abide by the FAIR principles. Harnessing the expertise and services provided by Excelra can optimize your workflow and make it much more quick, efficient, and accurate.


1. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016).

2. Annika Jacobsen, Ricardo de Miranda Azevedo, Nick Juty, Dominique Batista, Simon Coles, Ronald Cornet, Mélanie Courtot, Mercè Crosas, Michel Dumontier, Chris T. Evelo, Carole Goble, Giancarlo Guizzardi, Karsten Kryger Hansen, Ali Hasnain, Kristina Hettne, Jaap Heringa, Rob W.W. Hooft, Melanie Imming, Keith G. Jeffery, Rajaram Kaliyaperumal, Martijn G. Kersloot, Christine R. Kirkpatrick, Tobias Kuhn, Ignasi Labastida, Barbara Magagna, Peter McQuilton, Natalie Meyers, Annalisa Montesanti, Mirjam van Reisen, Philippe Rocca-Serra, Robert Pergl, Susanna-Assunta Sansone, Luiz Olavo Bonino da Silva Santos, Juliane Schneider, George Strawn, Mark Thompson, Andra Waagmeester, Tobias Weigel, Mark D. Wilkinson, Egon L. Willighagen, Peter Wittenburg, Marco Roos, Barend Mons, Erik Schultes; FAIR Principles: Interpretations and Implementation Considerations. Data Intelligence 2020; 2 (1-2): 10–29. doi:

3. Antonis Bikakis, Beatrice Markhoff, Alessandro Mosca, Stephane Jean, Eero Hyvönen, Francesco Beretta, Antonis Bikakis, Eero Hyvonen, Stéphane Jean, Beatrice Markhoff, and Alessandro Mosca. 2021. A challenge for historical research: Making data FAIR using a collaborative ontology management environment (OntoME). Semant. web 12, 2 (2021), 279–294.

4. What is fair? FAIR. (n.d.). Retrieved from

5. Fair principles. GO FAIR. (2022, January 21). Retrieved from

6. Poveda-Villalón, M., Espinoza-Arias, P., Garijo, D., Corcho, O. (2020). Coming to Terms with FAIR Ontologies. In: Keet, C.M., Dumontier, M. (eds) Knowledge Engineering and Knowledge Management. EKAW 2020. Lecture Notes in Computer Science(), vol 12387. Springer, Cham.

7. Lomax, J. (2019, June 11). How to use ontologies to unlock the full potential of your scientific data – part 1. SciBite. Retrieved May 14, 2022, from

8. J. Domingue, D. Fensel and J.A. Hendler (eds), Handbookof Semantic Web Technologies. Vol. 1. Foundation and Technologies, Springer, Berlin/Heidelberg, 2011. doi:10.1007/978-3540-92913-0

Biologics – the biotech drugs transforming medicine

Biologics, also known as biological products, are any type of medicines derived from living organisms such as humans, animals, or microorganisms via highly complex manufacturing processes and administered under closely monitored conditions. This is in contrasts to traditional non-biologic pharmaceutical drugs, which are synthesized in a laboratory through chemical processes without the use of components of living matter. Cancer, infectious diseases, autoimmune disease are among the ailments for which biologics are used to prevent, treat, or cure (Fig. 1).

Figure 1. Biologic medicines in development by therapeutic category [1].

Biologics include a wide variety of products such as monoclonal antibodies, vaccines, gene and cell therapies, and recombinant proteins (Fig. 2).

Figure 2. Biologic medicines in development by product category [1].

Monoclonal antibodies are by far the most researched category of biologics with at least 338 therapeutic mAbs currently being developed by pharmaceutical companies [1].

Monoclonal antibody (mAb) – the bestselling category of biological products

Antibody engineering has significantly advanced ever since the approval of the first monoclonal antibody by the United States Food and Drug Administration (US FDA) in 1986 [2]. Therapeutic antibodies currently available in the market are safe with fewer adverse effects owing to their high specificity. Consequently, antibody drugs have become the leading class of newly developed drugs in recent years. Eight of the top ten bestselling drugs worldwide in 2018 were biologics. In 2018, the global therapeutic monoclonal antibody was worth roughly US$115.2 billion, with revenues expected to reach $300 billion by 2025 (Fig. 3) [3].

Figure 3. Timeline from 1975 showing the successful development of therapeutic antibodies and their applications [3].

As of December 2019, US FDA had approved 79 therapeutic mAbs, including 30 for cancer treatment [4].

Best-selling biotech drugs worldwide

AbbVie’s Humira and Merck’s Keytruda are among the top-selling biotechnology drugs in the world, generating 19.6 billion and 11.1 billion U.S. dollars, respectively, in 2019 (Fig. 4) [5]. Oncology, autoimmune/immunology, hematology, ophthalmology, and dermatology are among the top five therapy areas in 2019. Oncologic treatments account for six of the top-selling drugs in 2019, making oncology the most targeted field [6].

Figure 4. Top selling biotech drugs worldwide in 2019 [6].

Bristol Myers Squibb, AbbVie, Pfizer, and Roche are four pharmaceutical companies with more than one best-selling drug of 2019. Bristol Myers Squibb had the most top-selling drugs (Eliquis, Opdivo, and Revlimid) in 2019, accounting for 63% of the company’s total revenue. Whereas AbbVie’s revenues in 2019 were significantly reliant on its main products (Humira and Imbruvica), which accounted for 72% of the company’s total revenues [6]. The United States spent approximately 45 billion U.S. dollars on biotechnology research and development. In addition, the United States had approximately 34% of the world’s share of biotechnology patents filed in 2014, while Germany filed 8% of global biotech patents [5].

The Rise of Biosimilars

A biosimilar is a biologic that is similar to another biologic medicine (known as a reference product) that has already been approved by the FDA in the United States. In terms of safety, purity, and potency, biosimilars are very similar to the reference product, but there may be minor differences in clinically inactive components. The biologics and biosimilars industry in the United States is fast expanding, and as new medications are introduced, the benefits for patient access and cost management will continue to grow. There are 18 biosimilars on the market in the United States as of November 2020, competing against seven reference biologics, with ten more FDA-approved biosimilars expected to hit the market in the coming years [7].

Biosimilars save money in the long run, with higher savings coming from newer launches competing against more expensive drugs. The gap between the originator and the mean Average Sales Price (ASP) of their biosimilars ranged from 8.1 percent to 45.1 percent lower than the originator products as of July 2020 (including insulins) [7]. Biosimilars saved 6.5 billion U.S. dollars annually in the second quarter of 2020, and savings are expected to exceed 100 billion U.S. dollars over the next five years [8].

A biopharmaceutical product knowledge base is the need of the hour

Antibodies are the most successful class of biotherapeutics because of their binding versatility [9]. With the rapid growth of therapeutic antibody research, the chances of a specific antibody being the only one against a certain antigen are decreasing. Understanding the methods used to produce competing antibodies, as well as their pros and cons, can be extremely helpful in moving therapeutic antibodies forward. Data from clinical trials dominate the scientific literature on therapeutic antibodies, rather than the details of pre-clinical development that is underway for nearly two-thirds of all therapeutic antibodies. The information on the latter could only be obtained from patents. Many researchers are put off by patents’ opaque and archaic language but hidden in the text of these files are details about antibody sequences, assay techniques, epitopes, and much more. Patent applications are usually the first public disclosure of novel antibodies, often months or even years before conference papers or clinical trials. Researchers can identify novel antibodies in early stages of development months or years before they are formally announced by mining the patent literature.

There are very few databases that harvest this information. The IMGT Monoclonal Antibody Database and WHOINNIG are two non-commercial resources for antibody research. Other databases that aren’t unique to antibodies, such as ChEMBL, DrugBank, and KEGG DRUG, also capture WHO data. Most databases deliver additional metadata for their therapeutic entries, such as clinical trial status, companies involved in development, target specificity, and alternative names. While these archives include sequence information, it is currently not possible to query them by sequence or to bulk-download relevant collections of therapeutic sequences for direct bioinformatic analysis.

Excelra is strongly positioned to deliver tailor-made curation on chemically defined antibodies (i.e., antibodies with a known primary amino-acid sequence) connected with their antigenic target, which can be either a protein or a chemical entity.

For more information and to connect with our scientific teams, write to us on:


1. PhRMA [Pharmaceutical Research and Manufacturers of America] (2013). Medicines in Development: Biologics. 2013 Report. Accessed 10 Jul 2021.
2. Ecker, D. M., Jones, S. D., Levine, H. L. The therapeutic monoclonal antibody market. MAbs. 2015, 7, 9–14.
3. Lu, R. M., Hwang, Y. C., Liu, I. J. et al.Development of therapeutic antibodies for the treatment of diseases. J Biomed Sci. 2020, 27 (1), 1-30.
4. The Antibody Society (2019). In: Approved antibodies. Accessed 10 Jul 2021.
5. Statistica (2020). Select top selling biotech drugs worldwide in 2019. Accessed 10 Jul 2021.
6. PharmaIntelligence (2020). Top 10 Best-Selling Drugs of 2019. Accessed 10 Jul 2021.
7. IQVIA Institute Report (2020). Biosimilars in the United States 2020 – 2024. Accessed 10 Jul 2021.
8. IQVIA Institute Report (2020). Biosimilars in the United States 2020 – 2024. Accessed 10 Jul 2021.
9. Kaplon, H., Muralidharan, M., Schneider, Z., Reichert, J. M. Antibodies to watch in 2020. MAbs. 2020, 12(1), 1703531.


The ongoing COVID-19 pandemic is continuing to spread rapidly across the globe. As of 30th June 2020, the World Health Organization (WHO) has reported over 10 million cases and half a million deaths across 188 countries. Current clinical research is focused on accelerating the development of drugs and vaccines for treatment of SARS-CoV-2 infection. In this scenario, identification and selection of the right biomarkers is of paramount importance and plays a critical role for optimizing clinical trial design and successful drug development. It is quite cumbersome to find the biomarker data which is often scattered across public literature and considerable efforts go into data annotation from numerous disparate and dispersed sources.

Excelra’s COVID-19 Biomarker Database is an ‘Open-Access’ resource which is excerpted from our GOBIOM platform – the world’s largest biomarker intelligence database. COVID-19 Biomarker Database is a compilation of manually curated biomarkers from published clinical trials, evaluating potential drugs or biologics for the treatment of SARS-CoV-2. The database additionally includes information on FDA-NIH recommended BEST (Biomarkers, EndpointS and other Tools) classification of biomarkers, supported with direct links to the referenced literature. In the following sections, we present crucial statistics on the global COVID-19 clinical trial landscape with data and visualizations from the COVID-19 Biomarker Database. cumbersome to find the biomarker data which is often scattered across public literature and considerable efforts go into data annotation from numerous disparate and dispersed sources.

Clinical Trials Analysis

Clinical Trials Characteristics

Out of the 2351 registered clinical trials on COVID-19, 785 clinical trials with biomarker information were included in the COVID-19 biomarker database. The included clinical trials comprised of 627 (80%) interventional studies and 158 (20%) observational studies.


Of the 785 clinical trials with biomarker information which are included in the database, majority biomarkers are proteomic followed by Scoring scale, Physiological, Cellular, Biochemical, Genomic and imaging biomarkers.

Figure 1: Biomarker count by type

Majority of the biomarkers included in the trials are utilized for assessing the Pharmacodynamic/Response for the drugs under investigation.

Figure 2: Numbers indicate clinical trial count

Clinical Trial Phase

Analysis of clinical trials in COVID-19 biomarker database suggest that majority of trials are in Phase II (52%) followed by phase III (28%), Phase I (14%) and Phase IV (6%).

Figure 3: No. of clinical trials by Phase. *Clinical trial counts represent clinical trials with biomarker information.

Study Population Size

Further analysis on the sample size included in the clinical trials reveals that majority of the trials (28%) recruit 100-500 patients in the clinical trials.

Figure 4: Study population distribution in clinical trials

Drugs/Therapies Investigated

Overall, 535 clinical trials were registered for testing the therapeutic benefits of potential drugs/vaccines, including 133 (25%) clinical trials for monotherapy and 404 (75%) clinical trials for combination therapy.

Figure 5: Numbers represent clinical trial counts

Research Landscape

Numerous clinical trials have been registered by the Industry/Academia/Research institutes in for evaluating Pharmacodynamic/Response for the drugs under investigation.


Below are the top contributors by no. of studies

Figure 6: Numbers represent clinical trial counts


Analysis of the data in COVID-19 biomarker database shows that the primary focus of the current clinical trial research is to evaluate Pharmacodynamic/Response for the drugs under investigation. Of the studies evaluating Pharmacodynamic/Response for the drugs, majority studies (75%) have included combination drug therapies. Given that most of the clinical trials (28%) include 101-500 patients in the study, many of these studies are likely to provide evidence on efficacy and safety of the investigated therapy.

It is noteworthy that our analysis was limited to clinical trials registered in

Excelra’s COVID-19 Biomarker Database has been released in support of the ongoing global scientific efforts, aimed at developing safe and effective therapeutic options to treat the novel coronavirus disease.