Contact Us

Drug Development

Ontologies and The FAIR data principles

From Medicine to Physics, to History, and all the sciences in between, everything nowadays is “Data” that keeps on evolving by the second. This continuous and exponential growth in the “Sciences” creates the need for a solid infrastructure supporting the use and re-use of scholarly generated data across all fields. In 2016, a diversified group of stakeholders from different backgrounds representing academia, industry, funding agencies, and scholarly publishers, put collectively a “measurable” set of principles known as “The FAIR Data Principles”. By definition, “Data”, is all kinds of digital objects that are generated in research: raw research data, code, software, presentations, etc. Every letter in “FAIR” refers to four major principles among a list with a total 15 guiding principles that thoroughly describe how FAIRness of data can be achieved through technical implementation.  The FAIR principles: to make data “Findable”,” Accessible”, “Interoperable”, and “Re-usable”, stem from the notion of the open science movement of being able to re-use data in new perspectives. These principles aim to act as a guideline for scholars and researchers to improve the reusability of their data. Researchers are hence not only able to publish articles and books but also provide the original data that has shaped their work. At this point, it is worth mentioning that the “FAIR Data Principles” are unique and different from previous peer initiatives that centered on the human scholar, the FAIR Principles emphasized improving the ability of “machines” to automatically seek and use the data, in addition to human scholars. “Good data management” is not a purpose by itself, but rather it is a key component on the road to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data is published. Unfortunately, the current digital model used for scholarly data publication wards off the ability to maximally benefit from the research investments. Consequently, science funders, publishers, and governmental agencies are employing data management and stewardship plans for data generated in publicly funded experiments. Beyond proper collection, annotation, and archival, data stewardship includes the idea of ‘long-term care’ of valuable digital assets, aiming for data to be discovered and re-used for downstream investigations, either alone or in tandem with newly generated data.

To summarize, to better understand the “Fair Data Principles”, one should keep in mind the following: 

  1. Humans AND Machines are targeted for Data use and re-use.
  2. The FAIR principles are applied both to: “data” and “Metadata” *.
  3. The FAIR principles are not exclusive to open data only.
  4. The FAIR principles are not rules set in stone, rather they continuously evolve and get modified in research as needed to accommodate for future needs of the different fields and new data.

The FAIR Data Principles:

1. Findable:

The first step in (re)using data is to simply find them. This means that the data can be discovered by both Humans and MachinesThe data are referenced with unique and persistent identifiers (e.g. DOIs or Handles) and the metadata include the identifier of the data they describe. This principle is further subdivided:

  • F1. (meta)data are assigned a globally unique and persistent identifier
  • F2. Data are described with rich metadata (defined by R1 below).
  • F3. metadata clearly and explicitly include the identifier of the data they describe.
  • F4. (Meta)data are registered or indexed in a searchable resource.

2. Accessible:

The data are archived in long-term storage and can be made available using standard technical procedures. However, it does not mean that the data have to be openly available for everyone, rather, information on how the data could be retrieved (or not) must be available. Once the required data is found, the user needs to know how it can be accessed, possibly including authentication and authorization. This principle is further subdivided:

  • A1. (Meta)data are retrievable by their identifier using a standardized communications protocol.
  • A1.1 The protocol is open, free, and universally implementable.
  • A1.2 The protocol allows for an authentication and authorization procedure, where necessary.
  • A2. Metadata are accessible, even when the data are no longer available.

3. Interoperable:  

The data can be exchanged and used across different applications and systems. The data also needs to be integrated with other data from the same research field or other research fields. This is made possible by using metadata standards, standard ontologies, and controlled vocabularies in addition to meaningful links between the data and related digital research objects. In addition, the data need to interoperate with different applications or workflows for analysis, storage, and processing. This principle is further subdivided:

  • I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.
  • I2. (Meta)data use vocabularies that follow FAIR principles.
  • I3. (Meta)data include qualified references to other (meta)data.

4. Reusable:

The purpose of FAIR is to optimize the reuse of data. The data must be well documented and provide substantial information about the context of data creation. Furthermore, the data should abide by community standards and include clear terms and conditions on how the data may be accessed and reused. This allows others to assess and validate the results of the original study, hence ensuring reproducibility, or to design new projects based on the original results. Reusable data encourage collaboration and avoid duplication of work. To accomplish this, metadata and data must be well-described so that they can be replicated and/or combined in different settings. This principle is further subdivided:

  • R1. (Meta)data are richly described with a plurality of accurate and relevant attributes.
  • R1.1. (Meta)data are released with a clear and accessible data usage license.
  • R1.2. (Meta)data are associated with detailed provenance.
  • R1.3. (Meta)data meet domain-relevant community standards.

Ontology and the FAIR data principles :

Definition: In the computer science world, by definition, “ontology” is “a formal explicit specification of a shared conceptualization of a domain of interest”. In other words, ontology is an attempt of adopting, for a given academic doctrine, a commonly shared data model expressed using
a generalization that is unanimous with used technologies.

Ontologies play a vital role in providing an open, consistent, stable identifier for a given “thing” and providing consensus in the scientific community as to what that ”thing” actually is. Ontologies also describe how types, or classes, of things are interconnected.

Fundamentally speaking, “Ontologies” encapsulate the scientific knowledge in a particular scientific domain; this is why forming ontologies is so challenging: scientists must get to agree with each other! As mentioned before, ontologies are designed by Humans but are readable by Machines: they provide a mechanism to explain to a Machine how Humans understand the things that exist in the world and how they are related. Ironically the term ‘Ontology’ often gets used to indicate any kind of controlled vocabulary, list, or taxonomy; in fact, these artifacts are better thought to be more useful as being on a wide spectrum with increasingly strong semantics, from a collection of terms (tags) to enhance categorization through to a formal description of a domain with classes and relationships. Ontology is more than just using a standard name and ID. Following a standard is extremely useful because it allows the scholar to align the data both inside and outside the organization. In other terms, this is the Interoperable part of FAIR.
Through ontology, the researcher has access to all the additional knowledge that has been formed by experts enabling the scholar to form meaningful questions at different levels and layers of a particular subject.

Ontologies are the key to adopting “FAIR”:

When working with data, one should be mindful of the “FAIR” principles. In its essence, FAIR is about making data re-usable, however, how can one do this if we are not using the same language? For example, an investigator searching for articles relating to ‘stroke’ would miss references to ‘brain attack’ or ‘Cerebrovascular accident”. Thus, ontologies are the key to deciphering FAIR: data can only be reused if it is well written, classified, described, and of high quality. Ontologies play a crucial role in some of the FAIR data principles, especially in providing support for data “Interoperability “and “Reusability “. The need for
ontologies (vocabularies) is highlighted in the following principles:

I2. Data and metadata should use vocabularies that follow FAIR principles.
I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.
R1.3. (Meta)data meet domain-relevant community standards.

Furthermore, ontologies are also relevant in terms of “Findability”, (F2) requiring
to describe data with rich metadata, and “accessibility”, (A1) metadata should
be retrievable from a unique identifier. Since ontologies are often the byproduct of research activities or essential entities in many areas of research, the FAIR principles must
be applied to them, irrespective if ontologies are being used to describe data or
metadata.

Our Services

Information is the cornerstone of all rational decision-making. Without the proper information/data, individuals, institutions, communities, and governments are unable systematically take optimal decisions or understand the effects of their actions. In the past decades, information technology has played a crucial role in automating and increasing the number of information spaces. Simultaneously, there has been an improvement in information access. Despite these advances, most of these automated spaces remained as independent, fragmented components in large and increasingly complex silo-based architectures. The problem is that, nowadays, many of the important questions in large organizations, governments, and scientific communities can be answered by connecting different pieces of information dispersed over these silos.

The time and efforts required to curate, integrate, and analyze silos of shattered data alone are enormous for a single researcher or group and require a significant amount of human effort, which is slow, costly, and error prone. This is where the value of:” the FAIR data principles” lies. We, at Excelra can expedite your research and all the related work to data through our programs that abide by the FAIR principles. Harnessing the expertise and services provided by Excelra can optimize your workflow and make it much more quick, efficient, and accurate.

References:

1. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

2. Annika Jacobsen, Ricardo de Miranda Azevedo, Nick Juty, Dominique Batista, Simon Coles, Ronald Cornet, Mélanie Courtot, Mercè Crosas, Michel Dumontier, Chris T. Evelo, Carole Goble, Giancarlo Guizzardi, Karsten Kryger Hansen, Ali Hasnain, Kristina Hettne, Jaap Heringa, Rob W.W. Hooft, Melanie Imming, Keith G. Jeffery, Rajaram Kaliyaperumal, Martijn G. Kersloot, Christine R. Kirkpatrick, Tobias Kuhn, Ignasi Labastida, Barbara Magagna, Peter McQuilton, Natalie Meyers, Annalisa Montesanti, Mirjam van Reisen, Philippe Rocca-Serra, Robert Pergl, Susanna-Assunta Sansone, Luiz Olavo Bonino da Silva Santos, Juliane Schneider, George Strawn, Mark Thompson, Andra Waagmeester, Tobias Weigel, Mark D. Wilkinson, Egon L. Willighagen, Peter Wittenburg, Marco Roos, Barend Mons, Erik Schultes; FAIR Principles: Interpretations and Implementation Considerations. Data Intelligence 2020; 2 (1-2): 10–29. doi: https://doi.org/10.1162/dint_r_00024

3. Antonis Bikakis, Beatrice Markhoff, Alessandro Mosca, Stephane Jean, Eero Hyvönen, Francesco Beretta, Antonis Bikakis, Eero Hyvonen, Stéphane Jean, Beatrice Markhoff, and Alessandro Mosca. 2021. A challenge for historical research: Making data FAIR using a collaborative ontology management environment (OntoME). Semant. web 12, 2 (2021), 279–294. https://doi.org/10.3233/SW-200416

4. What is fair? FAIR. (n.d.). Retrieved from https://www.howtofair.dk/what-is-fair/#

5. Fair principles. GO FAIR. (2022, January 21). Retrieved from https://www.go-fair.org/fair-principles/

6. Poveda-Villalón, M., Espinoza-Arias, P., Garijo, D., Corcho, O. (2020). Coming to Terms with FAIR Ontologies. In: Keet, C.M., Dumontier, M. (eds) Knowledge Engineering and Knowledge Management. EKAW 2020. Lecture Notes in Computer Science(), vol 12387. Springer, Cham. https://doi.org/10.1007/978-3-030-61244-3_18

7. Lomax, J. (2019, June 11). How to use ontologies to unlock the full potential of your scientific data – part 1. SciBite. Retrieved May 14, 2022, from https://www.scibite.com/news/how-to-use-ontologies-to-unlock-the-full-potential-of-your-scientific-data-part-1/

8. J. Domingue, D. Fensel and J.A. Hendler (eds), Handbookof Semantic Web Technologies. Vol. 1. Foundation and Technologies, Springer, Berlin/Heidelberg, 2011. doi:10.1007/978-3540-92913-0

Using Legacy and Current Data To Accelerate Drug Development

A thought-leadership article.

Dr. Kavita Lamror

Director – Value Evidence Services

“Information is the oil of the 21st century, and analytics is the combustion engine.” – Peter Sondergaard

Randomized controlled trials (RCTs) are the gold standard of evidence for establishing the value of an intervention and obtaining regulatory approval. However, patient-level RCT data from legacy trials is not available for further hypothesis testing and analysis. As interest in retrospective analysis of clinical data arises, it is worth noting that most of this data is available in the Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) format and permissions need to be sought for aggregation and transformation.

In addition to RCT data, the United States Food and Drug Administration (US FDA) and European Medicines Agency (EMA) have created guidelines to accommodate real world evidence (RWE) generated from claims, registries and electronic medical records (EMR) data sets for demonstrating the efficacy, safety, and effectiveness of drugs. This data, if analysed as per approved guidelines, is accepted for regulatory and reimbursement approvals. However, real world data (RWD) is coded in multiple formats and needs to be processed further. Various organizations have started transforming this data to Observational Medical Outcomes Partnership (OMOP) common data model (CDM) formats for RWE generation. This adds enhanced substantiation to the added “value” of the drug.

As the pharmaceutical industry faces rising research and development costs (R&D) costs per drug brought to the market, there exists a compelling need to optimize the existing data assets and shorten the drug development life-cycle. Heterogeneity of treatment effect (HTE) is another challenge for the industry. HTE is defined as the difference in patient outcomes measured from post-launch RWD as compared to results observed in pre-launch RCTs. It might occur due to real-world risk exposures not accounted for in the target population and has the potential to have a significant impact on the accessibility to and acceptance of the drug by the end-user. While probable confounding factors and risk factors are identified from existing published research, it is sometimes difficult to estimate the impact of unobserved exposures on treatment impact. This evidence might be available in the vast amounts of data existing in RCTs and RWE.

If the data from RCTs and RWD could be aggregated in an analyzable format, it has the potential to be utilized for clinical trial planning, segmented patient targeting, predicting clinical outcomes, improving efficiencies in health care systems, and tracking safety outcomes with increased accuracy. The legacy and ongoing RCT data, can be transformed by mapping CDISC SDTM data sets into OMOP CDM for RWD.

Excelra’s Approach

Excelra understands the need of the scientific community to aggregate, extract-transform-load, standardize, visualize, and analyse this data. As key stakeholders in the research community move towards “findability, accessibility, interoperability, and reusability” (FAIR) data standards for improving biopharma productivity, our data scientists can help create scalable clinical data repositories for interacting with data in a convenient and efficient manner on an automated platform. An acute understanding of data provenance and lineage is key to successful insight generation and our skilled team leaves no stone unturned while transforming big data into effortless data engines for bespoke client solutions.

Excelra’s “Molecule to Market” processes are Health Insurance Portability and Accountability Act (HIPAA), Europe General Data Protection Regulation (EU GDPR), as well as 21 Code of Federal Regulations (CFR) part 11 compliant to ensure implementation of best practices while transforming confidential data into meaningful insights for accelerating your drug discovery needs.

The COVID-19 Vaccine Landscape

The ongoing COVID-19 pandemic is continuing to spread rapidly across the globe. As of 7th Feb 2021, the World Health Organization (WHO) has reported over 105 million cases and over 2 million deaths across 219 countries. Current clinical research is focused on accelerating the development of drugs and vaccines for treatment of SARS-CoV-2 infection.

With the number of clinical trials increasing by the day, identification and selection of potential biomarkers for inclusion in the clinical trials is of paramount importance for the success of COVID-19 clinical studies. Excelra’s COVID-19 Database, an open access biomarker database is our contribution to the global scientific community, to help identify biomarkers from published clinical trials against the novel coronavirus disease.

Vaccines in development

A broad range of candidate COVID-19 vaccines are being investigated globally using various platforms. A handful of vaccines have been approved by various regulatory authorities and many more remain in development at both clinical and pre-clinical stages.

Currently, there are 63 candidate vaccines in clinical development and 175 vaccines in pre-clinical development. There are 20 candidate vaccines in stage 3 clinical trials and 9 vaccines have been authorized across several countries.

Figure 1: Vaccines in clinical and pre-clinical development.

The vaccines can be broadly categorized into virus vaccines (Attenuated live virus, inactivated dead virus), protein-based vaccines (protein sub-units, virus-like particles), viral vector vaccines (replicating vector, non-replicating vector), and nucleic acid vaccines (DNA vaccine, RNA vaccine).

Among the candidate vaccines in clinical development, the popular ones are the protein-based vaccines followed by non-replicating vector vaccines and inactivated virus vaccines.

Figure 2: Vaccines in clinical development.

Similarly, protein-based vaccines followed by RNA based vaccines and non-replicating vector vaccines are among mostly researched vaccines in pre-clinical development.

Figure 3: Vaccines in pre-clinical development

Vaccines Approved

Below is a list of all vaccines that have achieved regulatory authorization or approval across different countries.

Table 1: List of approved vaccines.

GOBIOM’s free COVID-19 biomarker database

Excelra’s COVID-19 Biomarker Database is a collection of manually curated clinical biomarkers, meticulously annotated by our data-scientists, to support the development of drugs/vaccines for treating COVID-19. With the number of clinical trials increasing by the day, identification and selection of potential biomarkers for inclusion in clinical trials is of paramount importance for the success of COVID-19 clinical studies