Ontologies and The FAIR data principles
From Medicine to Physics, to History, and all the sciences in between, everything nowadays is “Data” that keeps on evolving by the second. This continuous and exponential growth in the “Sciences” creates the need for a solid infrastructure supporting the use and re-use of scholarly generated data across all fields. In 2016, a diversified group of stakeholders from different backgrounds representing academia, industry, funding agencies, and scholarly publishers, put collectively a “measurable” set of principles known as “The FAIR Data Principles”. By definition, “Data”, is all kinds of digital objects that are generated in research: raw research data, code, software, presentations, etc. Every letter in “FAIR” refers to four major principles among a list with a total 15 guiding principles that thoroughly describe how FAIRness of data can be achieved through technical implementation. The FAIR principles: to make data “Findable”,” Accessible”, “Interoperable”, and “Re-usable”, stem from the notion of the open science movement of being able to re-use data in new perspectives. These principles aim to act as a guideline for scholars and researchers to improve the reusability of their data. Researchers are hence not only able to publish articles and books but also provide the original data that has shaped their work. At this point, it is worth mentioning that the “FAIR Data Principles” are unique and different from previous peer initiatives that centered on the human scholar, the FAIR Principles emphasized improving the ability of “machines” to automatically seek and use the data, in addition to human scholars. “Good data management” is not a purpose by itself, but rather it is a key component on the road to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data is published. Unfortunately, the current digital model used for scholarly data publication wards off the ability to maximally benefit from the research investments. Consequently, science funders, publishers, and governmental agencies are employing data management and stewardship plans for data generated in publicly funded experiments. Beyond proper collection, annotation, and archival, data stewardship includes the idea of ‘long-term care’ of valuable digital assets, aiming for data to be discovered and re-used for downstream investigations, either alone or in tandem with newly generated data.
To summarize, to better understand the “Fair Data Principles”, one should keep in mind the following:
- Humans AND Machines are targeted for Data use and re-use.
- The FAIR principles are applied both to: “data” and “Metadata” *.
- The FAIR principles are not exclusive to open data only.
- The FAIR principles are not rules set in stone, rather they continuously evolve and get modified in research as needed to accommodate for future needs of the different fields and new data.
The FAIR Data Principles:
The first step in (re)using data is to simply find them. This means that the data can be discovered by both Humans and Machines. The data are referenced with unique and persistent identifiers (e.g. DOIs or Handles) and the metadata include the identifier of the data they describe. This principle is further subdivided:
- F1. (meta)data are assigned a globally unique and persistent identifier
- F2. Data are described with rich metadata (defined by R1 below).
- F3. metadata clearly and explicitly include the identifier of the data they describe.
- F4. (Meta)data are registered or indexed in a searchable resource.
The data are archived in long-term storage and can be made available using standard technical procedures. However, it does not mean that the data have to be openly available for everyone, rather, information on how the data could be retrieved (or not) must be available. Once the required data is found, the user needs to know how it can be accessed, possibly including authentication and authorization. This principle is further subdivided:
- A1. (Meta)data are retrievable by their identifier using a standardized communications protocol.
- A1.1 The protocol is open, free, and universally implementable.
- A1.2 The protocol allows for an authentication and authorization procedure, where necessary.
- A2. Metadata are accessible, even when the data are no longer available.
The data can be exchanged and used across different applications and systems. The data also needs to be integrated with other data from the same research field or other research fields. This is made possible by using metadata standards, standard ontologies, and controlled vocabularies in addition to meaningful links between the data and related digital research objects. In addition, the data need to interoperate with different applications or workflows for analysis, storage, and processing. This principle is further subdivided:
- I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.
- I2. (Meta)data use vocabularies that follow FAIR principles.
- I3. (Meta)data include qualified references to other (meta)data.
The purpose of FAIR is to optimize the reuse of data. The data must be well documented and provide substantial information about the context of data creation. Furthermore, the data should abide by community standards and include clear terms and conditions on how the data may be accessed and reused. This allows others to assess and validate the results of the original study, hence ensuring reproducibility, or to design new projects based on the original results. Reusable data encourage collaboration and avoid duplication of work. To accomplish this, metadata and data must be well-described so that they can be replicated and/or combined in different settings. This principle is further subdivided:
- R1. (Meta)data are richly described with a plurality of accurate and relevant attributes.
- R1.1. (Meta)data are released with a clear and accessible data usage license.
- R1.2. (Meta)data are associated with detailed provenance.
- R1.3. (Meta)data meet domain-relevant community standards.
Ontology and the FAIR data principles :
Definition: In the computer science world, by definition, “ontology” is “a formal explicit specification of a shared conceptualization of a domain of interest”. In other words, ontology is an attempt of adopting, for a given academic doctrine, a commonly shared data model expressed using
a generalization that is unanimous with used technologies.
Ontologies play a vital role in providing an open, consistent, stable identifier for a given “thing” and providing consensus in the scientific community as to what that ”thing” actually is. Ontologies also describe how types, or classes, of things are interconnected.
Fundamentally speaking, “Ontologies” encapsulate the scientific knowledge in a particular scientific domain; this is why forming ontologies is so challenging: scientists must get to agree with each other! As mentioned before, ontologies are designed by Humans but are readable by Machines: they provide a mechanism to explain to a Machine how Humans understand the things that exist in the world and how they are related. Ironically the term ‘Ontology’ often gets used to indicate any kind of controlled vocabulary, list, or taxonomy; in fact, these artifacts are better thought to be more useful as being on a wide spectrum with increasingly strong semantics, from a collection of terms (tags) to enhance categorization through to a formal description of a domain with classes and relationships. Ontology is more than just using a standard name and ID. Following a standard is extremely useful because it allows the scholar to align the data both inside and outside the organization. In other terms, this is the Interoperable part of FAIR.
Through ontology, the researcher has access to all the additional knowledge that has been formed by experts enabling the scholar to form meaningful questions at different levels and layers of a particular subject.
Ontologies are the key to adopting “FAIR”:
When working with data, one should be mindful of the “FAIR” principles. In its essence, FAIR is about making data re-usable, however, how can one do this if we are not using the same language? For example, an investigator searching for articles relating to ‘stroke’ would miss references to ‘brain attack’ or ‘Cerebrovascular accident”. Thus, ontologies are the key to deciphering FAIR: data can only be reused if it is well written, classified, described, and of high quality. Ontologies play a crucial role in some of the FAIR data principles, especially in providing support for data “Interoperability “and “Reusability “. The need for
ontologies (vocabularies) is highlighted in the following principles:
I2. Data and metadata should use vocabularies that follow FAIR principles.
I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.
R1.3. (Meta)data meet domain-relevant community standards.
Furthermore, ontologies are also relevant in terms of “Findability”, (F2) requiring
to describe data with rich metadata, and “accessibility”, (A1) metadata should
be retrievable from a unique identifier. Since ontologies are often the byproduct of research activities or essential entities in many areas of research, the FAIR principles must
be applied to them, irrespective if ontologies are being used to describe data or
Information is the cornerstone of all rational decision-making. Without the proper information/data, individuals, institutions, communities, and governments are unable systematically take optimal decisions or understand the effects of their actions. In the past decades, information technology has played a crucial role in automating and increasing the number of information spaces. Simultaneously, there has been an improvement in information access. Despite these advances, most of these automated spaces remained as independent, fragmented components in large and increasingly complex silo-based architectures. The problem is that, nowadays, many of the important questions in large organizations, governments, and scientific communities can be answered by connecting different pieces of information dispersed over these silos.
The time and efforts required to curate, integrate, and analyze silos of shattered data alone are enormous for a single researcher or group and require a significant amount of human effort, which is slow, costly, and error prone. This is where the value of:” the FAIR data principles” lies. We, at Excelra can expedite your research and all the related work to data through our programs that abide by the FAIR principles. Harnessing the expertise and services provided by Excelra can optimize your workflow and make it much more quick, efficient, and accurate.
1. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
2. Annika Jacobsen, Ricardo de Miranda Azevedo, Nick Juty, Dominique Batista, Simon Coles, Ronald Cornet, Mélanie Courtot, Mercè Crosas, Michel Dumontier, Chris T. Evelo, Carole Goble, Giancarlo Guizzardi, Karsten Kryger Hansen, Ali Hasnain, Kristina Hettne, Jaap Heringa, Rob W.W. Hooft, Melanie Imming, Keith G. Jeffery, Rajaram Kaliyaperumal, Martijn G. Kersloot, Christine R. Kirkpatrick, Tobias Kuhn, Ignasi Labastida, Barbara Magagna, Peter McQuilton, Natalie Meyers, Annalisa Montesanti, Mirjam van Reisen, Philippe Rocca-Serra, Robert Pergl, Susanna-Assunta Sansone, Luiz Olavo Bonino da Silva Santos, Juliane Schneider, George Strawn, Mark Thompson, Andra Waagmeester, Tobias Weigel, Mark D. Wilkinson, Egon L. Willighagen, Peter Wittenburg, Marco Roos, Barend Mons, Erik Schultes; FAIR Principles: Interpretations and Implementation Considerations. Data Intelligence 2020; 2 (1-2): 10–29. doi: https://doi.org/10.1162/dint_r_00024
3. Antonis Bikakis, Beatrice Markhoff, Alessandro Mosca, Stephane Jean, Eero Hyvönen, Francesco Beretta, Antonis Bikakis, Eero Hyvonen, Stéphane Jean, Beatrice Markhoff, and Alessandro Mosca. 2021. A challenge for historical research: Making data FAIR using a collaborative ontology management environment (OntoME). Semant. web 12, 2 (2021), 279–294. https://doi.org/10.3233/SW-200416
4. What is fair? FAIR. (n.d.). Retrieved from https://www.howtofair.dk/what-is-fair/#
5. Fair principles. GO FAIR. (2022, January 21). Retrieved from https://www.go-fair.org/fair-principles/
6. Poveda-Villalón, M., Espinoza-Arias, P., Garijo, D., Corcho, O. (2020). Coming to Terms with FAIR Ontologies. In: Keet, C.M., Dumontier, M. (eds) Knowledge Engineering and Knowledge Management. EKAW 2020. Lecture Notes in Computer Science(), vol 12387. Springer, Cham. https://doi.org/10.1007/978-3-030-61244-3_18
7. Lomax, J. (2019, June 11). How to use ontologies to unlock the full potential of your scientific data – part 1. SciBite. Retrieved May 14, 2022, from https://www.scibite.com/news/how-to-use-ontologies-to-unlock-the-full-potential-of-your-scientific-data-part-1/
8. J. Domingue, D. Fensel and J.A. Hendler (eds), Handbookof Semantic Web Technologies. Vol. 1. Foundation and Technologies, Springer, Berlin/Heidelberg, 2011. doi:10.1007/978-3540-92913-0