Skip to main content

Genesis of the concept of FAIRification

The information age (mid-20th century onwards) witnessed a boom in data generation and digitization. The current century is an era of not only data creation but also data analytics which yields the true value of data. The effective utilization of data to enhance value for all is an evolving concept which resulted in the inception of ‘FAIR’ data practices. FAIRification comprises 15 guiding principles outlined by Wilkinson et al (2016) which are aimed at enhancing the findability, accessibility, interoperability, and reusability of data. It is a way of connecting and harnessing the power of data being generated to maximize its utility. Learn more about FAIR concepts in our blog on FAIRification for scientific data connectivity.

Each principle has a core set of values which reflect its utility. The Findability principle assigns the data/metadata a universality providing recognition for research. Accessibility reflects the openness amongst data providers to share data. Interoperability provides ease and equal accessibility. Reusability empowers multiple uses of the data thereby increasing its impact.

Figure 1: FAIR guiding principles as outlined by Wilkinson et al 20161

Need of the hour (Challenges, Necessity and Current approaches)

Conventionally, data lakes and warehouses were constructed to manage and disseminate high-quality data consistently. Protected, siloed data became a major impediment to knowledge discovery and innovation. Lack of community standard ontologies to normalize heterogeneous data and high costs for regulating digital processes proved to be key challenges. Efficient data management, governance, and availability became necessary. Implementation of the FAIR guiding principles helps break down data silos, making data available to both humans and machines.

Aligning FAIRified data is needed at multiple levels with high priority for:

Unlocking scientific transformation – A shift from ‘my data’ to a corporate asset fosters FAIR data sharing.

Reducing expenses – Not having FAIR data costs an estimated €10.2bn/year to the European economy.

Increasing strategic value addition – Harness the power of AI/ML on FAIR data to accelerate creation of valuable data assets.

Minimize data wrangling – Decreases time and effort for preparing data into analysis-ready formats.

Current approaches evaluate FAIRness using maturity indicators and quantifiable metrics applied to data, metadata, and infrastructure. While scoring for findability and accessibility is attained at the metadata level, interoperability and reusability may need multiple iterations.

FAIR principles are anecdotal and act as guidelines in the FAIRification process. The current approaches evaluate FAIRness through crucial maturity indicators and quantifiable metrics applied to data, metadata, and associated infrastructure4. Scoring for findability and accessibility is attained at the metadata level in one go, but assessment of interoperability and reusability might entail intensive iterations. The FAIRification process can be broadly categorized into the following steps:

  • Retrieve non-FAIR data: Access data to be FAIRified
  • Analyze retrieved data: Examine data content with respect to concepts, structure, relationships between different data elements, different data identification methodologies and analysis, provenance, etc.
  • Define semantic model for data: Use community, purpose, and domain-specific ontologies and controlled vocabularies to describe and define dataset entities, concepts, and relations in an accurate, unambiguous and machine-actionable format
  • Make data linkable: The non-FAIR data can be transformed into linkable data by applying the semantic model defined in step 3 using Semantic Web and Linked Data technologies. This ensures interoperability and reuse, facilitating the integration of the data with other types of data and systems
  • Assign license: Ensure data license information is included, else reuse of data might get hampered
  • Define metadata for the dataset: Ensure that the data is described by proper and rich metadata to support all aspects of FAIR data assessment
  • Deploy FAIR data resource: Deploy or publish the FAIRified data, together with relevant metadata and a license, so that the metadata can be indexed by search engines and the data can be accessed, even if authentication and authorization is required.

    Figure 2: FAIRification workflow adapted from GO FAIR4.

These steps align closely with best practices in data curation, ontology management, and enterprise-scale data governance.

Utility and benefits

FAIRifying data yields multiple benefits:

Research communities: Seamless data acquisition, integration, and analytics expedite discoveries.

Biopharmaceutical sector: Reduced time for drug discovery and high-quality datasets for analytics.

Business impact:

Financial: Improved data usability increases productivity and revenue per employee.

Operational: Better asset utilization and forecasting.

Customer-oriented: Faster innovation cycles.

Organizations increasingly invest in FAIRification to capitalize on data value and emerging technologies like AI/ML.

From the point of view of business, the impact is three-fold namely; financial, operational and customer oriented6.

  • Financial impact: A study by Barua A et al measuring the impact of effective data on business shows that upon improving the usability of data by a mere 10% there is an estimated increase in sales per employee by 14.4%. Also, reduction in the effort and time to make data useful for the user results in a significant improvement in the productivity per employee6.
  • Operational impact: It involves effective utilization of assets, accurate planning and forecasting6.
  • Customer-oriented impact: It results in a better ability to innovate in relatively short periods of time6.

Thus, processes like FAIRfication which are intimately involved in improving data sharing and usability would have an overall long-term positive impact on business6.

These benefits are now being recognized across sectors and many organizations are investing time and effort towards FAIRification with a long-term astute goal in mind.

Excelra’s approach

Excelra understands the importance of FAIRifying data. Evaluating FAIRness is the first step toward FAIRification. With 18+ years in data science and domain expertise, Excelra has developed a streamlined process with a customized SOP and quantitative assessments of database compliance to FAIR principles. Domain experts, supported by deep technical experience in data curation, bioinformatics, and semantics data services, help partners understand FAIR compliance and recommended improvements.

Figure 3: Schema for FAIR evaluation of a given database.

Case study: FAIRness assessment of public life sciences databases

As a first step to understand database FAIRness, 12 public databases covering proteins, drugs, genes, pathways, and diseases were evaluated using our FAIR assessment methodology:

  • Proteins: PDB, Binding DB, UniProt
  • Drugs/Chemicals: PharmGKB, ChEMBL, PubChem, DrugBank
  • Genes: NCBI Gene, Ensembl, GWAS Catalog
  • Pathways: Reactome
  • Diseases: DisGeNET

Results showed widespread compliance with metadata-related FAIR principles. Interoperability and reusability compliance varied across resources, highlighting the evolving nature of FAIR adoption.

The questionnaire and SOP were extensively utilized for scoring and evaluating each database. Based on the assessment and scoring the results were collated.

Figure 4: Summary of the quantitative assessment of 12 public databases for FAIR compliance.

Salient features from the analysis are:

  • All databases are compliant to >13 of the 15 principles.
  • All the databases irrespective of their themes have relevant descriptive metadata elements incorporated. These are usually compliant to the FAIR principles related to metadata and their associated identifiers.
  • Findable- The 12 databases evaluated are public and are compliant to the ‘Findable’ principle. PharmaGKB and GWAS catalog are partially compliant to F2 as they lack certain metadata types.
  • Accessible- Most databases evaluated, have data dumps accessibility. These are compliant to the ‘Accessibility’ principle. However, 5 databases are not compliant with the A2 principle. This principle discusses the availability of metadata even after the data is no longer available. This indicates that some databases do not make versions of their databases available. Lack of retrospective data and or lack of evidence of the existence of data leads to A2 non-compliance.
  • Interoperable- Databases evaluated in the current study show compliance to the principles of ‘Interoperability’. ChEMBL however, is not I3 compliant as the chemical entity output does not include references to related metadata
  • Reusability- Certain databases are partially compliant to R1. It is observed that the ‘About’ page of these databases do not provide all the requisite information. All the databases are compliant with the R1.1, R1.2 and R1.3 principles.

FAIR being a new evolving concept which has come up in the past decade, many databases are yet to be fully FAIR compliant. Although it is observed that most databases in the life sciences domain frequently utilized by both academicians and in industries are mostly FAIR compliant. The extensive use of these and their proven utility across years is also a proof to how being FAIR compliant has helped them.

Excelra’s edge

Excelra’s edge in FAIRification evaluation

  • 18+ years of data science experience with 60+ PhDs across a 600+ talent pool
  • Global collaborations spanning biotech to top pharma
  • Deep experience in scientific informatics and database solutions
  • Expertise in data annotation, validation, ontology management, and integration
  • In-house analytical and domain expert blend for tailored FAIR solutions

Future outlook

FAIRifying data will accelerate knowledge discovery and reduce time and cost across scientific research and industrial innovation. By adopting FAIR data, communities will leverage advanced analytic methods such as AI/ML to unlock future breakthrough opportunities and enhance data reuse across disciplines.