Skip to main content
search

Contributors: Snehal Dilip Karpe, Santosh Behera, Lingaraja Jena, Mahendra Pal Singh, Govardhan Kothapalli, Malarvizhi A, Veronica Medikare, Sidharth Shankar Jha, Uzma Saeed, Jitesh Pillai, Puneet Saxena, Chandra Sekhar Pedamallu

Introduction

Data has become one of the most valuable assets for modern-day drug discovery. Data along with cutting-edge technologies like AI and big data analytics are enabling the drug discovery process in ways never seen before, e.g. soon AI-innovation may be able to save $54 billion in R&D costs annually 1. Recognizing the value of the data and the Pharma sector may spend more than US$4.5 billion on digital transformation by 2030 2.

Biomedical data is unique to bio-pharmaceutical sectors, and it can range from omics and clinical data to patient health records. Omics data, especially the next generation sequencing data, is increasing exponentially thanks to recent advances in sequencing technologies. However, it is still underutilized but well-positioned to be leveraged as a data asset that can propel unprecedented growth 3,4.

Figure 1: Relevance of Omics Data at Different Stages of Drug Discovery Pipeline

Omics data provides a snapshot of the molecular events at underplay in any given sample and thus can be used at all stages of the drug development process in Pharma and Biotech sectors to a different effect (Figure 1). The advancement in genomic and transcriptomic sequencing technologies allows us to answer a plethora of questions at every stage of the discovery process, thereby making the scientific interventions more precise. At the discovery stage, omics data can help bio-pharma companies in target identification, patient stratification, disease / drug biomarkers, combination partner prediction, etc. In phase I clinical trials, omics data can aid bio-pharma companies in dose escalation studies (focus is to identify optimal dosage). In addition, accompanied omics data may provide clues behind the reasons for toxicity and safety. Enriching responsive patients as part of the phase II clinical trials can be done using omics-based biomarker signatures. As part of the phase III clinical trials, longitudinal omics data can detect early response or resistance. Even after regulatory approvals, omics data and omics-based biomarkers are being used in development of companion diagnostics or to check the suitability of the drug for repurposing against other indications.

There is no doubt that the ultra-modern sequencing technologies along with the low running cost benefits is providing lot of ammunition to the researchers but also it has the power to generate huge amounts of data. The scientific community, realizing the technology’s potential and ease in accessibility (now) has made various datahubs available like GEO (Gene Expression Omnibus), ArrayExpress (to name a few). This has led to ushering in contributions from several groups, though working siloed but do have a common goal with the rest of the world.

Key success factors that influence usage of these datasets are data quality and metadata quality and completeness. So, the omics datasets collected from various sources (e.g., publications, data repos like GEO, ArrayExpress) require cleanup, transformation, formatting, and arrangement. Collection of these datasets (data + meta data) can become a true “asset” to biopharma companies. These collections of curated datasets are called data assets.

Figure 2: Framework for Data Asset Creation in Biomedical Domain

Figure 2 depicts the framework which includes guidelines, processes, and tools for data collection, storage, analysis, and utilization used for data asset building in biomedical domain. A data asset framework has several critical steps which need to be followed for building invaluable data assets.

Objective of the study:

Data asset creation starts with an objective in mind, that an organization wants to achieve under a specific portfolio. A few of the objectives are based on repurposing drugs, studying drug effects in different indications, competitive marketed-drug assessment, as well as even evaluating the right indication for a drug of their interest. Let’s say a pharma company is interested in finding and ranking indications that are best suited to the treatment with their proprietary drug asset. They may be able to leverage the known asset-specific biomarkers, use them on the publicly available omics data, and classify the patients into putative responders and non-responders. This patient stratification and percentage of responders can assist in the indication prioritization. A data asset created for such goals and processes is mainly composed of disease signatures-focused omics datasets collected from public domains. The other objective could be in performing a competitive assessment of various drugs against an indication of interest, they may be interested in gathering information on all the approved drugs as well as clinical candidates, their mechanisms of action, and allied omics datasets that capture the treatment effects of these drugs on the indication of interest. In this way, a precisely defined objective drives the scope and further planning.

Defining the scope:

This includes the steps that are required for the smooth conduction of the data asset creation process as per the set timelines and budget. For creating data assets based on omics data, various factors should be considered. Based on the defined objective, one may choose to gather data assets focused on sample properties such as the type of indication, intervention, tissue type, cell line, target, population, etc. From the process point of view, one may choose to decide on specific technologies (e.g. genomics, proteomics, metabolomics), further specifications like platforms (e.g. microarray, RNA-seq), and even file formats and preprocessing (e.g. fastq, raw counts, TPM). These will be included in the inclusion and exclusion criteria for selecting the datasets. For e.g., an organization may be interested in gathering data assets for creating a disease landscape for breast cancer subtypes. They need to choose whether they would be interested in only the omics datasets with samples from breast tissue, or they may very well also gather datasets with samples from blood, which could be helpful to develop diagnostic, prognostic, or drug efficacy markers that can be made use of through a simple blood test.

Data acquisition – Dataset survey, sources, and accessibility:

Based on the scope and further planning, the data identification and extraction are performed. If the data contains in-house generated data, then the team should follow all the regulatory and ethical considerations for sample collection and generate the data using scientifically approved methods or protocols. For the public datasets, the team will check the availability of relevant data in the public domain and curate the same using standard procedures as planned. It includes finding the right set of resources to collect the data e.g. GEO (Gene Expression Omnibus) and ArrayExpress will be the first stop for gathering public domain omics datasets. If one is specifically interested in single-cell expression datasets, Single Cell Expression Atlas also needs to be scanned in addition. Many high throughput proteomics datasets are not available in the public databases mentioned before, but they may be found through the websites of specific platforms like Olink 5 or SomaLogic 6 or shared as supplementary information of individual publications. As per the defined scope, the availability of the data with required sampling and processing properties can be checked in the metadata of the datasets and individual samples. Data can be fetched through programmatic interfaces e.g. using R modules.

Also, a possibility exists that some datasets are not publicly available or even available upon request. Moreover, some datasets listed in GEO may not be linked to a publication. All these add to the challenges of accessibility and further enrichment of the data.

Data enrichment:

To convert this dispersed data to actionable knowledge, metadata related to these entities needs to be gathered and managed efficiently. Several tools can help in this quest (Figure 3).

Figure 3: Examples of Available Tools and Frameworks for Metadata Management and Integration

The metadata from various resources and by different authors may follow different vocabularies and ontologies. The collected data will undergo a cleaning and organizing process in the desired format. Controlled vocabularies, ontologies, and semantics should be employed such that the data is tagged with relevant and consistent labels. Metadata greatly affects how the data will be used and reused 7. Hence, utmost care should be taken for the quality and integrity of datasets as it is very crucial for reliable and reproducible research findings.

  1. Data quality: The data quality can be checked by accuracy, completeness, and validity. While collecting the data from various resources, the complete downloading of the correct set of datasets and their associated information should be checked. The validity of data can be ensured by checking if the data were generated using scientifically approved methods. Once the available metadata has been tagged correctly, any missing information or unusual observations should be rechecked.
  2. Data Integrity: The data integrity can be checked by traceability and authenticity. Traceability ensures that the data collected should be traced back to its source. Once the source is identified, it is highly advisable to check the authenticity by checking if the data were associated with any publication or deposited in a well-curated platform.

Data Interoperability and Integration

Initially, the data can be integrated at the level of the same type/technology of the data asset E.g. all the RNA-seq samples coming from various datasets should be standardized/normalized such that any sample / group may be analyzed together or compared with any other sample / group without erroneous results.

At later stages, the data may be integrated across different data types. This involves combining data from various sources, such as clinical trials, laboratory research, and patient records, to provide a unified view.   Integrating data from various sources is essential in Pharma and Biotech to create a comprehensive view of research and patient outcomes. This involves using ETL (Extract, Transform, Load) processes to combine data from disparate systems such as electronic health records (EHRs), laboratory information management systems (LIMS), and clinical trial management systems (CTMS). Interoperability ensures that these systems can communicate and exchange data efficiently, facilitating smoother workflows and enabling holistic analysis across the organization 8,9.

At both the stages, interoperability and integration is crucial –

  1. Interoperability: The most crucial and important task in data asset is harmonizing data obtained from various sources. It makes the data ready to implement with other datasets. This can be achieved by various processes such as preprocessing, normalization, transformation, annotation, format and data sharing. Pre-processing is performed by cleaning raw data to remove errors, duplicates, and irrelevant information based on the planning. Merging datasets from different studies requires consistent data across studies i.e. the data range, and their normalization/pre-processing steps. Therefore, standardizing the data formats and values across different data sets is crucial. Once the data is merged, there is a need of co-normalization/batch-correction to ensure there is no study level variates in the dataset. The effectiveness of the normalization and/or batch correction procedures needs to be evaluated with the help of analysis and visualization options like PCA. Moreover, most of the technologies which are used to collect the datasets, have their specific set of identifiers attached to their data such as probe set ids in Microarray data. These probeset ids are not biologically meaningful unless they are annotated or mapped to corresponding genes. Therefore, adding metadata and context to the data, such as gene annotations, protein functions, and phenotypic information is one of the critical steps in data asset creation. Tools like NextFlow 10, Snakmake 11, Galaxy 12, Common Workflow Language (CWL) 13, and Apache Airflow 14 are indispensable in bioinformatics for automating and managing complex data analysis pipelines required to ensure interoperability. These tools enhance scalability, reproducibility, and ease of use, making it possible to handle large volumes of biomedical data efficiently and effectively. By leveraging these tools, researchers can focus on extracting meaningful insights from their data while ensuring rigorous standards of analysis and reproducibility.
  2. Linking data sources/Data integration: While the data were collected across multiple technologies, it is advisable to integrate these datasets for creating more meaningful asset. For example, the multi-omics data integration process. Combining data from different sources to create comprehensive and unified datasets. This often involves resolving conflicts, merging overlapping data, and aligning data from various platforms and experiments.

Validation

It is important to ensure that most of the biological signal is retained after these steps by correlating the results from harmonized / integrated data with those generated from the individual data pieces as reported in the individual publications. There may be multiple challenges here. The tools, methodologies, statistical processes, contrasts, etc. used in individual publications may be outdated as per the latest scientific trends or may be inappropriate for the desired outcome. In such scenarios, the analysis workflow as applied to the integrated data may have to be applied to select individual data pieces to ensure that the two are not drastically different from each other. Additionally, the results from integrated data can also be cross validated by comparing them with those from a third resource that has not been included within the data asset.

Data storage, analysis and visualization:

After collecting and preprocessing the data, it is a good practice to keep all the data generated in the same format e.g., tab separated values (.tsv) or comma separated values (.csv). Most of the organizations prefer to keep the data in a computer readable format so that the data can be used in downstream analysis with ease. Once the data asset is generated, it is important to select the format like binary, h5ad to store that data with less memory. There are various specialized tools available to store, access, visualize and analyze complex integrated biological data.

  1. A variety of options are available for storage of the data, from relational or NoSQL databases to data warehouses, data marts, data vaults, data lakes, data lakehouses, etc. Depending on the needs of the business and the legacy storage systems and their adaptability to change, companies can opt for the most suitable solution for them which can enable further applications of Business Intelligence and Machine Learning. Popular tools for business analytics and visualization like R shiny, Spotfire, Tableau, Power BI, etc. can also be tuned to ingest biomedical data effectively 15.
  2. Tools like TileDB 16 are used for complex data storage and analysis. TileDB can store large genomic datasets and provide fast, efficient access for analysis. Its flexible structure allows for the integration of different types of omics data, supporting comprehensive biomedical research.
  3. Rosalind 17 is a cloud-based platform designed to facilitate biomedical data analysis and collaboration. It provides an intuitive interface for researchers to upload, analyze, and visualize data without needing advanced computational skills. Rosalind offers tools for various bioinformatics analyses, including sequencing data processing, differential expression analysis, and pathway analysis. It also enables researchers to share data and results easily, fostering collaboration.

Key Considerations for Data Asset Management and Usage

Data asset management in biomedical research is critical for effective handling, storing, and analyzing large volumes of complex biological data. It helps to enhance research efficiency, support reproducibility, facilitate decision-making, accelerate scientific discoveries, and maintain a competitive edge. A set of best practices can guide drug developers and stakeholders to transform their data assets into strategic resources and drive innovation, enhance patient outcomes, and ensure regulatory compliance 18,19 (Figure 4).

Figure 2: Key factors to be considered in target prioritization

The metadata from various resources and by different authors may follow different vocabularies and ontologies. The collected data will undergo a cleaning and organizing process in the desired format. Controlled vocabularies, ontologies, and semantics should be employed such that the data is tagged with relevant and consistent labels. Metadata greatly affects how the data will be used and reused 7. Hence, utmost care should be taken for the quality and integrity of datasets as it is very crucial for reliable and reproducible research findings.

  1. Data quality: The data quality can be checked by accuracy, completeness, and validity. While collecting the data from various resources, the complete downloading of the correct set of datasets and their associated information should be checked. The validity of data can be ensured by checking if the data were generated using scientifically approved methods. Once the available metadata has been tagged correctly, any missing information or unusual observations should be rechecked.
  2. Data Integrity: The data integrity can be checked by traceability and authenticity. Traceability ensures that the data collected should be traced back to its source. Once the source is identified, it is highly advisable to check the authenticity by checking if the data were associated with any publication or deposited in a well-curated platform.

Data Interoperability and Integration

Initially, the data can be integrated at the level of the same type/technology of the data asset E.g. all the RNA-seq samples coming from various datasets should be standardized/normalized such that any sample / group may be analyzed together or compared with any other sample / group without erroneous results.

At later stages, the data may be integrated across different data types. This involves combining data from various sources, such as clinical trials, laboratory research, and patient records, to provide a unified view.   Integrating data from various sources is essential in Pharma and Biotech to create a comprehensive view of research and patient outcomes. This involves using ETL (Extract, Transform, Load) processes to combine data from disparate systems such as electronic health records (EHRs), laboratory information management systems (LIMS), and clinical trial management systems (CTMS). Interoperability ensures that these systems can communicate and exchange data efficiently, facilitating smoother workflows and enabling holistic analysis across the organization 8,9.

At both the stages, interoperability and integration is crucial –

  1. Interoperability: The most crucial and important task in data asset is harmonizing data obtained from various sources. It makes the data ready to implement with other datasets. This can be achieved by various processes such as preprocessing, normalization, transformation, annotation, format and data sharing. Pre-processing is performed by cleaning raw data to remove errors, duplicates, and irrelevant information based on the planning. Merging datasets from different studies requires consistent data across studies i.e. the data range, and their normalization/pre-processing steps. Therefore, standardizing the data formats and values across different data sets is crucial. Once the data is merged, there is a need of co-normalization/batch-correction to ensure there is no study level variates in the dataset. The effectiveness of the normalization and/or batch correction procedures needs to be evaluated with the help of analysis and visualization options like PCA. Moreover, most of the technologies which are used to collect the datasets, have their specific set of identifiers attached to their data such as probe set ids in Microarray data. These probeset ids are not biologically meaningful unless they are annotated or mapped to corresponding genes. Therefore, adding metadata and context to the data, such as gene annotations, protein functions, and phenotypic information is one of the critical steps in data asset creation. Tools like NextFlow 10, Snakmake 11, Galaxy 12, Common Workflow Language (CWL) 13, and Apache Airflow 14 are indispensable in bioinformatics for automating and managing complex data analysis pipelines required to ensure interoperability. These tools enhance scalability, reproducibility, and ease of use, making it possible to handle large volumes of biomedical data efficiently and effectively. By leveraging these tools, researchers can focus on extracting meaningful insights from their data while ensuring rigorous standards of analysis and reproducibility.
  2. Linking data sources/Data integration: While the data were collected across multiple technologies, it is advisable to integrate these datasets for creating more meaningful asset. For example, the multi-omics data integration process. Combining data from different sources to create comprehensive and unified datasets. This often involves resolving conflicts, merging overlapping data, and aligning data from various platforms and experiments.

Validation

It is important to ensure that most of the biological signal is retained after these steps by correlating the results from harmonized / integrated data with those generated from the individual data pieces as reported in the individual publications. There may be multiple challenges here. The tools, methodologies, statistical processes, contrasts, etc. used in individual publications may be outdated as per the latest scientific trends or may be inappropriate for the desired outcome. In such scenarios, the analysis workflow as applied to the integrated data may have to be applied to select individual data pieces to ensure that the two are not drastically different from each other. Additionally, the results from integrated data can also be cross validated by comparing them with those from a third resource that has not been included within the data asset.

Data storage, analysis and visualization:

After collecting and preprocessing the data, it is a good practice to keep all the data generated in the same format e.g., tab separated values (.tsv) or comma separated values (.csv). Most of the organizations prefer to keep the data in a computer readable format so that the data can be used in downstream analysis with ease. Once the data asset is generated, it is important to select the format like binary, h5ad to store that data with less memory. There are various specialized tools available to store, access, visualize and analyze complex integrated biological data.

  1. A variety of options are available for storage of the data, from relational or NoSQL databases to data warehouses, data marts, data vaults, data lakes, data lakehouses, etc. Depending on the needs of the business and the legacy storage systems and their adaptability to change, companies can opt for the most suitable solution for them which can enable further applications of Business Intelligence and Machine Learning. Popular tools for business analytics and visualization like R shiny, Spotfire, Tableau, Power BI, etc. can also be tuned to ingest biomedical data effectively 15.
  2. Tools like TileDB 16 are used for complex data storage and analysis. TileDB can store large genomic datasets and provide fast, efficient access for analysis. Its flexible structure allows for the integration of different types of omics data, supporting comprehensive biomedical research.
  3. Rosalind 17 is a cloud-based platform designed to facilitate biomedical data analysis and collaboration. It provides an intuitive interface for researchers to upload, analyze, and visualize data without needing advanced computational skills. Rosalind offers tools for various bioinformatics analyses, including sequencing data processing, differential expression analysis, and pathway analysis. It also enables researchers to share data and results easily, fostering collaboration.

Key Considerations for Data Asset Management and Usage

Data asset management in biomedical research is critical for effective handling, storing, and analyzing large volumes of complex biological data. It helps to enhance research efficiency, support reproducibility, facilitate decision-making, accelerate scientific discoveries, and maintain a competitive edge. A set of best practices can guide drug developers and stakeholders to transform their data assets into strategic resources and drive innovation, enhance patient outcomes, and ensure regulatory compliance 18,19 (Figure 4).

Figure 4: Best Practices for Managing and Using Data Assets in Pharma and Biotech

Biomedical Data Assets: Applications, Resources and Tools

Bio-Pharmaceutical companies leverage data assets to explore various areas like uncovering a drug’s mechanism of action (MoA), analyzing disease landscapes, and identifying molecular signatures for target, drug and disease (Figure 5). All these data assets are linked to various applications, resources and tools (Figure 6).

Figure 5: Various Applications of Biomedical Data Assets

Genomic, proteomics, and metabolomic data along with the associated metadata can be used to develop a powerful data asset that can be used for target identification and prioritization, biomarker identification, patient stratification, disease prioritization, and study molecular and pathway signatures of a disease. A crucial application of data assets can also be elucidation of drug mechanism of action (MoA). Integration of diverse datasets on genomics, transcriptomics, and proteomics, may help researchers gain deep insights into how the drug interacts with target proteins, thus impacting drug efficacy and safety 20. This knowledge helps refine drug development and predict potential side effects. Additionally, advanced data analysis techniques like machine learning unlock novel insights into drug combination opportunities. Drug repurposing or repositioning is the process of discovering new therapeutic uses for existing drugs. Of late, drug repurposing has gained traction due to challenges associated with traditional drug development process 21. Overall, data assets play a central role here by consolidating datasets like drug repositories, clinical trials, and biomedical literature (Figure 5,6). Computational analysis of multi-omics datasets and data from various repositories, facilitates the identification of potential alternate indications for existing drugs, thus expediting the discovery of new treatment options and reducing costs.

Figure 6: Resources and Tools Used in Different Application Areas of Biomedical Research

Understanding the disease landscape, encompassing factors like prevalence, genetics, and treatment options, is key to successfully repurposing existing drugs for new therapeutic applications. Competitive data assessment informs about potential market size, patient populations, research priorities, potential competitors in a disease space, and any unmet medical needs. They also tell if the drug under development first in-class will be or best in class drug, thus helping developers in strategic decisions.

In the fight against disease, researchers utilize a trio of powerful tools: disease signatures, target signatures, and drug signatures. Disease signatures, unique molecular fingerprints of a specific illness, aid in diagnosis, treatment selection, and understanding the disease itself. These signatures are built from genetic, epigenetic, and proteomic markers.  Meanwhile, target signatures focus on a specific molecule or pathway known to be involved in the disease process. By analyzing these target signatures, researchers can identify potential drugs that can modulate the target’s activity, guiding drug discovery efforts.  Finally, drug signatures reflect the cellular changes caused by a particular drug based on responder profile, offering insights into its mechanism of action. This information, like a disease signature but reflecting the drug’s influence, is crucial for understanding a drug’s effectiveness.

Overall, data assets are revolutionizing drug discovery by providing researchers with a wealth of information.  From elucidating drug mechanisms to uncovering new treatment options, these assets are propelling innovation and improving patient outcomes across the healthcare landscape.

Conclusion

A drug discovery program starts with identification of novel and effective drug development technology with biological targets for the development of new drugs with the unmet clinical need. Discovering and evaluating the potential therapeutic benefit of a drug target is founded not only on experimental, mechanistic, and pharmacological studies but also on a theoretical molecular druggability assessment, an early evaluation of potential safety measures and through considerations regarding opportunities for commercialization as well as options for generation of IP. Traditional approaches are inadequate for large-scale exploration of novel drug targets, as they are expensive, time-consuming, and laborious. In recent years, various computational strategies for predicting potential druggable proteins have emerged, which commonly use the sequence, structural, and functional features of proteins as input but also system-level properties such as network topological features. Despite tremendous success, unfortunately many promising and experimentally validated targets are not within the scope of drug modifiability. Discovery of next generation technologies including targeting protein degradation, protein stabilizers (RESTORACs), excellent drug delivery system, targeting PPI, targeting intrinsically disordered regions, as well as targeting protein-DNA binding may provide significant assistance in overcoming these undruggable targets.

References

  1. Kevin Gawora. Fact of the Week: Artificial Intelligence Can Save Pharmaceutical Companies Almost $54 Billion in R&D Costs Each Year | ITIF. Information Technology & Innovation Foundation https://itif.org/publications/2020/12/07/fact-week-artificial-intelligence-can-save-pharmaceutical-companies-almost/ (2020).
  2. Source: ABI Research. Pharma Industry to Spend $4.5 Billion on Digital Transformation by 2030. PR Newswire (2021).
  3. Ahmed, Z., Wan, S., Zhang, F. & Zhong, W. Artificial intelligence for omics data analysis. BMC Methods 2024 1:1 1, 1–4 (2024).
  4. Chen, B. et al. Harnessing big ‘omics’ data and AI for drug discovery in hepatocellular carcinoma. Nature Reviews Gastroenterology & Hepatology 2019 17:4 17, 238–251 (2020).
  5. Scientific publications — Olink®. Olink® Part of Thermo Fisher Scientific https://olink.com/knowledge/publications (2024).
  6. Publications – Resources – Life Science Research – SomaLogic. SomaLogic Operating Co., Inc. https://somalogic.com/publications/ (2024).
  7. Caliskan, A., Dangwal, S. & Dandekar, T. Metadata integrity in bioinformatics: Bridging the gap between data and knowledge. Comput Struct Biotechnol J 21, 4895–4913 (2023).
  8. Huser, V., Sastry, C., Breymaier, M., Idriss, A. & Cimino, J. J. Standardizing data exchange for clinical research protocols and case report forms: An assessment of the suitability of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM). J Biomed Inform 57, 88–99 (2015).
  9. Torab-Miandoab, A., Samad-Soltani, T., Jodati, A. & Rezaei-Hachesu, P. Interoperability of heterogeneous health information systems: a systematic literature review. BMC Med Inform Decis Mak 23, (2023).
  10. A DSL for parallel and scalable computational pipelines | Nextflow. nextflow https://www.nextflow.io/ (2024).
  11. Mölder, F. et al. Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021).
  12. Galaxy. Galaxy https://usegalaxy.org/ (2024).
  13. Home | Common Workflow Language (CWL). commonwl https://www.commonwl.org/ (2024).
  14. Apache Airflow. Apache Airflow https://airflow.apache.org/ (2024).
  15. Clinical Data Visualizations at BMS Merge Trial and Real-World Data. Bio-IT World https://www.bio-itworld.com/news/2020/10/26/clinical-data-visualizations-at-bms-merge-trial-and-real-world-data (2020).
  16. TileDB Open-Source. TileDB https://tiledb.com/open-source/life-sciences/ (2024).
  17. Discovery Platform & Data Hub for Scientists | ROSALIND®. Rosalind https://www.rosalind.bio/ (2024).
  18. Holdsworth, J. What Is Data Management? | IBM. IBM https://www.ibm.com/topics/data-management (2024).
  19. Albrecht, B. et al. Top ten observations from 2022 in life sciences digital and analytics. McKinsey & Company https://www.mckinsey.com/industries/life-sciences/our-insights/top-ten-observations-from-2022-in-life-sciences-digital-and-analytics (2023).
  20. Woo, J. H. et al. Elucidating Compound Mechanism of Action by Network Perturbation Analysis. Cell 162, 441–451 (2015).
  21. Jourdan, J. P., Bureau, R., Rochais, C. & Dallemagne, P. Drug repositioning: a brief overview. J Pharm Pharmacol 72, 1145–1151 (2020).

How can we help you?

We speak life science data and help you unlock its potential.

Please fill the form


"*" indicates required fields

This will close in 0 seconds