Contact Us

Life Sciences Tech

Semantic Metadata Catalog for Metadata Management

A reference point within the Enterprise Data Lakes

Traditional techniques of Data Cataloging at a data storage level creates dispersed data silos across the enterprise. An intelligent automated Data Catalog linked to diverse and distributed data storages will enable effective data governance through real time orchestration of people, processes, and technology; enabling an organization to leverage their data as an Enterprise Asset.

Excelra’s Semantic Metadata Catalog is specially designed to help automate and process organization-wide data, creating an Enterprise Data Lake to gain maximum advantage of its enterprise assets. Semantic Metadata is deeply interlinked, richly contextualized and has multiple interconnectivity. The addition of semantic metadata to the metadata content allows a higher-level of abstraction, enabling the creation of programmatic approach to cross-departmental functions and the use of dispersed data assets for a more holistic relationship view. Efficient enterprise data collaboration helps maximize the value in formats that are easy to comprehend, enabling business IT partnership.

Content is leveraged because of the people, places, organizations, brands, topics that it mentions, rather than just structural metadata itself (e.g. file format, file size, creation date etc.).

Semantic Metadata Catalog is based upon conceptual resources and REST. In effect, every resource of interest to an organization exists as a certain type (such as an employee, a product, a location etc.). Depending upon the granularity of the model and the size of the organization, there may be hundreds of these types, but there is typically an inheritance structure that can create a general taxonomy of entity types.

Source of metadata can be systems, end users and metadata API’s. An essential brick in metadata management, is to simplify and automate an enterprise information inventory, as well as update them from different databases as part of a future meta data management strategy.

Semantic Data Catalogs are often very useful for large number of heterogeneous, non-RDF based databases. This is typically the case with Biopharma data.

Semantic Metadata catalogs are ideal for storing information for real world catalogs as well. Most catalogs are, highly referential in nature, with lots of categorization, links to resources, and the need for consistent annotation. Certain aspects of catalog entries are less ideal, such as transactional content, but these can generally be stored externally and then linked to by reference. It should also be worth noting that this content data can also be retrieved as part of the generation of output either within or after a semantic query.

It is noteworthy that semantic catalogs essentially retrieve links to data, not necessarily data itself. The catalog does not automatically translate from one source to another, though having a semantic data catalog is a necessary precursor for this to happen. Schema to schema mapping (also known as ontology to ontology mapping) is a surprisingly complex process, very much akin to translating between language

These data points differ from semantic data catalogs because they are managing mappings from one ontology to another, and  constitute a pretty crucial step towards a universal data conversion engine.

Excelra has extensive domain capabilities around biology, chemistry, clinical and commercial space in developing standard ontologies for linking enterprise research data. The Enterprise Data Lake strategy always has a challenge due to variants of data catalog that usually come into play with organizations that are dealing with differing but conceptually overlapping ontologies. This is typically a problem for a given data catalog type environment. However, because of acquisitions, the enterprise still ends up with multiple ontologies that overlap and need to be translated.  In this case, there is usually the goal of creating a single ontology and although the source ontologies are still in use, an intermediate stage is needed to manage the translation until they can be phased out.

Excelra’s solution strategy  takes into account the importance of maintaining this intermediate layer with the required level of semi-automation along with a dynamic UI framework for non-technical end users to manage this with ease. While it is possible to  integrate everything into the database, by using a semantic data catalog approach, the best would be to bring in intermediate information, transforming it, and caching as appropriate for subsequent queries. This provides a mechanism for importing triples from source files at the time of querying, which can then be put into an intermediate graph, queried, and cached. Once the triples become stale, the graph is deleted.

The catalog entries enable us to effectively pick and choose the information to work with,  while allowing the system to retrieve data from the appropriate systems without having the end user to worry about the source system. The data lineage is always established through the required audit logs without making it complex to the end users.

Transformation of data is  not always reversible and hence becomes a challenge in complying with FAIR principles in an automated process. If, for instance, a transformation creates an attribute with different values based upon the state of two or more variables, disentangling that logic (which is not purely functional) can be extraordinarily complex if not outrightly impossible (for instance, calculating the average from a set of values and passing that average as the value of an attribute). However, knowing the transformation we can recalculate, there should be a change in the attribute in the source, as we have the transformation and the associated target property.

This is also critical for working with both content and digital asset management systems. The assets themselves are generally not stored within the same database as the catalog. Instead, they surface enough metadata, performing entity extraction of metadata and storing this annotational information within the Semantic Data Catalog. This also helps in resolving master data, as this makes it possible to identify both the resource identifiers and the associated relationships.

Excelra understands the need to have phase-wise data processing steps and data logs with the required flexibility to change in meta data, in the interim processing, and hence  the UI functionality has been built to consider all these requirements.

Enterprise IoT Integration

Enterprise IoT is about networks and the relationships between resources, not just in terms of simple properties but in terms of such factors as security, actions, discovery and related areas. Increasingly IoT systems are making use of semantic graphs to keep the complex web of interconnectedness manageable and easy to traverse and query. It makes it cost-effective, FAIR compliant and presents an effective data unification layer along with structured representation through Knowledge Graphs.

 

Bringing semantic technologies into the process of metadata management ensures that data is smarter for content as well as knowledge discovery and transfer. This data allows systems to automatically assign topics and categories to resources and further infer context from that information.

Excelra’s key strength lies in understanding semantic metadata and leveraging it to create and consume more interconnected, richer, well-structured and retrievable resources that can have a direct impact on an organization’s profits and performance. We have built our data understanding, expertise and design, having worked with the Bio-pharma industry over a period of 18 years.

Semantic Data Catalogs are not widely regarded due to the associated complexityExcelra’s solutioning offers an intelligent and Dynamic UI based interface, which acts as a solution for non-technical users as well, allowing the organization to better adapt and retain to a solution. An effective and efficient Semantic Data Catalog is a key component of Excelra’s Enterprise Data Strategy, making meta data the most valuable Enterprise Asset.

 

The effectiveness of semantic technology usually comes down to maintaining data discipline and governance. This is why effective metadata management is less about tools than it is about the process.

Life Sciences Tech Trends 2021

COVID-19 has had an unprecedented adverse impact on global health and economy at large. While the global economy is navigating the financial and operational challenges, the life sciences industry is at the epicenter of attention as the world awaits an effective vaccination to defeat the pandemic. The response so far has been remarkable as governments, organizations, regulators, researchers, and academia have all come together like never before; to create and share knowledge, supply resources, and provide access to technical skills and technologies. It is noteworthy that drug giant Pfizer and BioNtech got their joint SARS-CoV-2 vaccine approved in less than 8 months, which is a clear example for how such collaborations are bearing fruit. In this article we shall reflect upon the opportunities, strategies and technologies that will continue to impact the life science industry in 2021 and beyond.

1. AI powered drug R&D combined with MLOps

AI powered drug discovery is enabling big pharma & biotechs to change the traditional approach of R&D often taking between 11 – 15 years and with costs now exceeding $3 billion. From drug target identification, lead compound screening, preclinical and clinical trials, to greatly improving the success rate of drug development; AI can really streamline R&D efforts by integrating and processing vast datasets to derive actionable insights.

As running these multiple ML models at scale becomes increasingly difficult, MLOps offers an automated way of developing, deploying and refining over time. It refers to the application of DevOps tools applied over ML models from production to deployment.

Here are some use-cases where AI/ML tools are used in drug discovery and development:

Figure 1: AI/ML use-cases in drug discovery & development

2. Hyper-automation across the value chain from molecule to market

Gartner predicted that by 2024, Organizations will cut their operational costs by 30% by adopting hyper automation techniques along with redesigning their business processes. The life science industry has been relatively slow to catch up to this wave but is quickly picking up as some of the major players have already integrated hyper automation into their business strategy.

 

Potential areas span across discovery and research, development, manufacturing, sales and marketing, supply chain and distribution. Given the highly regulated nature of the industry, sector automation could potentially revolutionize compliance management, patient service and other supply chain improvements. For example; automation in drug R&D can aid identification of biomarkers and DNA/RNA genomic sequencing. On the manufacturing side it can help with continuous plant monitoring and help fast track decision system, optimize inventory and lead to better management of market demands.

Figure 2: Key enablers for hyper-automation

3. Use of advanced analytics

Pharma companies are transforming their traditional approach of developing medicines to deliver highly personalized drugs to offer the right treatment at the right time. This is being done by analyzing millions of deep, broad and disconnected data sets around a patient coming from EHR, digital data, imaging and multi-omics technologies.

 

Combined the power of AI and real world data, hypotheses can be generated at scale to improve lab testing efficiencies, helping organizations understand disease, drug effectiveness, speed up search for new indications for existing drugs and optimize pricing decisions based on value that is being delivered.

Figure 3: Types of data analytics methods

4. Creating value from next-generation real world evidence

Recent advances in Real World Evidence (RWE) analytics have made pharma organizations looks beyond just descriptive analysis that helps with basic patient profiling and cohort comparisons. Advanced predictive models in combination with ML, probabilistic models, unsupervised algorithms help understand patient characteristics, disease progression, patient response and any potential risks. This facilitates doctors to intervene at the right time and deliver the right care.

Figure 4: RWE use-cases across the pharma value chain

5. Quantum computing as pharma’s next big disruptor

The life sciences sector has the potential to benefit significantly from quantum computing. Majority of challenges in the life science industry are computationally complex, be it finding relationships among sequences, structures and functions; or determining the interaction of different molecules from drug to the body. Compared to 1% today, by 2023, 20% of organizations will be budgeting for quantum computing projects. This holds a very good potential for transforming data heavy processes, speeding up the drug discovery R&D or simulating clinical trials. Some of the use cases are:

Figure 5: Quantum computing use-cases in drug discovery

6. Adopting FAIRIFICATION: Breaking down data silos, developing machine-ready data

Adopting FAIR data principles goes a long way and Covid-19 has made the industry realize that a big potential exists in collaborations to produce effective drugs for better patient outcomes. FAIRIFICATION helps organizations combine external datasets with proprietary information to gain novel insights using graph technologies.

 

Adopting better methods for data capture and structuring, complex models for data storage and utilizing advanced platforms to auto-identify data relationships, are signs that organizations are prepared for tomorrow, where the data is ready for machine consumption.

Figure 6: Consumption patterns of data lakes

Conclusion

As global economies continue to recover and are looking at innovations to bridge technological gaps, Covid-19 has only accelerated the process and has made us more adaptable and responsive. Organizations are looking at realigning their traditional corporate strategies and are adopting a platform-first strategy that is dynamic to handle a wide range of uncertainties.