Semantic Metadata Catalog for Metadata Management
A reference point within the Enterprise Data Lakes
Traditional techniques of Data Cataloging at a data storage level creates dispersed data silos across the enterprise. An intelligent automated Data Catalog linked to diverse and distributed data storages will enable effective data governance through real time orchestration of people, processes, and technology; enabling an organization to leverage their data as an Enterprise Asset.
Excelra’s Semantic Metadata Catalog is specially designed to help automate and process organization-wide data, creating an Enterprise Data Lake to gain maximum advantage of its enterprise assets. Semantic Metadata is deeply interlinked, richly contextualized and has multiple interconnectivity. The addition of semantic metadata to the metadata content allows a higher-level of abstraction, enabling the creation of programmatic approach to cross-departmental functions and the use of dispersed data assets for a more holistic relationship view. Efficient enterprise data collaboration helps maximize the value in formats that are easy to comprehend, enabling business IT partnership.
Content is leveraged because of the people, places, organizations, brands, topics that it mentions, rather than just structural metadata itself (e.g. file format, file size, creation date etc.).
Semantic Metadata Catalog is based upon conceptual resources and REST. In effect, every resource of interest to an organization exists as a certain type (such as an employee, a product, a location etc.). Depending upon the granularity of the model and the size of the organization, there may be hundreds of these types, but there is typically an inheritance structure that can create a general taxonomy of entity types.
Source of metadata can be systems, end users and metadata API’s. An essential brick in metadata management, is to simplify and automate an enterprise information inventory, as well as update them from different databases as part of a future meta data management strategy.
Semantic Data Catalogs are often very useful for large number of heterogeneous, non-RDF based databases. This is typically the case with Biopharma data.
Semantic Metadata catalogs are ideal for storing information for real world catalogs as well. Most catalogs are, highly referential in nature, with lots of categorization, links to resources, and the need for consistent annotation. Certain aspects of catalog entries are less ideal, such as transactional content, but these can generally be stored externally and then linked to by reference. It should also be worth noting that this content data can also be retrieved as part of the generation of output either within or after a semantic query.
It is noteworthy that semantic catalogs essentially retrieve links to data, not necessarily data itself. The catalog does not automatically translate from one source to another, though having a semantic data catalog is a necessary precursor for this to happen. Schema to schema mapping (also known as ontology to ontology mapping) is a surprisingly complex process, very much akin to translating between language
These data points differ from semantic data catalogs because they are managing mappings from one ontology to another, and constitute a pretty crucial step towards a universal data conversion engine.
Excelra has extensive domain capabilities around biology, chemistry, clinical and commercial space in developing standard ontologies for linking enterprise research data. The Enterprise Data Lake strategy always has a challenge due to variants of data catalog that usually come into play with organizations that are dealing with differing but conceptually overlapping ontologies. This is typically a problem for a given data catalog type environment. However, because of acquisitions, the enterprise still ends up with multiple ontologies that overlap and need to be translated. In this case, there is usually the goal of creating a single ontology and although the source ontologies are still in use, an intermediate stage is needed to manage the translation until they can be phased out.
Excelra’s solution strategy takes into account the importance of maintaining this intermediate layer with the required level of semi-automation along with a dynamic UI framework for non-technical end users to manage this with ease. While it is possible to integrate everything into the database, by using a semantic data catalog approach, the best would be to bring in intermediate information, transforming it, and caching as appropriate for subsequent queries. This provides a mechanism for importing triples from source files at the time of querying, which can then be put into an intermediate graph, queried, and cached. Once the triples become stale, the graph is deleted.
The catalog entries enable us to effectively pick and choose the information to work with, while allowing the system to retrieve data from the appropriate systems without having the end user to worry about the source system. The data lineage is always established through the required audit logs without making it complex to the end users.
Transformation of data is not always reversible and hence becomes a challenge in complying with FAIR principles in an automated process. If, for instance, a transformation creates an attribute with different values based upon the state of two or more variables, disentangling that logic (which is not purely functional) can be extraordinarily complex if not outrightly impossible (for instance, calculating the average from a set of values and passing that average as the value of an attribute). However, knowing the transformation we can recalculate, there should be a change in the attribute in the source, as we have the transformation and the associated target property.
This is also critical for working with both content and digital asset management systems. The assets themselves are generally not stored within the same database as the catalog. Instead, they surface enough metadata, performing entity extraction of metadata and storing this annotational information within the Semantic Data Catalog. This also helps in resolving master data, as this makes it possible to identify both the resource identifiers and the associated relationships.
Excelra understands the need to have phase-wise data processing steps and data logs with the required flexibility to change in meta data, in the interim processing, and hence the UI functionality has been built to consider all these requirements.
Enterprise IoT Integration
Enterprise IoT is about networks and the relationships between resources, not just in terms of simple properties but in terms of such factors as security, actions, discovery and related areas. Increasingly IoT systems are making use of semantic graphs to keep the complex web of interconnectedness manageable and easy to traverse and query. It makes it cost-effective, FAIR compliant and presents an effective data unification layer along with structured representation through Knowledge Graphs.
Bringing semantic technologies into the process of metadata management ensures that data is smarter for content as well as knowledge discovery and transfer. This data allows systems to automatically assign topics and categories to resources and further infer context from that information.
Excelra’s key strength lies in understanding semantic metadata and leveraging it to create and consume more interconnected, richer, well-structured and retrievable resources that can have a direct impact on an organization’s profits and performance. We have built our data understanding, expertise and design, having worked with the Bio-pharma industry over a period of 18 years.
Semantic Data Catalogs are not widely regarded due to the associated complexity. Excelra’s solutioning offers an intelligent and Dynamic UI based interface, which acts as a solution for non-technical users as well, allowing the organization to better adapt and retain to a solution. An effective and efficient Semantic Data Catalog is a key component of Excelra’s Enterprise Data Strategy, making meta data the most valuable Enterprise Asset.
The effectiveness of semantic technology usually comes down to maintaining data discipline and governance. This is why effective metadata management is less about tools than it is about the process.
- Posted In:
- Life Sciences Tech