Why using ontologies
Being a relatively new technology, how does application of ontology bring business value that is not covered by current solutions, e.g., schemata and information models, and what are its consequences?
Last updated
Being a relatively new technology, how does application of ontology bring business value that is not covered by current solutions, e.g., schemata and information models, and what are its consequences?
Last updated
This section is the result of a discussion with the REPRO group members on the question why we are applying ontologies in the first place as opposed to alternatives, and why we think that investing in this technology will pay off in the end. The discussion results have been further analysed but its final elaboration into this section does not imply the end of the discussion.
The architectural principle of Separation of Concerns is a design principle for separating a computer program into distinct modules such that each addresses a specific set of concerns from the perspective of an identified stakeholder. It is a form of modularity achieved by encapsulating information inside a section of code that has a well-defined interface. From this, a well designed piece of software emerges as a coherent set of separate modules, each of which can live its own life-cycle relatively independent from the other modules. Re-use of modules improves the quality of the overall application landscape, enables independent maintenance and evolution, which lowers the total life-cyce costs of each module.
Ontologies address a specific concern from the perspective of the end-users. The main concern addressed by the ontology is the representation in terms of the system’s concepts of the real world meaning according to the perspective of its end-users. In that way, it adds to the business value, as follows:
Business-IT Alignment: ontologies bridge the usual gap between reality and the software system;
Localisation: Being the single tangible artefact about semantics, ontologies centralise semantics. This allows for much an improved maintenance and evolution on semantics, as well as the quality of the overall software system due to the reduction of complexity and the increase of coherence and consistency in applying semantics;
Technology independence: Ontologies disentangles the semantic specification of the data from the message syntax that is enforced by communication technology. Consequently, semantics emerge as a separate component that can be implemented independently from various information and communication technology. Not only does this allow for reuse of the same semantic component in our enterprise application landscape, it can even be reused by applications outside our enterprise in the context of our business network;
Stability: Ontologies are focussed on describing the domain of application, as opposed to prescribing the system design. Since the domain of application is closely tight to the core of the business, ontologies directly concern the business. and hence, remain rather stable;
Simple evolution: having modularised semantics into a separate component, we can introduce governance on it, allowing its extension and evolution independently from other software components;
Model Driven Development: Based on SoC, an ontology provides the specification of the domain knowledge in a model driven environment for knowledge based systems.
Ontologies describe that part of reality that is relevant to talk about, to know about and to be informed about. In that sense, ontologies cover a domain independently from the involved applications. Applications are developed to show a certain behaviour and to facilitate the user in achieving her objectives in the domain of application. This represents a natural separation of concerns: descriptive capabilities are covered by ontologies while functional capablities are implemented by software. Where ontologies observe and describe the relevant state of affairs in the domain of application, applications act upon those state of affairs to achieve a certain purpose. That makes that ontologies and applications are complementary to each other, where the former are used by the latter, and the latter influence (by acting upon the domain of application) the former. This implies that one ontology (or parts thereof) can be reused to inform many applications about the state of affairs in the domain.
Your domain might have been modeled before with a different viewpoint or purpose. Look for existing ontologies, analyze the level of granularity, select them and evaluate them for the possibility for further full or partial reuse.
The scope of ontologies is therefore in clear contrast with that of databases: the table of the database is of relevance only within the context of the application, whereas an ontology specifies the context of use for the application. In that sense, ontologies do not replace databases, ontologies serve a different purpose altogether, viz. representing the state of affairs in reality. And although databases have been and still are being used with that purpose in mind exactly, they are not very good at it. So ontologies will focus on the capability to describe reality, whereas the use of databases will be restricted to what they can do best: storing and retrieving data, with high performance and maintaining data consistency.
An ontological model is of descriptive nature as opposed to, e.g., UML models, that are of prescriptive nature. This has the following consequences:
Common understanding: ontologies assure a focus on explanation and understanding as opposed to construction and realisation. Ontologies offer a shared perspective on reality;
Domain focus: ontologies specifically describe a domain as opposed to prescribing data structures (for data exchange);
By analysing the ontological model, one analyses the abstraction of the domain;
Faithful to reality: agreement to use the shared vocabulary in a coherent and consistent manner by, e.g., agreeing on its ontological commitment through its foundational ontology;
Reuse: Ontologies specify domain semantics from a perspective that is much less liable to changes over time than technology and data models.
Explicit semantics: an ontology allows implicit assumptions and knowledge to be expressed explicitly;
Formal and computational: By their underlying logical language, i.e., OWL, ontologies are artefacts that are interpretable by machines, without extra effort. This enables inferencing;
Validation: Due to their support for inferencing and reasoning, ontologies allow for validating data (in accordance to the ontological model) for new knowledge or facts, or introduction of inconsistencies;
Heterogeneity: Variations in terminology and semantic scope and/or granularity is a natural consequence from independent sofware development. This hampers interoperability and demands for semantic standards. Consequently, heterogeinety has always been considered a bug, however, the introduction of explicit specifications that are formal and computational allows for data specifications that can be connected to, processed on demand for purposes of transcription, establishing consistency and va;idity. This allows to focus again on representing the domain of application in a way that is faithful to our perspective. The resulting ontological model declares this particular perspective, to be used for transcription, correction or other purposes deemed necessary for establishing interoperability. Now, heterogeneity can be considered a feature againa as opposed to a bug, allowing semantics to become as accurately as necessary. Key to mutual understanding is not only establishing what is agreed upon, but also on explicating our disagreements and particularly how we disagree specifically.
Due to their computational characteristic, ontologies allow for specification-based data processing machines, removing the need for specific and repeated case-based implementation of software. We show some example patterns of use:
Reasoning: Developing a reasoner once and being able to use it for a wide variety of logical models over and over again embodies the primary pattern that exists in AI already for decades (e.g., Prolog). The main distinction between these and the contemporary Semantic Web approach is its foundation on a family of standards: type of logic, serialisation of the logical model, constraint specification, provenance inclusion, data serialisation and more. As an as much as independent standardisation body as possible, W3C has been able to design and specify a body of standards that is encompassing as well as modularised, allowing for easy uptake in industrial/commercial use. Ontologies are one of the main artefacts in this body of standard;
Data pipelines: Application of data driven methods require that the data is cleansed and homegenous before applying AI. By inserting an ontology between data sources and AI-algorithms allows for a universal, specification-based approach to data homogenisation and cleansing as opposed to case-by-case implemention of software. This pattern of use can be repeated easily for different data sources and AI applications, demanding a once-only implementation of the underlying generic machinery that adapts to the specifics of the data source and the AI-process on the bases of the specified ontologies;
Semantic mediation: Ontologies provide for the potential to reconcile the semantic differences between two ontologies, and hence, the particular native perspectives of the related applications on the shared domain of application. Based on an explicit specification of the reconciliation, semantic interoperability emerges.
Federative data services: One of the fundamental standards of W3C is the Resource Description Framework (RDF). It defines as atomic data structure the subject
- property
- object
data triple, and combines that with the unique naming convention for its individual data elements, i.e., introducing namespaces and URI's to identify your particular subject, property or object with their own specification. The triple with the unique naming convention is a major differentiator to other data formats in the sense that no implicit data context is assumed. Moreover, by demanding URI dereferentiation for each data element, meaning that the URI should be resolvable as URL, there is even no implicit assumption about a the existence of a local container for the data itself. Since ontological data are expressed in RDF, there is no need for data centralisation or duplication anymore. Application of federated data services is a logical consequence that follows from its native support by the RDF data structure, whereas central storage and/or data exchange are a logical consequence of all other data formats. Keeping data at their source together with their own semantics specification will become the rule as opposed to the exception.
Knowledge derivation: Specifying data by ontologies enables the combination of data from different sources (see Data pipelines above). Adding reasoning to the mix allows us to specify new rules that produce new facts from existing facts. As opposed to previous approaches we now do not need to implement specifc knowledge engines for specific use cases, but can integrate generic components and, by specification, derive new knowledge.
Why should we not apply ontologies, or at least defer its introduction in our own enterprise ICT landscape for three, maybe five years? Its progress is slow, its contents is complicated and it promises a silver bullit that requires yet another layer on top of the abstraction layer. We elaborate on each the three.
Firstly, although the Semantic Web and ontologies already exist for almost two decades, it still is consdered a new IT-paradigm. One can argue that this paradigm is a failure, considering that things in ICT always move very rapidly, whereas this paradigms history already outlives many ICT innovations that can now be considered legacy technology. There certainly is a risk that this paradigm won't make it, particularly considering its slow progression and the presence of fringes in its supporting technology and standards, and components that are still missing altogether. When can we expect to see its breakthrough?
Secondly, this paradigm represents a rather complicated way of looking at software, ICT architecture, applications and their interrelations. This stuff is more complicated than the stuff regular ICT is build from. Will it be possible to leverage the complexity to a level that is comprehensible for the mere mortals amongst us, pushed down into the infrastructure with a few but powerful semantic services that are widely used? Or wil it remain a highly specialised branch of ICT, applied only in a niche of those that absolutely require it?
At the beginning of this page, The Good, this paradigm promises business value. This is what ICT always has been promising over the last half a century. Indeed, ICT now enables much more functionality, shows beautiful societal improvements and has invaded and changed our lives to levels we could not have foreseen. A the same time, the costs of ICT never reduces. How would this now be different with the Semantic Web and ontologies? What proof exist in support of the claims about future improvements? Is this the only way forward or do alternatives exist that might be less royal but effective still?
As a consequence of The Bad above, we currently see a lot of immature tooling, products that only realise half of what one expects, and things that break down before it can be deployed in operations. Despite being a natural consequence of new technology, this seems to endure longer than with other new technology, and why is that? Part of the answer should be sought in the pace of the scientific progress; many aspects about the Semantic Web, ontologies and reasoning are still under academic investigation, demanding the software tooling industry to frequently adapt their products, or to wait long before a stable product can be developed. This is worsened by the fact that the W3C standardisation process on these matters are equally dependent on the academic progress, and can only begin to produce results, viz. standards, when science produce a comprehensive and stable outcome. Since tools are expected to interoperate with eachother, the are dependent on the standards to stabilise as well.
Another consequence is caused by the complexity of the matter. This requires to experience a steep learning curve before one can start to become productive with this new paradigm. At the same time, the learning curve is extended by the addition of new, equally complicated aspects that result from additional academic research.
Despite their ugliness, these two consequences can be considered to pass, eventually. In fact, the production of this Handbook is yet another sign that the new paradigm stabilises to a level that can be considered "good enough" for its practical application in new architectures, methods and related artefacts, here the engineering of an ontology.