How to become an ontology expert?

An overview of relevant literature on ontology engineering as required and optional readings.

Introduction & goals

This chapter is intended for people with a technical background who have had a first acquaintance with ontologies only. We guide the reader to the major introductory works on ontologies. Our aim is not to present a sufficient introduction in and of itself, we merely provide an approach based on first-hand experience to becoming an ontology expert.

We recommend to tackle two approaches simultaneously: 1) a theoretical approach and 2) a modelling approach. The theoretical approach will be more natural to people with a strong background in formal logic. The modelling approach will be more natural to computer science majors. We consider both approaches to complement eachother, and that both are indispensable for becoming an ontology expert. The theoretical background is then applied on use-cases such as the Semantic Web.

In this Chapter we look at ontologies from a high level by explaining the definition of ontology most studies agree upon. This is followed with an introduction on the relevant literature on both the theoretical approach and the modelling approach. The final Section presents some examples of applications where ontologies have been successfully applied.

Definition on ontology

In 1993, Gruber defined an ontology as an "explicit specification of a conceptualization", slightly adjusted by Borst in 1997 into a “formal specification of a shared conceptualization", combined by Studer in 1998 to read as "a formal, explicit specification of a shared conceptualization" . We can break this definition down into three parts: 1) it is shared, implying a consensus rather than an individual perspective, 2) it is a formal specification, turning it into a machine readable format, and 3) it is a conceptualisation, denoting a simplified representation of the world abstracted with a particular purpose in mind.

Elaborating on this definition of ontology, firstly, an ontology represents a conceptualisation, because it intends to describe a particular domain by identifying the classes, relations, individuals, and types of values the domain consists of. Take for example the pizza domain. We may define a pizzaas something that has both a base and a topping. The basenecessarily has a crust that may be filledwith cheese, hot dogs, or chocolate. We may describe a distastefulpizza as any pizza that has pineappleas one of its toppings.

Secondly, defining an ontology as formal specification implies that the end-product has to be in a computer-readable language. It is neither a graphical model, nor a rigid mathematical proof, nor a research paper, but a text in a formal syntax. With formal we refer to a modelling language with a logical footing, allowing to express logical constructs such as subsumption (a bike and a car are both vehicles) and disjointness (the coffee and the cup that it holds are two different things) and logical rules, a.k.a. constraints (if it rains I'll do not take the bike; I travel by car), and facilitating logical reasoning over the facts and rules that we store (it rains, hence I'll take the car).

Thirdly, the ontology has to be shared, such that each system can recognise a message that applies the concepts from the ontology format. If the ontology were not shared, it would be as if each system talks in a different language. The story of the Tower of Babel shows that multilingual systems do not work.

Required reading

Introduction to Logic

Engineering an ontology concerns describing the domain of application in a formal logic. Any logic suffices, since it is primarily used to unequivocally specify a model about reality as opposed to supporting reasoning. No matter your background in logic, we recommend the use of Logic, Language and Meaning, vol.1, as either a primer or reference work (or both) to the world of, well, logic, language and meaning; it is both verbose in its explanation, examples and exercises, but also very comprehensive in topics and carries an excellent index.

Any logic will be appropriate for use as modelling language, however, there is a balance to consider. Allowing more expressiveness in the ontology's modelling language provides the ability to discern more differences, resulting in a stronger logic. However, the stronger the logic, the higher the computational complexity because more consequences are required to be calculated before a conclusion can be reached. This complexity can easily result in the ontology to become undecidable. This is where Description Logic finds its place as a balance between computational complexity and language expressiveness.

The mainstream ontologies are therefore founded upon Description Logic (DL). This logic differentiates along the lines of concepts, relations, individuals, and restrictions, and its purpose of use is most clearly described in the 2017 An Introduction to Description Logics. The logic is called Description Logic, because each concept is uniquely specified through some formal description which is validated in a set theoretic reflection of what it constitutes in our conceptualisation of reality. For example, a pizza may be described as anything that has both a base AND topping.

Chapters 1 and 2 from An Introduction to Description Logics introduce the logic with some clear examples. Quite some proofs are given, since the writers are trained as logicians. It is not required for our purposes to fully understand each proof. Also, the penultimate section of chapter 2 already gets quite technical and goes beyond what we consider necessary. The first few sections of chapter 3 are also relevant, because these explain why extensions strictly increase the expressivity of DL. This is important to understand for optimisation purposes in a later stage.

The Protégé-project has developed an extensive manual that uses the pizza domain as an example in ontology engineering. Background knowledge on the Pizza-example may be found in the Handbook on Ontologies and in An Introduction to Description Logics. Note that we have extended the Pizza-example in this handbook where appropriate.

We should also consider principles of modelling in the conceptual phase, such as the minimal ontological commitment; and the optimal way to model recurring structures such as part-whole relations, unity versus plurality, composition and constitution, and properties and qualities. A large body of philosophical literature has already been devoted to this topic, but we only need to cover the conclusions for our modelling purposes.

Conceptual modeling is not only about logic. It is also about understanding what is said, and discovering the actual message behind it.

Required reading/exercise

  • Chapter 1 “Introduction”, chapter 2 ”A Basic Description Logic”, and 3 “A Little Bit of Model Theory” from Introduction to Description Logic, excluding 2.6 and 3.3 - 3.5

  • Selected exercises from chapters 2 and 3: dltextbook.org -> resources

  • Chapter 2-4 (inclusive) from Introduction to Ontology Engineering

Background knowledge to the Computational Phase

At some stage an operational ontology has been produced that responds to the competency requirements that are set in the preparatory phase. When the ontology only serves as a semantic standard, specifying the germane entities in the domain of application with all their constraints, the process ends here for the ontology engineer. However, when it should serve as a reflection and registration of the state of affairs in the domain of application, it will be used as one of the operational components of the application (or applications). In this case, beside responding to the competency requirements it should also meet the software (quality) requirements that are put on the applications that make use of it. This is not a trivial task, because the developer needs to ensure the resulting ontology has desirable computational properties such as fast reasoning and decidability and still respond to the competency demands at the one hand, and assuring that this will not invalidate any of the software (quality) requirements at the other hand. This implies that either the ontology engineer also gains knowledge about software architectures or collaborates closely with a software architect on this matter.

The implemented ontology is represented in the Web Ontology Language (OWL), which is the de facto only ontology standard. In syntactical terms it is an extension of the Resource Description Framework (RDF), while in logical terms it is a representation of Description Logic: expressive enough to achieve a practical semantic accuracy, but not so expressive that it becomes non-computational anymore. With that, OWL also is a W3C recommendation.

Required reading/exercise

Applications of ontology

The main application of OWL and the underlying DL is the Semantic Web. This idea was proposed by Tim Berners-Lee as an improvement to the World Wide Web that he himself already invented. Whereas the WWW holds enormous amounts of human-oriented HTML-encoded data that are structured only for the sake of human readability, the Semantic Web is system-oriented and stores ontological metadata such that each software service can extract information from another service through a so-called SPARQL-endpoint without the intervention of humans.

The difference between the DBPedia initiative and the regular Wikipedia illustrates this difference nicely. The DBPedia initiative constructed an ontology containing over 500 classes and nested sub-classes covering the range of Wikipedia articles. The ontology therefore contains diverse classes like ArtificialSatellite and DisneyCharacter. Each and every class has a set of properties defined for it. Another service can therefore at runtime construct a query on, for example, the birth dates of some famous person, because the ontology prescribes which property holds that piece of information.

When the technique is not applied on the Semantic Web but rather on the scale of a business use-case the same technique is usually called (Open) Linked Data. It is applied to share data both within as well as among companies in a certain domain of application, e.g., Smart Industry. The Smart Home industry uses the SAREF-ontology that defines messages between various smart objects.

Required reading

Optional reading

  • The Semantic Web Primer – Grigoris Antoniou, Paul Groth, Frank van Harmelen and Rinke Hoekstra

  • Nederlandse Parels van Linked Data Toepassingen 2019 – Platform Linked Data Nederland (link)

  • Tim Berners-Lee TED-talk (link)

When you are familiar with DL and OWL, the next goal is to learn how to query an ontology. SPARQL (SPARQL Protocol And RDF Query Language, indeed the first character of the abbreviation is the abbreviation itself! ) is the W3C-standard query language for graph-like structures. SPARQL is flexible enough to ask Boolean questions, extract variables, or construct a new graph (see: Allemang11)

Required reading

Summary

Summarising this Section, we consider it vital to develop an understanding of both the conceptual phase and the computational phase. The theory of DL concerns the former, whereas modelling in OWL can be considered to concern both. Concerning the computational phase, a working foundational knowledge on software architecture is necessary. Both the first few chapters of Introduction to Description Logic and the OWL pizza tutorial should be studied in parallel. The ideas can then be used in Semantic Web and Linked Data applications, the practices of which can be found in the Semantic Web for the Working Ontologist.

For the resources below either a link or a digital copy can be found at our DS-Wiki Knowledge page. You might also want to enforce your colleagues to hand over the physical copies of books that they "might have" in posession.

Handbook of Description Logic. The introductory and general sections of this handbook, which is a collection of chapters by separate authors, have been superseded by the Introduction to Description Logic textbook. Some more technical chapters may be interesting of the reader requires detailed knowledge on that exact topic. We do not recommend taking up this book unless there is a specific interest in one of its chapters.

Introduction to Description Logic. The recent textbook. We consider this a very strong technical introduction into DL. It also has a chapter dedicated to the link between DL and OWL.

Introduction to Ontology Engineering. A recent textbook on ontology engineering by C. Maria Keet, who is rightly famous in the discipline. The book combines material from Introduction to Description Logic with a more hands-on engineering approach, to show how the theory is useful in the Semantic Web practice.

Handbook of Semantic Web technologies. Dedicated to the computational part of knowledge representation, linked data, and ontologies. We particularly recommend the SPARQL chapter, because the other books lack an introduction to SPARQL. The introductory chapters to RDF and OWL are inferior to those in Handbook and Introduction.

OWL-pizza tutorial. Great tutorial which does not just introduce the reader to Protégé, but also introduces any operator that may be used for ontology construction.

Ontology engineering with Ontology Design Patterns. A very useful though relatively unknown book. It provides the reader with insights in ontology engineering methods using design patterns, i.e., semi-foundational ontologies that lie between the foundational ontologies, such as the Unified Foundational Ontology (UFO) or General Foundational Ontology (GFO), and the subject domain ontology.

Ontological foundations for structural conceptual models. Guizzardi. An extensive survey of foundational ontologies and the startpoint of UFO and OntoUML. May a bit too much to read in its entirety. A summary is included in this ontology handbook.

Last updated