Chapter 5: Generic vs specific
Question: How generic/specific should properties/classes be?
Category: Generic vs specific
Problem owner: Robin, Linda and Roos
Date: 16 Jun 2020
Problem
Modelling is a continuous trade-off between what information should be included in the model and what can be abstracted away. Which concepts should become a class, or a property, and which concepts do not fit in the model? The modeler needs to find a balance: generalising too much results in concepts that are too abstract to provide differentiation where necessary, whereas too much specialisation results in details that are hardly used, if ever, and only overcomplicates things. For example, if both a company and a person have an email address, should the model provide both with their own email address
concept, or does it make sense to abstract this into a shared concept email address
or even a more abstract address
? What characteristics of the domain context lead to indications regarding the proper answer for that particular situation?
Example use case: modelling a Covid-19 test passport
For the Covid-19 test passport, the driving scenario is that its owner can prove being tested negative in order to be allowed to enter, e.g., a country. The main issue for the model is about the information that is required to support the criteria based upon which access to the geographic area can be established. To that end, the passport needs multiple categories of information: 1. The information about the test and the test result, 2. information about the tested person, and 3. information about the organisation that vouches for this result (this can be a laboratory, a hospital, a doctor, etc.).
In this use case you would need personal information about a person (2), and information about an organisation (3). Assume that the most important information to identify a person is their name. Coincidentally, this applies for the organisation as well. Now for the modelling, you can choose to model one datatype property: hasName
. As you are learning more about your use case, you notice that the name information of the person requires more specifics: it needs to be identical to the name that is in your passport, split in a given namen and a family name. Organisations do not carry given and family names, invalidating the initial design choice to share a single data property. For purpose of identification of the person, it is a necessary criterion to differentiate between hasGivenName
and hasFamilyName
properties, whereas it is a sufficient condition for the identification of the organisation to only represent hasName
property.
One now can argue whether thehasGivenName
andhasFamilyName
properties can be designed as subproperty of thehasName
property or let them remain datatype properties in their own right. The answer is left as an exercise to the reader.
Our suggestion would be to look closely at the things in the domain that are referred to, and consider their equivalences as well as their differences. Are the former sufficiently strong to support the subproperty variant, or are the latter so significant that different concepts should justify their reflection of that significance?
Discussed options
When competency questions (CQs) are defined, the level of detail for the concepts are implied by the CQs. If all competency questions can be answered, then the ontology contains sufficient and correct information. You can then assume that the domain has been modeled completely and at the right level of abstraction. For example, it cannot be the case that individuals eventually add extra semantics, e.g., differentiate from their values whether they are large
or small
. Their semantics must be apparent from the modelled concepts.
If you don't have a clear motivation to make things more specific, then don't. For example, if a company and a person both have an email address and there is no extra need to distinguish between them, then sharing a property neiher produce conflicting data nor complicates the application to cope with their difference. Still, in many cases we experience that this distinction is difficult to substantiate. For classes, a rule of thumb for adding further distinctions seems to lay in the fact whether or not incoming or outgoing properties exist for it. On the other hand, the danger of overgeneralising is lurking. For example, at first sight sharing the samehasName
property for both the name of a chemical atom and the name of a person seems acceptable; both are names and both can be represented as a string. However, for a particular application their semantics can be considered so different from eachother that it is important to capture the difference explicitly, despite the costs of additional application complexity to cope with the difference. A useful question to clarify this, is how the application is going to use the distinction between the names: is it only about representing the name on the screen, or is it about drawing the conclusion that the name represents a chemical?
The modelling choices that are made in the initial phase of (drawing up) the conceptual ontology determine the further course of action. Therefore, choices made in modelling must take into account future expansion and application of the model. In addition, it is important to be consistent and to follow through on the choices made. This emphasizes that the design principles that are made at the start are important, because modifying and reversing an earlier choice increases costs, uses resources and increases the risk of modelling errors.
Outcome
In short, there is no single clear approach to generic or specific modelling. The use case determines whether the modelling approach should be generic or more specific.
When in doubt about making some concepts more generic or specific, do not specialise. Leave it up to future discussions whether additional distinctions are required or not. This rule of thumb follows the agile principle to "decide at late as possible", and carries the same argumentation (see, e.g., here). The foundation for this principle is in acknowledging that each and every decision reduces your solution space and, hence, your future flexibility and range of engineering choices.
For the FIT project, due to its technical characteristics it was necessary for every property to have one domain and one range. This was a clear design constraint for the ontology and eventually lead to more specific classes and properties. Overarching superclasses and superproperties were used as necessary design pattern to consolidate this constraint. And although unrelated to the generalisation/specialisation discussion, the domain and range constraint seriously hampered the reusability capabilty of the resulting ontology.
Last updated