Chapter 4: Codelists
Last updated
Last updated
Question: How to create codelists in an ontology?
Problem owner: Robin
Date: 14 Jul 2020
Many data models have some predefined lists of possible values for a certain concept, referred to as codelists. Sometimes these codelists have predefined codes to pursue uniform use in a sector, or just to make explicit what the possible codes are regarding a certain situation. These codelists can be defined nationally or even internationally (think of codelists about currencies and countries), but can also be determined at sector level. But how can we best model these codelists in an ontology?
Several options on modelling codelists in ontologies were discussed. The options were applied to an example from the FIT project on modelling kinds of staffing positions, such as a secondment and a temporary staffing position. The options 1, 2, 3a and 3b are shown in the picture and will be described below.
Option 1 is about defining the types of staffing positions as subclasses of the StaffingPosition
class. We agreed in the discussion that it is only useful to make such subclasses when specialisation applies, i.e., it is necessary to create some extra properties within the subclasses. If this is not the case, then option 2 seems the better option. In the picture there are only three subclasses, but are these subclasses complete? What if new subclasses emerge? Are all subclasses disjoint, or are subsubclasses necessary to realise the desired distinction? What about the number of subclasses; will these not run out of control with the consequence that this solution in the end won't produce the desired effect? In conclusion, subclassing is a mechanism on its own with its own proper use, and cannot be considered a proper implementation for codelists.
Using class individuals, corresponding with option 2, was chosen as the best option. In the example this means that a separate StaffingPositionType
class is created. It contains a 'flat' codelist of different codes as predefined individuals, which are linked to the StaffingPosition
class by use of a new property. Using this option adding new codes (as individuals) is easy and scalable, and clear restrictions can be made on the number of codes that may be used. It is possible to model option 2 as a separate SKOS ontology, although application of SKOS makes more sense with more complicated codelists than the one exemplified, where codes are hierarchical or otherwise comparable through, e.g., notations such as broader
and narrower
. Keep it as simple as possible, and do not prepare for more complicated future scenarios that might never emerge. That being said, the use of SKOS concepts and concept schemes has added value, let alone for its broad application and clear approach.
The last possible option discussed was option 3. This is about restricting the values of the range of a property. This option can be split into:
Option 3a: restricting the possible values of the range by specifying a value type with a limited value range, e.g., enumeration;
Option 3b: restricting the property using an (anonymous) class construction for each allowed type.
Regardless of which option is chosen, forward compatibility is something to think about when making a choice on modelling codelists. The choice made will affect future use. Therefore it is important to ask questions like: what will happen when more knowledge should be added? Does the chosen modelling structure allow for additions to the codelist? And athough additional restrictions can be added, it does so with the consequence of more complexity of the code. Moreover, a codelist remains a list, and acknowledging that characteristic explicitly as in option 2 as opposed to a more implicit approach of value restrictions, represents a more principled approach.
The result of the discussion is that option 2, so using class individuals, turns out to be the best option in most cases. Option 1, using subclasses, might be useful when extra properties should be related to the subclasses.