3.1: Competency Questions
Introduction
Competency Questions are for ontology engineering what requirements are for software engineering. These "ontology requirements" collectively describe what we demand from the ontology, but since this demand is not about functionality but about knowledge, the term “competency” is used instead. Like software engineering, competency questions, or CQ for short, also provide the baseline for the evaluation of the final product, the ontology. Applying CQ's in your ontology engineering process thus structures the initial modelling phase and provides a clear benchmark afterwards.
If we take the ontology engineering viewpoint, the domain analyst / ontology engineer is often confronted with the following problems:
The added business value is unclear.
The domain expert does not grasp the ontology.
The domain expert cannot evaluate the ontology.
When are we ready with modelling; with the current model or is yet more a detailed version required.
The end user doesn't know SPARQL.
How to evaluate a conceptual model.
In all these cases Competency Questions contribute to the solutions of the above issues. The following properties of CQs may help the corresponding problems:
CQ's are formulated in natural language:
This helps to make the business value explicit in your CQ's, you can address it (issue #1 above);
This makes that they are understandable for the domain expert. By their nature they also provide an interface to the ontology, preventing to confront the domain expert with the "ugly logical details" of the ontology (issue #2 above);
The domain expert actively collaborates in formulating the CQs:
the answers to CQ's are used as baseline for ontology evaluation afterwards (issue #3 above).
CQ's represent a sense of completeness:
Each CQ results in a modelette: a small model that is to satisfy the CQ. Whenever the modelette satisfies the CQ, you are done engineering this CQ (issue #4 above);
Additional modelling is not necessary when all CQs are satisfied and their modelettes are integrated into the conceptual model (issue #4 above);
CQs can be easily translated into SPARQL, providing their modelettes are done:
The only thing that the end user needs to evaluate, are the CQ's. When these can provide the information that they require, their evaluation objective has been achieved (issues #5 & #6 above);
Introducing (test) data provides an opportunity for automated evaluation of the correctness and completeness of the conceptual model during the evaluation phase (issue #6 above).
The following three sections give more insight into the why, what, and how of Competency Questions.
Section 3.2.1: Why do we need Competency Questions?
Functional requirements
Competency questions enable the ontology engineering process to start with the functional requirements. By their natural Question-Response formulation they act as input-output mechanism only without the need to consider any internal design, representation, or other engineering constructs. This characteristic makes that CQ's provide for a stable set of demands throughout the ontology engineering process. Consequently, during their individual implementation & collective integration into the conceptual model allows for iterative improvements, ad infinitum, functionally as well as extra-functionally, e.g., performance, while maintaining a consistent measure of functional correctness.
Type of validation
Competency Questions are a black box validation method of the ontology. It observes the ontology from an input/output perspective and answers whether the ontology correctly responds to the questions that we have about the subject domain of interest (DoI). The CQs can be fulfilled for both an ontology that strictly follows the philosophical considerations, viz. a foundational ontology, and for an ontology that does not, viz. a domain, task or application ontology.
A set of Competency Questions constructs a scenario of tests that has to be fulfilled in order to consider the ontology as verified. Other methods merely produce guidelines for ontology construction, which can either produce engineering (quality) criteria or an iterative method aimed at improving the ontology without an end condition. The competency questions provide exactly that end condition, and above all, do so in terms of its purpose, i.e., providing an accurate description of a part of reality.
Promise of semi-automatic evaluation
Competency questions also have the promise of being evaluated semi-automatically. A paper [Ren14] describes automatically checking the satisfiability and unsatisfiability of CQs, and simultaneously checking the existence of concepts and relations implicitly asserted through linguistic presupposition by asking the CQ. This, however, needs an additional mapping from concepts mentioned in the CQs to concepts defined in the ontology.
Reducing complexity
Applying CQ's enforces a divide-and-conquer approach. Where the complexity of the total domain makes it unclear where to start engineering, by bringing the complexity down to many atomic CQ's simplifies the engineering process to the implementation of a single CQ into a single modelette or the integration of modelette into the conceptual model. Moreover, due to the three different levels of abstraction that the CQ's can be grouped into (atomic, medium, business), the higher abstraction CQ's cannot be resolved without taking lower abstraction CQ's into account. This, too, provide guidance on the engineering process.
Structured & iterative engineering process
Each CQ represents a particular functional requirement as knowledge demand, to be implemented as a distinct modelette. The total engineering work is defined by the backlog of CQ's, and each CQ represents one single engineering task. This clearly marks completeness as well: completeness of a single modelette, and completeness of their collective integration into the conceptual model.
Representation follows domain vocabulary
CQ's are formulated in terms that stem directly from the domain. Consequently, the model that responds to a particular CQ applies terminology that is easily recognisable for the domain experts.
Semantic conflicts emerge naturally
Vocabulary will not limit its application in one single CQ but will be re-applied in other CQs. Hence, concepts that are used in a modelette that responds to a particular CQ will be reused in other modelettes. Where a semantic conflict lives implicitly between their (different) use over distinct CQ's, these conflicts will emerge when modelettes are integrated. In these cases the difference in their meaning emerge since the same concept will be used or applied differently between the contexts of both modelettes.
Versioning
Clearly, the granularity of the conceptual model is directed by the CQ's. Therefore, the level of evolution can latch onto that very same granularity. CQ's, then, are the natural choice for the level of versioning that applies in ontology engineering, and the scope of revisions will primarily address changes that apply to a CQ.
Section 3.2.2: What are Competency Questions?
In this section we will first give an example of some CQs in the pizzaria domain. Then we will briefly describe which type of questions do not count as CQs. We conclude with a summary of the categorisation developed in the literature.
CQs define the input-output behaviour of an ontology as a black box. In theory it should be sufficient to have a graphical interface with for each CQ a button, such that the operator only has to press a button to extract the knowledge associated with that specific competency question. The operator then gets a digestible chunk of knowledge extracted from the huge set of triples, represented as a table: the columns identify the particular elements that are discerned by the question, and each row provides the data that comply to the subject constraints.
With the danger of stressing too much we repeat that a CQ represents a functional requirement. Let us give some examples from the pizza domain. Let's say we have expanded the example pizza ontology of toppings and crust, with a pizza vendor concept that contains contact informations such as addresses and a list of all pizzas they sell. The suggested business case is to automatically extract special pizza's per city, in the sense that there may be some toppings or pizza's that are only sold by one vendor in the entire city, province, or even country. So, the ontology can construct some measure of uniqueness.
A high-level competency describing the functional requirements of the ontology would be:
Which vendors in location X have a pizza P that has a topping T unique for that location?
Which pizza is sold by only one vendor in the entire country? If any?
Examples of competency questions: In the extended pizza ontology (with vendors) CQs can be categorised as follows:
Low or atomic-level:
What is the price of this pizza?
What are the toppings of this pizza?
What is the spiciness of this pizza?
In which city is this vendor located?
How much calories does this pizza give?
General feature: these CQs will in the eventual ontology probably model a property/object relation, and not much more. The low-level competency questions ask questions about the data. Ontologically this level described property/object relations. Use-case: it describes the data
General feature: these CQs will in the eventual ontology probably model a property/object relation, and not much more. The low-level competency questions ask questions about the data. Ontologically this level described property/object relations. Use-case: it describes the data
Middle or connectivity-level:
What is the most expensive pizza of a particular vendor?
Which topping is on only one pizza?
In which cities is pizza X being sold?
In which cities is a pizza sold that has only two toppings?
General feature: these CQs describe relations between several concepts along several property/object relations. Ontologically: it describes relations among at least three concepts. Use-case: it combines information into meaningful wholes.
General feature: these CQs describe relations between several concepts along several property/object relations. Ontologically: it describes relations among at least three concepts. Use-case: it combines information into meaningful wholes.
High or business-level:
Which vendor in city Y has the cheapest Margharita pizza?
Which vendor in city Y has a unique topping?
Which city has a vendor that sells a pizza with toppings X, Y and Z?
What is the least used topping for a vendor V?
General feature: the high-level CQs describe the questions that really show the added value to the ontology, in this case added value is financial, but it can also be otherwise. Ontologically: these are not fundamentally different from middle-level CQs. It is only the case that the semantics of these questions are such that the answer is useful to the end result. Use-case: these questions are exactly what has to be answered to give added value. These questions exactly describe why an ontology is a useful tool in this use-case.
General feature: the high-level CQs describe the questions that really show the added value to the ontology, in this case added value is financial, but it can also be otherwise. Ontologically: these are not fundamentally different from middle-level CQs. It is only the case that the semantics of these questions are such that the answer is useful to the end result. Use-case: these questions are exactly what has to be answered to give added value. These questions exactly describe why an ontology is a useful tool in this use-case.
CQ's - What are they not?
Sometimes questions such as "should I model concept X as a datatype property or an object property?" are called CQs, but this is plain wrong. Also, most why-questions are not CQs, because an ontology usually does not contain a notion of causality, making it hard to answer why-questions. Also, why-questions are not fit for SPARQL, so do not have a place in ontology engineering in any way shape or form, which is not caused by the possible CQ-nature. The same goes for who-questions.
Categorisation of CQs
A relatively recent paper [Ren14] categorised CQs collected from a number of web sources. The CQs are classified on type of question, polarity, predicate arity, relation type, etc. They conclude with twelve CQ-patterns, of which the most simple can be further subdivided into eight patterns. All CQ patterns are based on the class expressions, relations or individuals occurring in the CQ. For example, Which X y's Z? with X and Y concepts and OPE a property; or What Y has the y Z? with X a concept, y an ordering (lowest) and Z a datatype property.
We should see these CQ-patterns not as a strict prescription of all allowed CQs. We should construct CQs that are not an instance of any of these, if the business case asks for it. In this sense, we should see Ren's CQ-patterns as atomic CQs that we can combine to form the CQs that describe our use case. Our example CQ looks an instance of CQ11, but may be even a little more complicated. So, these patterns can be used to automatically verify the ontology on a low level, but high-level CQs are needed to verify the ontology on the application level.
Finally, we would like to give some directions towards a new type of CQ that does not merely return a datatype, but returns another set of ontology individuals. These new individuals are constructed based on knowledge extracted from the ontology. This type of CQ tries to make full use of SPARQL-CONSTRUCT queries rather than just SELECT or ASK queries.
Example in Knowledge Base Explanation Ontology
A typical example was used in the Knowledge Base Explanation-ontology. Explainable Plasido creates a tree-like ontology instance that contains a lot of information on the derivation process, including several derivation sequences that are the by-product of Jena classes and Plasido's builtin rules. These derivation sequences are added to the ontology partly for completeness' sake and partly because it is easier to implement. Several competency questions need to inspect the explanation-tree and can disregard quite some knowledge residing in the explanation tree, but the overall tree structure needs to be retained. Therefore, the competency 'instruction' constructs a new tree containing the same overall hierarchy as the original but removing several intermediate nodes. It then returns a new graph pattern containing the tree rather than just a set of datatype instances.
This type of CQ may more accurately be termed a Competency Instruction, Competency Construction (Question) or some other name. It may also be used to build a general SPARQL-CONSTRUCT-query to be used in a business-case where the ontology is not merely used for extraction, but where also users need to insert knowledge into the knowledge base. This procedure hides ontology fundamentals such as part-whole relations from the user, and allows a user to enter just the variables to be inserted into the SPARQL-CONSTRUCT-query.
Section 3.2.3: How to produce domain relevant Competency Questions?
The CQs should be approached from a business perspective. Each CQ should correspond to a (sub)-questions that a user would be likely to query the graph for. This approach can be taken top-down and bottom-up, where top-down would start with queries covering a large number of edges at once, whereas the bottom-up approach would first cover queries selecting only one edge.
The optimal approach is to start from both directions and to work towards the middle. Simultaneously, we can add low-level queries that high-level queries consist of, and high-level queries we introduce in an evaluation iteration. The high-level approach gives the CQs that should be answerable in the completed product. The low-level CQs ensure that the ontology is correct on a detailed level. Let us again use the extended pizza ontology (with vendors) as an example. The high-level competency question still is:
Which vendors in location X have a pizza P that has a topping unique for that location?
Being able to answer this CQ assumes we already established a link between vendor and location, between pizza and toppings, between pizza and vendor, etc. Some low-level CQs would therefore be:
Which pizzas P contain topping T?
Which vendors V reside in location L?
Which pizzas P are sold by Vendor V?
All three CQs describe the same pattern. They query the KB on whether it contains a concept that is linked through some object property to another concept. To verify this CQ three things should be the case: 1) both concepts and the property should exist, 2) the property should be satisfiable, and 3) the property should be unsatisfiable, such that we know the CQ is not trivally true.
If we would add a price or calories to a pizza, we could construct another low-level CQ to extract that value:
How much does pizza P cost?
How much calories does pizza P have?
In both cases the CQ checks the existence of concept P and the existence of a datatype property cost/calories.
This last price-CQ can be useful if we would have extended the high-level CQ with also the price of that particular pizza. Or maybe we could extend the query to ask the cheapest pizza that has a topic unique for a TYPE OF location (e.g. city).
The low-level CQs are also important for the test-driven development. The automatic testing [Ren2014] describes a method that can give an automatic measure that checks whether low-level CQS are still satisfiable during development of more complex high-level CQs. Whenever a change on the metamodel changes a rule to be unsatisfiable or trivially satisfied, the user is immediately notified.
Competency Questions in practice
When working with clients and domain experts, it can be hard to think up competency questions from scratch. As explained above, first the domain and the use of the ontology should be clear, and then competency questions should be formed. But how do we do this? In this section we will describe a process to help you set up competency questions. In addition, this process also makes sure that you have valuable information to start building the ontology.
We can split the process of coming up with competency questions and gathering information in three steps: (1) Divergence, (2) Convergence, and (3) Forming questions. First we have to find out: what is our domain about? What concepts come up when you think of this domain? These questions are explored in step 1 by gathering as much information as we can. You want the experts of the domain to overload you with information about their domain. This can be done in several ways, for instance: letting the experts write down everything they come up with on a white board, making a mind map, writing a story about their domain, etc. In step 2, we limit this bulk of information again. In this step we want to find out what the most important information is about the domain. What is the core? What are terms and notions that are essential? Those questions can be answered in several ways. You can let the experts rank the concepts that came up in step 1, and only keep the highest ranking, or you can give them the assignment to give a short elevator pitch about the domain. In this step of the process, you can also begin with sorting the concepts. An easy first step is to categorize into verbs and nouns or noun phrases. This way you have activities (verbs), and entities (nouns and noun phrases).
In step 3, you want to gather your converged set of concepts from step 2, and let those inspire questions. You can use the patterns described above to fill in important concepts in questions, but you can also start free format: what are the questions that the domain experts have about their domain? Low level questions can also lead to high level questions, asking why the domain expert wants to know something is a way of getting to a higher level question.
When you completed these steps and have ended up with multiple competency questions of different levels, you are ready almost ready to move forwards in the conceptual phase. An important last step is to check whether these questions are satisfactory to the domain experts. Are they satisfied if all these questions could be answered through the ontology of do they want more out of it? If the latter is the case, go back to step 3 or steps before, depending on how much more they want out of the ontology. If they are satisfied you can move on to the other steps of the conceptual phase. Keep in mind that you might have to revisit the competency questions over time. As the ontology develops, new insights are gained that can lead to new of different competency questions.
Section 3.2.4 Competency questions during engineering
In the previous section we described how to arrive at a set of competency questions. In this section we will provide an example with the ontology Cornelis developed for the V1713 "Achteraf is mooi wonen' project. We show how CQs can also have a function during OE, rather than just for stating requirements. The example shows the whole cycle of OE.
Example from project V1713 AIMW
Iterative starting process:
Describe intended use
The use was concerning surveillance on people suspected of setting 5G masts on fire.
State additional requirements
There had to be enough data to some kind of recommendation on additional information that may be useful given a fire or given a suspect.
Modules. We graphically linked the modules, shown below.
News media item
Tweet / social media item
People
Infrastructure (e.g., 5g mast)
Process (i.e., possible reactions to 5g mast fire)
Geography (locations)
Cars
Competency questions
For each module we defined competency questions, all divided into the three CQ levels.
Intermezzo: various stages of the infographic
Containing just the modules
Add links between modules that are related
Add relevant concepts per module
Add relations between the relevant concepts per module
Describe the relations briefly in natural language
Tag the modules with suggested reuse candidate
Tag the concepts with concepts from the reuse candidates
Tag relations with relations from the reuse candidates
Example continued: Role of competency questions
We arrived at the a set of CQs ordered per module. Take the tweet module for example:
Atomic/low level:
Wat is de tekst van dit bericht?
Welke user heeft dit bericht geplaatst?
Wanneer is dit bericht geplaatst?
Vanaf welke locatie is dit bericht geplaatst?
Op welk artikel of bericht is dit een reactie?
Connectivity/middle level:
Is de reactie positief of negatief?
Vanuit welke provincie is dit bericht geplaatst?
Wat zijn de keywords van het artikel waar dit bericht een reactie op is?
Business/high level:
Welke berichten van deze gebruiker zijn geplaatst binnen 1km van een zendmast?
Hoe vaak per tijdsinterval reageert deze gebruiker op een artikel met een bepaald keyword?
Then, we considered reuse for each ontology module. For several we found interesting candidates to reuse. For example, sioc for tweet module, foaf for people, organisation for organisation, ma-ont for drone images, and possible geonames for the location.
The next step is consider how well each reuse candidate maps to the modules and, importantly, also to the competency questions. Let's take again the CQs for the tweet module. We annotated each CQ with a relation from the reuse candidate to check how it fits. You can also annotate each CQ with a newly created property that extends a property of the
Functionele eisen social media-model (Twitter & Facebook) v0.1:
Atomic/low level:
Welke twitterberichten zijn er? (rdf:type aimw:Tweet)
Wat is de tekst van dit bericht? (sioc:content)
Welke user heeft dit bericht geplaatst? (sioc:has_creator)
Wanneer is dit bericht geplaatst? (sioc:createdAt)
Vanaf welke locatie is dit bericht geplaatst? (geo:lat & geo:long, of geo:lat_long)
Op welk artikel of bericht is dit een reactie/retweet? (sioc:reply_of > aimw:retweet aimw:Tweet)
Op welk internetplatform is dit bericht geplaatst? (sioc:has_space)
Naar welke andere gebruikers wordt dit bericht gestuurd? (aimw:mentionsUser sioc:UserAccount) [edit: sioc:addressed_to -> aimw:mentionsUser]
Welke hashtags worden in deze tweet gebruikt? (aimw:containsHashtag)
[Waar komt deze user vandaan? (aimw:roughLocation xsd:string)][edit: roughLocation in principe voor userAccount, maar nog ter discussie, zie ook hieronder]
Connectivity/middle level:
Is de reactie positief of negatief? (aimw:sentiment)
This also immediately shows which relations to model in the ontology. Each CQ becomes a single addition to the ontology.
This opens a lot of possibilities for both automatic testing & verification, and opportunities to improve the OE workflow. Each commit can be linked to a specific CQ or module, each module can be interpreted as a separate ontology that may be reused in an additional project. In the next chapter we will take a look at (semi-)automatic verification.
Literature
Falbo, R. de A. (2014). SABiO: Systematic Approach for Building Ontologies. In G. Guizzardi, O. Pastor, Y. Wand, S. De Cesare, F. Gailly, M. Lycett, & C. Partridge (Eds.), Proceedings of the 1st Joint Workshop ONTO.COM / ODISE on Ontologies in Conceptual Modeling and Information Systems Engineering, co-located with FOIS 2014 (p. 14). Rio de Janeiro, RJ, Brasil: CEUR
An extensive method for modelling ontologies that reserves a significant role for CQs.
Potoniec, Jedrzej et al., "Dataset of ontology competency questions to SPARQL-OWL queries translations", Data in brief (29) 2020,
Description of a Competency Question data set containing for several ontologies a set of Natural Language-SPARQL tuples. This paper describes the data set, the Wisniewski (2019) paper describes the analysis.
Ren, Y., Parvizi, A., Mellish, C., Pan, J. Z., Deemter, K. van, & Stevens, R. (2014). Towards competency question-driven ontology authoring. In V. Presutti, C. D’Amato, F. Gandon, M. D’Aquin, S. Staab, & A. Tordai (Eds.), European Semantic Web Conference - The Semantic Web: Trends and Challenges (ESWC 2014) (Vol. LNCS-8465, pp. 752–767). Anissaras, Crete, Greece: Springer Link. https://doi.org/10.1007/978-3-319-07443-6_50
An earlier paper on categorising CQs and with some ideas for automated evaluation based of ontologies with CQs.
Wiśniewski et al., "Analysis of Ontology Competency Questions and their formalizations in SPARQL-OWL", Web Semantics: Science, Services and Agents on the World Wide Web 59 (2019)
Companion paper to Potoniec (2020). This paper analyses the data set to find (linguistic) patterns in the kind of CQs to benefit CQ construction.
Last updated