Wikidata:WikiProject Ontology/Problems
Problem cases in existing knowledge representation by Wikidata
The purpose of this page is to identify specific classes, metaclasses, or other groups of classes within wikidata where there are what appear to be significant ontological problems, and work out how we should resolve them. Difficult problems should probably be discussed on their own subpage.
items used as classes
editStatements with instance of (P31) should only have classes as values. This rule is expressed as property constraint, so violations should be listed at Wikidata:Database_reports/Constraint_violations/P31.
subclass/instance of loops
editOntologies should generally be tree-like. That is they should have a partial order: "instance of" should point from an item to a group (class) of items, while "subclass of" should point from a smaller class to a larger class. Treating the instance-of and subclass-of relationships as edges of a graph, the resulting graph should be acyclic; there should be no loops. Unfortunately a few loops have made it into wikidata at this point in time; this section is to document and address them.
Subclass/Instance of loops in wikidata - autogenerated lists
edit- Wikidata:WikiProject Ontology/Problems/subclass of self
- Wikidata:WikiProject Ontology/Problems/subclass of subclass of self
- Wikidata:WikiProject Ontology/Problems/3rd-order subclass of self
- Wikidata:WikiProject Ontology/Problems/4th-order subclass of self
- Wikidata:WikiProject Ontology/Problems/5th-order subclass of self
- Wikidata:WikiProject Ontology/Problems/6th-order subclass of self
- Wikidata:WikiProject Ontology/Problems/instance of self
- Wikidata:WikiProject Ontology/Problems/subclass of instance of self
- Wikidata:WikiProject Ontology/Problems/instance of 2nd-order subclass of self
- Wikidata:WikiProject Ontology/Problems/instance of 3rd-order subclass of self
- Wikidata:WikiProject Ontology/Problems/instance of 4th-order subclass of self
- Wikidata:WikiProject Ontology/Problems/instance of 5th-order subclass of self
General looping instance/subclass combination
editThis sparql query could in principle catch all the general loop cases - except it times out on the current WDQS...
SELECT ?item WHERE {
?item (wdt:P31|wdt:P279)+ ?item .
}
spurious high-order metaclasses
editOntologies should generally not be very deep. That is, if the instances of a class are themselves classes, that makes the class a "metaclass". First-order metaclasses are not unusual. Second-order metaclasses (whose instances are first-order metaclasses) should be quite rare. Higher-order metaclasses may be needed but should be extremely rare; variable-order metaclasses may also be helpful (some definition of "class" would be one) but also should be rare.
Problems with higher order metaclasses
editA class can be identified as such by itself being a subclass of another class, by having another class be its subclass, or most directly by having instances. These provides three different mechanisms for detecting higher order classes as well, as the following queries illustrate.
- Wikidata:WikiProject Ontology/Problems/3rd order metaclasses by subclass
- Wikidata:WikiProject Ontology/Problems/3rd order metaclasses by superclass
- Wikidata:WikiProject Ontology/Problems/3rd order metaclasses by instance
These are only looking at direct instance-of relationships up the hierarchy. The most general query along these lines would look like, for example:
select DISTINCT ?item WHERE
{ ?metametaclass wdt:P31 ?item .
?metaclass wdt:P31/wdt:P279* ?metametaclass .
?class wdt:P31/wdt:P279* ?metaclass .
?otherclass wdt:P279 ?class . }
However this times out in WDQS.
'concept'
editconcept (Q151885) comes up as a high-level metaclass in many subclass of/instance of trees; for example:
- champagne (Q134862) instance of wine (Q282) subclass of ... liquid (Q11435) instance of fundamental state of matter (Q15831576) subclass of ... state (Q3505845) instance of concept (Q151885)
but concept then has two more levels above it that cross the instance-of (metaclass) leap:
- concept (Q151885) subclass of mental representation (Q2145290) instance of symbol (Q80071) subclass of ... depicting object (Q1166770) instance of physical object (Q223557)
symbol (Q80071) itself also appears frequently near the top of the ontology trees. This should probably be cleaned up.
classes with too many subclasses
editClasses can have millions of instances (human (Q5) being a typical example in Wikidata). But in order to be useful abstractions, subclasses of a given class should be relatively limited in number. This should produce a reasonably understandable tree of groupings of whatever the class contains. Here is a list of classes with more than 1000 direct subclasses:
Anti-patterns from Multi-Level Modeling Theory
editSee Applying a Multi-Level Modeling Theory to Assess Taxonomic Hierarchies in Wikidata by F. Brasileiro et al. This paper was discussed on English Project Chat in March 2016. It lists several specific anti-patterns to check for, with associated sparql queries:
Anti-pattern 1
editAn item is an instance of a class, but is also classified (perhaps via several intermediate classes) as a subclass of the same class. This often indicates that "instance of" has been used where "subclass of" makes more sense; alternatively it may mean the class in question should be considered a metaclass whose instances are classes. There are a lot of issues like this in wikidata right now.
- Wikidata:WikiProject Ontology/Problems/instance and subclass of same class
- Wikidata:WikiProject Ontology/Problems/instance and subclass of subclass of same class
- Wikidata:WikiProject Ontology/Problems/instance and 3-level subclass of same class
- Wikidata:WikiProject Ontology/Problems/instance and 4-level subclass of same class
- Wikidata:WikiProject Ontology/Problems/instance and 5-level subclass of same class
- Wikidata:WikiProject Ontology/Problems/instance and 6-level subclass of same class
The most general form to find these problems is:
select ?metaclass ?metaclassLabel (count(*) as ?count) WHERE {
?class wdt:P31 ?metaclass ;
wdt:P279+ ?metaclass .
service wikibase:label {
bd:serviceParam wikibase:language "en" .
}
} group by ?metaclass ?metaclassLabel order by DESC(?count)
but this general query times out. Some of the specific autogenerated lists may be empty also due to time-outs; when the queries work they all show many problems of this sort.
Anti-pattern 2
editThis is where a subclass C has two superclasses A and B that are related to one another by an instance of relationship.
- Wikidata:WikiProject Ontology/Problems/pattern 2 direct superclasses
- Wikidata:WikiProject Ontology/Problems/pattern 2 indirect superclasses case 1
- Wikidata:WikiProject Ontology/Problems/pattern 2 indirect superclasses case 2
The general form for this query (which again times out) is:
select ?classA ?classALabel (count(*) as ?count) WHERE {
?classC wdt:P279+ ?classA ;
wdt:P279+ ?classB .
?classB wdt:P31 ?classA .
service wikibase:label {
bd:serviceParam wikibase:language "en" .
}
} group by ?classA ?classALabel order by desc(?count)
Also note this inconclusive RFC on color class relationships from 2016 (color (Q1075) is one of the classes appearing most often in these lists).
Disjointness issues tracking
edit- Some complex constraints tracks disjointness issues (documentation of the union/disjoint properties : P2738
Anti-pattern 3
editConflicting instance-of relations: C is an instance of A and B, but B is also an instance of A. The following query would fetch these cases:
SELECT ?classA (count(*) as ?count) WHERE {
?classC wdt:P31 ?classA;
wdt:P31 ?classB .
?classB wdt:P31 ?classA .
} group by ?classA order by desc(?count)
but again it times out. However the paper mentioned above does list some specific cases to look into, and that there were over 7000 cases in all:
Central Park (Q160409) is considered an instance of both urban park (Q22746) and park (Q22698), while urban park is also an instance of park. This anti-pattern often occurs in chains with terms such as: award (Q618779), Chinese surname (Q1093580), family name (Q101352), Voivodeship road (Q1259617), Mikroregion (Q11781066) and natural region (Q1970725).
Other noted problems
editFrom wikidata project chat April 20 2016: "SQID as a tool for editors"
edit- Classes that have large numbers of direct subclasses (say >300) and also have a small number of direct instances. These seem to indicate modelling issues in almost all cases (in the case of large numbers of both direct classes and instances, this is again the problem of items that are subclasses and instances of another class at the same time). Moreover, almost all cases where a class has more than 100 direct subclasses suggest that some more subclasses could be useful to hierarchically group things into smaller collections.
- Subclasses of Q5 that have an instance. You can see them in the class browser, or on the Q5 class page. Most of them should be changed, e.g., using occupation (P106).
Diamond-inheritance-like problems, and disjointness
editSome disjointness statements entails diamond inheritance problems. Example, on april 2024[1] we had a statement stating that a vehicle is either an aircraft or a boat, but water-based aircraft (Q20035742) was both. How to solve, open problem ?
Maths and reality object mixed in the ontology
editA surface is real or mathematical, as of April 2024 we have stuff mixed up in the ontology[2]. Solution ?
All Subpages
edit- WikiProject Ontology/Problems/3rd-order subclass of self
- WikiProject Ontology/Problems/3rd order metaclasses by instance
- WikiProject Ontology/Problems/3rd order metaclasses by subclass
- WikiProject Ontology/Problems/3rd order metaclasses by superclass
- WikiProject Ontology/Problems/4th-order subclass of self
- WikiProject Ontology/Problems/5th-order subclass of self
- WikiProject Ontology/Problems/6th-order subclass of self
- WikiProject Ontology/Problems/Anti-pattern 1
- WikiProject Ontology/Problems/Anti-pattern 2
- WikiProject Ontology/Problems/High order metaclasses
- WikiProject Ontology/Problems/Loops
- WikiProject Ontology/Problems/instance and 3-level subclass of same class
- WikiProject Ontology/Problems/instance and 4-level subclass of same class
- WikiProject Ontology/Problems/instance and 5-level subclass of same class
- WikiProject Ontology/Problems/instance and 6-level subclass of same class
- WikiProject Ontology/Problems/instance and subclass of same class
- WikiProject Ontology/Problems/instance and subclass of subclass of same class
- WikiProject Ontology/Problems/instance of 2nd-order subclass of self
- WikiProject Ontology/Problems/instance of 3rd-order subclass of self
- WikiProject Ontology/Problems/instance of 4th-order subclass of self
- WikiProject Ontology/Problems/instance of 5th-order subclass of self
- WikiProject Ontology/Problems/instance of self
- WikiProject Ontology/Problems/instances of instances of physical object
- WikiProject Ontology/Problems/pattern 2 direct superclasses
- WikiProject Ontology/Problems/pattern 2 indirect superclasses case 1
- WikiProject Ontology/Problems/pattern 2 indirect superclasses case 2
- WikiProject Ontology/Problems/subclass of instance of self
- WikiProject Ontology/Problems/subclass of self
- WikiProject Ontology/Problems/subclass of subclass of self
- WikiProject Ontology/Problems/too many subclasses