Wikidata:WikiProject Ontology/Modelling

Ontological modelling with classes in Wikidata edit

TL;DR: Be careful when creating instance of (P31) and subclass of (P279) links for classes, including classes that do not currently have any members. One class can only be subclass of (P279) another if all members of the first class must be members of the second. When creating a class like watercraft (Q1229765) whose members are all individuals, the class should be instance of (P31) first-order class (Q104086571) if it does not have a superclass that already is.

The main items and properties in Wikidata that are used to structure the ontology are class (Q16889133), entity (Q35120), instance of (P31), and subclass of (P279).

Classes are those items that conceptually group together similar items, as human (Q5) groups together humans. The items in a class are known as its instances, and are explicitly related to the class via instance of (P31). Classes do not need to have many, or even any, instances in Wikidata, e.g., Honda Accord (Q463632) has few instances (none?) and quark (Q6718) has none. Classes do not need to have actual physical objects as instances, so unicorn (Q7246) and set (Q36161) are classes.

Classes are related to more-general classes using subclass of (P279), as human (Q5) subclass of (P279) person (Q215627). If a class is a subclass of another, then it is also a subclass of any more-general classes, so human (Q5) is a subclass of animal (Q729). It is not necessary to explicitly state these subclass relationships, so human (Q5) does not have animal (Q729) as a value for subclass of (P279) even though it is a subclass of animal (Q729).

Every item should be an instance of one or more classes, as Angela Merkel (Q567) instance of (P31) human (Q5). If an item is an instance of a class then it is also implicitly an instance of any more-general classes, so Angela Merkel (Q567) is an instance of person (Q215627). It is not necessary to explicitly state these instance relationships so Margaret Thatcher (Q7416) does not have animal (Q729) as a value for instance of (P31) even though it is an instance of animal (Q729).

entity (Q35120) is the class of all items, so all items are implicitly an instance of entity (Q35120) and all classes are implicitly subclasses of entity (Q35120). It is not necessary to explicitly state these relationships.

class (Q16889133) is the class of all classes, so all classes are implicitly an instance of class (Q16889133). Every item that is a value of instance of (P31) is a class. Every item that has a value for or is a value of subclass of (P279) is a class, so mathematical object (Q246672) is a class. It is thus not necessary for most classes to explicitly state that they are instances of class (Q16889133).

Classes can be instances of other classes, as Honda Accord (Q463632) is an instance of automobile model (Q3231690). metaclass (Q19478619) is the class of metaclasses (classes whose instances are all classes), so all metaclasses are implicitly instances of metaclass (Q19478619).

An item should normally not be both an instance of and a subclass of the same class. (So white (Q23444) should not be both a subclass and an instance of color (Q1075).) There are some exceptions, such as class (Q16889133) and metaclass (Q19478619). The instances of a class should normally not mix together groups of things and the things themselves and neither should the subclasses of a class. (So color (Q1075) should not have as subclasses both white (Q23444) and primary color (Q166902).)

See below for a discussion of these guidelines.

Background edit

Classes (also known as concepts and sometimes types) form the backbone of most ontologies in computer science. The classes used are either part of an ontology language (as in Semantic Nets, Description Logics [1], and OWL [2]) or are defined on top of some lower level formal language (as in RDFS [3] or regular logics, e.g., Common Logic).

The basis for the class-instance relationship is the philosophical notion of a Type-token distinction; the intuition around this distinction seems clear, but coming up with a precise definition that meets the intuitive understanding is tricky and leads to further complications such as "occurrences" that seem to be neither type nor token but something of both.[4] Determining what class-instance relationships actually mean may depend on the specific discipline associated with the entity (physical, biological, geographic, cultural, linguistic, etc.), rather than on general theoretical grounds.[4] On the other hand, in some cases the multiple meanings associated with a given natural language term (the origin of most Wikidata items) may require apparently conflicting understandings of what that term represents, and splitting each such case into distinct entries would lead to an impractical explosion of items.[5]

The Cyc project bears some resemblance to Wikidata in trying to collect statements and properties on items within the scope of the entire body of human knowledge. An analysis of the class/metaclass hierarchy within Cyc by Doug Foxvog[6] demonstrates the likely need for both fixed- and variable-order metaclass levels, where a maximum of 4th-order (along the fixed-order organization) seemed sufficient.

What classes are edit

Classes bring together several related notions that help structure a view of the world.

Classes collect together a set of objects in the world (the set of instances of the class). For example, the class of bridges (bridge (Q12280)) includes 15 July Martyrs Bridge (Q4484) (Bosphorus Bridge) and Golden Gate Bridge (Q44440). Because classes form the backbone of the ontology, objects that are not instances of any class do not gain much advantage from the ontology.

Classes are related to other classes via generalization/specialization relationships. For example, the class of bridges (bridge (Q12280)) would be a generalization of the class of suspension bridges (suspension bridge (Q12570)) and a specialization of the class of architectural structures (architectural structure (Q811979)). If an object is an instance of a class (as Golden Gate Bridge (Q44440) is an instance of suspension bridge (Q12570)) and that class is a specialization of another class (as suspension bridge (Q12570) is a specialization of bridge (Q12280)) then the object is also an instance of the generalization (so Golden Gate Bridge (Q44440) is an instance of bridge (Q12280)).

Classes can provide an intensional definition of their instances. For example, the class of suspension bridges could be defined as those bridges of suspension structural type.

Classes can provide a description of how information about their instances are described in the ontology. For example, the class of bridges could say that bridges have a location which is a geolocation, a structure type which is one of the structual types of bridges, and so on.

The instances of classes in an ontology do not need to be physical objects that exist in the real world. For example, colors can be instances of the class color (color (Q1075)) even though colors are related to human perception of light. Similarly, classes themselves can be instances of other classes (often called metaclasses or higher-order classes). For example, Honda Accord (Q463632) is a class, whose instances are actual physical cars (those made by the car manufacturer Honda with model designation Accord). Honda Accord (Q463632) is itself a car model, i.e., an instance of automobile model (Q3231690).

Because classes are so important to ontologies, mistakes in the setup of classes and their relationships to other classes have a large negative effect on the information represented using the ontology. For example, if bridge (Q12280) was incorrectly stated to be a specialization of building (Q41176), then all bridges would be incorrectly determined to be buildings.

There is a large difference between the instances of a class and its subclasses. The class of suspensions bridges is not itself a bridge! Instead the class of suspension bridges has bridges as instances. This difference is easy to see here, but can be tricky in situations where the ultimate individuals (e.g., the Golden Gate Bridge) are not so easy to determine. (An easy way to distinguish between instances and subclasses is to ask yourself what you would count up if you wanted to know how many things belong to a class. If you wouldn't count it, then it is not an instance, but it could easily be a subclass. If you are uncertain what to count then you probably need to be more specific in what you want to be an instance of the class.)

Classes in Wikidata edit

The formal language for Wikidata [7] does not have any special facilities for defining classes. Instead, some core items and properties, class (Q16889133), entity (Q35120), instance of (P31), and subclass of (P279), have been created by the Wikidata community for use with classes. Other items, notably first-order class (Q104086571), second-order class (Q24017414), third-order class (Q24017465), variable-order class (Q23958852), and metaclass (Q19478619), can be used to further determine what kind of class is being defined.

Wikidata does not include information about every object in the world (WD:N), even for classes in Wikidata. For example, most humans are not in Wikidata, but human (Q5) is a class in Wikidata. Wikidata can thus naturally have classes that have no instances.

Wikidata is not limited to having information about actual physical objects. For example, Lassie (Q941640) is a fictional character. Classes are needed to describe these items, so instances of classes need not be actual physical objects. Similarly, set (Q36161) is a class of abstract objects.

There is nothing in Wikidata that depends on classes not being instances of other classes. There is thus no need to prevent classes in Wikidata from being instances of other classes.

There is no syntactic difference between classes in Wikidata and other items. Classes can only be determined by their participation in relationships that are reserved for classes. To ensure that all classes partipate in such a relationship, it is usual to state that a class is an instance of class (Q16889133) even though it is possible to determine that an item is a class because it has or is a value of a subclass of (P279) statement. Similarly, to ensure that all classes of other classes can similarly recognized, it is usual to state that a metaclass is an instance of metaclass (Q19478619).

Class order in Wikidata edit

The Wikidata ontology has several metaclasses that provide information about the order of a class. A class is first-order if none of its members are classes; a class is second-order if all of its members are first-order class; similarly for higher orders. first-order class (Q104086571) is the metaclass of all first-order classes. second-order class (Q24017414) is the class of all second-order classes. There is also third-order class (Q24017465), fourth-order class (Q24027474), fifth-order class (Q24027515), and fixed order class of higher order (Q24027526). fixed-order class (Q23959932) is the class of all fixed-order classes. variable-order class (Q23958852) is the class of all classes that do not have a fixed order.

Providing an order for a class, if it has one, provides useful information. For example, watercraft (Q1229765) is a first-order class because all its members are individual watercraft. As watercraft (Q1229765) is an instance of (P31) first-order class (Q104086571), there is information in Wikidata stating that all members of watercraft (Q1229765) are individuals and not classes. ship (Q11446) is also a first-order class in Wikidata, which can be inferred from it being a subclass of (P279) watercraft (Q1229765). Wikidata should prevent classes from being stated as members of watercraft (Q1229765), but it currently does not. Nonetheless, being a member of first-order class (Q104086571) is a useful signal to users as to what entities should not be made members of the class.


Issues with classes in Wikidata edit

There are quite a few cases where instance of (P31) or subclass of (P279) are incorrectly used. See above for examples related to color - see also the Problems section for many more examples and some attempts at systematically tracking them.

Wikidata does not have any facilities for inference. This means that consumers of Wikidata information have to perform non-trivial queries to determine all the classes that an item is instance of and all the generalizations of a class. The alternative would be to explicitly have instance links to all the classes that an item is an instance of and subclass links to all the generalizations of a class. This would result in very many redundant links, but maybe there should be a bot that does this.

There are several items in Wikidata that appear to be very similar to class (Q16889133), including class (Q217594), and class (Q5127848). Only class (Q16889133) is the class of all classes - the other classes have different meanings and are not central to the Wikidata ontology.

Concepts and their names edit

taxon synonym (P1420) attempts to describe the relationships between names of concepts but generally items in Wikidata don't describe names but underlying concepts. This tension likely rises once Wikidata interfaces with Wikitionary.

References edit

  1. http://www.cambridge.org/us/academic/subjects/computer-science/programming-languages-and-applied-logic/description-logic-handbook-theory-implementation-and-applications-2nd-edition
  2. http://www.w3.org/2001/sw/wiki/OWL
  3. http://www.w3.org/TR/rdf-schema/
  4. 4.0 4.1 http://plato.stanford.edu/entries/types-tokens/
  5. See the many examples presented for instance in Surfaces and Essences: Analogy as the Fuel and Fire of Thinking by Douglas Hofstadter and Emmanuel Sander (2013)
  6. https://www.researchgate.net/publication/231599269_Instances_of_Instances_Modeled_via_Higher-Order_Classes
  7. https://www.mediawiki.org/wiki/Wikibase/DataModel