User:Peter F. Patel-Schneider/ontology cleaning tasks

Here is an initial, incomplete list of tasks that would be helpful in improving the Wikidata ontology. Some of these are being addressed by members of the Ontology Cleaning Task Force

Informal Description of the Wikidata Data Model edit

Status: Peter F. Patel-Schneider has talked to Lydia Pintscher about this See also https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ontology#comprehensive_description_of_how_Wikidata_is_supposed_to_work, type of Wikidata property (Q107649491) - metaclass of properties, and fundamental Wikidata entity (Q115490628) - entities related to reasoning.

There is a page - https://www.wikidata.org/wiki/Wikidata:Data_model - on how Wikidata is supposed to work but it is a bit old and only one person was involved in creating it. A better description would be useful. Lydia Pintscher said that she might be able to have a technical writer improve a terse description.


Informal Description of the Wikidata Ontology edit

Status: Peter F. Patel-Schneider has talked to Lydia Pintscher about this

There does not appear to be a good informal description of how the Wikidata ontology should work. Such a description would give an overview of the Wikidata ontology and also list all the entities that are important for the Wikidata ontology and what they are supposed to mean. This list would include at least instance of (P31), subclass of (P279), subproperty of (P1647), class (Q16889133), first-order class (Q104086571), second-order class (Q24017414), third-order class (Q24017465), fourth-order class (Q24027474), fifth-order class (Q24027515), fixed-order class (Q23959932), variable-order class (Q23958852), is metaclass for (P8225), and metasubclass of (P2445).


Formal Description of the Wikidata Ontology edit

It would be useful to have a formal description to go along with the informal description above.


Additional ontology-related classes and properties edit

There is some important ontology-related information that cannot be stated in Wikidata. For example, it is not possible to state that two classes are disjoint (without creating a union class). It would be useful to describe important ontology facilities that are missing and propose a way of adding them to Wikidata.

See https://www.wikidata.org/wiki/User:Peter_F._Patel-Schneider/disjoint for an example proposal.


Tools to support the Wikidata Ontology edit

It would be useful to have tools that enforce the requirements arising from the use of entities that are important for the Wikidata ontology. For example, there appears to be nothing that enforces the requirements arising from the use of disjoint union of (P2738) or is metaclass for (P8225).

Tools could range from bots to additions to Wikibase software.


Class Order edit

Status: Peter F. Patel-Schneider has been working on this

Wikidata has classes, such as first-order class (Q104086571), that can be used to state the order of classes. There are very many violations of the implications of these statements. Peter F. Patel-Schneider wrote a program to generate queries to find some of the violations. See https://www.wikidata.org/wiki/User:Peter_F._Patel-Schneider/order_violations for the current results. This program does not produce a complete list of violations and could be improved.

It would be useful to categorize and fix, or just fix, these violations.


Disjointness edit

Status: Peter F. Patel-Schneider has been working on this

Wikidata has a property - disjoint union of (P2738) - that states disjointness of classes. Peter F. Patel-Schneider wrote a program to generate queries to find some of the violations of these disjointnesses. There is a report on the results being generated.

It would be useful to categorize and fix, or just fix, these violations.


Autofix edit

Status: some people have ideas on what to do

Autofix is used to fix problems in the Wikidata ontology and enforce certain kinds of constraints. See https://commons.wikimedia.org/wiki/File:DMD2023_-_A_better_way_to_enforce_a_data_model._Suggestions_to_improve_Autofix.pdf for a presentation on the problems of Autofix.

There are lots of potential topics in this area ranging from characterizing what Autofix does to design of a replacement system to implementation of a replacement system.


Connections to other ontologies edit

Status: Andrea Westerinen is working on this

Wikidata has lots of connections to other ontologies.

In particular, there are links from Wikidata to many of the schema.org classes. It would be useful to check all these links for correctness and add links for at least the top several levels of the schema.org ontology.

There are also links to other ontologies and information imported from other ontologies. It would be useful to find and categorize these links and fix those that are incorrect.


Characterizing and potentially fixing unusual parts of the Wikidata ontology edit

Most of the Wikidata ontology is quite simple, with individuals being instances of classes and these classes being instances of metaclasses. But there are many places with violations of this simple ordering.

There are some classes that have a mixture of individuals and other classes as instances, for example color (Q1075) and disease (Q12136). Sometimes this is just the result of incorrect links, sometimes this is the result of bad modelling, and sometimes this is the result of good modelling. It would be useful to find these classes, determine what needs to be fixed, and make the fixes. Another place where the ordering breaks down is in the biochemistry domain.

There are loops in the subclass hierarchy. Most or all of these are errors. They should be tracked down and addressed, with the errors fixed and the correct exceptions noted.

There are loops in the instance hierarchy. Most of these are errors. They should be tracked down and addressed, with the errors fixed and the correct exceptions noted.



Fixing problems in the Wikidata ontology edit

There is a need to just go through the Wikidata ontology looking for incorrect links and fixing those that are found.