User:Peter F. Patel-Schneider/cleaning welcome

Thank you for your interest in the Wikidata ontology and its problems. Here is a larger account of the Wikidata ontology and recent efforts related to it for your information. This starts with the basics, mostly to provide a basis for the later parts of the account. If you are interested in working on the Wikidata ontology you might consider following the pointers in this account and looking at some of the information there. If you have any questions about this account you can contact me.


As in RDF, the Wikidata ontology is stated using regular predicates, notably instance of (P31) and subclass of (P279). This has some advantages over using special facilities (as Description Logics do) but also some disadvantages. A major disadvantage is that support for the Wikidata ontology is not part of the core Wikidata system but has to be added on to it. A related disadvantage is that ontology-related predicates can be added at any time, even without any support from the core system.

The Wikidata system does not provide much support for the ontology, not even for instance of (P31) and subclass of (P279). As a result it is all too easy to add ontology information to Wikidata in a way that should be at least flagged. For example, the Wikidata ontology has had inappropriate instance of (P31) and subclass of (P279) loops in it. As well, when adding ontology links the relevant portions of the resultant ontology are not shown. This results in subclass of (P279), and also instance of (P31), chains that are incorrect, as new links are often made without consideration of the new ontology ancestors of a class. Because edits to the Wikidata ontology have been made by many people, with varying degrees of ontology expertise, the lack of support has had a large impact on the quality of the Wikidata ontology.

One problem with analysis of the Wikidata ontology has been that Wikidata is stored and edited as a Wikimedia instance so there is no direct query interface. There are some tools available to visualize bits of the ontology but they are not used as much as they could be. Querying the Wikidata ontology is generally done using the Wikidata Query Service at https://query.wikidata.org/, which uses BlazeGraph against an RDF dump of Wikidata. The problem is that BlazeGraph struggles with ontology queries against Wikidata and many queries time out. Fortunately, querying against a recent dump of Wikidata can be done using QLever at https://qlever.cs.uni-freiburg.de/wikidata/ and this service can answer many queries that time out against the WDQS. Also, it is easy to set up a Wikidata query service using QLever yourself, using only a somewhat powerful desktop machine.


The problems with the Wikidata ontology have been known for quite some time and a Wikidata project - https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology - was formed to assist in guiding the development of the ontology. This project was quite active for a few years but has no been very active recently. A problem with this project is that it does not appear to have created a clear statement of how the Wikidata ontology is supposed to be structured. In the project's pages there are quite a few descriptions of supposed issues in the Wikidata ontology but some of these are not current. An interesting bit of work would be to go through all the pages and bring them up to date.

Recently there have been attempts to investigate problems in the Wikidata ontology, most associated with recent Wikidata Data Quality days. See "Overview of ontology issues" in https://www.wikidata.org/wiki/Wikidata:Events/Data_Quality_Days_2021 for a description of some of the problem types in the Wikidata ontology. See "How do we deal with concurrent uses of different properties?" in https://www.wikidata.org/wiki/Wikidata:Events/Data_Quality_Days_2022 for a description of a particular problem with Wikidata that is related to the Wikidata ontology. A survey was developed to ask Wikidata users what they thought were important problems with the Wikidata ontology. See https://commons.wikimedia.org/wiki/File:Wikidata_ontology_issues_%E2%80%94_suggestions_for_prioritisation_2023.pdf for a presentation of the results of the survey. There is considerable discussion on the survey at https://www.wikidata.org/wiki/Wikidata_talk:Ontology_issues_prioritization. There is a also a presentation on the results of the survey at Wikidata Con 2023 - https://commons.wikimedia.org/w/index.php?title=File:WikidataCon_2023_Ontology_issues_in_Wikidata_-_Everything_in_neat_and_tidy_boxes_Not_quite!.pdf. At https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023 there are several relevant presentations, particularly https://commons.wikimedia.org/wiki/File:Wikidata_Challenges_in_Semantic_Web_Community.pdf and https://commons.wikimedia.org/wiki/File:DMD2023_-_A_better_way_to_enforce_a_data_model._Suggestions_to_improve_Autofix.pdf.


Even with all these activities it does not appear that there were any large efforts to fully investigate problems with the Wikidata ontology, to build tools to better support the Wikidata ontology, or to fix current problems with the Wikidata ontology. A few of us have started a task force to try to clean up parts of the Wikidata ontology. The task force has a main page at https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Cleaning_Task_Force and has had a few meetings https://calendar.google.com/calendar/u/0/embed?src=90c4d393de6b606fdb90ba6da1f7c1cc7afdd084df7776c02e55170620807843@group.calendar.google.com. Participants in the task force are looking at problems with the upper part of the Wikidata ontology, correspondence between the Wikidata ontology and schema.org, disjointness in the Wikidata ontology, and violations of class order in the Wikidata ontology. The task force is mostly interested in domain-independent aspects of the Wikidata ontology, and not so much interested in particular domains in the Wikidata ontology. Feel free to join the task force or just sit in on some meetings.

There are many useful tasks to cleaning and improving the Wikidata ontology, ranging from theoretical analyses to techniques for reasoning to implementation of tools to direct editing.