Wikidata:WikidataCon 2019/Program/Sessions/Barriers to Using Wikidata as a Knowledge Base

WikidataCon logo ID : SUB-101 Barriers to Using Wikidata as a Knowledge Base
Speaker(s): Peter F. Patel-Schneider Timeblock: tb-saturday Start: 14:30 Slides: Barriers to Using Wikidata as a Knowledge Base.pdf
Room: Einstein Duration: 25min

At first glance Wikidata appears to be an excellent source of general background knowledge. It is large, contains generally high-quality information, contains a large ontology, and can be freely used. However, it ends up being difficult to use Wikidata as is, or even with minor modifications, as a knowledge base of general-purpose information.

First there are factual errors in Wikidata, such as the incorrect identification of items. These are unavoidable in a large information source, but reducing their number would be useful. There are also ontological errors, including confusion between instances and subclasses. Ontological errors cause severe problems when using Wikidata as a Knowledge base as one error generally affects much information.

The ontology in Wikidata is very complex and not well organized, particularly in its upper levels, where there can be several related classes, resulting in different classes being used in different areas of Wikidata. There are also multiple related properties, again with different properties being used in different areas. To exploit Wikidata information in a knowledge base, consumers need to know which class or property is used in each area, limiting general-purpose use.

There is no formal meaning for Wikidata. Consumers need to guess the intent of many classes and properties from very limited descriptions. This is particularly problematic for temporal qualifiers. A stronger formal theory would be the basis for better guidance to contributors on how to enter new information in Wikidata.

The constraint mechanism does help to identify several kinds of errors. However, constraints are very weak as they only point out potential problems. Some mechanism to exclude constraint violations would be useful. Further, the poor ontology organization makes it difficult to write good constraints. Constraints that are prescriptive and that can actually affect Wikidata would aid in its use as knowledge.

But what is most needed to make Wikidata more useful as general information is a tightening of the ontological modelling in Wikidata, including at least a much better description of its major classes and properties and how information combines in it. Then a formal theory for Wikidata data could be developed, permitting strong tools that can find and fix errors in Wikidata and tie together information from different parts of Wikidata. Strong tools can also discover information implicit in Wikidata and make this information available for use.

Hopefully, a group of Wikidata contributors will form to produce better ontological modelling in Wikidata and produce a formal theory for Wikidata. To help, the Wikidata community could set up mechanisms to encourage fixing problems in Wikidata information over just adding new information.

Type: Presentation
Keywords: Ontology, data quality, Future of Wikidata
Notes: #WikidataCon2019_SUB-101
People planning to attend:
  1. --WiseWoman (talk) 14:39, 9 September 2019 (UTC)
  2. --[[kgh]] (talk) 15:29, 21 October 2019 (UTC)
  3. ...
Next session in this room: Surviving marriage using Wikidata