User:GerardM/DBpedia for Quality
Wikidata and DBpedia are both very similar and very different. Both rely on the data in Wikipedia but the way they do this is distinctly different. Exactly because of this difference it is possible to leverage DBpedia to improve the quality of Wikipedias, Wikidata and DBpedia itself. The aim is to provide a tool for people to concentrate on the differences between Wikidata and the DBpedias.
Like Wikipedia, there is no single DBpedia. It harvests information from one specific Wikipedia and enters the data in a related DBpedia. For some Wikipedias, DBpedia uses the RSS feed to update its information. It also harvests data from Wikidata. In this process Wikidata properties are converted into DBpedia properties. The result is that information from a specific Wikipedia can be compared with information from Wikidata.
Many people say that the quality of Wikidata needs to improve. Where to start when there are 25,411,127 items? Many notions for improving quality have been expressed. This approach reduces the set of data that needs attention and increases the likelihood that an actual issue needs to be addressed.
By concentrating only on differences, the Wikidata, the Wikipedia communities are invited to improve their quality. There are a few scenario's;
- Wikidata is correct and Wikipedia is wrong; the Wikipedia article needs to be changed.
- Wikidata is wrong and Wikipedia is correct; the Wikidata item/statement needs to be changed.
This is a first iteration. When this works, in a second iteration attention can be given to Wikipedia wiki links and see if they match Wikidata statements. When they do, it is likely that everything is fine. It is possible that they link to a disambiguation page but at the same time the article implicit in the Wikidata link may exist. It is possible that articles link to an article that does not match Wikidata.. That is possibly a third iteration.
By using DBpedia in its many incarnations, it is possible for the language community of a Wikipedia to concentrate on their quality differences with Wikidata. When multiple communities do this, increasingly the quality of Wikidata will improve and become better than any of the Wikipedias. This is a result of this process but not the objective. The objective is that slowly but surely we will lift the quality we provide to our readers. It will pave the way for a new approach to our data; quality different from the quality of any single project.
The quality of the DBpedia data will improve implicitly as a consequence and this is one explicit objective of this project.
Project plan edit
- First to choose one language that is of a sufficient size and uses the RSS feed for its changes. The next step is to compare the Wikidata DBpedia with the Wikipedia DBpedia. We want a report on the total of matching links and the number of differing statements. This report will run regularly (one a month?) and will be an indicator of the development of overall quality.
- Users need to be able to select a subset of the data so that they can work on the data that is of interest to them.
- When Wikidata has no label, DBpedia may have a suggestion, alternatively the user sees the issue in a similar way as in Reasonator (a red squiggly line under any label we have).
- The values for Wikidata and Wikipedia are shown side by side. The user may indicate what project has been corrected.
- DBpedia will be triggered (when applicable) to harvest the article / the item and this may remove the issue from the list.