This is a list of various WDQS queries that point to potential problems in Wikidata. Some of them can be fixed automatically, others need manual review.
(Most of these queries are also mirrored somewhere on mw:Wikibase/Indexing/SPARQL Query Examples.)
Actors whose Spanish label ends with “ (actor)”Edit
Wikipedia article titles must be unique, so when multiple items have the same title, the article title is often disambiguated by adding a property of the item in parentheses, e. g. en:Mercury (element) vs. en:Mercury (mythology) vs. en:Mercury (planet). For persons, their (main) profession is frequently used for this.
On Wikidata, labels do not need to be unique (the unique identifier is the Q number), and so these additions are unnecessary. However, they are often still present, since the label was imported (by a bot) from the title of the corresponding Wikipedia article. In English, these labels mostly seem to have been fixed, but other languages retain high numbers of such titles.
The above query finds all actors whose Spanish label ends with “ (actor)”. Since it is vanishingly unlikely that this label addition is actually intentional, a bot could remove this suffix from all labels that the query returns. Of course, the query can also easily be adapted for other languages and professions.
Labels containing HTML escape sequencesEdit
This query finds all items where the label contains the text “"”, which is the HTML entity for the double quotation mark. This is probably a bug in whatever bot created the item, and can be fixed automatically by replacing the entity with its value (the double quote). Other entities (amp, apos, lt, gt, etc.) can also be fixed.
Mathematical formulae containing HTML escape sequencesEdit
Same thing as above, broken data import.
British English spelling in English labelsEdit
Wikidata’s “en” language code is usually taken to mean American English, as far as I understand (though I can’t find a reference for this). Items described as “colour” should be changed to “color”, and a separate British English description “colour” should be added.
URLs in page(s) (P304) referencesEdit
Person labels containing parenthesesEdit
This is a more general version of #Actors whose Spanish label ends with “ (actor)”. It does not limit the search to actors (merely to humans), and matches any parentheses in the label. Since such labels are, in general, sometimes correct, this should not be fixed by a bot. However, a human could look over this list and fix any cases that stand out.
Instances of weaponEdit
This query finds all items that are an instance of (some subclass of) weapon (Q728). Most of the results are not actually instances (individual objects); for example, the Carcano (Q858434) is only a class of rifles (one instance would be the John F. Kennedy assassination rifle (Q2012291)). These results should be changed to be subclasses (P:P279) instead of instances (P:P31).
Populations should generally be whole numbers. The above query finds populations that have a fractional component; one likely explanation is that the decimal separator and the thousands separator are switched between some locales (for example, English writes 1,000.0, whereas German writes 1.000,0), and someone entered the population with thousands separators which were then misinterpreted as decimal separators.
This query finds “odd” country (P17) statements. Many of these are arguably correct, because politics is simply complicated; however, some are also obvious mistakes, such as a country Persian (Q9168), which is the item of the Persian language, or JA (Q224881), which is a disambiguation and not an abbreviation for Japan (Q17).
Paintings on taxonsEdit
This query finds paintings where the painting surface is some taxon, e. g. Populus (Q25356). Usually, the intended statement is that the painting surface is this taxon’s wood, e. g. poplar wood (Q291034).
Authors who have worked together but whose Erdős numbers are more than 1 apartEdit
One’s Erdős number is the lowest Erdős number of all scientists one has collaborated with, plus one. It follows that authors who have published a paper together should not be more than one apart in Erdős number. This query finds papers whose authors have Erdős numbers more than 1 apart.
Descriptions that are just the default descriptionEdit
When you edit an item’s label, description, and aliases, the text box for the description displays a default text (in English, “enter a description in English”). A few items, for whatever reason, have this default text entered and saved as actual description. This is almost certainly an error. (You can adapt the query for other languages.)
Items that are simultaneously instance and subclass of the same classEdit
Language statements that point to a countryEdit
There are a variety of statements whose object should always be a language; if it’s a country instead, that’s probably an easy-to-correct mistake.
People with statements where start and end time are over 100 years apartEdit
Since humans only rarely live for over 100 years, it is likely that a statement about a person where the start time (P580) and end time (P582) qualifiers are over 100 years apart is an error (for example, entering 21999 instead of 1999, or 20013 instead of 2013). (Note that this is not always the case: according to the Japanese traditional order of succession, Emperor Kōan (Q312821) was actually in office for about 101 years.)
Capitals that aren’t capitalsEdit
Statements with reason for deprecation (P2241) that aren’t deprecatedEdit
A deprecated item can have the reason for its deprecation specified with a reason for deprecation (P2241) qualifier. If a statement with this qualifier isn’t deprecated, something is probably amiss – either the statement is correctly no longer deprecated, in which case the qualifier should perhaps be removed, or the statement should be deprecated but isn’t for some reason, in which its rank should perhaps be adjusted. (Notable exception: the Wikidata property example (P1855) statement on reason for deprecation (P2241) itself.)
Some CEO statements have an object that isn’t a human. Most of the time, this is a misuse of the property – it should go “company – CEO – person”, but these cases are entered as “person – CEO – company” (with the intention of “CEO of”).
People who have a date as place of birthEdit
Humans with male / female creature in statementsEdit
Some human items have male non-human organism (Q44148) or female non-human organism (Q43445) in statements. In sex or gender (P21), that should be male (Q6581097) or female (Q6581072); in other statements, it’s probably a mistake or vandalism.
Dates of birth with unknown yearEdit
“point in time” properties, such as date of birth (P569), cannot contain a month and day with no year. If “February 10” is entered, it is interpreted as the month of February of the year 10 AD (with no day; precision: month). For the case of date of birth (P569), the property birthday (P3150) has been created, which can be used instead to link to an item for the birthday if the birthday is known but the year isn’t.