Wikidata:WikiProject Data Quality
Wikiproject on data quality (Q1757694) in Wikidata.
Motivation edit
Ensuring data quality is of utmost importance, as the goal of Wikidata is to “give more people more access to knowledge” and therefore, the data needs to fit the needs of the data consumers. The Wikidata community has already developed methods and tools that monitor relative completeness (e.g., Recoin gadget), encourage link validation and correction (e.g. Mix’N’Match) and help editors observe recent changes and identify vandalism. Moreover, the community started global discussions about relevant dimensions of data quality in a recent RFC that used a survey of Linked Data Quality methods as the debate’s starting point to better describe and categorize quality issues and add more quality aspects/dimensions, with the goal of developing a data quality framework for Wikidata (RfC:DataQualityFramework).
Motivated by this, on January 18, 2019, we organized a workshop on "Data Quality Management in Wikidata" which took place at Wikimedia Germany.
In this workshop, we discussed existing challenges of data quality in Wikidata and derived possible solutions for data quality monitoring and data quality assurance in the context of Wikidata.
Resources edit
Queries edit
People where an alias is equal to the label edit
The following query uses these:
- Items: human (Q5)
- Properties: instance of (P31)
SELECT ?item ?label WHERE{ ?item wdt:P31 wd:Q5 . ?item rdfs:label ?label FILTER(lang(?label) = 'en') . ?item skos:altLabel ?alias FILTER (LANG (?alias) = 'en') FILTER(?label = ?alias) } LIMIT 100
Labels containing Markup edit
The following query uses these:
- Properties: instance of (P31)
#Labels containing markup tags SELECT DISTINCT ?item ?label WHERE { ?item wdt:P31 wd:Q13442814; rdfs:label ?label. FILTER CONTAINS(?label, "</"). } LIMIT 100
Titles containing HTML escape sequences edit
The following query uses these:
- Properties: instance of (P31) , title (P1476)
SELECT ?item ?title WHERE { ?item wdt:P31 wd:Q13442814; wdt:P1476 ?title . FILTER CONTAINS(?title, "&"). } LIMIT 100
PubMed article titles edit
PubMed uses brackets to indicate that the original title is translated to English
The following query uses these:
- Properties: instance of (P31) , PubMed ID (P698)
SELECT ?paper ?paperLabel ?pmid WHERE { ?paper wdt:P31 wd:Q13442814; wdt:P698 ?pmid; rdfs:label ?paperLabel FILTER(lang(?paperLabel)="en") FILTER(STRSTARTS(?paperLabel, "[")). FILTER(STRENDS(?paperLabel, "]")). } LIMIT 100
https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html#articletitle explains the brackets, and that titles may include other non-title words:
- Non-EN title that is not yet translated will be just: [In Process Citation]
- Explanatory information is enclosed in parentheses, eg: (author's transl)
- Corporate/collective authors may appear at the end, eg: GISU. Interdisciplinary Group for the Study of Ulcer.
- OLDMEDLINE subset (<CitationSubset> = OM) may have: Not Available
https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html#verniculartitle: <VernacularTitle> hold the original, untranslated title.
- Non-Roman alphabet language titles are transliterated to Latin
- https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html#language: <Language> holds the lang of the article (thus of the original title), as ISO 3-letter code
Tools edit
- Overview of Data Quality Tools on Wikidata (2019-01-18) - presentation by Lydia Pintscher and Lucas Werkmeister at Workshop on Data Quality Management in Wikidata.
Publications edit
See Scholia on data quality for publications. Please add missing publications to Wikidata as publication items and add main subject (P921): data quality (Q1757694) for the items to be included in this list.
Events edit
Tracking errors in people edit
These should be searched weekly.
- Wikidata:Database reports/items with P569 greater than P570 People who died before they were born.
- Wikidata:Database reports/unmarked supercentenarians People over 120 years old.
- Wikidata:Database_reports/identical_birth_and_death_dates Doppelgangers with same birth and death dates
- Wikidata:WikiProject Names/lists/people with a given name identical to their family name People with mixed up given name and surname, usually Hungarians and Asian people where the family name comes first
Tracking errors in Identifiers edit
These are for the external sources to monitor errors we have found.
Related projects edit
- Events
Subpages edit
- WikiProject Data Quality/Issues
- WikiProject Data Quality/Issues/Duplicate P31 and P279 statements
- WikiProject Data Quality/Issues/P642
- WikiProject Data Quality/Issues/P642/Property labels
- WikiProject Data Quality/Issues/P642/Property labels/Finno-Ugric
- WikiProject Data Quality/Issues/P642/Strategy
- WikiProject Data Quality/Issues/P805
- WikiProject Data Quality/Participants
- WikiProject Data Quality/Wikidata lists
- WikiProject Data Quality/Wikidata lists/Items using Map of Copenhagen with surroundings.png
Participants edit
The participants listed below can be notified using the following template in discussions:{{Ping project|Data Quality}}
- JakobVoss (talk)
- ClaudiaMuellerBirn (talk)
- Criscod (talk)
- Daniel Mietchen (talk)
- Ettorerizza (talk)
- Ls1g (talk)
- Pasleim (talk)
- Hjfocs (talk) 17:24, 21 January 2019 (UTC)
- PKM (talk)
- 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC)
- Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC)
- ElanHR (talk)
- User:Epìdosis (talk)
- Tris T7 TT me
- UJung (talk) 11:43, 24 August 2019 (UTC)
- Envlh (talk)
- SixTwoEight (talk)
- User:SCIdude (talk)
- Will (Wiki Ed) (talk)
- Mathieu Kappler (talk)
- So9q (talk) 19:33, 8 September 2021 (UTC)
- Zwolfz (talk)
- عُثمان (talk) 16:31, 5 April 2023 (UTC)
- M2k~dewiki (talk) 12:28, 24 September 2023 (UTC)
- —Ismael Olea (talk) 18:18, 2 December 2023 (UTC)
- Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC)
- Peter Patel-Schneider
See also edit
- Wikidata:WikiProject Virtual Twins — can be seen as a data quality project, since a pair of twins is mostly due to random and to the biases that exist in Wikidata coverage