Wikidata talk:WikiProject Data Quality

Latest comment: 3 months ago by Peter F. Patel-Schneider in topic improving autofix

Looking for participants for the evaluation of a tool to diagnose incompleteness in Linked Data edit

Dear members of the Data Quality project,

As part of my PhD, I am currently developing a visualisation tool to help data producers, such as Wikidata contributors, manage incompleteness in Linked Data.
This is in line with tools such as Recoin. We will present our prototype at the Wikiworkshop in Taipei in April.

We are currently running an evaluation, and looking for participants. Basically this consists in:

  • a first video talk to understand specific problems the participant may have, and agree on a small set of data of interest for her/him to visualise in the tool (≈30 min)
  • a second video talk once the data have been analysed (a few days later) to show how the tool works and get spontaneous feedback (≈45 min)
  • the participant uses of the tool on his own, giving us feedback such as:
    • how much it was used: if not at all or a little, what were the problems encountered / if a lot, which specific problems did it helped solving
    • what could be bettered

Feedback can be given through gitlab tickets, or more video talks, as the participant prefers.

JakobVoss (talk) ClaudiaMuellerBirn (talk) Criscod (talk) Daniel Mietchen (talk) Ettorerizza (talk) Ls1g (talk) Pasleim (talk) Hjfocs (talk) 17:24, 21 January 2019 (UTC) PKM (talk) 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC) Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC) ElanHR (talk) User:Epìdosis (talk) Tris T7 TT me UJung (talk) 11:43, 24 August 2019 (UTC) Envlh (talk) SixTwoEight (talk) User:SCIdude (talk) Will (Wiki Ed) (talk) Mathieu Kappler (talk) So9q (talk) 19:33, 8 September 2021 (UTC) Zwolfz (talk) عُثمان (talk) 16:31, 5 April 2023 (UTC) M2k~dewiki (talk) 12:28, 24 September 2023 (UTC) —Ismael Olea (talk) 18:18, 2 December 2023 (UTC) Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC) Peter Patel-SchneiderReply

  Notified participants of WikiProject Data Quality - Marie Ototoï (talk) 14:11, 13 February 2020 (UTC)Reply

@Marie ototoi: I'd be interested in learning more about your project. --Daniel Mietchen (talk) 22:55, 6 March 2020 (UTC)Reply
@Daniel Mietchen: you are welcome! would you like to be a participant? or just to receive detailed information when it is available? (we are currently running the evaluation and writing the paper) --talk 17:01, 7 March 2020 (UTC)Reply

Items that need to be split edit

I have added a focus list for items that need to be split because they describe more than one thing: Wikidata items that need to be split (Q91615393). -- Discostu (talk) 12:32, 23 April 2020 (UTC)Reply

Wikidata:Bibliography of Wikidata edit

Hi All,

I've created a Wikidata:Bibliography of Wikidata page, feel free to expand if interested.

Best, --Adam Harangozó (talk) 13:08, 13 June 2020 (UTC)Reply

New project page edit

Hi everyone,

Data quality is super important and this wiki project is at the center of it. However right now the page is not very inviting, easy to understand and does not tell someone what to do easily and quickly point to all the useful tools that exists. With George and Bayan I worked on a new project page that can serve as a better entry way to this wiki project and get more people to help out with data quality work on Wikidata. You can see the draft at User:LydiaPintscher/DataQualityDraft. I'd like to propose we replace the current content with what is in the draft. What do you think? --LydiaPintscher (talk) 12:47, 30 April 2021 (UTC)Reply

Oh and the new page borrows heavily from Wikidata:WikiProject Counter-Vandalism. --LydiaPintscher (talk) 12:50, 30 April 2021 (UTC)Reply
While I do sympathize with your plan, I don’t think that the draft is an improvement, to be honest. It’s not the layout per se (although I personally prefer the current layout to the colorful boxes, but that’s my problem), but the content. Note that the following observations are my personal feelings about the draft:
  • “Data quality has many aspects. Specifically to Wikidata, there are four dimensions to data quality each with their own aspects: intrinsic, contextual, representational, and accessibility.”: If do understand the words themselves, but as a somewhat experienced user, I have no idea what this actually means. And it sounds fairly technical and theoretical, so it’s not that appealing to me.
  • The Tools are just a long list. The list in Wikidata:WikiProject Counter-Vandalism isn’t perfect either, but at least it attempts to show me some sort of structure. I think it should be centered more on typical use cases: I notice some sort of problem, wonder how I could deal with it in an efficient way and look at the list what tool to use. The list should guide me to the correct tool for my endeavour.
  • After reading the page, I know little more about data quality issues than before. If it’s too abstract for me as a regular editor, it’s probably too abstract for beginners.
I think that those problems all come down to one central issue: We as a community don’t really have a common understanding of data quality. But that doesn’t have to be a problem if we can show diversity in our approaches in a meaningful way. Maybe you could provide 3 to 5 common issues / use cases / opportunities for improvement? Some fields I’m especially interested in (and I’m sure there are many, many others):
  • Finding and merging duplicates
  • Does Wikidata present the best obtainable version of the truth?
    • find inconsistent statements
    • find underreferenced statement
    • add references from high quality ressources
    • flag the best version according to existing data
    • Tools: Yeah, that’s a major problem: There aren’t really any helpful tools for that at the moment. There’s also little glory in that.
  • Try to be complete (that’s what WMDE seems to be heavily focused on right now)
    • check for constraints
    • ways to get good data into Wikidata (import from Wikimedia projects, MnM, …)
    • Tools: ORES, Item Quality Evaluator, InteGraality, …
Those are related, but also very different problems. Maybe the draft can somehow show this variety and tell users about the problem and then point them to tools or even “things you can do right now that take 2 minutes of your time” (or similar) --Emu (talk) 13:41, 30 April 2021 (UTC)Reply

New paper on data quality in Wikidata edit

As per A Study of the Quality of Wikidata (Q107425133). Have not read it yet, just skimmed through, but it seems very useful for our purposes. --Daniel Mietchen (talk) 02:44, 11 July 2021 (UTC)Reply

Feedback requested on use of P642 "of" edit

Please see Wikidata:Project_chat#Feedback requested: Use of P642 "of" - it looks like my project ping didn't work. - PKM (talk) 23:08, 16 November 2021 (UTC)Reply

Possible data quality issues in Mix'n'match synchronisation edit

See this thread. --Epìdosis 11:08, 5 January 2022 (UTC)Reply

Data Quality Days: come and exchange on data quality processes on July 8-10 edit

 

Hello everyone,

JakobVoss (talk) ClaudiaMuellerBirn (talk) Criscod (talk) Daniel Mietchen (talk) Ettorerizza (talk) Ls1g (talk) Pasleim (talk) Hjfocs (talk) 17:24, 21 January 2019 (UTC) PKM (talk) 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC) Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC) ElanHR (talk) User:Epìdosis (talk) Tris T7 TT me UJung (talk) 11:43, 24 August 2019 (UTC) Envlh (talk) SixTwoEight (talk) User:SCIdude (talk) Will (Wiki Ed) (talk) Mathieu Kappler (talk) So9q (talk) 19:33, 8 September 2021 (UTC) Zwolfz (talk) عُثمان (talk) 16:31, 5 April 2023 (UTC) M2k~dewiki (talk) 12:28, 24 September 2023 (UTC) —Ismael Olea (talk) 18:18, 2 December 2023 (UTC) Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC) Peter Patel-SchneiderReply

  Notified participants of WikiProject Data Quality

I'm happy to share with you the upcoming Wikidata event Data Quality Days 2022, taking place online on July 8th to 10th. Following up on previous and similar gatherings (Data Quality Days 2021, Data Reuse Days 2022), this event will focus on processes around data quality, and will provide a space to bring the Wikidata community and the Wikidata development team together. During 3 days of presentations, workshops and facilitated conversations, we will discuss how we are currently identifying and fixing incorrect data on Wikidata, how we could improve these processes to increase data quality, and what concrete measures we could put in place together, with policies, tools or documentation.

The event is open to everyone, no particular knowledge or experience needed, and will take place on the open source video conference platform Jitsi. On the event page, you will find some useful information, the list of sessions, and the list of participants where you can already sign up.

The program of the Data Quality Days 2022 is curated by the organizers (Léa Lacroix, Lydia Pintscher and Manuel Merz). Until June 19th, you can propose a presentation, a workshop or a discussion topic. These will be selected and grouped by the organizers, and the final schedule will be ready around June 27th.

If you have any questions, ideas or suggestions, or if you need support to propose a presentation, feel free to write on the talk page of the event or to reach out to me directly by email. I will also post updates on the talk page.

We're looking forward to discussing with you again about data quality! Cheers, Lea Lacroix (WMDE) (talk) 14:56, 2 June 2022 (UTC)Reply

Model item quality, and related aspects edit

I have posted a question about the quality of model items at Property talk:P5869#Quality of model items, together with a suggestion for continuous improvement of said quality and its potential effect on the rest of Wikidata.

As I believe this to be an issue for years to come, I'd like to bring it up somewhere else than in a general chat forum with a mere week-long span of attention. Where do you think this discussion primarily should take place; on that same model item (P5869) property talk page, at Wikidata talk:Model items, or as part of this project at Wikidata:WikiProject Data Quality/Issues (perhaps in a dedicated P5869 subpage)?

Note that the discussion may come to involve additional property proposals to assist with general modelling issues in Wikidata. --SM5POR (talk) 07:59, 18 December 2022 (UTC)Reply

Should there also have a subpage for Commons category (P373)? edit

I guess the issues relating P373 should also be an investigation agenda for this WikiProject, otherwise the 6th PFD nomination may occur very easier. Liuxinyu970226 (talk) 05:53, 7 August 2023 (UTC)Reply

@Mike Peel:^^ Liuxinyu970226 (talk) 05:57, 7 August 2023 (UTC)Reply
@Liuxinyu970226: Please, just get rid of the property already, it's a waste of maintenance effort now we have integration via the sitelinks. Thanks. Mike Peel (talk) 14:54, 13 August 2023 (UTC)Reply

Data Modelling Days, online gathering, November 30 - December 2, 2023 edit

Hello all,

JakobVoss (talk) ClaudiaMuellerBirn (talk) Criscod (talk) Daniel Mietchen (talk) Ettorerizza (talk) Ls1g (talk) Pasleim (talk) Hjfocs (talk) 17:24, 21 January 2019 (UTC) PKM (talk) 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC) Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC) ElanHR (talk) User:Epìdosis (talk) Tris T7 TT me UJung (talk) 11:43, 24 August 2019 (UTC) Envlh (talk) SixTwoEight (talk) User:SCIdude (talk) Will (Wiki Ed) (talk) Mathieu Kappler (talk) So9q (talk) 19:33, 8 September 2021 (UTC) Zwolfz (talk) عُثمان (talk) 16:31, 5 April 2023 (UTC) M2k~dewiki (talk) 12:28, 24 September 2023 (UTC) —Ismael Olea (talk) 18:18, 2 December 2023 (UTC) Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC) Peter Patel-SchneiderReply

  Notified participants of WikiProject Data Quality

Following the past events dedicated to data quality and data reuse, the Wikidata team wanted to host a new gathering dedicated to data modelling.

The Data Modelling Days will take place online over three days and will host a variety of discussions, workshops and practical sessions on the topics of Wikidata ontologies, EntitySchemas, modelling issues and various other challenges.

The event is open to everyone, regardless of your experience with modelling data on Wikidata. We particularly encourage people who are working on specific topics to join the event and present their modelling challenges.

If you know people or groups who are already discussing modelling issues on Wikidata, or would have something interesting to contribute, please share this message with them!

You can find more information on the dedicated page, sign up and let us know what you are interested in, you can already propose discussions and workshops on the talk page until November 19th.

If you cannot attend, don’t worry, most sessions will be recorded, notes will be taken and slides will be shared.

We are looking forward to seeing you and learning more about your modelling challenges during the Data Modelling Days! If you have any questions, feel free to reach out to me. Best, Lea Lacroix (WMDE) (talk) 14:25, 9 October 2023 (UTC)Reply

New property proposal edit

Wikidata:Property proposal/according to Although the topic might looks obscure, it is directly related to data quality (specifically - identifying and fixing errors in data sources). Would greatly appreciate feedback. Ghuron (talk) 05:55, 7 November 2023 (UTC)Reply

Coming up soon: Wikidata Data Modelling Days, online, November 30-December 2 edit

 
Wikidata Data Modelling Days 2023

Hello all,

If you are regularly involved in adding, organizing or reusing data from Wikidata, you certainly encountered some questions or issues related to data modelling: how to describe and structure information in a consistent way on Wikidata. This is a big topic for the community at large, and that's why we will address it together during a 3-days online event, the Data Modelling Days, that will take place next week, on November 30th, December 1st and 2nd.

During this online gathering, we will have lots of discussions on various topics that you can discover in the program: we will talk about Entity Schemas and how they can be useful to improve data quality and consistency on Wikidata, how to model heritage, gender, references or web fiction, the challenges encountered by people reusing Wikidata's data inside and outside the Wikimedia projects, how to model data on a fresh new Wikibase instance, and many other exciting topics.

Aside from attending sessions and joining the discussions, you can also join our Data Modelling Clinic sessions, where you can bring any topic you are working on, ask questions or ask the community for feedback or help. You will find these sessions on each day in the program.

The event is taking place online on the video conference platform Jitsi, it is free, no registration needed (although you are invited to add your name to the participants list). Most sessions will be recorded in video and have collaborative notes, and we will publish a list of outcomes and next steps for each session.

We are hoping to see a lot of you at the event!

If you have any questions, feel free to ask on the talk page or directly by writing to me. Best, Lea Lacroix (WMDE) (talk) 16:01, 24 November 2023 (UTC)Reply

improving autofix edit

JakobVoss (talk) ClaudiaMuellerBirn (talk) Criscod (talk) Daniel Mietchen (talk) Ettorerizza (talk) Ls1g (talk) Pasleim (talk) Hjfocs (talk) 17:24, 21 January 2019 (UTC) PKM (talk) 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC) Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC) ElanHR (talk) User:Epìdosis (talk) Tris T7 TT me UJung (talk) 11:43, 24 August 2019 (UTC) Envlh (talk) SixTwoEight (talk) User:SCIdude (talk) Will (Wiki Ed) (talk) Mathieu Kappler (talk) So9q (talk) 19:33, 8 September 2021 (UTC) Zwolfz (talk) عُثمان (talk) 16:31, 5 April 2023 (UTC) M2k~dewiki (talk) 12:28, 24 September 2023 (UTC) —Ismael Olea (talk) 18:18, 2 December 2023 (UTC) Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC) Peter Patel-SchneiderReply

  Notified participants of WikiProject Data Quality

There was an excellent talk at Data Modelling Days about improving autofix. I'm interesting in working the design and implementation of a better way to enforce data models. Is anyone else interested? Is this the right place to discuss enforcement of data models? Peter F. Patel-Schneider (talk) 22:28, 5 December 2023 (UTC)Reply

What's autofix? Lectrician1 (talk) 04:49, 10 January 2024 (UTC)Reply
The presentation at https://commons.wikimedia.org/wiki/File:DMD2023_-_A_better_way_to_enforce_a_data_model._Suggestions_to_improve_Autofix.pdf has information on autofix. Basically, KrBot looks at information on talk pages and uses that to make changes to Wikidata. Peter F. Patel-Schneider (talk) 14:54, 10 January 2024 (UTC)Reply
Return to the project page "WikiProject Data Quality".