Wikidata talk:Events/Data Quality Days 2021

Discussions related to the Data Quality Days are welcome here. If you have any question or need support (for example, to schedule and organize a session), feel free to contact Lea Lacroix (WMDE) directly (lea.lacroix@wikimedia.de or @Auregann on Telegram).

Presentations about data quality edit

Hello @Criscod, Jelabra:,

Would you be interested in facilitating a discussion about data quality, data modelling or data quality tools during the Data Quality Days? This would be quite informal, and would help giving participants a common understanding on data quality, and start some interesting discussions.

If you're interested, or if you have ideas for specific angles and topics, let me know :) Thanks in advance! Lea Lacroix (WMDE) (talk) 06:44, 25 August 2021 (UTC)Reply

I'd be happy to help (between the 8th and the 10th of September). Cristina Sarasua (talk) 08:27, 25 August 2021 (UTC)Reply
@Lea Lacroix (WMDE), Criscod, Manuel Merz (WMDE): It would be good if the discussion and/or presentation would including expanding corresponding items on Wikidata itself. (see #Defining_data_quality about definitions, but similarly for tools and processes). Thus the benefit of the discussion would go beyond the participants and reach all Wikidata. --- Jura 11:21, 30 August 2021 (UTC)Reply

Session about patrolling edit

Hi @MisterSynergy, Pyfisch:

I was wondering if you would be interested to talk about your patrolling and anti-vandalism workflows during the Data Quality Days. This could be a workshop or an informal discussion, showing the tools that you use, your routine, and chatting with other people who care about data quality. This could take place at the time of your choice from September 8th to 15th, in a Jitsi call or any other tool you prefer.

If you're interested, let me know, so we can talk further about content and scheduling :) Thanks in advance, Lea Lacroix (WMDE) (talk) 06:47, 25 August 2021 (UTC)Reply

@Lea Lacroix (WMDE): I haven't been active on Wikidata lately. There are a few quite good tools like [1] and [2] to make patrolling faster but they aren't used enough to catch all vandalism or bad edits by IP and new users. I find it somewhat difficult to patrol vandalism because of the many involved languages and the lack of sources or reasons for changes. For example many vandals like to change the gender of persons, however there are legitimate cases where the old value was incorrect or the person changed their gender. Another reason why I am not patrolling as much as before is that a large part of the Wikidata community prefers unverified or even unverifiable and wrong data over no data at all making the task of patrolling and and improving data quality even more difficult as it moves the burden of proof to the person removing statements. (The counter-argument usually being that data consumers can check the sources and discard data they don't trust.) Therefore I have serious doubts whether data quality in Wikidata can be achieved and if patrolling changes by IP users and new users actually improves Wikidata data quality significantly. --Pyfisch (talk) 10:52, 28 August 2021 (UTC)Reply
Alright, a response by me as well. I am considering to talk about this in some form, but have not yet made a final decision whether I really should. I am going to contact you (Léa) via email within the next days. —MisterSynergy (talk) 11:37, 28 August 2021 (UTC)Reply
@MisterSynergy: Thanks a lot, looking forward to it! FYI, it a full session about patrolling workflows is too much, I also scheduled a session to present tools in ~10min, if you prefer, you could also join this one. Lea Lacroix (WMDE) (talk) 11:26, 30 August 2021 (UTC)Reply

Also pinging @Ymblanter: would you be interested in joining the Data Quality Days and possibly talk about your workflow or favorite tools related to data quality and patrolling? Thanks in advance! Lea Lacroix (WMDE) (talk) 11:26, 30 August 2021 (UTC)Reply

No, I am sorry, I currently do not have time to prepare anything. I might join the audience depending on the exact schedule.--Ymblanter (talk) 18:41, 2 September 2021 (UTC)Reply

Deduplication edit

.. could be a worthwhile topic (among WMF sites, specific to a dataset prior or after upload, review of potential duplicates within Wikidata). --- Jura 12:01, 28 August 2021 (UTC)Reply

Wikidata's model of ranks edit

(Another topic suggestion)

It may seem a trivial topic to frequent contributors to Wikidata, but it's rarely understood by Wikipedia editors who only contribute occasionally. --- Jura 12:13, 28 August 2021 (UTC)Reply

Problems with dates edit

Possible topic: Help:Dates and beyond .. (how it's at Wikidata, how on Query Server, how it's handled elsewhere). --- Jura 12:17, 28 August 2021 (UTC)Reply

Defining data quality edit

(Topic suggestion)

https://w.wiki/3yKU currently just finds 4 items. There probably should be at least 50. --- Jura 12:29, 28 August 2021 (UTC)Reply


Implications of Wikidata's model of continuous and incremental growth for data quality edit

(Topic suggestion)

Less technically: Wikidata is a wiki .. --- Jura 12:45, 28 August 2021 (UTC)Reply

Quality control on instances of Q5 edit

Another topic suggestion: how quality controls on items about people are done. --- Jura 10:08, 29 August 2021 (UTC)Reply

Hi @Jura1: and thanks for your suggestions. This one especially caught our attention, it would be indeed very interesting to have an editor present their workflow around quality control for people.
I was wondering if you would be willing to give such a presentation or lead a discussion around the topic during the Data Quality Days? Lea Lacroix (WMDE) (talk) 11:23, 30 August 2021 (UTC)Reply
We could try to compile a summary in a collaborative way on a given day. Maybe with pointers on a flow talk page? --- Jura 12:20, 30 August 2021 (UTC)Reply
  • From the suggestions, I think it's by far the most extensive. We have multiple checks on various aspects of items about people. Maybe we could try a summary of these in an editing session. --- Jura 08:56, 6 September 2021 (UTC)Reply

Achieving completeness edit

(Topic suggestion)

How to achieve "completeness" for a series of items. What it means, how it's limited, how it's done, etc .. --- Jura 10:08, 29 August 2021 (UTC)Reply


Data quality improvements for Wikipedia from Wikidata edit

(Topic suggestion)

How Wikidata leads to data quality improvements at Wikipedia .. features, tools, .. --- Jura 12:21, 30 August 2021 (UTC)Reply


Quality controls after upload edit

(Topic suggestion)

  • Things to check once you uploaded data.
  • Some things are easier to check once data is uploaded.
  • Revisiting your upload a month or year afterwards.

--- Jura 12:24, 30 August 2021 (UTC)Reply

Data quality assurance/surveillance edit

(Topic suggestion)

What can be done to prevent data quality worsening ? Notification ? Noticeboard ?

Kpjas (talk) 10:37, 31 August 2021 (UTC)Reply

Thanks @Kpjas: for the suggestion. Would you be willing to facilitate a discussion on this topic? This would not mean that you have to have answers to all questions of course, but that you would make sure that the discussion is taking place, and helping others to ask and answer questions. What do you think? Lea Lacroix (WMDE) (talk) 08:27, 1 September 2021 (UTC)Reply
@Lea Lacroix (WMDE): I don't think I have comprehensive knowledge of inner mechanisms of Wikidata, tools and technical possibilities of the Wikibase system. My initial idea was that it would be helpful to have technical tools to notify subscribed users as well as the community at large about possible disruptive edits (either by accident or as a vandalism. etc.). As another measure to prevent worsening of data quality could be semi-protection of items or particular statements that are properly referenced and unlikely to need further updates. Constraints are fine for signalling possible data errors or problematic data but probably it would be better to have some handy tools built upon this mechanism. Kpjas (talk) 11:50, 1 September 2021 (UTC)Reply

Boilerplate text edit

I am interested in talking about boilerplate (Q1651672) and how they can help improving the data quality. If the project of Abstract Wikipedia will start, then the people involved in the project will think about the structures in Wikidata and try to understand them so that it will be possible to generate text. I know in some parts not how I can enter specific information. In such cases a form can help only in some parts and I dont know a tool for Wikidata, which offers the possibility to enter through a form to an existing item statements that also include qualifiers and sources.

At the moment I try to create a script using R. With this script it will be possible to enter the information that should be entered to Wikidata in a sentence with a predefined structure and then out of the sentence the relevant parts will be extracted. From my point of view Boilerplate text can help understanding the data structures better. If structures of the data and how to add it is known and easy enough to understand and use then data quality will increase propably. The another point is how to help people who see a text in Abstract Wikipedia and in it a missing information to add the information to Wikidata in a easy way. For both examples from my point of view a predefined structured sentence with gaps to add the information can help. A conversation about that topic is something I am interested in but I can not moderate a conversation. If such an conversation will happen I try to participate as far as I have time then. At next weeks Thursday I dont have time at the other Data quality days 2021 I have time to participate.--Hogü-456 (talk) 20:22, 31 August 2021 (UTC)Reply

All day events edit

@Jura1: I've seen you adding these, but I'm not clear how they will work in practice. Will you be watching them and helping editors for all of the 24 hours (or more if you're not using UTC)? Will you be available to talk about issues with working on them somewhere? What makes these 24 hours special? Would a more focused meeting time to talk through the topics/help pages work better? Thanks. Mike Peel (talk) 19:04, 6 September 2021 (UTC)Reply

  • If you want to do meetings on these, sounds good. Otherwise, it would help if people just have a look at them and add what they think is missing. Or add questions on their talk pages.
I don't think they need much if any moderation, people are used to work in the wiki format, at least most of us. --- Jura 19:18, 6 September 2021 (UTC)Reply
I think it would help a lot if you would add suggestions of tasks/todos during these editing days on the related talk pages. It would give people inspiration to start editing. Lea Lacroix (WMDE) (talk) 08:13, 8 September 2021 (UTC)Reply
Yes, I will add notices to the pages and their talk pages. If you have a clever wording in mind, don't hesitate .. --- Jura 09:25, 8 September 2021 (UTC)Reply
done that. Maybe we can pick topics for the 13th and 14th as well. --- Jura 09:58, 8 September 2021 (UTC)Reply

End of the Data Quality Days 2021 edit

Hello all,

Thanks everyone for participating in the Data Quality Days! Thanks to people who proposed and facilitated sessions, prepared documentation, followed up with the discussions.

You can find an overview of the sessions, with a link to the notes, slides and recording if any. The notes have been archived from the Etherpad to the wiki.

The outcomes page is here to keep track of the interesting things that happened during the Data Quality Days: if you did any work on documentation, queries, tools, proposed a new discussion, started a new project... please take a few minutes to indicate it on this page. This will help us a lot with measuring the outcomes of such an event.

Finally, if you have any feedback about the event itself, what you liked, what you missed, what should be improved for next iterations, and of course, ideas and wishes for future "XXX Days", feel free to leave a comment here, or to reach out to me directly! Lea Lacroix (WMDE) (talk) 06:34, 16 September 2021 (UTC)Reply

Next steps - Clarifying property application for effective SPARQL queries edit

For follow on to the "Clarifying property application for effective SPARQL queries" discussion on modeling positions held and occupations (the "museum director" problem) - perhaps a place to continue this discussion could be a task force within Wikidata:WikiProject Occupations and professions? This could be initiated in combination with the suggestion of follow-on sessions at Wikiconference North America and/or WikiDataCon. I am very interested in developing best practices to resolve this problem. - PKM (talk) 22:19, 16 September 2021 (UTC)Reply

  Notified participants of WikiProject Occupations and professions

@ShiehJ, Uncommon fritillary, WatsonAmy:

- PKM (talk) 22:24, 16 September 2021 (UTC)Reply

Return to the project page "Events/Data Quality Days 2021".