User:From Hill To Shore/Principles of automatic editing on Wikidata

Wikidata is a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world.

Wikidata editors can contribute to the project by manually inputting new data or correcting data errors already in the system. They can also use automated tools, scripts and bots to speed up their actions. Automated and manual edits are neither superior nor inferior to the other. Both have their strengths and weaknesses and can bring benefit to Wikidata. However, in the same way that manual editors must respect Wikidata's policies and guidelines, so too must any edits performed using an automated process.

In a number of discussions, it has become clear that Wikidata editors have differing opinions on the responsibilities of editors who use automated tools and who should correct problems when they occur. This essay is a first attempt to set out some principles and guidelines for dealing with automated edits. This essay is not a policy or guideline but is instead a user's view on how to handle automated editing and any resulting errors. Other users may refer to this essay in their own discussions but should note that it is not currently supported or opposed by any consensus.

Founding principle

edit

All datasets contain errors.

It is inevitable with all sets of data that a number of errors will be included. These may have been the result of a mistake in recording the original data, a misunderstanding of the source information, a mistake in an automated process or any number of other factors. The larger the set of data, the more likely it is to contain errors. Wikidata will always contain some level of error and the sources that Wikidata uses will also always contain some level of error.

As editors of Wikidata, we should try to resolve any errors that we encounter and do our best to minimise the introduction of any new errors into our database.

Advantages of automated editing

edit
  • Automated edits are very good at inputting or correcting large amounts of data in a short space of time.
  • Automated edits are very good at dealing with monotonous, repetitive or tedious tasks.
  • Automated edits can replicate an original dataset precisely, removing human input error in the transfer to Wikidata.

Disadvantages of automated editing

edit
  • Due to their speed, automated edits will introduce the errors from the source data more quickly than manual edits by a human editor
  • The normal correction of data by a human editor is not present in an entirely automated process.
  • A mistake in how an automated edit is applied can impact on thousands of data entries in a short space of time, compared to a manual error impacting on a small number of items.

Principles of automated editing

edit
  1. Contribution: Automated edits are, on the whole, a positive contribution to Wikidata.
  2. Validity: Automated edits are as equally valid as manual edits; both are prone to occasional error that must be resolved.
  3. Mistakes: An editor who makes a mistake with an automated edit has a duty to fix their mistake (however, they are welcome to ask other editors to help them). A mistake is where the source data is correct but it has been imported to Wikidata in the wrong way or in the wrong place.
  4. Errors in source data: An error in the source data that is copied into Wikidata is not itself a mistake. This is just a data quality issue that can be corrected by any user (see instructions below on how to handle data errors).
  5. Consider error reports: An editor using automated edits must pay attention to any reports of errors they have made and consider ways to reduce those errors in future.
  6. Repeated errors: An editor who reimports erroneous data with an automatic edit (after they have been advised of the problem) is at fault. They must take steps to rectify their repeated error.
  7. Consensus: An editor using automated edits must respect consensus in the same way as editors making manual edits. If consensus forms to say that a certain type of automated edit must either stop or be adjusted to reduce errors, then the editor must take the actions required by consensus. Further discussions can be held to see if a new consensus forms but the disputed edits must cease until a new consensus allows them to resume. An editor who continues to use automated edits against consensus will be handled in the same way as manual editors who edit against consensus. After attempts to resolve the dispute on an informal level have failed, the issue will be reported to Administrators who will decide on an appropriate action. Sanctions may include a temporary or permanent removal of the individual's use of automated edits or a temporary or permanent block of their main account (this is not an exhaustive list of sanctions and Administrators may also rule that no action should be taken).

Dealing with errors imported through automated edits

edit
  1. Assume good faith: It is important when dealing with automated edits that you assume good faith on the part of the editor that made the automated edit. They may have been dealing with thousands of pieces of information and are unaware of the problem you have identified. You can assume that they did not intend to introduce errors but, as noted in the founding principle above, it is inevitable that a proportion of the edits will contain errors.
  2. Contact the editor: If you have spotted the error, it is important that you contact the user who ran the automatic process to explain the problem. Unless they know about the error, they can't be expected to do anything about it.
  3. Correct a mistake: The user of the automated process has a duty to correct any mistakes where their bot, script or tool has imported information in the wrong way or inserted it in the wrong place. However, they can ask for assistance from other users. A data error in the original source that has been copied across correctly is not a mistake.
  4. Correct a data error: Consider the error that has been introduced and decide on the best way to resolve the situation. The person who made the automated edit may be able to help you or you could ask at Project Chat. In most cases, the best solution would be to use the deprecation process to record the entry as invalid and prevent accidental reinsertion of the same error later. Reverting the error should be avoided in most cases as another automated process could reintroduce the same error later.
  5. Handling disputes: In some cases you and the editor who made the automated edit may disagree that a mistake or error has occurred, or the best way to handle the situation. In this case it is a good idea to follow a dispute resolution process:
    1. Make sure to discuss the problem with the other user. Don't assume that an initial rejection is final; it may be that the other editor either doesn't understand the problem you are trying to explain or doesn't understand the effects of the problem. Make at least a couple of attempts to engage with the other user by yourself before bringing in other users.
    2. Seek advice from another user who may be able to suggest alternative ways to resolve the situation. If you are not sure who to ask, you can go straight to the next step.
    3. Seek advice from other users at Project Chat. Explain the problem and ask for suggestions to resolve the situation. Make sure to notify the other user that you have started a new discussion about the subject at Project Chat. It is possible that a discussion will involve multiple other users and a consensus on how to resolve the situation will form.
    4. If consensus on how to resolve the situation hasn't formed or either side disagrees with the consensus, you may escalate the issue to the Administrators' noticeboard. Make sure to notify the other user that you have started a new discussion about the subject at the noticeboard. Administrators will consider the situation and decide on an outcome. While introducing new users into the discussion may alter the balance of views and allow a new consensus to form, it should be noted that Administrators will often rule in support of the prevailing consensus. The more detailed the previous discussion and the broader the userbase that participated in it, the less likely it is for Administrators to rule against it, except in the case where the consensus has disagreed with an existing policy (a separate discussion may be held on whether the conflicting policy needs to be adjusted).
  6. Respect consensus: Once consensus has formed, both sides must respect the decision and stop editing against consensus (this may mean stopping the automated edit or stopping the reversion of the automated edit). Further discussions may lead to a new consensus in the future but until that happens, the current consensus must be respected.
  7. Dealing with breaches of consensus: If an editor continues to breach consensus, point this out to them on their talk page. If the breach was from an automated edit they may not be aware that they have gone against consensus. If the editor shows no signs of correcting the problem, you should then report the matter at the Administrators' noticeboard. Make sure to notify the other user that you have started a new discussion about the subject at the noticeboard. Administrators will decide whether to apply any sanctions and what form they should take.

See also

edit