Wikidata:Requests for permissions/Bot/MsynBot 2

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.

Approved--Ymblanter (talk) 20:05, 23 July 2018 (UTC)[reply]

MsynBot 2

MsynBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: MisterSynergy (talk • contribs • logs)

Task/s: batch editing (statements only) for constraint violation repairs

Code:

Function details: For quite a while now, I use batch editing to systematically repair various excessive constraint violation lists. These batches are typically one-time jobs which I pick up by opportunity, and include tasks such as:

move claim to different value (in main value, qualifier, or reference)
move claim to different property (in main value, qualifier, or reference)
repair claims by replacement with better ones
repair claims with quantity data type issues (bounds, units, thousands separators, …)
and so on, there are basically no limits

In all cases, constraint violations are in some sense involved, and my efforts are focused on the removal of them. The edits are on a pretty technical level, i.e. I try not to involve myself in editorial decisions regarding the content itself. Occasionally I complement edits/changes which are only indirectly related to the constraint violation.

Until now I execute these tasks with different tools like QuickStatements, Petscan, or PAWS, depending on their editing capabilities. The number of affected items varies a lot from job to job, from a couple of hundreds to more than 100k. Typically I start with a thorough investigation of the problem using the Wikidata Query Service and occasionally consultation of an involved user, and continue with a selection of affected items, before I elaborate a solution which also considers current use of the data to be modified by e.g. Wikipedia modules. Then, I implement the solution by preparing specific tool input (e.g. QuickStatements or python code for PAWS), and execute it as a batch. In case of bad constraints on the properties involved, I also check whether a solution involving the constraints definition would be more appropriate as a fix. In the beginning of the execution phase, I carefully watch my account for reverts or talk page messages, and after a while I just look at the execution occasionally, e.g. via my mobile device while I am away from my desktop machine, as the edit rate limitations otherwise do not allow to finish such tasks at all in reasonable periods of time. For the largest batch, I have already used a flooder flag on my regular account in the past.

As I consider this as “batch editing” and not “bot editing per WD:Bots”, I used my regular account for these tasks, and I have collected around a million edits that way since the beginning of this year 2018 without running into serious problems regarding my edits. The biggest task this year was clearly the modification of a lot of quantity data type claims involving the use of bounds, but there were many others—as I said, I pick up those jobs by opportunity, and most of them are much smaller (<1000 items). My efforts in the past years were not as excessive, as I wasn’t as much used to batch editing back then.

Very recently, however, there was a complaint (and a block of my regular account to immediately stop batch editing) in connection with the use of the PAWS tool in one of my batches (see Topic:Ugw2nh22n00tfms5 and Wikidata:Administrators' noticeboard#User:MisterSynergy). Although separation of edits into an alternate bot account would complicate my editing a bit, I hereby ask for a bot flag for User:MsynBot for this kind of batch editing in order to get things done, without being threatend by further population of my block log. Unfortunately I cannot really provide code, tool input, or a more specific task description, as all of that is custom-made each time from scratch based on my experience with such tasks. If the bureaucrats want to see test edits, I could continue to execute some edits from the most recent batch which led to the trouble, since the edits I made were not critized at all. —MisterSynergy (talk) 12:53, 16 July 2018 (UTC)[reply]

Support trustful user. --Pasleim (talk) 13:17, 16 July 2018 (UTC)[reply]

Support Matěj Suchánek (talk) 13:18, 16 July 2018 (UTC)[reply]

Support Although I wasn't too fond of the block put in place by Maarten, I also have his procedural concern of keeping pwb activity on bot accounts. Mahir256 (talk) 13:57, 16 July 2018 (UTC)[reply]

Support Sure, but a waste of time... --Succu (talk) 20:32, 16 July 2018 (UTC)[reply]

Oppose This seems far too generic to me - it's "bot flag to do ... stuff" rather than "to do <X>". I'd suggest continuing to use the main account for the smaller batch jobs, and asking for individual approval for bigger/continuous jobs (particularly those >100k edits). Thanks. Mike Peel (talk) 07:08, 18 July 2018 (UTC)[reply]

I understand your concern, but this is not a workable solution. There are basically three options:

We arrange that this type of batch editing is fine for regular accounts regardless of the tool used including PAWS, and I will continue to use my main account for all of these jobs. I would use the flooder flag for bigger jobs after an initial phase, and I can be sure that I am not threatened by further blocks of my account due to pure procedural concerns. (This is still my favorite option)
A bot flag is granted for my account (and others who want to do the same), then I would use the bot account for these tasks.
If neither happens, I will immediately cease my efforts to engage in this field and there will not be another million of repairs.

Since there is a lot of space for interpretation regarding automated editing, I need a clear solution without being threatened by blocks, and without having to make decisions about the size of a job (there are no continuous jobs anyway, these are always one-time jobs). Otherwise there will always be some admin who does not like my decision and add a block again without a reasonable warning like this time. I am also explicitly not willing to go through this kind of bureaucracy each time I want to fix something on a very technical level. Please mind that I do not plan to operate controversially, and that this process takes a couple of days each time, which is in most cases longer than the time for the fix even if 100k+ items are involved.

Anyway, I have made the task description above a little more specific in order to avoid confusion. Under this task I plan to edit statements only (no terms, no sitelinks), since this is the only field which is checked by the constraints system. I also mentioned “constraint violations”, although this has already been clearly stated in the lengthy description. —MisterSynergy (talk) 07:41, 18 July 2018 (UTC)[reply]

I'm curious to hear what others say here - maybe this kind of approach is fine here, and I've been overly granular with my requests for pi bot (talk • contribs • logs) tasks. My general understanding was that small semi-automated runs under your main account is fine, as you're taking responsibility for them if things go wrong and they're relatively easy to revert if they are small numbers. However, for larger/continuous tasks then it makes sense to check with the community first that these tasks are definitely OK *before* starting to run them - so those couple of days of process are useful even if that does take longer than the job does to run. Thanks. Mike Peel (talk) 07:58, 18 July 2018 (UTC)[reply]

Thanks, I’m curious as well :-)
I have recently already mentioned elsewhere that we do not clearly know what the purpose of the bot flag is (flag edits without human decision-making, or task approval, or ability to hide edits in watchlists, or whatever…). It is also not well-defined how bot-editing is conceptually different from automatic batch-editing which does not need approval and comes with the very same risks. I understand that I operate somewhere in the area between those two, but until now I have reasons to position myself on the automatic batch-editing side. Regarding your bot, I think there is the difference that your tasks are apparently not one-time jobs, but that’s just a guess. —MisterSynergy (talk) 08:20, 18 July 2018 (UTC)[reply]

Ping crats @Lymantria, Vogone, Ymblanter: I don’t think there will be much more community input here, unfortunately. Which kind of consensus do you expect to see, particularly with respect to Mike Peel’s concerns above? Do you want to see test edits, or does it suffice to link to the edits of the most recent job which I run under my regular account here (14:17 UTC and earlier; simple example: diff; more complex one due to the presence of another reference: diff). In that job, the aim is to move reference claims from imported from Wikimedia project (P143) to based on heuristic (P887) if and only if the reference is on a sex or gender (P21) statement, and the reference value is patronymic (Q110874) and there are no other reference qualifiers within the same reference. The input/python code I use for that particular job is here for assessment. —MisterSynergy (talk) 10:00, 21 July 2018 (UTC)[reply]

I am inclined to grant the bot flag; actually, we had a similar situation with the bot run by Multichill. I am travelling and will not have any internet access until tomorrow night and possibly even Monday morning; when I am back, if there are no new objections, I will grant the flag.--Ymblanter (talk) 21:34, 21 July 2018 (UTC)[reply]

I guess your are referring to Wikidata:Requests for permissions/Bot/BotMultichill from 2013, the only bot request I can find for that bot (it as ~10M edits). There have been some discussions regarding “blanket approvals” in that year as well, see Wikidata talk:Bots/Archive/2013. Nowadays, however, there is nothing about this question in the bot policy and I am not sure why it didn’t make it into the policy. It is pretty common, however, that bots apparently operate outside of their originally approved tasks, and apparently noone worries about that. —MisterSynergy (talk) 20:03, 23 July 2018 (UTC)[reply]

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.