Wikidata talk:Bots

Active discussions
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2020.

Maximum edits per minuteEdit

What is the value of maximum edits per minute that a bot with approved task should set? I think the default Pywikibot (put_throttle = 10) of 10 seconds between consecutive edits is too large for massive edits. --Albert Villanova del Moral (talk) 06:41, 19 January 2018 (UTC)

The default Pywikibot throttle to wait 10 s between requests is definitely not used by many tools. While a Pywikibot script would only edit a maximum of 6 pages per minute QuickStatements usually makes about 30 edits per minute while some bots achieve more than a hundred edits per minute as can be seen in Edit Groups. I think respecting maxlag is more effective in keeping Wikidata fast for human contributors. --Pyfisch (talk) 16:21, 10 December 2018 (UTC)

Revoke bot User:MatSuBotEdit

Please stop User:MatSuBot from doing edits as it just stupidly adds Wikipedia article titles as "labels" in different languages, which are not labels. That's why there are for example thousands of entries in the format "English Title (film)" or "Title (2018)" and so on in the film data (example [1]). The operator of the bot is not willing to change anything about this, he just writes (User_talk:MatSuBot) "there's nothing easier to remove them by hand", which is ridiculous because this would mean to check every single of the many thousands contributions of the bot manually.

The biggest problem of Wikidata is that it is becoming a giant messy dump. There are already so many wrong entries and nobody is able or willing to clean up this mess by hand. If you don't act soon, the only solution will be to delete Wikidata completely and start again from scratch. --146.60.144.241 11:51, 1 January 2019 (UTC)

  • It's tricky to find the right balance between
  1. not having any label (despite a sitelink)
  2. labels that should be there as the article title
  3. labels that need some change
  4. labels that could be completely different
Various bots take slightly different approaches. Before Matsubot operated, we had too many of #1. I found Matej fairly receptive to suggestions on balancing #2 and #3. I'm not sure if there is a solution for #4 other than editing them afterwards. --- Jura 13:13, 1 January 2019 (UTC)
Some comments:
  • it just stupidly adds - misleading, see [2] (or [3] if you don't trust me)
    There may be many cases when removing the parenthesis is undesired. For example, category/template titles, songs, chemical substances, actual titles, some intersting cases etc. After the import, the bot tries to match the parenthesis to the description in the same language. This usually covers cases like "2018 film", "playwright", "Arizona" but not all of them, and first of all, the description actually needs to be there. So I hope these wrong additions will motivate users to insert the description and also clean up the label. I think it's worth.
  • That's why there are for example thousands of entries... - I saw users who insert such labels by hand.
  • Jura very well revealed the motivation (ie. what was/wasn't before my bot).
  • I admit: I haven't got approval for this task, knowing there had been many bots doing the same thing before. My bot just does it regularly by scanning all previous week's (since 14 days go) sitelink additions. (Guess why there's such a delay...)
  • Previous "incidents": Topic:Uq6c5fvmxcl073xy, Topic:Uh7junmm0ep4bppw.
Matěj Suchánek (talk) 15:55, 1 January 2019 (UTC)
(Edit conflict) Not even suffixes like "... (film)" or "... (2018)" are removed. My experience with Wikidata is that it becomes increasingly unreliable. And a lot of the easy to find inaccuracies comes from bots. I already corrected a lot of labels added by this (or maybe also other bots) but this is very annoying. (Slow) humans can't correct the mistakes of (fast) robots, it has to be the other way round! --146.60.144.241 16:10, 1 January 2019 (UTC)
  • I had a look at English labels for films and fixed some (<100), not really more than last time (before MatSuBot). There are others that have "()" that should remain.[4] --- Jura 06:55, 2 January 2019 (UTC)
That's nice, but what about all the other languages? --146.60.145.101 23:44, 27 January 2019 (UTC)
You'd need to create a user account if you want to use QuickStatements to fix them. Also, some items might not have P31 yet, so these would need to be added first. --- Jura 02:50, 28 January 2019 (UTC)

Add Kontinent to all wididata-pages with state UkraineEdit

Please can someone with a bot add to all Wikidata-pages with the item „State“ = Ukraine, the item „Contient“ = Europe--Francis McLloyd (talk) 13:45, 16 February 2019 (UTC)

Let's not do that. All these item will have the country set to Ukraine (Q212) and that item has the continent on it. Multichill (talk) 15:40, 16 February 2019 (UTC)

When is a bot account needed.Edit

On the Wikidata:Bot requests page it says a bot account is not needed for mass edits with

  • PetScan for creating items from Wikimedia pages and/or adding same statements to items
  • QuickStatements for creating items and/or adding different statements to items
  • Harvest Templates for importing statements from Wikimedia projects
  • Descriptioner for adding descriptions to many items
  • OpenRefine to import any type of data from tabular sources

Do I need a bot account for edits made with wikidata cli if it is once off batch edits? Is there some threshold for the frequency of edits that require a bot account? And can a list of tools that do not need bot accounts be added to this page?

Iwan.Aucamp (talk) 05:48, 23 September 2019 (UTC)

@Iwan.Aucamp: non-bot accounts cannot edit at a rate of more than about 80 edits per minute (I'm not quite sure what the current limit is - see this discussion from last year though). If you try to edit too quickly you will be at least temporarily blocked. Bot accounts also need to respect service limits; the maxlag parameter in particular, as described on this page. In any case, to do more than 10,000 edits in a practical amount of time you need to use a bot account. ArthurPSmith (talk) 17:15, 23 September 2019 (UTC)
The current edit rate limits are 8/min for IPs and newbie accounts (younger than 4 days, AFAIK), and 90/min for normal registered users; all per [5], but not easy to read. Membership in the "bots" group comes with "noratelimit" rights, so that there are no edit rate limits in place in that case. Of course, bot accounts still have to obey to the mw:API:Etiquette, and in particular implement the maxlag parameter, to make sure that they do not use too much server capacities in times when the infrastructure runs at its limits. Many bot frameworks and tools have it implemented, but I have no idea about "wikidata-cli"; you need to ask the maintainers, or look it up in the docs.
Apart from that, there is indeed some uncertainty about the usage of a botflag here at Wikidata. The bot policy does not motivate its existence; in other words, it is not fully clear what should be achieved by using the bot flag on certain accounts, but not on others which show a similar edit pattern. If you are unsure, I suggest to file a request for a botflag at Wikidata:Requests for permissions/Bot with a description of the task, and see what the community thinks about it. —MisterSynergy (talk) 19:01, 23 September 2019 (UTC)

Request approval of Reinheitsgebot be revokedEdit

User:Reinheitsgebot has been warned on several occasions of introducing the wrong calendar, usually Gregorian, when the correct calendar is something else, usually Julian. Here is one such warning. But the behavior continues. In view of the long-term unwillingness to do anything about this problem, I request the bot permission be revoked indefinitely. Jc3s5h (talk) 16:19, 19 December 2019 (UTC)

@Jc3s5h: the bot didn't introduce any calendar in your example, I look at another example you reverted and here the bot is also only sourcing it. So what's the problem? Multichill (talk) 16:49, 19 December 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── One of the edits in question is this one, for Caroline Matilda of Great Britain. She was born in London, which in 1751, observed the Julian calendar. The edit asserts the birth date was 11 July 1751 Gregorian and give the source as The Peerage. More exactly, the Wikidata item already falsely claimed 11 July 1751 Gregorian, and said it was imported from the Spanish Wikipedia, which like all Wikipedias, is an unreliable source.

The Peerage is run by an amateur, but it does appear a lot of effort went into it; discussing the reliability of that site can wait for another time.

The Peerage's item does indeed state "b. 11 July 1751" but does not, in immediate proximity to the date, say what calendar was used. The site's FAQs do not say anything about calendars.

Digging deeper, we can look at a really famous person, Isaac Newton, on The Peerage; the death date given is 20 March 1727. As is documented by many sources, such as Encyclopedia Britannica, this is a Julian calendar date, but uses January 1 as the beginning of the new year, rather than 25 March as was the practice in England at that time. This shows that the date from the peerage for Caroline Matilda is most likely a Julian calendar date, not a Gregorian calendar date.

The persistant fault in the bot is turning it loose on sources which provide Julian Calendar dates but falsely interpreting them as Gregorian calendar dates. Jc3s5h (talk) 17:29, 19 December 2019 (UTC)

So you know a date can be improved and instead of doing that, you revert the bot and come complaining here? I understand you want to improve quality, but this is not the way to do it. Multichill (talk) 18:50, 19 December 2019 (UTC)
My most recent reversion of this bot did improve the item. The bot added the death date 9 April 1626 Gregorian, supported by two sources, neither of which state which calendar the date is in. The bot made the incorrect inference the dates from the sources were Gregorian. There was no need to add either of the sources added by the bot because the date, correctly lablled as Julian, was already supported by sufficient good sources. Jc3s5h (talk) 19:15, 19 December 2019 (UTC)
We have sourcing circumstances (P1480)=unspecified calendar (Q18195782) for a vague source.--GZWDer (talk) 20:22, 19 December 2019 (UTC)
sourcing circumstances (P1480)=unspecified calendar (Q18195782) is fine if no high quality source with an unequivocal date can be found. Part of this process involves a human editor searching through the source for any statement of what calendar is used, in places like a preface, FAQ, instructions to authors of a journal or encyclopedia, etc. Having found nothing there, the human would, if applicable, check the dates for several well-known people who's dates are not in question, such as Isaac Newton or George II of Great Britain. Only after such efforts do not resolve doubts should sourcing circumstances (P1480)=unspecified calendar (Q18195782) be resorted to. It is inexcusable for a bot to add ambiguous dates to an item that already has unambiguous dates with citations to reliable sources. Jc3s5h (talk) 18:38, 20 December 2019 (UTC)

The main problem with Reinheitsgebot is that bug reports are never answered. Sometimes the problems are fixed, sometimes not. That’s not how bots should be operated. --Emu (talk) 10:14, 20 December 2019 (UTC)

The false edits [ https://www.wikidata.org/w/index.php?title=Q37388&diff=next&oldid=1079779129 continue]. Jc3s5h (talk) 18:25, 20 December 2019 (UTC)

The edits are not "false". The bot adds that "source X states this is the birth/death date". So what the bot adds is the truth. If that is the correct date in the correct calendar, it does not try to solve, and it should not.
Note that, if no birth/death date exists, the bot might add a (referenced) date, without specifying the calendar. I am happy to amend some qualifier that specifies "unknown calendar" if that helps; please point me to the "official" way of doing so. --Magnus Manske (talk) 09:25, 6 January 2020 (UTC)
Ah that's already in the above discussion. I'll look into it. --Magnus Manske (talk) 09:25, 6 January 2020 (UTC)
The fact that a source does not specify the calendar in immediate proximity to the date in a machine-readable manner does not mean it is an unspecified calendar. If the sources calendar policy can be discovered through a human reading a source, the calendar is not unknown. Through proper investigation, we can say the policy of "The Peerage" is to always start years on 1 January, but otherwise use whichever calendar, Julian or Gregorian, was in force for a birth or death that occurred in Britain. For births or deaths in other countries, further investigation would be needed.
When you screen-scrape a source that is intended to be read by a human, you, Magnus Manske, are responsible for the blunders your bot makes. Jc3s5h (talk) 14:52, 6 January 2020 (UTC)
In addition, "The Peerage" is a source created by one individual as a hobby. It seems to be well-done, but one could argue it isn't a reliable source and nothing at all should be added from that source. In any case, if a source that has been written by a group of professionals an published by a reputable publisher is already cited, no mention of information from "The Peerage" should be added. Jc3s5h (talk) 15:03, 6 January 2020 (UTC)

Unattributed proxy editsEdit

Sorry if this has been discussed before but, for bots that proxy edits on behalf of some user (presumably logged in via OAuth), is there any policy about identifying the user in the edit summary? I see some RFPs that explicitly undertake this (e.g. QuickStatementsBot), whereas others do not (e.g. Reinheitsgebot). If this is considered to be good practice, should we add it to the general requirements for "statements adding bots"? For further discussion of the benefits of identifying the user, see this thread. Cheers, Bovlb (talk) 21:43, 18 March 2020 (UTC)

Also, this thread. Bovlb (talk) 21:47, 18 March 2020 (UTC)
@Bovlb: I don't think Reinheitsgebot edits on behalf of anybody but the operator? ArthurPSmith (talk) 17:24, 19 March 2020 (UTC)
@ArthurPSmith: I had the impression that Reinheitsgebot proxies edits from the mix'n'match tool that the botop does not take responsibility for, but it was not my intention here to start another discussion specifically on Reinheitsgebot, but rather to establish a general principle about proxying bots (that would guide us in any specific discussions). My proposal here is that either a botop is generally responsible for the edits of a bot, or the bot attributes those edits to an editor linked in the edit summary, and there is no middle ground. Cheers, Bovlb (talk) 17:32, 19 March 2020 (UTC) (typo fixed)
Your proposal sounds good to me! We probably need to gather other opinions somewhere though? ArthurPSmith (talk) 17:36, 19 March 2020 (UTC)
I thought this would be the best place to gather opinions about bot policy, but perhaps I should advertise it somewhere more central. Bovlb (talk) 19:34, 19 March 2020 (UTC)
This isn't directly about proxying per se, but en:Wikipedia:Bot_policy#Bots_operated_by_multiple_users says, in relevant part, Accounts used for approved bots that can make edits of a specific designated type, at the direction of more than one person, are not likely to be a problem, provided ... the Wikipedia user directing any given edit must always be identified, typically by being linked in the edit summary, and ...all bot operators must have the required skill and knowledge to operate the bot within community consensus.. Bovlb (talk) 19:41, 19 March 2020 (UTC)

OK. Here is a specific proposal to add to "Bot requirements":

Bots that proxy edits
In order for a bot to make edits on behalf of another user, for which the botop is not responsible, then:
  • The user must be logged in via OAuth
  • The user must be identified and linked in the edit summary
  • The bot cannot be used to bypass a block

If this seems reasonable, would it need to be posted on Wikidata:Requests for comment? Bovlb (talk) 18:26, 20 March 2020 (UTC)

@Bovlb: So if a user is logged in via OAuth, then an application can use the OAuth tokens to make the edits directly as that user - for example Quickstatements generally does this. The case where the edits look like they come from the bot rather than the user are when for whatever reason the bot has "lost" the OAuth tokens (for example due to a system reset of some sort) but still wants to finish the batch of edits that the user provided. That also means maybe we don't need to worry about blocks - if the edits could be made as the user to start with, then they were not blocked and presumably the bot can continue to edit. Anyway, I would modify your first requirement to "The user must have logged in via OAuth to initiate or authorize the batch of edits". And yes an RFC with an announcement on Project Chat would probably be best. ArthurPSmith (talk) 18:06, 23 March 2020 (UTC)
@ArthurPSmith: "if a user is logged in via OAuth, then an application can use the OAuth tokens to make the edits directly as that user" A good point. When you put it like that, it raises the question of why we need bots that make edits on behalf of a user that are not make directly as the user. Is this, perhaps, just a legacy thing, that we have existing bots that are still doing things "the old way"? If this is the case, then we don't want to encourage it, but it might still be helpful to have an explicit policy about it.
"an RFC with an announcement on Project Chat would probably be best" OK, but I'd like to debug the proposal first before rushing to a !vote. Bovlb (talk) 18:28, 23 March 2020 (UTC)
@Bovlb: As I understand it, in the case of Quickstatements what happens is that a user (authentiatced via OAuth) initiates a "batch" which may have tens of thousands of planned edits, which then goes into a queue. As long as the Quickstatements server is still running, it retains those OAuth credentials and can make those edits as that user. However, if the server has to restart for some reason, it no longer has those OAuth credentials, but it still knows those edits were authorized by that particular user. It may also be that a number of weeks pass before the queued batch can be done, so even without a restart, the credentials may no longer be valid. Either way, if the edits are to be continued, then (as a convenience so users don't have to restart stalled batches via logging in again) the "Quickstatementsbot" takes over and runs these edits on behalf of the user. Now, that's Quickstatements; other bot edits may be handled quite differently... ArthurPSmith (talk) 18:36, 23 March 2020 (UTC)
@ArthurPSmith: Thanks for the background. I note that QuickStatementsBot (talkcontribslogs) is currently blocked for "making edits without attributing them to the users responsible for them". This sort of thing is why I would like to see some explicit policy in this area. Cheers, Bovlb (talk) 18:41, 23 March 2020 (UTC)
Ah! Well, maybe this RFC isn't really needed if we're already following that policy? ArthurPSmith (talk) 18:48, 23 March 2020 (UTC)
@ArthurPSmith: Well, yes, although that block was a little controversial at the time and I just unblocked a bot that may be making unattributed proxy edits. I feel like it would be better to have a general principle to point to, rather than having to shoot from the hip on a case-by-case basis. Bovlb (talk) 20:33, 23 March 2020 (UTC)
In whatever case, I support the proposal of Bovlb as written. — regards, Revi 09:16, 25 March 2020 (UTC)
Suggestion: cannotmust not. --Matěj Suchánek (talk) 10:24, 25 March 2020 (UTC)
Return to the project page "Bots".