Wikidata talk:Bots

Latest comment: 1 year ago by Push-f in topic Formally describing bot tasks
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024.


When do you need to create a bot account? edit

This page currently doesn't mention any criteria for when a mass-editing operation would need a bot account. This is relevant because Quickstatements enables regular users to perform mass-editing operations. The only information I could find was on the Quickstatements help page which currently says "Very large runs or potentially-controversial runs should go through the approval process described in Wikidata:Bots.", but that is not a well-defined criterion. Silver hr (talk) 20:52, 16 July 2022 (UTC)Reply

Unattributed proxy edits edit

Picking up on this thread, I propose that we add:

Bots that proxy edits
In order for a bot to make edits on behalf of another user, for which the botop is not responsible, then:
  • The user must be logged in via OAuth
  • The user must be identified and linked in the edit summary
  • The bot cannot be used to bypass a block
  • The edits must not use the botflag for those edits

Bovlb (talk) 18:03, 30 August 2022 (UTC)Reply

I think we should rather phase out the use of proxy bot accounts completely. As much as I am aware, OAuth allows tools to make edits from the Wikimedia account of the tool user anyways.
Btw. we do have a related situation at User talk:Reinheitsgebot#Who is triggering edits of this account?. —MisterSynergy (talk) 18:32, 30 August 2022 (UTC)Reply
Eliminating proxy edits entirely would also meet my needs. Bovlb (talk) 20:47, 30 August 2022 (UTC)Reply
Using proxy bots used to have three advantages:
  • Before the introduction of OAuth, it was the only possibility. This is no longer true. (Extra grant needs to be requested through OAuth for the tool to be able to edit, but users should be comfortable with granting it if they want to use an editing tool.)
  • Their edits can be marked as bot edits. This is what you want to prohibit.
  • They can use higher API limits than ordinary users. This is what would remain, although I’m not sure if bots can actually take advantage of this, since they need to respect replication lag. It’s also a question if it’s an advantage or disadvantage that any logged-in user may quickly edit many pages.
  • Bots have two additional rights that autoconfirmed users don’t have—suppressredirect (Not create redirects from source pages when moving pages) and nominornewtalk (Not have minor edits to discussion pages trigger the new messages prompt)—, but these apply only for wikitext pages, not for entities, so they’re mostly uninteresting for Wikidata bots, especially proxy bots.
Considering that the only advantage that would remain is the higher API limits, and even that is of questionable value, I’m also for entirely banning proxy bots. However, I think such an important policy change should be discussed at a more visible place, e.g. on Wikidata:Project chat, so that all interested people can take part. —Tacsipacsi (talk) 08:08, 31 August 2022 (UTC)Reply
  • On "API limits": bot accounts have the right "apihighlimits" which allows them to read data from the API more efficiently in some scenarios. However, they do not have "noratelimit" any longer: the maximum edit rate for both bots and regular users is 90/minute. Bot accounts cannot edit quicker than regular ones. —MisterSynergy (talk) 08:58, 31 August 2022 (UTC)Reply
    Oh right, so bots can only query a bit more quickly. Thanks to continuation, this is probably a negligible difference, and even if/when not, nothing stops the tool from querying through a bot account; we want to ban only proxied edits. Then there’s really no reason to use proxy bots. —Tacsipacsi (talk) 18:42, 1 September 2022 (UTC)Reply

Formally describing bot tasks edit

We have quite many bots. I recently created a bot to create a better overview over our various bots by scraping the bot User:* profiles for {{Bot}}. One bot can perform many different tasks. I would like to make the individual bot tasks discoverable by the involved properties, e.g. show me all bots that add official website (P856) as a main statement, or all bots that use point in time (P585) as a qualifier, or all bots that edit lexemes.

Currently bot tasks are only described in free text ... so this would require us to introduce a way to formally describe the tasks of a bot. I therefore suggest the introduction of a new tasks parameter for {{Bot}} which would accept a JSON array where each contained object has the following properties:

  • description: English description of the task in plaintext (no wiki markup). Mentions of properties are automatically linkified.
  • space: In which space the edit is performed, acceptable values are: entity types (Item, Property, Lexeme, Sense, Form) or Wikitext to denote that the edit changes regular wikitext pages

Additionally a task may specify one of the following:

  • tasks that add or remove claims can specify which properties they use with "properties": { "mainStatement": [...], "qualifier": [...], "reference": [...] }
  • "fingerprint": true specifies that the tasks edits labels, descriptions and/or aliases
  • "sitelinks": true specifies that the tasks adds or removes sitlinks
  • "sitelink_badges": true specifies that the tasks adds or removes sitelink badges

The JSON would reside directly in the wikitext, making it easy to scrape and for humans visiting the page the JSON would be rendered via Module:BotTasks, as shown in the following examples.

This is just my first idea of how to formally describe bot tasks ... feedback is very much welcome!

--Push-f (talk) 08:42, 8 December 2022 (UTC)Reply

Examples edit

Github-wiki-bot edit

SpaceDescriptionProperties involved in the edit
Main statementQualifierReference
ItemAdd software version identifier (P348) to items that have source code repository URL (P1324) set to a GitHub.com repositorysoftware version identifier (P348)publication date (P577)reference URL (P854), retrieved (P813), title (P1476), publication date (P577)
ItemAdd official website (P856) to items that have source code repository URL (P1324) set to a GitHub.com repositoryofficial website (P856)reference URL (P854), retrieved (P813)

Lingua Libre Bot edit

SpaceDescriptionProperties involved in the edit
Main statementQualifierReference
ItemAdd pronunciation audio (P443) claims for records made on lingualibre.orgpronunciation audio (P443)reference URL (P854)
FormAdd pronunciation audio (P443) claims for records made on lingualibre.orgpronunciation audio (P443)language of work or name (P407)reference URL (P854)

Mr.Ibrahembot edit

SpaceDescriptionProperties involved in the edit
Main statementQualifierReference
ItemAdds descriptions for various languagesThis task edits labels, descriptions and/or aliases.

Discussion edit

Somehow this feels too static in my opinion:

  • My own bots currently have more than 10 tasks; I am also co-maintaining Deltabot and PLbot meanwhile, with more than 50 different scripts
  • Some tasks involve non-content namespaces
  • Some tasks involve actions such as "patrol", or "protect", "delete", etc. (admin-bot); some interact with sitelinks and badges, or terms in the widest sense; some may use "undo" or "rollback"
  • Some tasks may decide what to do on-the-fly

It would be quite an ask to provide a definite list of things the bots edit during operation. —MisterSynergy (talk) 09:24, 8 December 2022 (UTC)Reply

I guess by non-content namespaces you mean regular wiki pages? I already accounted for those with "space": "Wikitext".
Right I think it's okay if we leave out admin actions such as "patrol", "protect" and "delete" for now. Most bots aren't admin bots anyway.
I just added three other options "fingerprint", "sitelink" and "sitelink_badges" ... note that I am not proposing to model these in detail (e.g. which bot edits which labels/descriptions/aliases in which languages or which sitelinks are edited)... I think it's good enough to be able to differ a bot that only edits properties from a bot that only edits something in the fingerprint or something about sitelinks.
I don't know what you mean by "terms in the widest sense".
So yes I don't think this scheme has to cover everything, I think it's already valuable if it can describe most tasks of the average bot.
--Push-f (talk) 16:28, 8 December 2022 (UTC)Reply
Return to the project page "Bots".