Wikidata:Requests for permissions/Bot


Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: {{Wikidata:Requests for permissions/Bot/RobotName}}.

Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.


Bot Name Request created Last editor Last edited
QuebecLiteratureBot 2021-02-27, 01:05:19 Jura1 2021-03-08, 08:56:10
Lockalbot 1 2021-02-10, 18:52:05 Ymblanter 2021-02-13, 21:21:07
taxonbot 2021-02-08, 20:27:35 Jura1 2021-03-08, 09:04:05
Openaccess cma 2021-02-05, 15:24:35 Multichill 2021-03-08, 17:31:44
Marco 1 2021-02-02, 12:29:35 ZabeMath 2021-02-09, 17:59:12
ZabesBot 1 2021-01-26, 14:10:36 Lymantria 2021-02-07, 09:37:00
NicereddyBot 6 2021-01-23, 05:25:25 VIGNERON 2021-03-08, 11:22:35
BorkedBot 4 2021-01-22, 21:52:55 Lymantria 2021-03-08, 06:29:11
BubblySnowBot 2021-01-22, 07:35:45 Ymblanter 2021-02-13, 21:28:14
GZWDer (flood) 6 2021-01-12, 20:03:14 GZWDer 2021-02-09, 12:46:53
DutchElectionsBot 2020-12-28, 15:14:47 Dajasj 2021-03-09, 09:41:50
JarBot 5 2020-12-19, 05:40:32 Jura1 2021-03-08, 09:13:38
Cewbot 4 2020-11-27, 03:49:43 Kanashimi 2020-12-04, 06:09:53
Datatourismebot 2020-11-23, 23:14:16 Conjecto 2020-11-23, 23:14:16
Fab1canBot 2020-11-01, 14:00:02 Fab1can 2020-11-01, 14:00:02
BorkedBot 3 2020-10-29, 02:49:29 Lymantria 2021-02-06, 10:36:48
romedi 1 2020-10-24, 13:52:40 Lymantria 2020-11-01, 09:55:26
FischBot 8 2020-10-05, 23:40:58 Pyfisch 2020-10-16, 09:25:10
RegularBot 3 2020-09-09, 01:09:09 Ladsgroup 2020-10-10, 18:52:50
RegularBot 2 2020-08-08, 07:28:57 Mike Peel 2021-01-03, 19:28:53
RegularBot 2020-08-04, 13:25:12 Jura1 2020-08-15, 17:02:52
Orcbot 2020-07-30, 14:17:20 EvaSeidlmayer 2021-01-14, 20:30:18
OpenCitations Bot 2020-07-29, 13:23:50 Diegodlh 2021-02-15, 22:05:49
TwPoliticiansBot 2020-07-12, 14:31:33 TwPoliticiansBot 2020-07-12, 14:31:33
T cleanup bot 2020-06-21, 17:39:23 Jura1 2020-12-07, 21:23:34
OlafJanssenBot 2020-06-11, 21:45:35 Lymantria 2020-06-26, 08:07:22
Recipe Bot 2020-05-20, 14:21:59 Haansn08 2020-09-27, 09:40:31
LouisLimnavongBot 2020-05-14, 13:09:17 Hazard-SJ 2020-11-03, 06:51:31
BsivkoBot 3 2020-05-08, 13:25:37 Bsivko 2020-05-08, 13:28:25
BsivkoBot 2 2020-05-08, 12:50:25 Jura1 2020-05-19, 10:37:06
DeepsagedBot 1 2020-04-14, 06:16:52 Pamputt 2020-08-03, 18:35:01
Uzielbot 2 2020-04-07, 23:49:11 Jura1 2020-05-16, 13:23:42
WordnetImageBot 2020-03-18, 12:17:03 DannyS712 2020-07-07, 12:03:42
Lamchuhan-hcbot 2020-03-24, 08:06:07 GZWDer 2020-03-24, 08:06:07
GZWDer (flood) 3 2018-07-23, 23:08:28 1234qwer1234qwer4 2021-01-25, 14:02:11
MusiBot 2020-02-28, 01:01:19 Premeditated 2020-03-18, 09:43:03
AitalDisem 2020-01-14, 15:48:04 Hazard-SJ 2020-10-07, 06:10:56
BsivkoBot 2019-12-28, 19:38:23 Bsivko 2020-05-08, 12:35:10
Antoine2711bot 2019-07-02, 04:25:58 MisterSynergy 2020-10-29, 21:32:21

QuebecLiteratureBotEdit

QuebecLiteratureBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Mchlggnn (talkcontribslogs)


Task/s: Populate Wikidata with information about the works published by Canadian writers.

Code: Will be added later.

Function details: The bot will create new items for each book published by an author (very few exist at this moment in Wikidata).

Information about published books will mainly come from BAnQ, which is the legal depositary of published works in the territory of Quebec (Canada), and Infocentre littéraire des écrivains. The added data will make the distinction between the work, which is the abstract concept of the book, and every manifestations of this work, which are published editions, associated to the following information: date of publication, publisher, number of pages, etc.

For an example, you can consult the links ww added to the writer Sylvain Trudel. You can see the he is the author of the two following works:

The first one is associated to three editions, using the property has edition or translation:

The first two correspond, respectively, to the first edition of the original book, and another later edition with another publisher. The last one is an English translation. Note that each manifestation (that is, an edition) is indicated with a date, that distinguishes it from the work itself.


--Michel Gagnon (talk) 01:05, 27 February 2021 (UTC)

  • this needs more detail to be approved. BrokenSegue (talk) 14:01, 27 February 2021 (UTC)
  • I added some detail and a complete example. Mchlggnn 13:13, 2 March 2021 (UTC).
  • So essentially all works and editions subject to depôt légal in Quebec would be imported?
  1. How many would this be?
  2. How many would be added each month?
I'm not sure Wikidata is up to that. Maybe input from Wikidata:Contact the development team should be sought. --- Jura 08:56, 8 March 2021 (UTC)

taxonbotEdit

taxonbot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Thomasstjerne (talkcontribslogs)

Task/s: Add BOLD Systems taxon IDs to taxon pages

Code: https://github.com/thomasstjerne/taxon-wikibot

Function details: Traverses the BOLD checklist dataset in GBIF which is matched to the GBIF Backbone taxonomy. For each record in the BOLD checklist, the corresponding record in the GBIF backbone is used to locate the taxon in wikidata (through GBIF ID), and then the BOLD Systems taxon ID is inserted. --Thomasstjerne (talk) 20:26, 8 February 2021 (UTC)

Maybe you shoud reconsider your bots name. --Succu (talk) 21:17, 8 February 2021 (UTC)
  • you have fewer than 100 edits. is it safe to give you the control of bot flag? Also that bot username isn't registered which is odd. BrokenSegue (talk) 01:04, 9 February 2021 (UTC)
  • We already have a bot named TaxonBot, so I would prefer another name. --Ameisenigel (talk) 05:29, 9 February 2021 (UTC)
  • Should I update the bot name on this page (to keep discussion history) or should I create a new request page for the new bot name? Thomasstjerne (talk) 18:05, 10 February 2021 (UTC)
    Please update the name, and someone would move the page--Ymblanter (talk) 20:18, 10 February 2021 (UTC)
    • It seems this task is already taken by SuccuBot. Lymantria (talk) 07:17, 24 February 2021 (UTC) and yourself without bot. Lymantria (talk) 09:11, 26 February 2021 (UTC)
  • Quite messy: https://www.wikidata.org/w/index.php?title=Q14659481&action=history . Had SuccuBot been approved for this task in the meantime? --- Jura 09:04, 8 March 2021 (UTC)

Openaccess cmaEdit

Openaccess cma (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Ethanholda (talkcontribslogs)

Task/s: Creates new items and updates existing items for artworks in the Cleveland Museum of Art (Q657415) collection

Code: https://github.com/ClevelandArtGIT/openaccess-wikidata

Function details: The Cleveland Museum of Art has been an active contributor to Wikidata (see Wikidata:WikiProject Cleveland Museum of Art, and this bot will be run by staff. This bot uses Pywikibot to update the items for artworks from the Cleveland Museum of Art (or create missing ones), and keep the Wikidata items in sync over time. It loads artwork metadata from the CMA API, and then checks for the accession number against a SPARQL query to determine if there is an existing Wikidata item. Once it has the QID (or not), it either (1) creates the item from scratch or (2) checks the statements in the existing item to see if any updates need to be made to values or missing claims. It is currently programmed statements with the following properties, mapped from CMA data:

In addition, any statements added from the CMA data will contain a reference to the catalog record (i.e. reference URL (P854) where the artwork is described.

Some examples of items fully generated with this bot workflow:

Note, based on feedback, the code will be modified to no longer add instance of (P31) item of collection or exhibition (Q18593264) statements to items where there is a more specific P31 claim present. In addition, where an artwork is part of a larger work that is also described in Wikidata, the bot will add part of (P361) claims (as per this discussion). Please let me know if there is any feedback or changes you would like to see to the bot's current logic and/or data modeling. Also pinging Multichill here for comment based on past discussion. Ethanholda (talk) 15:24, 5 February 2021 (UTC)

  •   Comment Just commenting to note that I have helped CMA with the code, which is designed with a similar common approach as some other PWB-based bots. Dominic (talk) 15:56, 5 February 2021 (UTC)
    • Sorry, that ping got lost somewhere between keyboard and chair. Would love to get this up and running again. Can you update the code to not overwrite existing descriptions? Will comment in detail later. Multichill (talk) 21:47, 22 February 2021 (UTC)
      • @Multichill: Sure, but to never do so? I understand not overwriting edits from Wikimedian (assuming that is the concern?), but the descriptions are generated programmatically from other fields, like title/creator/date, so what would you want to do when the institution changes the artwork title/creator/date. Dominic (talk) 22:02, 23 February 2021 (UTC)
        • I expect the bot to never overwrite a human edit. That means that you have to keep state if you want to update things like descriptions. For statements it should be easier because these probably reference the museum website. Is it possible that you restore the original descriptions on items like this one? Multichill (talk) 21:17, 7 March 2021 (UTC)
  •   Support synching is an important job. I looked quickly at the code (I'm not an expert in Python) and at the examples, I see nothing wrong. Cheers, VIGNERON (talk) 08:05, 27 February 2021 (UTC)
  • I went ahead and implemented the change to no longer update descriptions for now, as Multichill requested. Perhaps we can refine that later per my question, but he hasn't replied in a couple of weeks now, so I don't mind just doing that because I'd rather than open question not hold anything up. Is there any other feedback? Dominic (talk) 20:13, 7 March 2021 (UTC)
    • I will approve the request in a couple of days, provided that no objections will be raised. Lymantria (talk) 21:08, 7 March 2021 (UTC)
  • Can you only include the inventory number in the description when this is needed for disambiguation?
  • Please also add location (P276) set to Cleveland Museum of Art (Q657415), that's currently missing.
  • Please also add operator (P137) set to Cleveland Museum of Art (Q657415) as qualifier to Commons compatible image available at URL (P4765)
  • If inception is set to something like "c. 1561" you can add it like this.
  • Bonus points for adding location of creation (P1071) (example)
  • Is the cronjob disabled at the moment? Than I can remove the block. Multichill (talk) 21:17, 7 March 2021 (UTC)
  • Can we see 50-200 test edits?--- Jura 08:51, 8 March 2021 (UTC)
    • If and when bot is unblocked, we can definitely run those test edits. --Ethanholda (talk) 16:59, 8 March 2021 (UTC)

Marco 1Edit

Marco 1 (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: MAstranisci (talkcontribslogs)

Task/s: I am a PhD students working on migrants authors, and their narratives. I am using Wikidata to find them, and their nationalities but I noticed that the few of them has this informations and I want to create new claims.

Code: I don't have a repository yet but my aim is to create it to deliver the code

Function details: I use Python. I would like to create these claims by referring to the lists of authors by ethnicity or nationality.
Moreover, I found that the EUROVOC IDs (namely, P5437) point to an older version of the EUROVOC dataset, so I would like to update them.
--MAstranisci (talk) 12:29, 2 February 2021 (UTC)

  • Idea sounds good. Few questions, where are you sourcing the information from? can you show some sample edits? how many edits will this be total? BrokenSegue (talk) 15:51, 2 February 2021 (UTC)
  • Well, after I get all authors ID, names, sitelink, and citizenship, I'm trying to fetch all urls from this kind of page /wiki/List_of_Nigerian_writers to add the nationality/ethnicity in my local machine. Then I would do this bulk update. Consider that there are more or less 100k people that has occupation novelist, writer or poet and only 10k have a claim about the nationality (while the citizenship information is widespread).

The EROVOC ID of a country (for instance, Italy) now point at this http://publications.europa.eu/resource/authority/eurovoc/1519 which is no more available. I want it to point to the actually mantained page http://publications.europa.eu/resource/authority/atu/ITA MAstranisci (talk) 17:35, 3 February 2021 (UTC)

Hey, could we start with you creating an account for your bot and then linking above as well (see Wikidata:Bots#Bot accounts). Regards --ZabeMath (talk) 17:58, 9 February 2021 (UTC)


NicereddyBot 6Edit

NicereddyBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Nicereddy (talkcontribslogs)

Task/s: Add StrategyWiki ID (P9075) to games based on data import from PCGamingWiki (Q17013880) (via PCGamingWiki ID (P6337)). Add Internet Game Database numeric game ID (P9043) to games as qualifiers on Internet Game Database game ID (P5794) using the IGDB API.

Code: I write my bot in Ruby. The first script for StrategyWiki is very similar to the script I wrote for importing WineHQ IDs a while ago. The second is a bit different from most of my other stuff, but still pretty standard overall: StrategyWiki script, IGDB Numeric ID Script

Function details: There are 1800 StrategyWiki pages linked on PCGamingWiki, and we have around 8000 PCGW IDs in Wikidata. If I had to estimate, I'd say somewhere between 1200-1500 StrategyWiki pages will be imported into Wikidata via my script, with fairly high accuracy since the data is all curated manually by the PCGW editor community. For IGDB numeric game IDs, I anticipate that I'll add qualifiers for >99% of all existing IGDB game IDs, so it'll be roughly equal to the number of Internet Game Database game ID (P5794) usages (currently 7700). The only ones that wouldn't get numeric IDs would be invalid IGDB game IDs, but there shouldn't be many of those.

I already tested both scripts using my own user account (see my Contributions page) to import 30-50ish IDs each. The IGDB numeric ID test import was done on January 23rd between and 4:35AM and 4:41AM UTC, and the StrategyWiki test import was done on January 23rd between 5:00AM and 5:08AM UTC.

You'll note that I had imported some bad data at first (at about 4:15AM) on Alundra (Q861322) specifically. This was because the script originally didn't handle items where there was more than one IGDB ID. I resolved that issue and now apply the correct numeric ID to each claim correctly.

I also had a problem with the StrategyWiki import (before 5:00AM UTC) and then reverted the changes to each page manually. I don't anticipate any problems like that anymore since I fixed the bug that was causing that (PCGW's API returns StrategyWiki pages with the underscores replaced with spaces... for some reason, so I just do a simple replacement of spaces to underscores in my script).

Thanks :) --Nicereddy (talk) 05:25, 23 January 2021 (UTC)

  • Hadn't seen the ruby API before. Code looks good. My only suggestions would be to add a reference (e.g. stated in PCGamingWiki) and an edit summary. Otherwise pretty clear support here. Honestly at just 1500 additions I would say you don't need to seek approval (people do larger quickstatement imports without any approval) . BrokenSegue (talk) 06:00, 23 January 2021 (UTC)
  • Thanks for the proposal! If others are looking for edits: Example StrategyWiki edit and Example IGDB edit.
  • Couple of suggestions:
    • Since you’re adding qualifiers the IGDB statements, if you happen to have it in the API query you’re doing anyway, could you add named as (P1810) as qualifier? I find it handy to find matching errors in the future (although for IGDB it’s not super-critical because the name can be inferred from the slug fairly easily).
    • As BrokenSegue says, can you add a reference? I guess with stated in (P248) − something like:
StrategyWiki ID
  De_Blob


add value

Thanks! Jean-Fred (talk) 10:00, 25 January 2021 (UTC)

@Jean-Frédéric, BrokenSegue: I've updated my script to include the reference information now. See the edits on User:NicereddyBot from 19:15 to 19:19 on February 6th, 2021 for examples. (link to Contributions page). The Ruby gem doesn't support edit summaries as far as I can tell, so I can't add those unfortunately. - Nicereddy (talk) 19:24, 6 February 2021 (UTC)


Could this bot also import:

@Nicereddy: --Trade (talk) 20:56, 21 February 2021 (UTC)

@Trade: It should be able to , although I doubt it'd get many (if any) MobyGames records. The other three should be doable, though they're not as straightforward as some of the other properties I've been importing thus far since they're not available via Semantic MediaWiki and not in a consistent format inside the infobox. Nicereddy (talk) 05:49, 22 February 2021 (UTC)
@Jura1: Is there a reason to use the dedicated property over the reference URL? I'm not really aware of any standard practice in that regard. Nicereddy (talk) 04:40, 24 February 2021 (UTC)
Not really, it depends whether someone created/proposed one or not. Help:Sources#Databases describes the reference format. --- Jura 08:51, 24 February 2021 (UTC)
Just since I don't think it's a super important change to make and it's been a month, I'd prefer not to change that at this point. Nicereddy (talk) 03:18, 25 February 2021 (UTC)
Well, it's a bot. If it can't correctly format statements, it shouldn't be approved. That you hadn't checked Help:Sources in over a month is a bit odd. --- Jura 07:07, 25 February 2021 (UTC)
Just noticed that it was even you who had proposed the property: Wikidata:Property_proposal/PCGamingWiki_ID. If, in the meantime, you found that there is a problem with it and that it should be deleted, please request its deletion. --- Jura 07:10, 25 February 2021 (UTC)
@Jura1: The property is used on over 9000 items, it works fine. But I don't see a reason to use it in the references field here, or at least why that should block the bot from being approved at this point. Nicereddy (talk) 00:27, 7 March 2021 (UTC)
@Jura1: I'd think that the part of Help:Sources you refer to speaks about databases, while PCGamingWiki (Q17013880) is an instance of website (Q35127). IMHO the format of source is acceptable. Lymantria (talk) 06:26, 8 March 2021 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  • Interesting question. I think external-id properties are only meant for stable identifiers, such as found in databases, online or offline. So either the property is made for such and should be used, if it's not a stable identifier, we should delete the property. If the operator considers that Help:Sources doesn't apply to them, this should be discussed elsewhere. Unfortunatly, we already have too many references to cleanup (see Wikidata:Bot requests) and adding more isn't helpful. Anyways, given that the operator hasn't done the required 50 test edits in almost 2 months and we can't observe that the bot operates correctly, I can't support it at this point and I think we can close this request as stale. --- Jura 08:45, 8 March 2021 (UTC)
  •   Support, the important part is to add the identifier; the formatting of the reference for this identifier is a minor detail (most external-id don't have nor need a reference, adding one is already far better that the usual standard) that shouldn't block the bot to run. Cheers, VIGNERON (talk) 11:22, 8 March 2021 (UTC)

BorkedBot 4Edit

BorkedBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: BrokenSegue (talkcontribslogs)

Task/s:

Code: Available in the User:BorkedBot page.

Function details: This is a pretty broad request and I will narrow it down if there are objections but I'm hoping past uses of my bot and the simplicity of the task will let that slide. The code for all this isn't written and this task won't begin until my current tasks are done. Given that I'm still waiting on my last RFP/BOT from October I figured doing this early and broad would be best. But in principle this is a simple task.

  1. Find all uses of the identifier without these qualifiers.
  2. Hit the relevant API.
  3. Populate the qualifier.

--BrokenSegue (talk) 21:52, 22 January 2021 (UTC)

  Question What qualifiers are you planning to use on YouTube video ID (P1651)? And would it possibly to import the last update (P5017) from Fandom? --Trade (talk) 15:43, 9 February 2021 (UTC)
@Trade: I was thinking named as (P1810) and language of work or name (P407) though maybe title (P1476) is more appropriate here. And I'm unsure if last update is worth it given how fast I imagine it'll go stale? I don't know if updating our records every time an edit happens there (or even quarterly) is worth it. BrokenSegue (talk) 16:01, 9 February 2021 (UTC)
I was thinking named as (P1810)/language of work or name (P407), author name string (P2093) (name of channel), publication date (P577), duration (P2047) (seconds), number of viewers/listeners (P5436) and point in time (P585). 'And I'm unsure if last update is worth it given how fast I imagine it'll go stale?' I mean they all go stale, that's why your bot update the YouTube Channel qualifiers every third month. --Trade (talk) 16:19, 9 February 2021 (UTC)
fair enough. i can do all that assuming the api returns it. not sure of the utility though. BrokenSegue (talk) 16:32, 9 February 2021 (UTC)

I am ready to approve this request soon, provided that no objections will be raised. Lymantria (talk) 06:29, 8 March 2021 (UTC)

GZWDer (flood) 6Edit

GZWDer (flood) (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Semi-automatic edits using various tools, with following condition:

  • Relevant Wikiproject will be notified for any tasks with more than 1000 edits; exceptions:
    • Fixing existing bot error (by this, future or prior tasks), and bot errors by others (with a notice to the user causing it, if the error is likely to reoccur)
    • Any requests posted by others in Wikidata:Bot requests (but I will wait some days using common sense to allow others commenting on)
  • A discussion with clear intent of bot work will be started in relevant Wikiproject talk page or Project chat seven days before proposed edits if 1. more than 100000 edits are involved; 2. more than 10000 items will be created; or 3. this task involving creating new lexemes, forms or senses. A new bot approval may be required if recommended by other users in discussion. Exceptions:
    • Fixing existing bot error (by this, future or prior tasks)
    • Fixing bot errors by others, with agreement of that user
    • If there are consensus to do the task elsewhere

In all cases the bot may do no more than 100 edits as demo.

Code: varies among tasks

Function details:

Created as a request from User_talk:GZWDer. For a long time, this account acts as a dedicated aemi-automatic editing account to make edits of main account more

Any issues about previous edits should be reported to User:GZWDer/issues, not here. When raising a concern, be prepared to response to replies to your responses.--GZWDer (talk) 20:03, 12 January 2021 (UTC)

@Multichill, So9q, Jura1: Please notice other participants of Telegram chat.--GZWDer (talk) 20:04, 12 January 2021 (UTC)
Done. Thanks for taking the time to write this and prepare the cleanup. I wish you good luck.--So9q (talk) 20:35, 12 January 2021 (UTC)
I see now that you requested rights to modify other objects unrelated to the cleanup. I suggest you remove that and save that for later until the cleanup is done and the communitys trust in you has been restored. Also it might be a good idea to change the request title to clearly state "cleanup". I hope that a succesful cleanup will go a long way to clear you reputation. I suggest you ask previous critics on their talk page to comment on this proposal. An sincere apology about actions that got you blocked earlier would probably also help rebuild trust. ;-)--So9q (talk) 20:40, 12 January 2021 (UTC)
I do need to collect (from other users) what needs to clean up at User:GZWDer/issues before any specific action. I do avoid any edits that is potentially controversial.--GZWDer (talk) 21:45, 12 January 2021 (UTC)
I asked the operator to not do any edits with an unauthorized bot and I shared something about trust "A bot flag is a statement of trust: Trust to do correct edits, trust to not abuse it, trust to fix issues, trust that you will clean up the mess that might be caused by a robot, etc. It looks like multiple people lost that trust in you to the point that the authorization was revoked. It's going to be hard to regain that trust. Running an unauthorized bot (or running a bot under your main account) will only make matters worse."
After that message the operator decided to edit anyway so I changed the block to a complete one (already had a partial block for months). I lost trust in this operator. I'm not sure this operator should operate any bots. Multichill (talk) 11:52, 13 January 2021 (UTC)
+1 on Multichill. Also I didn't like the "please write your complaints in my dedicated subpage" approach. I might not be assuming good faith, and I'd be very happy to be disproved, but it sounded to me very much like GZWDer didn't want to understand what was the problem with their edits. --Sannita - not just another it.wiki sysop 13:00, 13 January 2021 (UTC)
@Multichill, Sannita: Let's go ahead to draft a plan to fix the issue. I does identified some issues at that subpage, but I am not sure others may found more. Until then I will refrain running any bots or scripts (approved or not, under any accounts) not related to fixing issues, but (unless I provide a list for others to work on, or running scripts on main account) it does need a dedicated usable bot account (whether having bot flag or not).--GZWDer (talk) 13:10, 13 January 2021 (UTC) struck 16:19, 8 February 2021 (UTC)
Note I have collected all issues in threads in 2020 to the issue page but I am not sure if I missed any.--GZWDer (talk) 13:32, 13 January 2021 (UTC)

Note I have revised the bot task description. Previously others recommends me to only do the cleanup but I believe I have done all what I can do without a discussion (though some stuffs need discussion about how to fix them).--GZWDer (talk) 16:19, 8 February 2021 (UTC)

Thanks for revising the description. I   Oppose any bot work from you until everything from before is cleaned up (if you want to use a bot for cleanup, please make a clear request for that).
I would like a clear bot request for every single source you wish to import from after the cleanup is done. Broad blanket requests are not for the best of Wikidata IMO. When you are in good standing making and getting a bot request approved should not take long. If it does, it's because it needs discussion and that's a good thing. I want high quality data in WD and low quality imports are not my preferred way to get to that goal.So9q (talk) 09:37, 9 February 2021 (UTC)
@So9q: "until everything from before is cleaned up" - I already closed many of issues in User:GZWDer/issues (which I can fix unilaterally). Some issues needs discussion. For the second point, see the ongoing discussion at Wikidata:Project_chat#QuickStatement_Approval? (my opinion is described in task description).--GZWDer (talk) 12:46, 9 February 2021 (UTC)

DutchElectionsBotEdit

DutchElectionsBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Dajasj (talkcontribslogs)

Task/s: Import and update information about candidacies in Dutch Elections based on a manually curated excel file.

Code: Function details: I'm collecting election results of Dutch Elections for each candidate, so I can add this to WikiData. For this I have created an excel file with all the relevant information, and have matched the items myself manually (so political party ID, person ID, election ID). With this code I want to update WikiData after I have updated my own file. It will check whether a candidacy statement exists with as target a specific election and checks if any information is missing. If so, it will add that including references. If not, it will just skip it and should not override anything. --Dajasj (talk) 15:14, 28 December 2020 (UTC)

how many edits and can you show a sample edit? BrokenSegue (talk) 22:37, 28 December 2020 (UTC)
Here is the list of contributions I did for testing and showcasing: Special:Contributions/DutchElectionsBot. I'm still working on the data because it takes a lot of time. But I think it would be at least in the thousands. Dajasj (talk) 22:43, 28 December 2020 (UTC)
cool seems good.   Support. I am confused what the series ordinal (P1545) is supposed to mean but that wasn't added by your edit BrokenSegue (talk) 22:50, 28 December 2020 (UTC)
  Comment Excuse me if it is obvious (I do not speak Dutch), but I cannot find the number of votes nor the candidate's name in the URL referenced in this diff: https://www.wikidata.org/w/index.php?title=Q28861081&type=revision&diff=1330211216&oldid=1320368353 . --Haansn08 (talk) 00:25, 29 December 2020 (UTC)
@BrokenSegue It refers to the position on the party list per Wikidata:WikiProject_elections documentation.
@Haansn08 You're correct, but it is linked on the bottom Uitslag Tweede Kamerverkiezing 2017 in ODS-bestand, but I thought it would make sense to refer to this summary post, so if people want to look it up they can select their own preferred format. Or would it be preferable to link directly to the file? Dajasj (talk) 07:41, 29 December 2020 (UTC)
Looks like a nice project!
Multichill (talk) 12:11, 29 December 2020 (UTC)
Thanks for all the feedback! I only saw that User:Andrawaag added all candidates for 2017, but did not see it was a bot?
It was indeed not a bot, but I did allign the work back then with an ongoing project of the Finnish election(s). Initially, I had a different EntitySchema, but then was made aware of the project in Finland and adapted the work to their schema to allow for some interesting comparisons. Afterwards I did not find the time to finalize the work with the final outcome, which is an intrinsic part of the Finnish schema. So great idea to bottify the efforts. I would recommend talking to @Susannaanas: (et.al) about their schema to see if any collaboration would be possible. In 2017 they had an extensive model, and there might be more synnergies. Are you aware of the related sparql queries? --Andrawaag (talk) 18:06, 30 December 2020 (UTC)
Thanks for the work you did, it was really helpful to get me started! I rely on their model, so as far as I know that should be the same. Created this as an example today Q104585675. However, I am a bit new on WikiData, and with its API, so in the short term I will refrain from automatically creating items, so I won't add missing persons like you (and the Finnish) did yet. And yeah, I saw the queries, they were part of my inspiration (besides that I would like to use this over at the Dutch Wikipedia). I hope in the future more complex queries across elections will be possible. Dajasj (talk) 18:41, 30 December 2020 (UTC)
  • Regarding represents (P1268), I know it is giving contraint violations and there was some discussion around it over on the Elections Project, without any result. But I saw it being used that way in many other projects such as Wikidata:WikiProject Finnish Elections. I see no other option, so a new one or updated constraints would be necessary then..
  • I know multiple candidates can have the same position. As far as I see this should not be a problem. Although User:Andrawaag only did one per position. I will look into this..
  • I won't add (missing) candidates automatically for now. Maybe in the future, but I would file a new request
  • So far I am looking at European Parliament 2019, Eerste Kamer 2019, Tweede Kamer 2012&2017, Municipal elections 2018. But I hope to go further back in the future. Would that be an issue?
  • I will expand the references! :) Dajasj (talk) 12:42, 29 December 2020 (UTC)
I expanded the references, and ran an additiontal ten tests (and some errors). Let me know if anything else requires improvement! Dajasj (talk) 00:25, 30 December 2020 (UTC)
  • You are correct, it was just an oversight. Thanks for adding the additional property scope statement. Arlo Barnes (talk) 20:08, 30 December 2020 (UTC)
  • Usually right after the elections, people will create articles for the people who got elected, but didn't have an article yet. Effeietsanders is quite active in that area.
  • Would be great to have more elections. Anyone actually getting elected should be notable enough for an item here. So probably after doing the municipal elections, you also filled some gaps for the other elections.
Reminds me that we still have these overviews: User:Sjoerddebruin/Dutch politics/Tweede Kamerverkiezingen 2017, 2012 & 2010 and a lot of notes at User:Sjoerddebruin/Dutch politics. @Sjoerddebruin: time to pick this up again for the next elections? Multichill (talk) 10:42, 30 December 2020 (UTC)
Anyone actually getting elected should be notable enough for an item here, do you also refer to elected officials on the provincial or municipal level? Dajasj (talk) 10:50, 30 December 2020 (UTC)
  Support good project, it's nice to see election data on Wikidata --Haansn08 (talk) 17:45, 29 December 2020 (UTC)
Thanks :) Dajasj (talk) 00:25, 30 December 2020 (UTC)
I am going to approve the bot in a couple of days provided no objections have been made.--Ymblanter (talk) 20:49, 30 December 2020 (UTC)
  • I was trying to figure out if the ".0" on "votes received" had some meaning, but, looking at the reference, it seems to be bot generated. Personally, I'd try to create two items for the references and use these with stated in (P248), but I suppose your approach is fine too. BTW, we have candidate number (P4243) which might work better than the "series ordinal" qualifier. --- Jura 21:29, 31 December 2020 (UTC)
* I'm not sure why the .0 appears. Should be fixable. I will look into it!
* Could you expand on what you mean with two items for the references? I'm not sure I understand.
* candidate number (P4243) seems to refer to a identifier of a candidate, not the position on the party list. So as far as I can tell, that would not be the right usage of that qualifier. Dajasj (talk) 14:56, 1 January 2021 (UTC)
  • About the ref: sample
  • candidate number (P4243) I think it's just meant to a list number, like the ones you are using. It has string datatype and #1-#10 are very frequent as value. --- Jura 00:08, 3 January 2021 (UTC)
Thanks for the example! It looks nice, is that the preferred method?
Wikidata:WikiProject elections also describes it as a unique number, and series ordinal as position on party list. Dajasj (talk) 08:41, 3 January 2021 (UTC)
The number I checked was used consistently across references for the same candidate (in this election). Not sure where the text on the Wikiproject comes from. Maybe it predates the creation of the property. The proposal seems to match your usecase: Wikidata:Property proposal/candidate number. This probably explains why we have some many numbers 1 through 10 as values.
Help:Sources explains what is preferred. The question is which format matches your ref. What can help determine this might be whether it's a static document or a dynamic webpage. --- Jura 10:34, 3 January 2021 (UTC)
But if we look at the example of that property, it has a very high candidate number. 147 per party is incredibly high. But maybe that's how the Finnish organise their election. But furthermore, Q28777227 only contains series ordinal and is an example from the same project. So maybe we can continue this particular discussion on Wikidata_talk:WikiProject_elections, and if consensus is reached, create a bot to make its usage across all elections consistent. Dajasj (talk) 11:14, 3 January 2021 (UTC)
Sometimes new properties are created and not necessarily everything updated at once. Quote from the proposal discussion for P4243: "it is a number of the candidate on the slate for the local election"- --- Jura 11:25, 3 January 2021 (UTC)
But then this quote confuses me again -> "A necessary property to disambiguate people who have no birth dates recorded.". And on the project page Wikidata:WikiProject_elections it was emphasised as recently as October 2020[1] that series ordinal is for position on party list. So wouldn't it be preferable to discuss this with them? Dajasj (talk) 11:34, 3 January 2021 (UTC)
I would prefer series ordinal too because it's not a number assigned to a candidate. Multichill (talk) 19:05, 12 January 2021 (UTC)
I would prefer P4243 because it's a number assigned to a candidate. --- Jura 19:07, 12 January 2021 (UTC)
Could you perhaps elaborate on this? What do you mean with that it is positive that it is assigned to a candidate? Dajasj (talk) 22:40, 12 January 2021 (UTC)
It's a conclusion I drew when going through the references you provided to figure out if ".0" has some meaning. As you seem to disagree, I suppose you'd have plenty of samples where the "number of the candidate on the slate for the local election" refers to different people, or not? --- Jura 13:42, 13 January 2021 (UTC)
Sorry for my late reply. But in Dutch elections, candidates only get a number on the candidate list of their party. Thus for every party there is a candidate with number 1 (and the other numbers, depending on the length of their candidate list). Dajasj (talk) 19:54, 3 February 2021 (UTC)
That's ok for P4243. --- Jura 09:07, 8 March 2021 (UTC)

No longer needed Dajasj (talk) 16:22, 26 February 2021 (UTC)

  • @Dajasj: Can you clean up your edits through the non-bot account based on the review above? --- Jura 09:07, 8 March 2021 (UTC)
Do you mean removing the ".0"? Dajasj (talk) 09:21, 8 March 2021 (UTC)
If so, I fixed that. Or do you mean the P4243 thing? (I only just saw your addition) Dajasj (talk) 09:30, 8 March 2021 (UTC)
  • Well, any points raised above should be addressed. --- Jura 08:08, 9 March 2021 (UTC)
Okay, I thought maybe you had specific points in mind. I believe I changed everything except series ordinal -> candidate number. Problem is, I have not been the only person who has used this. So that could lead to inconsistencies. Furthermore, do you have any suggestions for tools to swap only the qualifier, and not remove and replace the complete statement? Thanks in advance. Dajasj (talk) 09:41, 9 March 2021 (UTC)

JarBot 5Edit

JarBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: جار الله (talkcontribslogs)

Task/s: Adding "Monolingual text" (P2096) to images (P18) from arwiki.

Code:pywikibot

Function details:Hello, we use infobox based on wikidata and we want to remove duplicate file link from the article but we don't want to lose the comment.1, 2. --جار الله (talk) 05:40, 19 December 2020 (UTC)

  • Sorry if I'm confused I think there's maybe a language barrier here. But is the text you are importing licensed under the public domain? If not you cannot import it. Right? BrokenSegue (talk) 17:01, 19 December 2020 (UTC)
@BrokenSegue: No, the text isn't licensed under the public domain. I heard "mass import" is unacceptable but importing few might be reasonable. the task include around 1,700 texts only.--جار الله (talk) 17:59, 19 December 2020 (UTC)
@جار الله: I'm no copyright expert but you're gonna have to argue the text is simple enough to not be copyright-able. I cannot comment as I do not speak arabic. Otherwise I have no objections and support. BrokenSegue (talk) 18:01, 19 December 2020 (UTC)
@BrokenSegue: Now there is less than 1,500 includes 130 that the text is the same as the Label or the Pagename. If all the 1,500 cannot be import at least we can import the "Monolingual text" that same as Label or the Pagename and it's licensed under the public domain.--جار الله (talk) 05:23, 30 December 2020 (UTC)
If the legend is the same as the label or pagename, I don't think it's needed as qualifier on Wikidata. --- Jura 09:13, 8 March 2021 (UTC)

Cewbot 4Edit

Cewbot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Kanashimi (talkcontribslogs)

Task/s: Import new articles from online recsources.

Code: https://github.com/kanasimi/wikibot

Function details: Please refer to Wikidata:Bot_requests#weekly import of new articles (periodic data import). The task will import new articles from PubMed Central, about 30K articles every week. Maybe imports from other resources in the future. --Kanashimi (talk) 03:49, 27 November 2020 (UTC)

PubMed ID (P698) will be used to avoid duplicates for articles from PubMed Central. For other resources, identifier and article title, author(s) will be checked. --Kanashimi (talk) 20:04, 27 November 2020 (UTC)
  • 1. You should check DOI too (but some does not have a DOI). 2. What source you will use to resolve the authors? Many does not provide enough information (i.e. ORCID) to resolve them.--GZWDer (talk) 05:09, 1 December 2020 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  • Thanks for doing these. I don't think complex author resolution is needed, but if it can be done, why not. OTH Inclusion of journal or other publication venue would be useful. Previous imports sometimes skipped them when an item wasn't created (meaning the bot or its operator needs to create it when one is encountered). User:Research_Bot/issues lists a few past problems. GZWDer talk page has a few others. --- Jura 17:12, 1 December 2020 (UTC)
@Jura1: Thank you. User:Research_Bot/issues is very useful. @GZWDer: I will also try DOI (P356). If there is no information of author(s), i will skip the check. --Kanashimi (talk) 00:24, 2 December 2020 (UTC)

DatatourismebotEdit

Datatourismebot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Conjecto (talkcontribslogs)

Task/s: Update of the property Datatourisme ID during the internal reconciliation process

Code: We will mainly use the Wikidata Toolkit java library

Function details: --Conjecto (talk) 23:14, 23 November 2020 (UTC)

Fab1canBotEdit

Fab1canBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Fab1can (talkcontribslogs)

Task/s: Adding country to islands that have no country

Code:

Function details: The bot takes the island's country from its Wikipedia page in other languages and adds it to wikidata --Fab1can (talk) 14:00, 1 November 2020 (UTC)

romedi 1Edit

romedi 1 (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Scossin (talkcontribslogs)

Task/s: Add relation ?entity is a medication "?entity wdt:P31 wd:Q12140". I detected missing statements for molecules in commercialized drugs, for example: https://www.wikidata.org/wiki/Q353551 is a medication.

Code: https://github.com/scossin/RomediApp https://www.romedi.fr

Function details: --Scossin (talk) 13:52, 24 October 2020 (UTC) addMedicationStatement(entity): RDFstatement

Please register the bot and run some 50-250 test edits. Lymantria (talk) 09:55, 1 November 2020 (UTC)

FischBot 8Edit

FischBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Pyfisch (talkcontribslogs)

Task/s: Remove date of birth, date of death statements sourced only by VIAF.

Code: not public at this point

Function details: The bot works from a list of VIAF records linked to Wikidata items. For each item it removes all date of birth (P569) and date of death (P570) statements with the source stated in (P248): Virtual International Authority File (Q54919). In case the statement has additional sources, only the reference is removed.

I already removed (edits) those dates from VIAF that were marked "flourished". They were wrongly imported as dob/dod. Other dob/dod statements imported from VIAF may be correct, but VIAF is not a suitable source as it discards date information found in other authority files and incorporates information from Wikidata. Some common errors are: missing circa, wrong precision (year instead of century), flourished dates not marked as such and more. In the future the dates should be added directly from the relevant authority control file.

Some examples of dubious or wrong dates from VIAF (query):

@Magnus Manske: you originally added most of these statements. @Jura1, Epìdosis: from prior discussion at Wikidata:Bot requests.

If there is no objection to the removal of these statements I will start the bot on Friday. --Pyfisch (talk) 23:40, 5 October 2020 (UTC)

  •   Strong support --Epìdosis 06:19, 6 October 2020 (UTC)
  • Thanks for doing these. Most helpful. From the above query, I checked a few rdfs exports at VIAF. They generally have the date and one or several sources where it could come from. Generally, it can be found on one of them. Sometimes this is LoC or idref (both tertiary sources), but it could also be ISNI or dbpedia, which would probably make VIAF a quintary source. Obviously, others can have the same problem, e.g. a LoC entry has several references without the dates being attributed to one of them. To sum it up: I'd also deprecate (if no other ref is present) or remove these references/statements. --- Jura 10:40, 6 October 2020 (UTC)
    BTW, when we will import dates from VIAF members, the first ones I would consider are the following: GND ID (P227), Library of Congress authority ID (P244), Bibliothèque nationale de France ID (P268), IdRef ID (P269). --Epìdosis 22:34, 7 October 2020 (UTC)
  • @Magnus Manske: would these be re-imported by some tool? --- Jura 10:40, 6 October 2020 (UTC)
    I really fear these statements would be (at least in part) reimported by @Reinheitsgebot: from MnM catalog 2050. An option in MnM should be inserted: it should be possible to mark a catalog as not suitable for the automatic addition of references based on it; this option would also be very useful for CERL Thesaurus ID (P1871) (= catalog 1640), which isn't an independent source too, and for other catalogs. --Epìdosis 12:48, 6 October 2020 (UTC)
  • Unfortunately   Support As explained above, and as I have seen in items, there are too many bad claims in this import from VIAF. --Shonagon (talk) 16:25, 6 October 2020 (UTC)
  •   Comment After some thought, I think it's preferable to keep the statements that were correctly imported from VIAF and only deprecate them when the statements are known to be incorrect. VIAF's approach isn't much different from other tertiary sources mentioned above, i.e. LOC, CERL or GND would be preferable with their secondary source, notably for GND that has become a wiki.
    The statements Pyfisch removed in the initial batch were different: there we knew bots had imported them incorrectly into Wikidata. --- Jura 17:28, 8 October 2020 (UTC)
    There is a bunch of dates already labeled "circa" by VIAF, but this qualifier is missing for these dates on Wikidata. In addition dates that are stated as "19.." or "20th century" in the sources VIAF uses are recorded in VIAF as 1950 and imported into Wikidata. This issue equally applies to dates with decade precison. While I can't be sure that the data for people with "date of birth: 1950" in VIAF is wrong, as there are people who were actually born in 1950, it is very likely. --Pyfisch (talk) 09:24, 16 October 2020 (UTC)

RegularBot 3Edit

RegularBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Automatically import articles from Russian Wikinews

Code: Using harvest_template.py, and newitem.py

Function details: Recently some bots created many pages in Russian Wikinews. ~10000 articles per day is excepted (but once the the initial import in Russian Wikinews is completed, it may be 100-300 per day). This task involves: (The following may be done separately)

  •   Oppose there is still need from cleanup from previous bot runs. The above seems to run without approval. --- Jura 09:46, 9 September 2020 (UTC)
    • Specific issues may be easy to fix, see earliest edits of this account. But please point out them.--GZWDer (talk) 10:28, 9 September 2020 (UTC)
      • I think you have done sufficient test edits. I have asked this account to be blocked until any of its planned tasks are approved, especially as the operator thinks approval isn't needed. --- Jura 10:33, 9 September 2020 (UTC)
        • So other than "without approval", are there any issue about this specific task and bot's edits?--GZWDer (talk) 11:56, 13 September 2020 (UTC)
@SCIdude, Charles Matthews: For this task only, I do not expect duplicates.--GZWDer (talk) 10:36, 13 September 2020 (UTC)
@Edoderoo: For this task only, eventually more statements will be added (see "Function details").--GZWDer (talk) 10:38, 13 September 2020 (UTC)
I am not going to this task without an approval, but do community have more comment?--GZWDer (talk) 11:55, 13 September 2020 (UTC)
@Jura1: Do you have more comments about this task in particulars? I feel that we should not have issues about this task.--GZWDer (talk) 14:15, 14 September 2020 (UTC)
What are your thoughts about the applicability to your bot/flood/etc accounts of "The bot operator is responsible for cleaning up any damage caused by the bot" (see Wikidata:Bots#Bot_accounts)? --- Jura 15:56, 14 September 2020 (UTC)
@Jura1: Existing issues are being fixed (please point out). I does not expect issues from this task.--GZWDer (talk) 16:02, 14 September 2020 (UTC)
Can you do a sum up of recently raised issues and provide ways we can check that they are fixed? You can't just open a new bot request and expect people re-repeat every problem you are meant to fix every time. --- Jura 16:04, 14 September 2020 (UTC)
@Jura1: Inspired by User:Research Bot/issues (another bot making a large number of issues), I have created User:GZWDer/issues. Feel free to expand it.--GZWDer (talk) 17:17, 14 September 2020 (UTC)
@Jura1: Did you identified any other issues?--GZWDer (talk) 13:04, 15 September 2020 (UTC)
@Jura1: Do you have any concern?--GZWDer (talk) 11:21, 17 September 2020 (UTC)
Why is User_talk:GZWDer/2019#Prefixes_in_labels still not fixed? It was discussed at [2] just recently and yet you still fixed just one type of thousands of problematic labels, check Q75645721, Q76226338, Q75911351. You probably repaired less then the other people who helped you.
Also, the Peerage import lead to the addition of information about countless minors and other not notable persons. As even a supporter of that import brought up, there is no consensus for such publications on Wikidata. These still need to be selected and proposed for deletion.
As you kept running this bot without approval, I think it's better blocked indefinetly. --- Jura 06:51, 20 September 2020 (UTC)
  •   Oppose (Repeating what I said in another RFP) This user has no respect on infra's capacity in any way, these accounts along two others has been making wikidata basically unusable (phab:T242081) for months now. I think all of other approvals of this user should be revoked, not to add more on top. (Emphasis: This edit is done in my volunteer capacity) Amir (talk) 06:55, 20 September 2020 (UTC)
Comment: I think that would go too far. But I have thought for some time now that community regulation of bot editing should be put on a more organised footing. And I say this as someone who makes many runs of automated edits (channelled through QuickStatements). We need better definitions of good practice, and clearer enforcement.
Currently, we try to deal with ramifying issues and loose specifications with threaded discussions, spread over many pages. The whole business needs to be taken in hand. Structure is required, so that the community can manage the bots and the place is not simply an adventure playground for them. Charles Matthews (talk) 07:06, 20 September 2020 (UTC)
  • Obviously, everybody makes errors or might overlook some aspects once in a while, but most other operators are fairly reliable and try to clean up behind them. --- Jura 07:13, 20 September 2020 (UTC)
I don't know why you say that. Systematic problems with constraint violation is an area where major bots simply ignore the bot policy and good practice. Charles Matthews (talk) 07:40, 20 September 2020 (UTC)
do you have a sample? --- Jura 07:43, 20 September 2020 (UTC)
I work on cleaning up Wikidata:Database reports/Constraint violations/P486. If you graphed the "Unique value" violations over time (first section), you would see that they climbed gradually to over 3.1K. This was largely the work of one bot, whose owner was ignoring the issue. I had those edits, which were over-writing corrections, stopped in mid-2019. No bot corrections were made subsequently: I remove the violations by hand, and they are down to 40% of the peak. There are other properties where similar problems continue, to this day. Charles Matthews (talk) 08:02, 20 September 2020 (UTC)
If you don't think the operator's response on User talk:ProteinBoxBot is adequate, I'd ask for a block. It's not ok that it overwrites your edits. --- Jura 08:27, 20 September 2020 (UTC)
Well, I wouldn't. I discussed the matter at WikiDataCon on Berlin, as a dispute that needed to be resolved. I came to an understanding, face-to-face, and that was the pragmatic thing to do. That is really my point: no principles were documented, no fixes agreed, the whole thing was done with bare hands. Since I have considerable dispute resolution experience on enWP, I could see that was the way to go. There is no formal dispute resolution here on Wikidata, and the problems are complex. There is a two-dimensional space, one dimension being the range of issues, and the other the fixes. While informal dispute resolution is better in at least 90% of cases, the piecemeal approach and lack of documentation is not OK, and something should be done about it. We are talking about the difference between 2015, when people were grateful to have bot operators working away, and 2020 when Amir can talk as above, which is an informed judgement. I don't think reducing the "fix" dimension to blocks and bans is adequate, though: that is my Arbitration Committee experience talking. Charles Matthews (talk) 08:43, 20 September 2020 (UTC)
If you are happy with the outcome, why bring it up here? Either the bot operates as it should or it doesn't. --- Jura 08:52, 20 September 2020 (UTC)
I didn't say I was happy. You did ask for a sample. I'm coming from a direction that sees more nuance, more human factors. What is said in Wikidata:Bots is "Monitor constraint violation reports for possible errors generated or propagated by your bot", which implies self-regulation. I think, having dealt with GZWDer also in a major dispute, that language is too weak, and hard to enforce. Charles Matthews (talk) 09:07, 20 September 2020 (UTC)
I haven't read that bot's talk page in detail, but if it overwrites other editors' contributions that fix things, this is a major problem that has nothing to do with constraint violations. In some wikis, the end up blocking the operator over such conduct. --- Jura 09:16, 20 September 2020 (UTC)
Well indeed. And if such code is still in use, it is because of inertia in replacing older, Python-based libraries, I would guess, when there are certainly better solutions available. Which is a desirable change. That issue is at least in part about implementing change, and overcoming reluctance to change code whose development costs should now have been fully depreciated. In the case of ProteinBoxBot, there is contract work done here, but not properly declared here as it should be under the Wikimedia general terms of use (IMO). As far as I'm concerned this is all a can of worms. The legacy code issue clearly does apply to GZWDer, too. When I talk about the inadequacy of a piecemeal approach, these are some of the considerations I have in mind. An on/off switch for bot editing really is crude if we want to get to the root of things. We may see things very differently, but this is what is on my mind when I argue for more "structure". Charles Matthews (talk) 09:33, 20 September 2020 (UTC)
WikidataIntegrator (as also used by ProteinBoxBot) has plenty of issues and bugs that I applied more than one dozen local fixes (not pulled to upstream as some are just hacks or task-specific), but removing existing statements is a fundamental problem. In the long-term future I plan to get rid of it completely, but I do not know when I can work for an alternative.--GZWDer (talk) 16:03, 20 September 2020 (UTC)
Maybe you should look at what Magnus Manske has been doing with Rust for the past 18 months. Charles Matthews (talk) 19:09, 20 September 2020 (UTC)
@Charles Matthews: Magnus's rust bot still have many serious issues like Topic:V2fzk650ojg2n6l1 and [3]. Currently Magnus have not brought it to an usable situation. The code can not be used without a substantial fix.--GZWDer (talk) 21:05, 20 September 2020 (UTC)
What I said about inertia. Charles Matthews (talk) 04:26, 21 September 2020 (UTC)
@Ladsgroup: I said this task will not run with more than 60 edits every minute. Do you still oppose?--GZWDer (talk) 15:43, 20 September 2020 (UTC)
@Ladsgroup: Do you have any comment on discussions above?--GZWDer (talk) 22:00, 21 September 2020 (UTC)
  •   Oppose What's the point in adding that many russian wikinews? Do they really need to be imported? Is there any chance any of them will ever need to be linked?--Hjart (talk) 07:20, 20 September 2020 (UTC)
    • There is so much more then only linking to other languages about Wikidata, that I do not know where to start to answer your question. Edoderoo (talk) 07:39, 20 September 2020 (UTC)
      • So that such a concern is not valid. @Hjart:.--GZWDer (talk) 15:36, 21 September 2020 (UTC)
  •   Oppose do we really need more objects instance of wikinews articles? --Sabas88 (talk) 07:01, 23 September 2020 (UTC)
    • Providing metadata is just one of many purposes of Wikidata. Ideally, it should be expected that every Wikimedia (other than Wiktionary) articles han have an item. @Sabas88:--GZWDer (talk) 18:41, 23 September 2020 (UTC)
@Ladsgroup, Hjart, Sabas88: Do you have any further comments?--GZWDer (talk) 02:08, 29 September 2020 (UTC)
You have pinged me three times and posted on my talk page as well. Given that WMDE is going to remove noratelimit from bots, your bot won't cause more issues hopefully but you lost your good standing with regards to respecting infra's capacity to me. Amir (talk) 18:52, 10 October 2020 (UTC)

RegularBot 2Edit

RegularBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Mass creation of items from Wikimedia articles and categories.

Code: Using a modified version of newitem.py

Function details: I intended to move all mass sitelink import feature to this bot. Feel free to raise your concern. --GZWDer (talk) 07:28, 8 August 2020 (UTC) @Tagishsimon, SCIdude, Hjart, Jheald, Edoderoo, Animalparty: @Charles Matthews, Voyagerim, Sabas88, Jean-Frédéric, Ymblanter: Should we import unconnected pages from Wikipedia at all?

  • Previouly this is cleaned up annually or semi-annually
  • In my opinion even if importing new items will result in duplicates, not importing them at all defeats the purpose of Wikidata
    • This results in large and infinitely growing backlog which including many duplicates will few people fixing them (such as the cebwiki one)
    • Importing them at least allows users finding them using various tools (including when the item is improved)
    • Some wikis have people cleaning up unconnected pages, but many wiki does not
  • Alternatively we only import items with a specific age (the default of newitem.py is 14 days since creation and 7 days since last edit, and there are bot importing from nlwiki and cswiki using such setting; but I use 1/0 setting)

--GZWDer (talk) 07:46, 8 August 2020 (UTC)

DiscussionEdit

  •   Support if you use the default 14/7 - I'm editing such items right now. However, it would help if you could make a quick page that lists tools to find them. I used PetScan for the titles and a WD dump, but this is out of reach for most people. --SCIdude (talk) 07:59, 8 August 2020 (UTC)
    • Note I will use different setting for different wikis, and I don't think 14/7 is a right solution (especially for actively edited pages). In most wiki I suggests 1/0 with a template skip list, unless there are someone actively cleaning up unconnected pages and suggests a different setting (in many wikis there are not).--GZWDer (talk) 08:38, 8 August 2020 (UTC)
      • Before you import unconnected pages and especially if you do 1/0 you should definitely talk with affected communities. You should not just import them without notifying anyone. --Hjart (talk) 08:59, 8 August 2020 (UTC)
  •   Oppose without much better efforts to identify matches to existing items, and (if new items must be created) to much more comprehensively import information to new items, to make them properly identifiable from their statements. Also support delay to see whether new items get moved, merged, added to, or deleted. And above all, please can we encourage creators of new articles to link their own new articles, rather than assume that a bot (might) come and do it for them. Oppose any bot doing this, unless there is such an information campaign on major wikis. Jheald (talk) 09:05, 8 August 2020 (UTC)
    • @Jheald: "encourage creators of new articles to link their own new articles" - but as long as there are users not doing them, there will always be backlog of unconnected pages. Moreover, Many tools (PetScan, HarvestTemplate, projectmerge, etc.) require an existing item to work and thus for the best effect items should be created beforehand.--GZWDer (talk) 09:15, 8 August 2020 (UTC)


  •   Oppose Duplicates will only be found if there are many properties filled on these items. Creating empty duplicates is moving problem from area A to area B, where you can even discuss how big the issue is if some items from the Zulu-wiki (or any other low volume wiki) do not get connected to any wikidata item within a week. This should ALSO be discussed with the community of these wiki's. Edoderoo (talk) 09:18, 8 August 2020 (UTC)
    • @Edoderoo: This also means if a page is never connected, a duplicate is never found. Creating an item will allow using tools projectmerge or KrBot's number merge, or more dumb way, a search; in addition when more data are added, duplicates will surface (User_talk:Ghuron/Archives/2018#Extra_US_presidents). In most wikis, there are no people taking care of unconnected pages; even in most active one like nlwiki, a bot doing so is still required ([4]).--GZWDer (talk) 09:25, 8 August 2020 (UTC)
      • For me it is NOT an issue if page XYZ on Zulu-wiki is not connected to Wikidata and not to any other wiki. And again, you didn't answer on the part: creating EMPTY items will not be any help in finding duplicates. So how many properties will your bot add, and how will it define those properties? I know by own personal experience that a bot adding properties is leading to other HUGE issues and tremendous additional extra manual doublechecking work. Edoderoo (talk) 09:38, 8 August 2020 (UTC)
      • Agree. Duplicates do not exist in the meta-space of WD plus WP. It is either a WD duplicate or not. Before you create the item there is no WD duplicate. --SCIdude (talk) 09:42, 8 August 2020 (UTC)
        • @Edoderoo: In my opinion a hidden duplicate is still a duplicate. This task only covers creating new items; adding statements is another thing. (User:NoclaimsBot is a good idea, but we need to generalize it - it currently only works in a few wikis and there are no workflow suggesting a new template or category to add.) @SCIdude: This basically means we will have a infinitely growing number of unconnected items until there are someone taking care of it (unlikely in smaller wikis), which defeats the purpose of Wikidata to be used as a centralized interlanguage links store.--GZWDer (talk) 09:49, 8 August 2020 (UTC)
          • Creating blank-shitty-items that has no value at all can be done by anyone using PetScan. But if your bot can not add any value, I am absolutely against. You better put your energy in creating value, instead of shitty volume. Edoderoo (talk) 10:09, 8 August 2020 (UTC)
            • Not every pages will be handled by human in due time (see the BotMultichillT example).--GZWDer (talk) 10:21, 8 August 2020 (UTC)
              • @GZWDer: regarding NoclaimsBot. More wiki's can be added and it's easy and anyone can add other templates. Categories produced too many false positives so I didn't implement that part. Multichill (talk) 13:18, 8 August 2020 (UTC)
  • @Mike Peel: I think PiBot is approved for some activities in this kind of area. Any thoughts on best practice, minimum requirements, what other bots are active, and how to function appropriately in this area? Jheald (talk) 09:35, 8 August 2020 (UTC)
    • I'm generally in support of this bot, I come across quite a lot of items that it's usefully created. I think waiting for 2 weeks after page creation is a good idea if the items are going to be blank - pi bot creates them within an hour for humans, but it's also matching pages with existing ones wherever possible, plus it's importing basic data about the humans from enwp at the same time. It would be nice if there was a way of finding matches between unconnected pages and existing items, to avoid duplicates, but this is tricky. I have some scripts that search Wikidata for the page title to find matches, skipping items that already have a sitelink, but they then need a human to check through the matches to see if they are correct. Thanks. Mike Peel (talk) 10:03, 8 August 2020 (UTC)
      • Yeah, though Pi bot is only run on some wikis.--GZWDer (talk) 10:21, 8 August 2020 (UTC)
  •   Oppose for wikis that mass-generate articles automatically like cebwiki (not sure if there are others?). Support for other wikis, with the current age restrictions (14/7). − Pintoch (talk) 10:10, 8 August 2020 (UTC)
    • Another point is, if items are not created at all, few people will notice existence of local pages. Many people imported locations from various sources, where many duplicates with cebwiki ones exists. If duplicates (specifically hidden ones) will happen, let them happen as earliest as possible.--GZWDer (talk) 10:21, 8 August 2020 (UTC)
  • Oppose until a satisfactory explanation is offered about why it is better than the failed proposal Wikidata:Requests for permissions/Bot/GZWDer (flood) 2. – The preceding unsigned comment was added by Jc3s5h (talk • contribs) at 10:53, 8 August 2020 (UTC) (UTC).
    • Duplicates will exist, but creating items will make them visible and allow people actively working on it. In previous years there are significant import from other sources without notice of possible duplicates.--GZWDer (talk) 11:17, 8 August 2020 (UTC)
  • I wrote newitem.py to make sure the backlog doesn't get too large. You to give users the time to connect articles to existing items or create new items with some statements. If that doesn't happen in time, the bot comes along to create an empty item so that we don't get an ever growing backlog. The current settings for the Dutch Wikipedia are: The article has to be at least 49 days old and the last edit has to be at least 35 days ago. Running this bot on 1 days old and edit set to 0 days is ridiculous. Why this rush?   Oppose with those crazy settings. Multichill (talk) 13:18, 8 August 2020 (UTC)
    • @Multichill: What is the propose of a last edit threshold?--GZWDer (talk) 14:20, 8 August 2020 (UTC)
      • If you're going to run this on many wiki's I would do a conservative approach and set creation around 30 days and last edit around 45 days. Multichill (talk) 14:28, 8 August 2020 (UTC)
        • @Multichill: I still do not get the point of last edit. Many local wiki users do not care Wikidata and I can find many articles in DYK, GA or FA without a Wikidata item connected.--GZWDer (talk) 14:33, 8 August 2020 (UTC)
          • To increase that the article is in a stable situation. For example, you don't want to create items for articles that are nominated for deletion. Multichill (talk) 14:39, 8 August 2020 (UTC)
            • @Multichill: FA, GA and DYK articles are usually not "stable" in this sense, as is current hot topic. Instead the bot skip pages containing some defined template.--GZWDer (talk) 15:55, 8 August 2020 (UTC)
  •   Oppose in the same sense as Multichill. I have a page with some tools I use successfully to find matches of particular interest to me. If a duplicate item is created, rather than an available match, then this PetScan query, related to s:Dictionary of National Biography, will not pick it up (within its scope), until it is merged here, which I know can take a year. I use this PetScan query to patrol for likely candidates, and this works well. When an item, for a person, needs to be created, I use https://mix-n-match.toolforge.org/ first, to see whether I can create an item with some identifiers on it, as a start. These workflows have worked fine for me. They have been disrupted by the short lead time: there may be some trade-off here, but I will not be sure about effect of creating the needing-to-be-merged-here items until they start appearing in the medium term.
Where, please, is the urgency of a change in the status quo? Waiting a few weeks is sensible, in the general case. There needs to be a clear explanation of what is currently broken and requiring to be fixed. Charles Matthews (talk) 14:08, 8 August 2020 (UTC)
  • @Charles Matthews: Currently there are no bot at all to clean up the backlog in most wikis. If possible, I can run the enwiki one with the default setting, which will at least significantly reduce the backlog. For other wikis, alternative setting may be used.--GZWDer (talk) 14:22, 8 August 2020 (UTC)
Three points here:
  1. As is usual in discussions with you, you do not answer the question directly, but start another line of discussion. While this may be politeness in some ways, it does not correspond to the needs of wiki culture.
  2. When I read this diff from your user talk page, I thought that you simply don't understand the merging issue for duplicates. Technically, merging is fairly easy on Wikidata. But to do it responsibly, particularly for human (Q5) items but also in other areas such as medical terms, is hard work.
  3. When you ask for a bot permission that gives you many options that you might use, I'm inclined to refuse. I think you should specify what you will do, not talk about what you might do. If you define some backlogs you want to clear, and say how you might clear them, that might be OK.
Charles Matthews (talk) 14:48, 8 August 2020 (UTC)
@Charles Matthews: For this task only, it have only one job - fully automatically creating new items from Wikimedia pages. For point #2: It is not a good thing either that many hidden duplicates in an infinitely growing backlog of unconnected pages that nobody is cleaning up. Things are even worse when many new items are created which increased the number of hidden duplicates. (Mix'n'Match can only find pages in one wiki.) --GZWDer (talk) 15:21, 8 August 2020 (UTC)
Well, I gave a link to a serious dispute on your user talk page. This dispute actually needs to be resolved. It changes the situation. Let me explain: I do not always agree with the idea that bot tasks should be completely specified. It is usually be better if the bot operator agrees to stay within the bot policy. The dispute is about good practice in the creation of duplicates here, which is not currently mentioned in the bot policy.
But the way you are arguing is likely to have the result that not creating too many duplicates is added to the bot policy. Because many people disagree with you. In the end, disputes are resolved by addressing the issues.
A possible solution here is to divide up the language codes into groups, and try to get some agreement on how long to wait for each group of codes. If you can give details of wikis that "nobody is cleaning up", probably that could be a basis for discussion. If you are really saying this is a "long tail" problem, where there is more in the "infinitely growing backlog" of "hidden duplicates", as you call it, than we all know, then we do need to understand how fat the tail is. If there are 250 out of ~300 wikipedias involved, generally the smallest, then maybe it is comprehensible as an issue. The ceb wikipedia is clearly an edge case, and we should exclude it at present. Charles Matthews (talk) 16:14, 8 August 2020 (UTC)
@Charles Matthews: See User:GZWDer_(flood)/Automatic_creation_schedule, currently involving 151 different wikis (including all Wikipedias with more than 10000 articles except four). See also here for the number of unconnected pages older than two weeks per wiki and here for history of backlog in enwiki.
Previously when PetScan is used, pages with title same as the label of existing items are skipped by default. However I don't think this is a good practice as the skipped page are itself infinitely growing. So I decided to import all of them and duplicates can be found when items are created (especially when more statements are added).--GZWDer (talk) 16:45, 8 August 2020 (UTC)
It seems you are missing the point of what I am saying, and also the point that almost everyone here is opposing. Charles Matthews (talk) 18:41, 8 August 2020 (UTC)
@Charles Matthews: Creation of items from unconnected page will results in duplicates unless they are checked one by one which is not possible in a automatic process. Connections do not disappear if the items are not created. So let it happen which will surface works that need to do, unless significant many people doing them in another way (i.e. cleaning up unconnected pages manually).--GZWDer (talk) 18:47, 8 August 2020 (UTC)
To be clear, you need to engage here with criticism. If your attitude is "all or nothing", then clearly at this time you get nothing. Charles Matthews (talk) 19:03, 8 August 2020 (UTC)
@Charles Matthews: Do you agree to automatically create new items for articles older than a time stamp (14 days by default, but may be a bit longer for wikis with users actively handling unconnected pages)? Duplicates will happen nevertheless (there are no automatic way to prevent it), but at least unconnected pages are likely to be abondoned (i.e. not actively handled) if they are not handled in a specific timeframe. In other word, we currently have two workflows (handle unconnect pages before any automatical imports, and handle them after imports), and this proposes a cut point that the loss of creating items lately (i.e. unable to use tools for extent items, and possible duplicates with items recently created) outweight the gain (i.e. duplicates in creation, and premature for human handling).--GZWDer (talk) 13:42, 9 August 2020 (UTC)
@GZWDer: I can agree to a two-phase system, in which (phase I) newly-created wikipedia items are left for a period, and then (phase II) automatic creation of a Wikidata item takes place. In all your suggestions, it seems to me, you make phase I too short. I agree that there is a kind of trade-off here, and that we can accept some duplicates caused in phase II. That doesn't mean that phase II of automated creation has to be blind to the duplication issue. I don't think it is a good idea to apply the same workflow to all wikipedias. (And I would say, as a Wikisource editor, there is much work to do there, also.) Charles Matthews (talk) 13:59, 9 August 2020 (UTC)
@Charles Matthews: Originally I also want to cover Wikisources; as more controversies are expected (and met in the past), The schedule currently only include the Chinese one. So for phase II there are some options:
  1. Create items for older articles en masse, as I originally proposed.
  2. Increase the interval between creations, e.g. Create items once each year, which is what I have done between 2014 and 2020 - this does not solve all issues.
  3. Not creating the items at all. This will result in infinitely growing backlog which I am strongly worried about (even for cebwiki). And also in the future other users will create items covering the same topic without notice of local articles.
  4. Manually checking each article - require language skill and not always scalable
  5. Create them subset by subset (I imported more than 40000 individual English Wikisource articles)
  6. And other ideas?--GZWDer (talk) 14:10, 9 August 2020 (UTC)

@GZWDer: In the big picture, this really is not a simple issue. Here is a table for what seems to be needed.

Plan for: Phase I Phase II Comments
A: smaller wikipedias
B: larger wikipedias
C: other wikis

There is a comments column because: firstly, there are points about scope (deciding about "smaller", "larger" and ceb); secondly, there are issues about item creation in Phase II. Charles Matthews (talk) 08:03, 10 August 2020 (UTC)

  Oppose Looks to me like you want permission to run a new bot which basically just does the same thing your old bot did. It's clear to me that you will not get a go for something like that. If you want permission for a new bot, you will need to build something that addresses our concerns substantially better than your old one.--Hjart (talk) 16:12, 8 August 2020 (UTC)

This is what being planned:

Plan for: Setting Comments
Group 1: All Wikipedias other than those listed below 14/0 (14/7 is also OK but not my favor) Should we split it to larger and smaller ones with different settings>
Group 2: Some Wikipedias such as dawiki 21/0 (or 21/7, 30/7) Wikis with active users handling Wikidata. Alternatively each wiki may use a custom setting.
Group 3: zhwiki, zhwikisource, all Wikinews 1/0 If agreed any Wikidata actions (i.e. improvement and merge) can happen after item creation. The client sitelink widget functions regardless whether an item exist.
nlwiki, cswiki Not to be done
cebwiki Planned to mass import regardless of duplicate, then treat as Group 1 Leaving unconnected indefinitely will result in more and more duplicates
arzwiki Currently skipped, but eventually to be done Currently there are many articles created based on Wikidata and is not connected to Wikidata; it is being fixed
Wikisource (other than zh):
Non-subpages in main namespace
Pages in author namespace
Treated as Group 1 (by default) or 2
Wikisource (other than zh):
Subpages
Manual batch import with a case-by-case basis
  • Wikisource (other than zh) is not planned initially
  • "Pages" includes articles and categories, but non-Wikipedia categories is not planned initially

--GZWDer (talk) 08:39, 10 August 2020 (UTC)

  •   Oppose until previous problems are fixed, e.g. User_talk:GZWDer#Mass_creation_of_items_without_labels. --- Jura 12:15, 11 August 2020 (UTC)
  •   Oppose if the behaviour of the flood bot isn't addressed. Could you please add some heuristic before creating the item, for example check for similarly named items and if there's already something with a 50% similarity leave it in a list to review manually? --Sabas88 (talk) 08:56, 14 August 2020 (UTC)
    • @Sabas88: I am afraid that this will still result in a backlog. For some time when Wikidata item creator is functional (since deprecated in favor of PetScan), it checks and skips any pages with label same as an existing item. After several runs, the list of skipped pages become longer and longer. I do not think it is scalable for any human checking beforehand. Anyway, new pages are held for several days, and users may creating items or linking them to existing one. It is unlikely for a page to be taken care of when there have been a significant period since it is created.--GZWDer (talk) 23:17, 14 August 2020 (UTC)
  •   Comment Is there any way we can stem this problem at the source - when a user creates a new page on a language wiki, can we get the UI to immediately try to link it to an existing wikidata item, encourage the user to select the right one? Is there maybe a phabricator task for this? This sort of bot action really can't be the correct long-term solution for this problem. I have run across many, many, maybe over a hundred, such page creations that should have been linked to an obvious existing wikidata item, and required a later item merge on Wikidata. ArthurPSmith (talk) 17:53, 18 August 2020 (UTC)
I like the idea by @ArthurPSmith: very much. @Lydia Pintscher (WMDE): what do you think? Would that be possible to implement? From my point of view, one problem is, that a lot of creators of articles, categories, navigational items, templates, disambiguations, lists, commonscats, etc. are either not aware of the existance of wikidata or did forget to connect a newly created article etc. to an already existing object or to create a new one if not yet existing (which leads to a lot of duplicates, if this creation respectivley connection is not done manually, but by a bot instead, which have to be merged manually). An additional step after saving a newly created article etc. to present to the user a list of wikidata objects (e.g. a list of persons with the same name; could be a similar algorithm as the duplicate check / suggestion list in PetScan, duplicity example 1 and duplicity example 2) that might be matching or the option to create a new one if no one matches. Thanks a lot! --M2k~dewiki (talk) 22:47, 28 August 2020 (UTC)
also ping to @Lucas Werkmeister (WMDE), Mohammed Sadat (WMDE), Lea Lacroix (WMDE): for info --M2k~dewiki (talk) 22:50, 28 August 2020 (UTC)
Also ping to @Lantus, MisterSynergy, Olaf Studt, Bahnmoeller: In addition, i think the User:Pi bot operated by @Mike Peel: does a great job with connecting to existing objects or creating new ones if not existing for items regarding peoples (currently only for the english wikipedia, until june 2020 for about one year also for the german wikipedia, Thanks a lot to Mike! - In my opinion this should be reactived also for the german wikipedia). Of course, the algorithm could be improved, for example by also considering various IDs (like GND, VIAF, LCCN, IMDb, ...). The algorithm is described here: User_talk:Mike_Peel/Archive_2#Matching_existing_wikidata_objects_with_unconnected_articles.

Since this very fundamental problem of connecting articles to existing objects respectivley creating new objects for unconnected pages (when, by whom, how to avoid duplicates, ...) for hundreds of newly created articles per day in different language versions has been discussed for years now, the above proposal by ArthurPSmith could be a solution to it. It might be combined with specialized bots like Mikes Pi bot for people (and maybe others for movies, geographic objects, lists, categories, ...).

Also see

Also for info to @Derzno, Jean-Frédéric, Mfchris84, Giorgio Michele, Ordercrazy: another problem regarding item creation and duplicates is, that there are a lot of already existing entries e.g. for french or german monuments, churches, etc. which contain the monument ID. E.g. for bavarian monuments there are currently 160.000 wikidata objects. But if a user connects an newly created article to an (unconnected) commonscat (using the "add other language" in the left navigation) for this monument an additional wikidata object is created, so there is one object containing the sitelinks to the article and the commonscat and another one with the monument ID. Currently the only solution is to connect a newly created commonscat for a monument as soon as possible to the already existing wikidata object with the monument ID, so if a user connects an article to this commonscat, then the existing wikidata object will be used, otherwise a new one only with the two sitelinks will be created. For example, in 2020 so far about 1.000 new commonscats for bavarian monuments have been created, which have not been connected to the already existing wikidata objects by the creators of the commonscats.

Also see:

Hello @Lantus, Olaf Studt, Bahnmoeller: I have now create these two pages:

The first one might help to find and connect unconnected articles, categories, templates, ... from de-WP to existing objects respectivley to create new wikidata objects. The second one might help to enhance existing objects with IDs (using HarvestTemplates for GND, VIAF, LCCN/LCAuth, IMDb, ...) or other properties (e.g. using PetScan based on categories). Parts of the functionality of these two pages might be sooner or later be implemented in (specialized) bots. --M2k~dewiki (talk) 01:23, 29 August 2020 (UTC)

The problems with Bavaria is slightly complex and a mixture of many different issues. It get started with the bot transfer in 2017 followed by certain others root causes. At the end the datasets are in a very bad shape and it’s a nightmare to clean up. Day by day I’ll find new corners of surprises. Currently I’m working hard to get a couple of these issues fixed. On top of this bot issues we need to find a way to get people stopped, working out by the same way as in the German Wikipedia. Some folks pulling together again and again things to be in line with articles. So the P4244 issue list will be filled up again and again with doubles, wrong using and violations. I have no idea how we can get stopped but personally I’d gave up to discussing with people having no mindset on database design and definitions. Most are living in their own world. Anyhow, crying doesn’t help and I’m doing my best to drop the P4244issues. To be honest this is a job for month and many item needs to be checked manually. So I’m not happy to get through the back door uploaded with new issues of a bot task. --Derzno (talk) 03:31, 29 August 2020 (UTC)
  • As I have said many times: In most wikis, there are not enough people to taken care of unconnected pages. If possible, I can postpond the item creation (the plan is 14 days after article creation), but the backlog must be cleaned eventually.--GZWDer (talk) 04:11, 29 August 2020 (UTC)
@Derzno: the problem is not only related to Bavarian monuments, but affects all cases in all languages and all language versions of wikipedia and all sort of object types (e.g. movies with totally different names in different languages, chemical components, ...), where datasets have been imported before, but not connected to articles, commonscats, etc. How would a user find the right object between the 90 million existing objects (Special:Statistics)? If a user is looking for "Burgstall Humpfldumpf", does not find it and therefore creates a new object for this article/commonscat-combination, while there might exist an object "Bodendenkmal D-123-456-789" or an object for the japanese or russian translation? Duplicates eventually might be identified and merged by identical IDs (like GND, LCCN/LCAuth, VIAF, IAAF, IMDB, monuments IDs like Palissy-ID for french monuments, BLf-ID for Bavarian monuments, DenkXWeb Objektnummer for Hesse state monuments, BLDAM for monuments from Brandenburg, LfDS for monuments from Saxony, P2951 for Austrian monuments, CAS-Number for chemical components, ...). How could the process of the matching by ID (currently there are more than 8.000 properties, a lot of them are IDs) be handled on a large scale (i.e. every day in several language versions of wikipedia hundreds of new articles are created) which need to be connected to maybe already existing objects? --M2k~dewiki (talk) 07:21, 29 August 2020 (UTC)
So we created these items (including duplicates) first, and someone will improve them; duplicates discovered and merged. Originally I expected this will become the primary workflow for unconnected pages - this is why I previously run the bot at 1/0 instead of the default 14/7. There are people who taken care of new Wikipedia articles; Previously my expection is completely move Wikidata handling after item creation. Using a delay is expected by many users, but works (i.e. clearing unconnected pages) should eventually be done. In another word, I give people some time to do Wikidata connection, and after the time limit, new items are created automatically.--GZWDer (talk) 09:05, 29 August 2020 (UTC)
Also see Wikidata:Contact_the_development_team#Connecting_newly_created_articles_to_existing_objects_resp._creating_new_object_-_additional_step_when_creating_articles,_categories,_etc. (difflink). --M2k~dewiki (talk)
  •   Oppose. Moving a backlog from place A to place B makes sense only if place B has a more active community or better tools, but this does not seem to be the case. I often encounter duplicates created by this kind of bots many years ago, and at least on my home wiki some of them might have been spotted earlier if they had simply remained unconnected. --Silvonen (talk) 16:18, 12 September 2020 (UTC)
    • Without a bot of this kind page can left unconnected indefinitely, which may be not optimal for users using PetScan, HarvestTemplate and projectmerge, or even users try to find a item about the topic (they will never know a local unconnected page and it is almost impossible to check each of 900 wikis to find whether a topic exists). If a specific wiki does not have enough people to handle all unconnected pages, we have a reason to mass create them (after a period). (Yes, for wikis with some active users doing so, we can postpond them; but even nlwiki requires a bot to clean up the backlog.)--GZWDer (talk) 10:34, 13 September 2020 (UTC)
  •   Oppose. I'm sceptical in general, but actively hostile to anything run by this user ever since I found a group of hundreds of duplicate items that were so easy to connect to their originals, I managed to do it with Quickstatements. Bots currently have several orders of magnitude more capacity here than active manual editors. With that in mind, running a bot that carelessly adds wrong/duplicate items that require manual correction is wasting the more precious resource. A semi-automated workflow would be preferable, and might have an easier time attracting users if my impression is correct that creating new items is the subjectively more rewarding experience compared to correcting existing items. --Matthias Winkelmann (talk) 00:22, 13 September 2020 (UTC)
    • "new items is the subjectively more rewarding" - yes for the reason I stated above. It requires some work to clean up duplicates, but bring them to Wikidata will allow more users noticing it, especially for wikis with few users handling them locally.--GZWDer (talk) 10:34, 13 September 2020 (UTC)
@Charles Matthews: You have not commented on this plan yet.--GZWDer (talk) 10:35, 13 September 2020 (UTC)
@GZWDer: I commented on 8 August that the urgency of item creation here for newly-created articles on wikipedias is not as great as you are assuming. My view remains the same. Certainly for enWP, which most concerns me, waiting longer and adding more value to items that are created is a good idea. So I will not support a plan of this kind. Charles Matthews (talk) 10:47, 13 September 2020 (UTC)
@Charles Matthews: For wikis with active users handling unconnected pages, it may wait a bit latter. But it is less likely for a page to be connected if they are not connected in a while (for this point a tradeoff must be chosen), and not creating them also impede the usage of many tools (as I responsed to Silvonen).--GZWDer (talk) 11:04, 13 September 2020 (UTC)
@GZWDer: Clearly, there are a number of trade-offs to consider here. But since we don't agree about those trade-offs, we are not so likely to agree on a plan. I am arguing from my actual workflow, starting with PetScan (queries on User:Charles Matthews/Petscan). I become involved in article writing, such as w:Sir James Wright, 1st Baronet, through using queries. Using those queries is positive for my work on enWS and enWP. I think Wikidata is important in integrating Wikimedia projects, so I do not oppose the principle of automated creation of items here. But I do oppose doing it too quickly. Charles Matthews (talk) 11:15, 13 September 2020 (UTC)
  • hmm Wikidata:Requests for permissions/Bot/JonHaraldSøbyWMNO-bot 2 - this is one of the reasons I proposed to mass import pages from Cebuano Wikipedia (and other wikis): others will import something similar, so import them earlier will reduce the number of duplicates.--GZWDer (talk) 14:14, 28 September 2020 (UTC)
  • I didn't read the whole talk but shouldn't it be on Wikipedia side? So after user save his article window with reminder to connect article to Wikidata item should pop up or something similar. Eurohunter (talk) 16:15, 21 December 2020 (UTC)
  •   Oppose Frankly, I am getting a bit tired of all these one sitelink item creations. From an Wikipedia point of view, statements should not be taken from the Wikipedias (and especially not with tools or bots that don't reuse the existing citations) and the length of the backlog does not matter at all. On an priority list on Wikipedia this backlog of unconnected pages is always going to be low down on the list, as it should be.--Snaevar (talk) 18:19, 21 December 2020 (UTC)
@Eurohunter: also see meta:Community Wishlist Survey 2021/Wikidata/Creation of new objects resp. connecting to existing objects while avoiding duplicates. --M2k~dewiki (talk) 18:28, 21 December 2020 (UTC)
@M2k~dewiki: Just wanted to vote but it ended. Eurohunter (talk) 20:05, 21 December 2020 (UTC)
  • In case it wasn't clear earlier, I   Support this bot request. Duplicates are an issue (I frequently merge items created by this bot), so I think it is best if the bot waits for a few days before creating the item, but not running it creates a backlog of unconnected items that gets in the way of matching new items. Pi bot also now imports various statements (such as commons category links and descriptions, hopefully coordinates soon) for non-humans, but only if the item already exists - and again, not having the Wikidata item creates backlogs for those tasks. @GZWDer: I know you don't like it, but could you adopt the '14/7' rule please, and clear the backlog? Thanks. Mike Peel (talk) 19:11, 28 December 2020 (UTC)
  • So:
Plan for: Setting Comments
Default: All Wikipedia, and Wikisource non sub-pages 14/0 or 14/7 -
Some specific wikis (please comment below) TBD
All Wikinews 1/0 If approved, will succeed Wikidata:Requests for permissions/Bot/RegularBot 3
nlwiki, cswiki Not to be done
cebwiki Items will be created with at least one identifier (or source) other than Geonames. The actual code is to be developed.
arzwiki Currently skipped Will be re-evaluated if bot-creating article is stopped
  • @Jheald, Edoderoo, Pintoch, Jc3s5h, Charles Matthews, Hjart: Please comment, if you want a different configuration, either in general, or in a specific wiki.--GZWDer (talk) 19:26, 28 December 2020 (UTC)
    @GZWDer: Being pragmatic (what has a chance to be be approved?), I suggest that you just look at Wikipedias for this task, go with 14/7 with a list of excluded Wikipedias, and leave the rest for other bot tasks. Thanks. Mike Peel (talk) 19:33, 28 December 2020 (UTC)
    @GZWDer: I've said it before, but since you don't seem to understand it, I guess it needs to be said again.. You need to actively ask every single wikipedia for permission before running any bots on them. Danish wikipedia i.e. has had people handling unconnected pages for years and I guess many other wikipedias has too. At least, don't touch dawiki. Thanks --Hjart (talk) 22:27, 28 December 2020 (UTC)
  • @Hjart: Do your community run a bot that cleans up very old backlog? If no I will run it on 30-day old pages. P.S. You did not responsed to my comment at Wikidata:Requests_for_permissions/Bot/RegularBot 3.--GZWDer (talk) 22:31, 28 December 2020 (UTC)
    @GZWDer: Yes. we do have such a bot. And from watching some german activity, I guess they do too. Again, please ask every single community before doing anything to their backlogs. And don't touch dawiki at all. --Hjart (talk) 22:38, 28 December 2020 (UTC)
    OK. --GZWDer (talk) 22:39, 28 December 2020 (UTC)
  • I still oppose this, as I am not confident the operator can respect the views of the community on this. General lack of trust in them given the history in this area. If this task is important, someone else will step in to do it, no one is (or should be) irreplaceable. − Pintoch (talk) 21:43, 30 December 2020 (UTC)
  • There clearly isn't yet a meeting of minds here. Charles Matthews (talk) 11:10, 2 January 2021 (UTC)

RegularBotEdit

RegularBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Doing fully-automatic and periodic tasks (see below)

Code: See /data/project/largedatasetbot-regular in Toolforge

Function details:

Current tasks are:

Potentially future tasks:

Before the task moved to a dedicated bot account, they are performed via the GZWDer (flood) account. But this needs to be moved to a new account to cope with phab:T258354. (The request is not ready to be approved until we decided how to use the user group.) --GZWDer (talk) 13:24, 4 August 2020 (UTC)

  Support Definitely preferable to run this under a bot account, thanks! ArthurPSmith (talk) 18:37, 4 August 2020 (UTC)
  •   Oppose until problems from previous runs of the operator's bots are fixed, e.g. User_talk:GZWDer#Mass_creation_of_items_without_labels. --- Jura 12:16, 11 August 2020 (UTC)
    • @GZWDer: I don't think it's adequate that the response to problems raised on the operators talk page is limited to noting that the same tool wont be used again and another bot will clean it up (what hasn't happened in four months). Can you present a plan to investiage previous problems with your bots and a way to track their resolution? I don't expect you to fix them all yourself (you can place requests on Wikidata:Bot_requests), but similar problems need to be identified and you need to ensure they don't reoccur. It's not ok that you bork items for Wikisource (leave it to others to clean up) and then years later you do the same again. --- Jura 05:59, 12 August 2020 (UTC)
      • @Jura1: Do you find any examples that are not fixed?--GZWDer (talk) 14:40, 12 August 2020 (UTC)
        • I had found 2000. I don't think think it's for Matej or myself to fix or check if identified problems are fixed or not. It's really up to you to do that. Can you do that and come back? --- Jura 06:16, 13 August 2020 (UTC)
          • @Jura1: For example?--GZWDer (talk) 08:20, 13 August 2020 (UTC)
            • Sample for what? --- Jura 09:01, 13 August 2020 (UTC)
              • @Jura1: Do you find any items that have such issue and are not fixed?--GZWDer (talk) 09:04, 13 August 2020 (UTC)
                • The comment linked above pointed to 2000 of them. --- Jura 09:12, 13 August 2020 (UTC)
                • @Jura1: But they are fixed.--GZWDer (talk) 09:18, 13 August 2020 (UTC)
                  • The question for you is if your bot(s)/account(s) created more simiarly defective items and if they all have been fixed since. Further, if all other defects raised to you have been followed up. --- Jura 09:22, 13 August 2020 (UTC)
                    • I don't think so. Feel free to point to an example if it is not the case.--GZWDer (talk) 09:24, 13 August 2020 (UTC)
                      • The problem with the label of Q75877437 raised in 2019 is still unresolved (and probably thousands of similar ones). --- Jura 09:30, 13 August 2020 (UTC)
                        • @Jura1: See https://w.wiki/ZaU --GZWDer (talk) 23:37, 14 August 2020 (UTC)
                        • Ty. When I wrote that comment in 2019 I thought it was helpful to include an entire regex of cases that needed fixing. I fixed some, others fixed more, but there is still left. Maybe we should add it to Wikidata:Bot_requests. --- Jura 17:02, 15 August 2020 (UTC)
  • Comment If this request is approved (on which I am giving no opinion), it should only be on the firm condition that the bot creates no new items under this task -- i.e. any job that might involve creating items would need to be submitted as a new bot request, for separate discussion, and should not go ahead under this approval. Any such new bot request would need to set out in detail for consideration what actions were being proposed to avoid the creation of duplicates, and how the new items would be properly populated with enough statements to make them well identifiable. Given GZWDer's previous tendency to be rather "relaxed" on both these scores in the past (at least in the eyes of many), I believe this limitation, and requirement in future of specific approval before any such tasks, to be necessary. Jheald (talk) 13:59, 12 August 2020 (UTC)
    • This task will only create items from Prime Page, I hope there will be no duplicates.--GZWDer (talk) 14:39, 12 August 2020 (UTC)

OrcbotEdit

Orcbot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: EvaSeidlmayer (talkcontribslogs)

Task/s: The bot makes use of author-publication-matches from ORCID database to match existing publication-items and author-items in Wikidata.

Code: https://github.com/EvaSeidlmayer/orcid-for-wikidata

Function details: The bot aims for matching authors-items and publication-items on the ground of ORCID database. Only already in Wikidata existing authors-items and already in Wikidata existing publications-items are matched.

ORCID contains in 2019 eleven archive files. For the first archive file we had been able to detected:

  • 457K Wikidata publication-items (3.8M publications in total)
  • 425K publication-items do not have any author-item registered
  • 32K publications are identified in Wikidata with registered authors
  • of those 32K publication-items:
    • 3.7K author-items listed in Wikidata are correct allocated to their publication-items (11.7%)
    • 4.2K author-items listed in Wikidata are not yet allocated to publication-items (24.6%)
    • The other authors are not registered to Wikidata yet.

These are the numbers only for the *first* of *eleven* ORCID-files. Would be cool to introduce the matching of authors to publications on ORCID basis.

  • @EvaSeidlmayer: Thanks for working on this. One thing I don't see in your github README or your statement here is how you plan to match up the authors with the existing author name string (P2093) entries for these articles - or is the plan just to add the author (P50) entries with no qualifiers and not removing the existing name strings? Matching name strings is quite tricky, especially as given names are often abbreviated, some parts of names may be left out, joined together in different ways, etc. Not to mention name changes... And there can be two authors on the same paper with the same surname, or partially matching surnames. These issues have tripped up a number of automated approaches here in the past. There are also issues with duplicate or otherwise erroneous ORCID records, which have also tripped things up - for example there have been some major imports of this sort of author data from Europe PMC which, when there are duplicate ORCID's, lists both, resulting in an offset for all the author numbers (series ordinal (P1545) qualifiers) after that point. Anyway, this is definitely useful, but can be harder than it seems. ArthurPSmith (talk) 17:50, 30 July 2020 (UTC)
  • @EvaSeidlmayer, ArthurPSmith:   Support. It is accurate that the bot task is harder than described. However, it is important to begin even if the bot is not complete. Effectively, she can further develop later the bot's source code. Just a brief note. I advise Ms. Eva to create a user page for the bot on Meta. --Csisc (talk) 18:25, 30 July 2020 (UTC)
  • @Csisc: I created a user page on Meta. However, now I dont see how I can create/retrieve a API-token for the bot. Is there a documentation?
Please also make some test edits.--Ymblanter (talk) 19:15, 12 August 2020 (UTC)
  • @Ymblanter: I tried to make some test edits in the test Wikidata instance taking into account the different properties numbers as well. However, I was told I do not have the bot right for pushing to the test instance. Where can I get the bot right for the test instance? --Eva (talk) 16:03, 26 August 2020 (CET)
    Sorry, I am not sure I understand the question. Can you make about 50 test edits here? You do not need the bot flag for the test edits.--Ymblanter (talk) 18:58, 26 August 2020 (UTC)
    @Ymblanter: Hm. Strange. I checked again, the instance I refer to is the test instance: "wb config instance / https://test.wikidata.org/w/api.php" But when I push only *one* file (such as "wb create-entity Q123.json") I get: "{ assertbotfailed: assertbotfailed: You do not have the "bot" right, so the action could not be completed...." What did I do wrong? --Eva (talk) 09:44, 27 August 2020 (CET)
    Unfortunately, I do not know. You may want to ask at a better watched place such as the Project Chat--Ymblanter (talk) 18:27, 27 August 2020 (UTC)
  • I managed to do some test edits with Orcbot in the test instance. In order to connect them with Orcbot subsequently by adding author statements to article items P242, I created some scientific article items and authors item manually.
    • authors:
      • Josepha Barrio Q212734
      • Shuai Chen Q212749
      • Raphael de A da Silva Q212755
    • articles:
      • Prevalence of Functional Gastrointestinal Disorders in Children and Adolescents in the Mediterranean Region of Europe. Q212738
      • Dietary Saccharomyces cerevisiae Cell Wall Extract Supplementation Alleviates Oxidative Stress and Modulates Serum Amino Acids Profiles in Weaned Piglets Q212750
      • Amino-acid transporters in T-cell activation and differentiation. Q212751
      • Dietary L-glutamine supplementation modulates microbial community and activates innate immunity in the mouse intestine. Q212752
      • Insight in bipolar disorder: a comparison between mania, depression and euthymia using the Insight Scale for Affective Disorders. Q212753
      • Changes in absolute theta power in bipolar patients during a saccadic attention task. Q212754


The article now have an author statement what was missing before. The template for the connection looks like this: {"id": "Q212754", "claims": {"P242": {"value": "Q212755", "qualifier": [{"P80807": "('Rafael', 'de Assis da Silva')"}]}}}

@Csisc, Ymblanter: What is the next step to establish the Orcbot? --Eva (talk) 14:04, 1. September 2020 (CET)

Could you please do a few edits here (they may be the same as on test wikidata if appropriate).--Ymblanter (talk) 20:06, 1 September 2020 (UTC)
@EvaSeidlmayer: Can you write down the message in red issued by the compiler. --Csisc (talk) 09:49, 2 September 2020 (UTC)
@Csisc: Not sure if this is the message expected, but this is what I get when I try to log in after I reset the credentials to: "invalid json response body at http://www.wikidata.org/w/api.php?action=login&format=json reason: Unexpected token < in JSON at position 0" This is the red part. However, first I am asked to "use a BotPassword instead of giving this tool your main password". --Eva (talk) 14:02, 2. September 2020 (CET)
@EvaSeidlmayer: Try to use requests.post instead. See https://www.wikidata.org/w/api.php?action=help&modules=login for login documentation. --Csisc (talk) 14:17, 3 September 2020 (UTC)
Hey @Csisc:, when I'm logged in as EvaSeidlmayer@Orcbot using abc1def2ghi3jkl4mno5pqr6stuv7wxyz as password I receive this message: "permissiondenied: You do not have the permissions needed to carry out this action." I use Wikidata-CLI for the interaction. --Eva (talk) 22:43, 4. September 2020 (CE
@EvaSeidlmayer: Try to use Orcbot as a username (just the bot username). You can also change to Wikidata Integrator (https://pypi.org/project/wikidataintegrator/). --Csisc (talk) 11:35, 8 September 2020 (UTC)
It worked after I updated the bot password including "edit existing pages". :) Afterwards, I was able to do the test edits:

The authors are now registered (P50) to their publications:

Q48080592 Changes in absolute... → Q47701823 Raphael de A da Silva
Q40249319 Insight in bipolar... → Q47701823 Raphael de A da Silva
Q43415493 The complete picture of changing pediatric inflammatory... → Q85231573 Josefa Barrio
Q37721105 Dietary Saccharomyces cerevisiae... → Q61824599 Shuai Chen 
Q41082700 Amino-acid transporters..  → Q61824599 Shuai Chen
Q51428341 Dietary L-glutamine supplementation.. → Q61824599 Shuai Chen

@Csisc:, sorry it took so much time! --Eva (talk) 09:27, 9. September 2020 (CET)

@EvaSeidlmayer: This is an honour for me. --Csisc (talk) 15:00, 9 September 2020 (UTC)

What is the next step to get this approved? NMaia (talk) 13:28, 24 November 2020 (UTC)

I still do not see test edits--Ymblanter (talk) 19:56, 25 November 2020 (UTC)
@EvaSeidlmayer: Did you make the test edits by running the bot script with your account, e.g. this edit to add an author? I notice that you didn't add stated as (P1932) or series ordinal (P1545) qualifiers and the author name string (P2093) claim for the same author was not removed. Will Orcbot make these edits when importing data?
Presuming Orcbot is going to add stated as qualifiers, will the name formatting be consistent with an item's existing author and author name string statements? Since the large imports of scholarly article (Q13442814) bibliographic data were, to the best of my knowledge, primarily from PubMed and CrossRef, there is a risk that using a different source (i.e. ORCID) could result in inconsistent data, such as a combination of initialised and full given names. It won't be an issue when adding authors to new publication items created by Orcbot. But it might be preferable to handle existing items differently and copy data from the existing author name string to a new author claim. Simon Cobb (User:Sic19 ; talk page) 01:08, 8 January 2021 (UTC)
Hey @Sic19:, thank you for thinking along! Regarding the problem of potential different "name formatting" from PubMed, CrossRef and ORCID, the OrcBot requests all labels and aliases for an author QID (which is supposed to be registered as author (P50) to an article (reference ORCID public data file)). OrcBot uses the following command for doing this:
wb d author_QID  | jq -r '.labels,(.aliases|.[])|.[].value' | sort | uniq 

Then, OrcBot compares all of these spellings with the names stated in author name string (P2093). By this means, OrcBot makes sure that the series ordinal from author name string (P2093) can be transferred correctly to author (P50). Does this solve your objection? Did I understand you correctly? Eva (User:EvaSeidlmayer ; talk page) 19:07, 14 January 2021 (UTC)


Hey, sorry for late response. Yes, OrcBot runs as EvaSeidlmayer. I can change this if this is necessary. When Rdmpage pointed out the lack of series ordinal (P1545) and author name string (P2093). I stopped OrcBot (in November 2020). I am currently on the improvement of OrcBot which involves the transfer of information (series ordinal) from author name string (P2093) to author (P50). Afterwards, the author name string (P2093) statement will be deletetd as some tools cannot deal with both statements (author (P50), author name string (P2093)) at the same time. Eva (User:EvaSeidlmayer ; talk page) 18:44, 14 January 2021 (UTC)

OpenCitations BotEdit

OpenCitations Bot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Csisc (talkcontribslogs)

Task/s: Adding references and citation data for scholarly publications found in Wikidata using Wikidata tools and OpenCitations.

Code: Not developed

Function details:

  • This bot retrieves the Wikidata ID and DOI of scholarly publications using WDumper. Then, it uses the REST API of OpenCitations to retrieve the DOI of the references and citing works of each publication. Finally, the obtained DOI are converted to Wikidata ID using the WDumper-based dump and the final output is automatically added to Wikidata using QuickStatements API as cites work (P2860) relations.
  • The License of OpenCitations is CC0.

--Csisc (talk) 13:23, 29 July 2020 (UTC)

We had another bot doing this work a while ago, is it no longer operational? Or was there a reason it stopped? Also, since each article usually has a dozen or more references, sometimes many times that, it would be better to add the references in a single update, rather than one at a time as would be necessary through QuickStatements. Other than that, yes it would be good to get this added to Wikidata. ArthurPSmith (talk) 17:51, 29 July 2020 (UTC)
@ArthurPSmith: What I found out is that there are many publications in Wikidata not linked to their reference publications although reference data about them are available in OpenCitations. I can restrict the work to scholarly publications not having any reference using SPARQL. --Csisc (talk) 09:34, 30 July 2020 (UTC)
@ArthurPSmith: We had User:Citationgraph bot and User:Citationgraph bot 2 work on this. Both stopped operating in 2018, since their operator, User:Harej, had rearranged his priorities. Yes, it would make sense to add all cites work (P2860) statements for an item in one go, e.g. via Wikidata Integrator. Not sure how the bot should handle citations of things for which Wikidata does not have an entry yet — perhaps with "no value" and "stated as", so that the information can be converted later as needed. --Daniel Mietchen (talk) 09:02, 7 September 2020 (UTC)
@ArthurPSmith: @Daniel Mietchen: Just wanted to mention that Citationgraph bot seems to be back online and Citationgraph bot 2 would follow along soon. I wish we had some something like Scroll To Text Fragment widely supported as web standard, or some paragraph-based anchoring, so I could point to the exact paragraph in this long thread (look for "Harej" there instead). --Diegodlh (talk) 22:05, 15 February 2021 (UTC)
@Csisc: Hi! I understand this was part of the Wikicite grant proposal you presented last year. I'm sorry it wasn't approved. Do you plan developing the bot anyway? Now that Elsevier has made their citations open in Crossref, I understand COCI coverage will see a dramatic increase next time it is published (last time was 07 Dec 20, before Elsevier's announcement). Thank you! --Diegodlh (talk) 04:53, 28 January 2021 (UTC)
Diegodlh: Of course, I am still for developing the bot. However, we need a server to host it. If the bot can be hosted, I do not mind developing it. The acceptance of Elsevier to include its citation data in OpenCitations corpus will certainly allow a trustworthy coverage of citation data in Wikidata graph. --Csisc (talk) 12:55, 31 January 2021 (UTC)
Hi, @Csisc:! Thanks for answering. Sorry I'm relatively new in this. Cannot it be hosted in Toolforge? --Diegodlh (talk) 18:49, 1 February 2021 (UTC)
@Diegodlh: I am studying this. The matter with Toolforge is that the Cloud can be easily blocked. --Csisc (talk) 12:25, 2 February 2021 (UTC)
  • I am very excited about this project. The reason my old bot shut down was, among other factors, the scaling issues. I was no longer able to get a reliable mapping of Wikidata items and DOIs from the Wikidata Query Service. The use of WDumper addresses that nicely. For data sources I also recommend PubMed Central. Harej (talk) 21:40, 9 September 2020 (UTC)
Please develop the code and make some test edits.--Ymblanter (talk) 19:30, 10 September 2020 (UTC)
  • Just as an observation, I have been trying to produce a dump of DOIs on Wikidata, and the task has yet to complete after seven days and as of writing is going to take months to complete. However I am developing an alternative strategy for producing lists of identifiers and hope to share more later. Harej (talk) 22:51, 6 October 2020 (UTC)
    • I have generated a dataset of Wikidata items with DOIs as of the 20 August 2020 dump. This should definitely help you get started. Harej (talk) 21:51, 7 October 2020 (UTC)
Ymblanter, Harej: I thank you for your answer. I will consider your comments and develop the bot for several months. --Csisc (talk) 12:55, 31 January 2021 (UTC)
Great, I will be looking forward.--Ymblanter (talk) 20:09, 31 January 2021 (UTC)

TwPoliticiansBotEdit

TwPoliticiansBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Jd3main (talkcontribslogs)

Task/s: Import data of politicians in Taiwan.

Code: We are still working on the code. The GitHub link will be added soon.

Function details: We plan to crawl data from the database of the Central Election Commission (link). When there are potential errors or duplications, this bot might skip these data and report them to the operator. --TwPoliticiansBot (talk) 14:31, 12 July 2020 (UTC)

T cleanup botEdit

T cleanup bot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Jura1 (talkcontribslogs)

Task/s: cleanup leftover from last incident, once discussion closed

Code:

Function details: Help cleanup as needed ----- Jura 17:39, 21 June 2020 (UTC)

Non-admins can not delete items.--GZWDer (talk) 20:02, 21 June 2020 (UTC)
@Jura1: Is this request still active? If so, please provide a permanent link to the relevant discussion. Hazard-SJ (talk) 06:06, 7 October 2020 (UTC)
  • I think some cleanup is still needed. I keep coming across duplicates. Supposedly beyond the identified ones, there are plenty more. Given that @Epìdosis: wants to work with them, we might as well keep them. It seems that most other users don't care or filter them out as well. --- Jura 18:57, 7 December 2020 (UTC)
    • Duplicates still need to be merged in big numbers (thousands), but this needs human check, as discussed. I cannot provide much more help, but I wouldn't delete anything, as the items are well sourced and, despite duplication, contain valuable information. In fact, it's not the sole import of items being well sourced but also having a high percentage of duplicates, and unfortunately the only way to act in these cases is trying to merge (mostly manually) duplicates. --Epìdosis 21:00, 7 December 2020 (UTC)
      • If you want to keep them and clean them up, let's close this. Just bear in mind that what you might consider well sourced can be a wiki or tertiary source, with eventually similar problems as Wikipedia in general. --- Jura 21:22, 7 December 2020 (UTC)

OlafJanssenBotEdit

OlafJanssenBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: OlafJanssen (talkcontribslogs)

Function: Replace dead, outdated or non-persitent links to websites of the KB (national library of The Netherlands) in Wikidata with up-to-date and/or persistent URLs

Code: https://github.com/KBNLwikimedia/WikimediaKBURLReplacement and https://github.com/KBNLwikimedia/WikimediaKBURLReplacement/tree/master/ScriptsMerlijnVanDeen/scripts,

Function details: This article explains what the bot currently does on Dutch Wikipedia (bot edits on WP:NL listed here. I want to be able to do the same URL replacements in Wikidata, for which I'm requesting this bot flag. The bot flag for this type of task is already enabled on Dutch Wikipedia, see here for approval

--OlafJanssen (talk) 21:45, 11 June 2020 (UTC)

I will approve this task in a couple of days, provided that no objections will be raised. Lymantria (talk) 09:45, 20 June 2020 (UTC)

@Lymantria, OlafJanssen:

  • I don't really see it do useful edits. It's somewhat pointless to edit Listeria lists ([5], etc) and one should avoid to edit archive pages [6][7]. --- Jura 10:16, 24 June 2020 (UTC)
  • Discussion should take place at User talk:OlafJanssen. Lymantria (talk) 10:21, 24 June 2020 (UTC)
  • I think it should be un-approved. Shall I make a formal request? --- Jura 10:26, 24 June 2020 (UTC)
    • No. Let's reopen this discussion. Lymantria (talk) 08:07, 26 June 2020 (UTC)

Recipe BotEdit

Recipe Bot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: U+1F360 (talkcontribslogs)

Task/s: Crawl https://www.allrecipes.com and insert items into wikidata with strctured data (ingredients and nutrition information).

Code: TBD. I haven't written the bot yet. I would like to get feedback before doing so.

Function details:

  • Crawl https://www.allrecipes.com and retrieve the structured data (example) for a recipe.
  • Parse the list of ingredients and nutrition information, Halt if any items are not parsed cleanly.
  • See if a Wikidata item already exists (unlikely, but a good saftey check)
  • Create an item for the recipe with the title, structured information (ingredients and nutrition infomration), and URL to the full work.

--U+1F360 (talk) 14:21, 20 May 2020 (UTC)

admittedly I'm not familiar with WD's bot policy but this does not seem useful as creating empty items would be useless and there is no place on any project where we should be mass posting recipes. Praxidicae (talk) 13:05, 22 May 2020 (UTC)
@Praxidicae: The items wouldn't be empty, they would contain metadata about the recipes. It would allow users to query recipes based on the ingredents, nutrition information, cook time, etc. U+1F360 (talk) 13:42, 22 May 2020 (UTC)
When I meant empty, I meant to other projects. Wikidata shouldn't serve as a cookbook. This is basically creating a problem that doesn't exist. Praxidicae (talk) 13:44, 22 May 2020 (UTC)
Who says Wikidata shouldn't serve as a cookbook? I would love to query Wikidata for recipes using e.g. the ingredients I have at home. --Haansn08 (talk) 09:21, 27 September 2020 (UTC)
@Praxidicae: Most items on Wikidata do no have any sitelinks. I asked in the project chat if adding recipes was acceptable, and at least on a conceptual level that seems fine? I believe it would meet point #2 under Wikidata:Notability. I'm not sure how it's any different from Wikidata:WikiProject_sum_of_all_paintings. U+1F360 (talk) 13:52, 22 May 2020 (UTC)
I fundamentally disagree I guess. This is effectively using Wikidata as a mini project imo. Praxidicae (talk) 13:53, 22 May 2020 (UTC)
I feel like the ship has sailed on that question (unless I'm missing something). U+1F360 (talk) 13:55, 22 May 2020 (UTC)
It wasn't a question, it's me registering my objection to this request. Which I assume is allowed...Praxidicae (talk) 13:57, 22 May 2020 (UTC)
@Praxidicae: Of course it is. :) I guess my point is that, the "problem" is that our coverage of recipes is basically non-existent. I'd like to create a bot to expand that coverage. A recipe is a valuable creative work. Of course I don't expect people to write articles about recipes (seems rather silly). In the same way, we are adding every (notable) song to Wikidata... that's a lot of music. U+1F360 (talk) 14:01, 22 May 2020 (UTC)
Which is what I find problematic. There have been proposals in the past to start a recipe based project and they have been rejected each time by the community. This is effectively circumventing that consensus. Not to mention this already exists and I also have concerns about attribution when wholesale copying from allrecipes. Praxidicae (talk) 14:03, 22 May 2020 (UTC)
What about the copyright side? Their Terms of Use specifies that the copyrights are held by the copyright owners (users) and there is no indication of free license in the website. Recipes are not mere facts, numbers and/or IDs. Also, there is no indication of "why Wikidata needs this info". — regards, Revi 14:18, 22 May 2020 (UTC)
I'll kick myself for asking, but U+1F360, sell this to me. Explain the copyright details, explain the instructional sections, explain how alternative ingredients will work, explain how differences in measurement units in different countries will work. This is your opportunity. Sell it to all of us. Nick (talk) 15:01, 22 May 2020 (UTC)
Let me attempt to answer "all" the qustions. :) For some background, I was recently trying to find recipes based on the ingredients I have on hand. Sure, you can do a full-text search on Google, but if you have 2 potato (Q16587531), it doesn't tell you if the recipes require 2 or less potato (Q16587531), just that it mentions the word. :/ Also, not to mention all the other ingredients you may need that you may not have (especially during a global pandemic). I was looking for just a database of recipes (not the recipes themselves), and as far as I could find, that doesn't exist (at least not in a structured form). I also thought of many other questions which are difficult (if not impossible) to answer without such a dataset like: What is the most common ingredient in English-langauge recipes? What percentage of recipes are vegitarian? Quesitons like this are un-answerable without a dataset of known recipes. As far as copyright is concerned, according to the US Copyright Office:

A mere listing of ingredients is not protected under copyright law. However, where a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a collection of recipes as in a cookbook, there may be a basis for copyright protection. Note that if you have secret ingredients to a recipe that you do not wish to be revealed, you should not submit your recipe for registration, because applications and deposit copies are public records. See Circular 33, Works Not Protected by Copyright.

At least in the United States, the "metadata" about a recipe (ingredients, nutirition information, cook time, etc.) cannot be copyrighted and therefore exists in the public domain. Since it's unclear whether the directions on a recipe are under copyright or not, I think it's safest to leave all directions in the source. As an example. Let's say we have a cookbook like How to Cook Everything (Q5918527) should we not catelog every recipe from the book in Wikidata? I would think this would be valuable information no? In my mind this is the same difference as an album like Ghosts V: Together (Q88691681) which has a list of tracks like: Letting Go While Holding On (Q93522041). I am not suggesting that we create a wiki of freely licened recipes. As @Praxidicae: mentioned that has been proposed and rejected many times. This is the same thing as music albums with songs or tv shows with epsides. Now we could make up a threshold of notability for recipes. Does it need to be printed in a book? Does it need at lest 3 reviews if on allrecipes? I'm not sure what makes a recipe notable or not, but in my mind they are valuable works for art that should be cataloged. U+1F360 (talk) 17:05, 22 May 2020 (UTC)
I realized I missed a few questions in there. Alternative ingredients should be marked with a qualifier of some kind. Measurments should remain in whatever unit is in the referenced source (as we do with all other quantities on Wikidata). The measurments could be converted when a query is preformed or a recipe is retried. U+1F360 (talk) 17:51, 23 May 2020 (UTC)
I manually created a little example Oatmeal or Other Hot Cereal (Q95245657) from a cookbook that I own. Open to suggetions on the data model! U+1F360 (talk) 23:02, 23 May 2020 (UTC)
Here is another example: Chef John's Buttermilk Biscuits (Q95382239). Please let me know what you think and what should change (if anything). U+1F360 (talk) 17:47, 24 May 2020 (UTC)
I like the idea of having recipes in Wikidata. The examples show we need more properties/qualifiers to better describe recipes. --Haansn08 (talk) 09:40, 27 September 2020 (UTC)

LouisLimnavongBotEdit

LouisLimnavongBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: LouisLimnavong (talkcontribslogs)

Task/s: Bot to get birthplace and nationality for a list of artists.

Code: import pywikibot

site = pywikibot.Site("en", "wikipedia") page = pywikibot.Page(site, "Khalid") item = pywikibot.ItemPage.fromPage(page)

Function details: --LouisLimnavong (talk) 13:08, 14 May 2020 (UTC)

@LouisLimnavong: It looks like creating this request was your only edit across all Wikimedia projects (and was over 5 months ago). If this request is still valid, please clarify what the task would be. Hazard-SJ (talk) 06:51, 3 November 2020 (UTC)

BsivkoBotEdit

BsivkoBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Bsivko (talkcontribslogs)

Task/s:

  • Term "видеоигра" is not correct for common description of "video game" for russian language. We can see it for ruwiki in Q7889, root (and childs) category and other cases. However, despite language differences, the first term was imported in bulk by bot(s?) (example). To fix this mistake we can use the same bot approach. And simultaneously, fill up description for empty pages.

Example of the first case and for the second one.

Code:

  • I use pywikibot, and there is a function which checks the presence of mistake and fix it. For empty cases, it prepares a short description:

def process_wikidata_computergame(title):

   item = get_wikidata_item("ru", title)
   if not item:
       return
   if 'ru' in item.descriptions.keys():
       if "видеоигра" in item.descriptions['ru']:
           item.descriptions['ru'] = item.descriptions['ru'].replace("видеоигра", "компьютерная игра")
           item.editDescriptions(descriptions=item.descriptions,
                                 summary=u'"компьютерная игра" is a common term for "videogame" in Russian')
   else:
       if 'P31' in item.claims.keys():
           if item.claims['P31'][0]:
               if item.claims['P31'][0].target:
                   if item.claims['P31'][0].target.id:
                       if item.claims['P31'][0].target.id == 'Q7889':
                           item.descriptions['ru'] = item.descriptions['ru'] = "компьютерная игра"
                           item.editDescriptions(descriptions=item.descriptions,
                                                 summary=u'added Russian description')
   pass


Function details: --Bsivko (talk) 13:25, 8 May 2020 (UTC)

  • The bot works in background with other articles, and it doesn't have broad scan. Bsivko (talk) 13:25, 8 May 2020 (UTC)

BsivkoBotEdit

BsivkoBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Bsivko (talkcontribslogs)

Task/s:

Code:

  • I use pywikibot, and there is a piece of software which gets property, makes a request with URL, gets page text, recognize article absence and switch to deprecated if keywords of absence were found:
   def url_checking(title, page):
   try:
       item = pywikibot.ItemPage.fromPage(page)
   except pywikibot.exceptions.NoPage:
       return
   if item:
       item.get()
   else:
       return
   if not item.claims:
       return
   id_macros = "##ID##"
   cfg = [
       {
           'property': 'P2924',
           'url': 'https://bigenc.ru/text/' + id_macros,
           'empty_string': 'Здесь скоро появится статья',
           'message': 'Article in Great Russian Encyclopedia is absent'
       },
       {
           'property': 'P4342',
           'url': 'https://snl.no/' + id_macros,
           'empty_string': 'Fant ikke artikkelen',
           'message': 'Article in Store norske leksikon is absent'
       },
       {
           'property': 'P6081',
           'url': 'https://ria.ru/spravka/00000000/' + id_macros + '.html',
           'empty_string': 'Такой страницы нет на ria.ru',
           'message': 'Article in RIA Novosti is absent'
       },
   ]
   for single in cfg:
       if single['property'] in item.claims:
           for claim in item.claims[single['property']]:
               rank = claim.getRank()
               if rank == 'deprecated':
                   continue
               value = claim.getTarget()
               url = single['url'].replace(id_macros, value)
               print("url:" + url)
               r = requests.get(url=url)
               print("r.status_code:" + str(r.status_code))
               if r.status_code == 200:
                   if single['empty_string'] in r.text:
                       claim.changeRank('deprecated',
                                        summary=single['message'] + " (URL: '" + url + "').")
               pass
   pass


Function details:

  • The bot works in background with processing other articles in ruwiki. So, that doesn't have broad scan. Also, there's not so many bad URL, and therefore, the activity is low (a few contribs per month). Bsivko (talk) 12:49, 8 May 2020 (UTC)

DeepsagedBot 1Edit

DeepsagedBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Deepsaged (talkcontribslogs)

Task/s: import russian lexeme and sense from ru.wiktionary.org

Code:

Function details: --Deepsaged (talk) 06:16, 14 April 2020 (UTC)

@Deepsaged: please make some test edits--Ymblanter (talk) 19:06, 14 April 2020 (UTC)
@Ymblanter: done: создать (L297630), сотворить (L301247), небо (L301348) DeepsagedBot (talk) 17:26, 28 May 2020 (UTC)

It is not possible to import senses from any Wiktionary project because licences are not compatible. Wikidata is released under CC-0 while Wiktionary senses are protected by CC by-sa licence. Pamputt (talk) 18:34, 3 August 2020 (UTC)

Uzielbot 2Edit

Uzielbot 2 (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Uziel302 (talkcontribslogs)

Task/s: mark broken links as deprecated

Code: https://github.com/Uziel302/wikidatauploadjson/blob/master/deprecatebrokenlinks

Function details: a simple wbeditentity calls to mark broken official links as deprecated. I did few examples on my bot account, all the edits proposed are of the same nature. I detect broken links based on http header (no response/400/404 are considered broken). --Uziel302 (talk) 23:49, 7 April 2020 (UTC)

no response/400/404 may have multiple reasons, including temporary maintenance, content that only accessible when sign-in, content only accessible in some countries, internet censorship, etc.--GZWDer (talk) 06:14, 8 April 2020 (UTC)
GZWDer, which of these edge cases are not relevant in manual checking? How is it possible to really detect broken links? And if no such option exists, should we ban "reason for deprecation: broken link"? Uziel302 (talk) 17:23, 8 April 2020 (UTC)
This means you should not flag them as broken links without checking them multiple times.--GZWDer (talk) 17:26, 8 April 2020 (UTC)
GZWDer, no problem, how many is multiple? Uziel302 (talk) 21:45, 8 April 2020 (UTC)
I am the main Bots writer in Hebrew Wikipedia, wrote over 500 Bots along the years. I can testify the broken links are a big problem and we need to resolve it from the source. I discussed it with Uziel302 prior to him writing here and I am convinced the method suggested here is the preferred method. Lets move forward to cleanup these broken links so they do not bother us any more. בורה בורה (talk) 09:18, 13 April 2020 (UTC)
@GZWDer: Would you react to the question? Is there a benchmark to consider a link broken? Repetitave checks with a minimla number of checks and a minimal time span? Lymantria (talk) 08:30, 16 May 2020 (UTC)
I don't think it should them to deprecated. You could add "end cause" *404". --- Jura 13:23, 16 May 2020 (UTC)

WordnetImageBotEdit

WordnetImageBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: WordnetImageBot (talkcontribslogs)

Task/s:

This bot is part of the Final Degree Project in which i link the offset/code of the words in Wordnet with the words and images of wikidata.

Code:

Is to be done.

Function details: --WordnetImageBot (talk) 12:16, 18 March 2020 (UTC)

Link words and images with the words of Wordnet, that is, add an exact match (P2888) url to those words that haven´t got a link with Wordnet. If a word in Wikidata doesn´t have an image, this bot will add the image.

Please, make some test edits and create the bot's user page containing {{Bot}}. Lymantria (talk) 06:36, 27 April 2020 (UTC)
@Andoni723: reminder to make the test edits --DannyS712 (talk) 12:03, 7 July 2020 (UTC)

Taigiholic.adminbotEdit

Taigiholic.adminbot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Taigiholic (talkcontribslogs)

Task/s: interwiki linking and revising

Code: using pywikibot scripts

Function details:

  • The account is already a adminbot on nanwiki whose works are mainly interwiki linking and revising in semi-automatic and batch way.
  • The operator owns another bot named User:Lamchuhan-bot, which is mainly working on interwiki linking for NEW ARTICLES (such works will be mainly done by this new requesting account once it gets the flag) only, no revising works.
  • Only request for "normal bot" flag here at this site, not for "adminbot".

Thanks.--Lamchuhan-hcbot (talk) 00:17, 16 March 2020 (UTC)

Thanks.--Lamchuhan (talk) 00:19, 16 March 2020 (UTC)

I think I sort of understand what the task is, but could you please be more specific?--Jasper Deng (talk) 06:59, 16 March 2020 (UTC)

@Jasper Deng: The bot will run by using pywikibot scripts on nanwiki. Some of the tasks will have to run interwiki actions such as:

item.setSitelink(sitelink={'site': 'zh_min_nanwiki', 'title': 'XXXXX'}, summary=u'XXXXX')

Thanks.--Lamchuhan (talk) 07:34, 16 March 2020 (UTC)

Please register the bot account and make some test edits.--Ymblanter (talk) 20:24, 18 March 2020 (UTC)

GZWDer (flood) 3Edit

GZWDer (flood) (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Creating items for all Unicode characters

Code: Unavailable for now

Function details: Creating items for 137,439 characters (probably excluding those not in Normalization Forms):

  1. Label in all languages (if the character is printable; otherwise only Unicode name of the character in English)
  2. Alias in all languages for U+XXXX and in English for Unicode name of the character
  3. Description in languages with a label of Unicode character (P487)
  4. instance of (P31)Unicode character (Q29654788)
  5. Unicode character (P487)
  6. Unicode hex codepoint (P4213)
  7. Unicode block (P5522)
  8. writing system (P282)
  9. image (P18) (if available)
  10. HTML entity (P4575) (if available)
  11. For characters in Han script also many additional properties; see Wikidata:WikiProject CJKV character

For characters with existing items the existing items will be updated.

Question: Do we need only one item for characters with the same normalized forms, e.g. Ω (U+03A9, GREEK CAPITAL LETTER OMEGA) and Ω (U+2126, OHM SIGN)?--GZWDer (talk) 23:08, 23 July 2018 (UTC)

CJKV characters belonging to CJK Compatibility Ideographs (Q2493848) and CJK Compatibility Ideographs Supplement (Q2493862) such as 著 (U+FA5F) (Q55726748), 著 (U+2F99F) (Q55738328) will need to be split from their normalized form, eg. (Q54918611) as each of them have different properties. KevinUp (talk) 14:03, 25 July 2018 (UTC)

Request filed per suggestion on Wikidata:Property proposal/Unicode block.--GZWDer (talk) 23:08, 23 July 2018 (UTC)

  Support I have already expressed my wish to import such dataset. Matěj Suchánek (talk) 09:25, 25 July 2018 (UTC)
  Support @GZWDer: Thank you for initiating this task. Also, feel free to add yourself as a participant of Wikidata:WikiProject CJKV character. [8] KevinUp (talk) 14:03, 25 July 2018 (UTC)
  Support Thank you for your contribution. If possible, I hope you to also add other code (P3295) such as JIS X 0213 (Q6108269) and Big5 (Q858372) in items you create or update. --Okkn (talk) 16:35, 26 July 2018 (UTC)
  •   Oppose the use a of the flood account for this. Given the problems with unapproved defective bot run under the "GZWDer (flood)" account, I'd rather see this being done with a new account named "bot" as per policy.
    --- Jura 04:50, 31 July 2018 (UTC)
  • Perhaps we could do a test run of this bot with some of the 88,889 items required by Wikidata:WikiProject CJKV character and take note of any potential issues with this bot. @GZWDer: You might want to take note of the account policy required. KevinUp (talk) 10:12, 31 July 2018 (UTC)
  • This account has had a bot flag for over four years. While most bot accounts contain the word "bot", there is nothing in the bot policy that requires it, and a small number of accounts with the bot flag have different names. As I understand it, there is also no technical difference between an account with a flood flag and an account with a bot flag, except for who can assign and remove the flags. - Nikki (talk) 19:14, 1 August 2018 (UTC)
  • The flood account was created and authorized for activities that aren't actually bot activities. While this new task is one. Given that there had already been run defective bot tasks with the flood account, I don't think any actual bot tasks should be authorized. It's sufficient that I already had to clean up 10000s of GZWDer's edits.
    --- Jura 19:46, 1 August 2018 (UTC)
I am ready to approve this request, after a (positive) decision is taken at Wikidata:Requests for permissions/Bot/GZWDer (flood) 4. Lymantria (talk) 09:11, 3 September 2018 (UTC)
  • Wouldn't these fit better into Lexeme namespace? --- Jura 10:31, 11 September 2018 (UTC)
    There is no language with all Unicode characters as lexemes. KaMan (talk) 14:31, 11 September 2018 (UTC)
    Not really a problem. language codes provide for such cases. --- Jura 14:42, 11 September 2018 (UTC)
    I'm not talking about language code but language field of the lexeme where you select q-item of the language. KaMan (talk) 14:46, 11 September 2018 (UTC)
    Which is mapped to a language code. --- Jura 14:48, 11 September 2018 (UTC)
Note I'm going to be inactive for real life issue, so this request is   On hold for now. Comments still welcome, but I'm not able to answer it until January 2019.--GZWDer (talk) 12:08, 13 September 2018 (UTC)
  Support I wonder why the information isn't in Wikidata for such a long time when many less notable subjects have complete data. --Midleading (talk) 02:38, 31 July 2020 (UTC)
  Oppose This user has no respect on infra's capacity in any way, these accounts along two others has been making wikidata basically unusable (phab:T242081) for months now. I think all of other approvals of this user should be revoked, not to add more on top. (Emphasis: This edit is done in my volunteer capacity)Amir (talk) 17:26, 17 August 2020 (UTC)
Repeating from another RFP. Given that WMDE is going to remove noratelimit from bots, your bot won't cause more issues hopefully but you lost your good standing with regards to respecting infra's capacity to me. Amir (talk) 18:53, 10 October 2020 (UTC)
While this is open, it is important not to merge letter and Unicode character items, like Nikki did with ɻ (Q56315451) and ɻ (Q87497973), ƾ (Q56316849) and ƾ (Q87497496), ʎ (Q56315460) and ʎ (Q87498018), etc.; the whole goal of this project is to keep them apart. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 14:02, 25 January 2021 (UTC)


MusiBotEdit

MusiBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools) Operator: Joaquinserna (talkcontribslogs)

Task/s: Query over Genius.com API and adding Genius artist ID (P2373) and Genius artist numeric ID (P6351) statements to its respective artist in case it hasn't been added before.

Code: Not provided. Using WikidataJS for SPARQL querying and claim editing.

Function details: Sequencialy query Genius.com for every possible ArtistID, search Wikidata for any singer (Q177220) or musical group (Q215380) with the same label as Genius' artist name, check if it has Genius artist ID (P2373) and Genius artist numeric ID (P6351) statements, then adds them if neccessary.

Discussion:

Already did a successful test here forcing Wikidata Sandbox (Q4115189) to be the updated id, Genius' ArtistID n° 1 is Cam'ron (Q434913) which already has Genius artist ID (P2373) Genius artist numeric ID (P6351).

Joaquín Serna (talk) 01:11, 28 February 2020 (UTC)

Could you please make a bit more test edits and on real items?--Ymblanter (talk) 20:00, 3 March 2020 (UTC)
: Done, you can check it out here Joaquín Serna (talk)
Add Genius artist numeric ID (P6351) as a qualifier to Genius artist ID (P2373). If you gather the data from Genius API, use Genius API (Q65660713) as reference. Optionally if you could also add has quality (P1552)verified account (Q28378282) for "Verified Artist" that would be great. - Premeditated (talk) 09:42, 18 March 2020 (UTC)

AitalDisemEdit

AitalDisem (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Aitalvivem (talkcontribslogs)

Task/s: This bot is made to create a sens (with an item) for every occitan lexeme in wikidata. This is the direct continuation of AitalvivemBot. This is a web application presented as a game. The program will get (for each lexeme) its french translation from Lo Congres's data (unfortunately those are privates data so we can't insert them in wikidata) and look for every item having the occitan word or translation in its label. Then it will use the collaborative work of the community to select the good senses and, once validated, insert them in wikidata. This program has the same goal than Michael Schoenitzer's MachtSinn but uses a translation database. I am also trying to make this program simple to adapt for other languages and with a complete documentation.

Code: You can find my code and documentation here

Function details: It would be very long to list every functions of this program (you can find it in the documentation here and here) but overall this bot will :

  • get informations about lexeme, senses and items
  • create senses
  • add items to senses using Property:P5137

The program will also verify to have enough positive responses from users before inserting a sens. All the details about the process of a game, the test of reliability of an user and the verification before inserting a sens are in the documentation.

--Aitalvivem (talk) 15:48, 14 January 2020 (UTC)

  •   Support This seems like a nice approach to collaborative translation. ArthurPSmith (talk) 17:45, 15 January 2020 (UTC)
  • Can you provide some test edits (say 50-100)? Lymantria (talk) 10:30, 18 January 2020 (UTC)
    • @Lymantria: Hi, I did two test run on 30 lexemes, for each lexeme I did two edits : one to add a sens and the other to add an Item to this sens.
Here is the list of the lexemes : Lexeme:L41768, Lexeme:L44861, Lexeme:L57835, Lexeme:L57921, Lexeme:L235215, Lexeme:L235216, Lexeme:L235217, Lexeme:L235219, Lexeme:L235221, Lexeme:L235222, Lexeme:L235223, Lexeme:L235225, Lexeme:L235226, Lexeme:L235227, Lexeme:L235228, Lexeme:L235229, Lexeme:L235231, Lexeme:L235232, Lexeme:L235234, Lexeme:L235235, Lexeme:L235236, Lexeme:L235239, Lexeme:L235240, Lexeme:L235242, Lexeme:L235243, Lexeme:L235244, Lexeme:L235245, Lexeme:L235246, Lexeme:L235247, Lexeme:L235248
The first test failed because of a stupid mistake of mine in the configuration file of the program. For the second test I had a problem when adding the item for Lexeme:L235226 because there were quotations marks in the description of the item so I fixed the probelem, run it again and everything went well.--Aitalvivem (talk) 10:11, 21 January 2020 (UTC)
I take it the test edits are the ones by the IP? Lymantria (talk) 08:31, 22 January 2020 (UTC)
Yes, I used the bot account to connect to the api but I don't know why it prints the IP instead of the bots account--Aitalvivem (talk) 09:53, 22 January 2020 (UTC)
I would like to see you succeed to do so. Lymantria (talk) 12:57, 26 January 2020 (UTC) (@Aitalvivem: 07:02, 29 January 2020 (UTC))
@Aitalvivem: Any progress? Lymantria (talk) 08:34, 16 May 2020 (UTC)
@Aitalvivem: Seems like there hasn't been any progress on this? Hazard-SJ (talk) 06:10, 7 October 2020 (UTC)
  • Note for closing bureaucrats: IPBE granted for 6 months per Special:Permalink/1102840151#IP blocked; please switch to permanent IPBE when you approve it. (Or the bot master should consider using Wikimedia Cloud Services where you don't get any IP blocks and a server env for use.) — regards, Revi 14:14, 22 January 2020 (UTC)

BsivkoBotEdit

BsivkoBot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Bsivko (talkcontribslogs)

I already have BsivkoBot at ruwiki. Do I need to register another account here? Or, should I link the existed bot? (right now, User account "BsivkoBot" is not registered)

Task/s:

  • Check Britannica URL (P1417) and clean up invalid claims. The reason that there are a lot of claims which linked to nowhere. For instance, Q1143358 has P1417 as sports/shortstop, which goes to https://www.britannica.com/sports/shortstop where we have Britannica does not currently have an article on this topic.

Code:

  • I use pywikibot, and currently I have a piece of software which gets P1417, makes a request with URL, gets page text, recognize article absence and stops with permissions exception:
   item = pywikibot.ItemPage.fromPage(page)
   if item:
       item.get()
       if item.claims:
           if 'P1417' in item.claims:
               brit_value = item.claims['P1417'][0].getTarget()
               brit_url = "https://www.britannica.com/" + brit_value
               r = requests.get(url=brit_url)
               if r.status_code == 200:
                   if "Britannica does not currently have an article on this topic" in r.text:
                       item.removeClaims(item.claims['P1417'], summary=f"Article in Britannica is absent (URL: 'Template:Brit url').")
               pass

Afterwards, I'm going to make test runs and integrate it with the other bot functions (I work with external sources at ruwiki, and in some cases auto-captured links from wikidata are broken and it leads to user complains).

Function details: --Bsivko (talk) 19:37, 28 December 2019 (UTC)

Please create an account for your bot here and make some test edits.--Ymblanter (talk) 21:26, 28 December 2019 (UTC)
I logged in by BsivkoBot via ruwiki and went to wikidata. It created the account (user BsivkoBot is existed here now). After that, I made a couple of useful actions by hand (not with bot), so BsivkoBot can do smth on the project. Next, I tried to run the code above and the exception changed to a different one:

{'error': {'code': 'failed-save', 'info': 'The save has failed.', 'messages': [{'name': 'wikibase-api-failed-save', 'parameters': [], 'html': {'*': 'The save has failed.'}}, {'name': 'abusefilter-warning-68', 'parameters': ['new editor removing statement', 68], 'html': {'*': 'Warning: The action you are about to take will remove a statement from this entity. In most cases, outdated statements should not be removed but a new statement should be added holding the current information. The old statement can be marked as deprecated instead.'}}], 'help': 'See https://www.wikidata.org/w/api.php for API usage. ..

I checked that it possible to remove the claim. So, the problem is on the bot side. Could you please help me, is it a permission problem or the code should be different? (as I see, it requires write rights, but I do not see any rights now) Bsivko (talk) 00:15, 29 December 2019 (UTC)
I changed the logic to setting of deprecated rank, and it was a success! Bot changed the rank and it was dissapeared for users in our article. Afterwards a test run, the code is the following:
       if item.claims:
           if 'P1417' in item.claims:
               for claim in item.claims['P1417']:
                   brit_value = claim.getTarget()
                   brit_url = "https://www.britannica.com/" + brit_value
                   r = requests.get(url=brit_url)
                   if r.status_code == 200:
                       if "Britannica does not currently have an article on this topic" in r.text:
                           claim.changeRank('deprecated', summary="Article in Britannica is absent (URL: '" + brit_url + "').")
               pass

Currently, it works. I'll integrate it into production. Bsivko (talk) 11:58, 29 December 2019 (UTC)

@Bsivko: The above error means you will require a confirmed flag for your bot.--GZWDer (talk) 21:03, 29 December 2019 (UTC)
Ok, I've got it, thank you for the explanation! I already implemented the function and rank changing is enough, it resolved the problem. Bsivko (talk) 21:10, 29 December 2019 (UTC)
Note your edits may be controversial. You should reach a consensus for such edits. (I don't support such edits, but someone may.)--GZWDer (talk) 21:48, 29 December 2019 (UTC)
I understand. I just started the discussion on chat. Please, join. Bsivko (talk) 00:20, 30 December 2019 (UTC)
  • Strong oppose per my comments when this was discussed in 2016. These are not "invalid claims". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:07, 30 December 2019 (UTC)
  • @Bsivko: BsivkoBot appears to be editing without authorization; there isn't support for more edits here, and I don't see another permissions request. Please stop the bot, or I will have to block it --DannyS712 (talk) 02:58, 8 May 2020 (UTC)
  • As I see, current topic is still under discussion, and functions above are off till that moment. For the extra stuff I'll open another branch. Bsivko (talk) 12:35, 8 May 2020 (UTC)

antoine2711botEdit

antoine2711bot (talkcontribsnew itemsSULBlock logUser rights logUser rightsxtools)
Operator: Antoine2711 (talkcontribslogs)

Task/s: This robot will add data in the context of Digital discoverability project of the RDIFQ (Q62382524).

Code: It's work done with OpenRefine, maybe a bit of QuickStatement, and maybe some API calls from Google Sheet.

Function details: Tranfer data for 280 movies from an association of distributors.

--Antoine2711 (talk) 04:25, 2 July 2019 (UTC)

@Antoine2711: Is your request still supposed to be active? Do you have test-/exmple-edits? Lymantria (talk) 07:18, 17 August 2019 (UTC)
@Lymantria: No, it's mainly batch operations that I do myself. There is nothing automated yet, and there won't be for the project I've been working in the last 9 months. --Antoine2711 (talk) 05:15, 2 March 2020 (UTC)
  • @Antoine2711, Lymantria: It seems to be unfinished, many items created are somewhat empty and not used by any other item: [9]. @‎Nomen ad hoc: had listed one of them for deletion. If the others aren't used or completed either, I think we should delete them. Other than that: lots of good additions. --- Jura 12:57, 8 September 2019 (UTC)
@Lymantria, Jura: The data I added all comes from a clean data set provided by distributors. I tried to do my best, but I might not have done everything perfectly. Someone spooted an empty item, and I added the missing data. If there are any other, I will do the same corrections.
My request for the bot is still pertinent as I will do other additions. What information do I need to provide for my bot request? --Antoine2711 (talk) 16:44, 8 September 2019 (UTC)
@Jura1: Sorry for not responding earlier. Theses people are team members on a movie, and I needed to create a statement with {{P]3092}} and also a quantifier, object has role (P3831), and in the case of Ronald Fahm (Q65116570), he's a hairdresser (Q55187). Ideally, I should be able to also push that in the Dxx, the description of this person. But I must be careful. I created around 1500 persons, and I might have 200 still not connected. Do you see anything else? --Antoine2711 (talk) 03:37, 25 February 2020 (UTC)
Yesterday I looked into this request again and noticed that the problems I had identified 5 months ago were still not fixed. If you need help finding all of them, I could do so. --- Jura 08:46, 25 February 2020 (UTC)
@Jura1: yes, anything you see that I didn't do well, tell me, and I'll correct it. If I create 500 items, and if I just do 1 % of errors, it's still 5 bad item creation. So even if I'm careful, I'm still learning and doing mistake. I try to correct them as fast as I can, and if you can help me pin-point problems, I'm fix them, like I did with everyone here. If you have SparQL queries (or other way of find lots of data) let me know and don't hesitate to share with me. Regards, Antoine --Antoine2711 (talk) 06:37, 2 March 2020 (UTC)
  • There's a deletion request for one of these items at Wikidata:Requests for deletions#Q65119761. I've mentioned a likely identifier for that one. Instead of creating empty items it would be better to find identifiers and links between items before creating them. For example Peter James (Q65115398) could be any of 50 people listed on IMDB - possibly nm6530075, the actor in Nuts, Nothing and Nobody (Q65055294)/tt3763316 but the items haven't been linked and they are possibly not notable enough for Wikidata. Other names in the credits there include Élise de Blois (Q65115717) (probably the same person as the Wikidata item) and Frédéric Lavigne (Q65115798) (possibly the same one but I'm not certain) and several with no item so I'm not sure if this is the source of these names. With less common names there could be one item that is then assumed to be another person with the same name. Peter James (talk) 18:29, 9 September 2019 (UTC)
Hi @Peter James: I think that most of theses are now linked. For a few hundred, I still need to put the occupation. I also think that I'm going to state, for the ones with few information, the movies in which they worked. Might help to identify those persons. I've also created links to given name and surname, but that doesn't help much for identification. Note that I added alot of IMDB ID, and those are excellent. Do you have suggestion for me? Regards, Antoine --Antoine2711 (talk) 04:53, 2 March 2020 (UTC)
  • I suggest we block this bot until we see a plan of action for cleanup of the problems already identified. --- Jura 09:44, 24 February 2020 (UTC)
Cleanup seems to be ongoing. --- Jura 09:36, 25 February 2020 (UTC)
  •   Comment I fixed writing system (P282) on several hundreds of family name items created yesterday, e.g. at [10]. --- Jura 09:42, 1 March 2020 (UTC)
I didn't know that alphabet latin (Latin alphabet (Q41670)) and alphabet latin (Latin script (Q8229)) where actually 2 different things. Thank you for pointing that out. --Antoine2711 (talk) 04:25, 2 March 2020 (UTC)
  • when doing that, I came across a few "last" names that aren't actually family names, e.g. H. Vila (for Rodrigo H. Vila), and listed them for deletion. You might want to double-check all other. --- Jura 10:02, 1 March 2020 (UTC)
Yes, thanks for pointing that out. I'm also cleaning that up. --Antoine2711 (talk) 04:25, 2 March 2020 (UTC)
@Antoine2711: you seem to be running an unauthorized bot that is doing weird edits. Please explain. Multichill (talk) 15:46, 7 March 2020 (UTC)
@Multichill: Hi, yes, I did see those errors. I was cleaning that yesterday, and will continue today. This was a edit batch with 3% of mistake, on a 3000 lot. Even if 3% is not a lot, in those quantities, I must be very careful. Unfortunately I'm still learning. Please note that nothing this bot do is not supervised and launched by a human decision (which may be unperfect…). Regards, Antoine --Antoine2711 (talk) 17:12, 7 March 2020 (UTC)