Wikidata:Bot requests

(Redirected from Wikidata:BR)
Bot requests

If you have a bot request, add a new section using the button and tell exactly what you want. To reduce the process time, first discuss the legitimacy of your request with the community in the Project chat or in the Wikiprojects's talk page. Please refer to previous discussions justifying the task in your request.

For botflag requests, see Wikidata:Requests for permissions.

Tools available to all users which can be used to accomplish the work without the need for a bot:

  1. PetScan for creating items from Wikimedia pages and/or adding same statements to items (note: PetScan edits are made through QuickStatements)
  2. QuickStatements for creating items and/or adding different statements to items
  3. Harvest Templates for importing statements from Wikimedia projects
  4. OpenRefine to import any type of data from tabular sources
  5. WikibaseJS-cli to write shell scripts to create and edit items in batch
  6. Programming libraries to write scripts or bots that create and edit items in batch
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/04.
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 15 days.

The nl descriptions lacking the space sign (2023-11-02) edit

Request date: 2 November 2023, by: Wolverène

Task description

Please see here. The descriptions formatted as <...> in sterrenbeeldSchorpioen (= smth in the constellation Scorpius) should obviously look like <...> in sterrenbeeld Schorpioen. Please help with the mass fixing - there are more than 79,000 descriptions with this typo and I will not be very able to fix it all manually. :) I hope it's not a hard issue. Thanks in advance.
P.S. I do not know if this issue occurs only with the Scorpio-related items, or it also occurs somewhere else. Would be great if the Dutch speakers checked it with the Search bar... --Wolverène (talk) 06:07, 2 November 2023 (UTC)[reply]

Discussion

Hi Wolverène, is this what you want, just to be sure before mass-editing? -Framawiki (please notify !) (talk) 10:01, 28 December 2023 (UTC)[reply]

Yes, this is. But there are 10 results as I see, so it is not hard to fix them manually. Regards, --Wolverène (talk) 11:19, 17 April 2024 (UTC)[reply]
There are thousands of them, so we still need help. --Wolverène (talk) 11:22, 17 April 2024 (UTC)[reply]
Request process

Request to update links to https (2023-11-26) edit

Request date: 26 November 2023, by: LA2

Link to discussions justifying the request

No discussion.

Task description

Update all links from http://runeberg.org/ to https:... because my website finally switched to https.

Licence of data to import (if relevant)
Discussion

Isn't this fixed by https://www.wikidata.org/w/index.php?title=Property:P3154&diff=prev&oldid=2032325658? --Azertus (talk) 00:47, 12 April 2024 (UTC)[reply]

Unfortunately not, because runeberg.org external links are also used in thousands of references. For example, dates of birth and death for Otto Torell (Q19588642). I will take a look at this. William Avery (talk) 15:48, 14 April 2024 (UTC)[reply]

I assume that where a reference url has been entered as a qualifier, like this, the qualifier should be changed to a reference. William Avery (talk) 20:41, 14 April 2024 (UTC)[reply]

Request process

Accepted by William Avery (talk) 15:48, 14 April 2024 (UTC) and under process[reply]

Request to automatically rank statements based on precision (2023-12-05) edit

Request date: 6 December 2023, by: Jahl de Vautban

Link to discussions justifying the request
  • This request originates from a discussion on Wikidata regarding the issues raised in the Project Chat Wikidata:Project_chat#Concept_of_bot_edits. Several contributors expressed concerns about bots adding "redundant", "useless", or generally "less precise" data when syncing with other databases. To address this problem, one proposed solution involves deploying a bot that assigns a preferred rank to the more precise statements, particularly for the following properties:
  1. date of birth (P569) and date of death (P570)
  2. place of birth (P19) and place of death (P20)
  3. occupation (P106)
  4. eventually instance of (P31), but that seems less of a priority.

The purpose of this bot request is to find a resolution to the aforementioned issues, and the task description may be subject to adjustments based on ongoing discussions. Pinging @Frettie, Vojtěch_Dostál: who expressed their willingness to work on this, but anybody is welcome. Thanks you all for your time!

Task description

The proposed bot should prioritize ranking statements based on their relative precision values. Given that the mentioned properties operate differently in determining precision, the following approaches could be implemented:

If, in any of these scenarios, one or more statements prove to be more precise, the bot should rank them as preferred and add qualifier reason for preferred rank (P7452)most precise value (Q71536040).

Discussion
  •   Comment My personal condition for supporting this bot request is that any statement set with a preferred rank should be sourced, preferably with a more robust reference than imported from Wikimedia project (P143). However, for the sake of practicality, I am open to forgoing this condition if it imposes an undue burden on the bot.
Regarding the general workflow, as highlighted by Vojtěch Dostál, it's important to address changes in subclasses and, less frequently, alterations in administrative locations. While I am not a bot operator and cannot confirm technical feasibility, the following workflow is proposed:
1. The bot scans all normal and preferred rank statements alike.
2. The bot examines their timestamp/P136/P279, and formulates a potential ranking.
3. The bot compares its ranking with existing ranks. Two possibilities arise:
4a. If the rankings are identical, the bot takes no action and proceeds to the next property or Qid.
4b. If the rankings differ, the bot revises the ranking according to its own assessment.
This workflow is general and assumes that the sole valid reason for setting a preferred rank is precision, which isn't true. Therefore, as the bot scans statements, if it encounters a reason for preferred rank (P7452) with a value other than most precise value (Q71536040), it should skip the property and move to the next. --Jahl de Vautban (talk) 20:18, 6 December 2023 (UTC)[reply]
Two remarks:
  • I don't think sources on the involved claims should be ignored. If this was done, someone would surely complain at a later point, and the bot operator would have to deal with another potentially controversial situation.
  • Re. P106: the subclass hierarchy is not consistent over several levels, thus I recommend to rank only those occupation claims whose values have a direct subclass hierarchy (or maybe 1 intermediate class maximum).
MisterSynergy (talk) 07:12, 7 December 2023 (UTC)[reply]
Several questions which immediatelly popped into my head:
  1. What if the current ranking had been set by a manual edit for whatever reason? Is it really OK to overwrite someone's manual ranking? Is it OK to eliminate qualifiers such as most precise value (Q71536040) from downranked statements?
  2. What about statements which have qualifiers? Particularly statement disputed by (P1310) but also many others?
  3. What if one of the "to-be-preferred" statements is not sourced? Should no ranking be done at all in this item, or should the "next best" statement with a source be preferred instead?
I will try to think about more potential issues over the next few days. Thanks for starting a request. Vojtěch Dostál (talk) 13:40, 7 December 2023 (UTC)[reply]
For #1 and #2 I think most situations can be resolved by the bot not attempting anything on (groups of) statements where a rank is set as preferred with reason for preferred rank (P7452) other than most precise value (Q71536040). It's job isn't to find the best value, only the most precise in a pool of values where no ranking exists. I do expect that such a bot will run into similar conflicts as Frettiebot because people don't understand what a bot does or what Wikidata mission is in collecting and ranking data; we can leave out statements with qualifiers in the first implementation and see if it's worth it to include them latter. For #3 my intitial stance was (and still is) not to consider them because they might as well not exist. --Jahl de Vautban (talk) 12:00, 9 December 2023 (UTC)[reply]


When it comes to professions, how can we be sure that the more precise/granular value is also always the "best" profession - i.e. the one that describes the person most accurately and should be ranked as preferred. Since we also use the property to store side activities, I'm not sure there's a good way to prevent such a side activity from getting ranked higher than the actual important profession. For example a sportsperson that also wrote a single novel could have writer (Q36180), author (Q482980) and novelist (Q6625963) values, where novelist (Q6625963) would be the most precise and preferrable value. But setting it to preferred rank might also elevate the value above the sportsperson value(s) - giving a distorted image of what the person is known for. Or an actor that worked on movies, television and stage might have a general actor (Q33999) value and then a bot adds stage actor (Q2259451) based on a reliable source. The statement isn't factually wrong, well sourced and more precise, but setting it to a preferred rank would be misleading if the actor didn't primarily perform on stage. How can cases like these be detected and avoided? --2A02:810B:580:11D4:781F:BF70:D629:BF3 20:14, 16 December 2023 (UTC)[reply]
Plainly we can't, and I don't expect the bot to be able to know that. Aside from precision, we could, as you said, use the most generic value as preferred, or the most recent, or the one that has the most references, or the one with the most trusted source, etc. The problem could also occurs with dates, as the most precise might not be the best. This is why I have written before that in its initial assessment of the statements, if the bot encounters a reason for preferred rank (P7452) with a value other than most precise value (Q71536040), it shouldn't attempt to do anything. Now surely most items don't have any form of ranking for the P106 statements and it's evident that the first round of edits will encounter some backslash. I fear this is inevitable, but altogether I don't think it is a bad thing if it makes people think of what could be the best practice s of modelling when you have several conflicting statements. As I have said elsewhere, in this case, the bot would merely exposing a problem that need to be tackled. --Jahl de Vautban (talk) 10:58, 31 December 2023 (UTC)[reply]
@Jahl de Vautban One additional issue would be with future imports. I just realized now that future importers would have to up-rank their precise occupations using the same algorithm as proposed here. Otherwise, their edits with normalrank would look like additions of 'less precise' occupations. This is not a reason to block this bot request altogether, but something to take into account. It will make proper editing a bit more complex. Vojtěch Dostál (talk) 11:27, 2 January 2024 (UTC)[reply]
@Vojtěch Dostál I had envisaged that the bot would rerun on a regular basis, as the subclasses for occupations may change over time, and thus that people could keep adding data without much thinking of the ranking; but I leave the technical best practices to those who know them. As a general side note, I feel like I am becoming the main supporter of this bot request, whereas I only started it to make the discussion in the Project chat go forward; a pity so few of the participants there showed up here. --Jahl de Vautban (talk) 11:54, 2 January 2024 (UTC)[reply]
Indeed, regular runs would make it more feasible. Vojtěch Dostál (talk) 12:13, 2 January 2024 (UTC)[reply]
Request process

Request to update P1991 (2023-12-17) edit

Request date: 17 December 2023, by: Artoria2e5

Link to discussions justifying the request
Task description
For every item matching instance of (P31)=taxon (Q16521) and taxon rank (P105)=species (Q7432), replace the value of LPSN URL (P1991) using a regular expression:
https?://www.bacterio.net/genus/([a-z]+)(?:\.html)?#([a-z]+)https://lpsn.dsmz.de/species/$1-$2
(My regex is rusty, but I hope it gets the point across. The P=Q constraints are really just a sanity check.)
Licence of data to import (if relevant)
Discussion

I also added a regex constraint. It will be very noisy until the links are updated, sorry... --Artoria2e5 (talk) 05:58, 17 December 2023 (UTC)[reply]

@Artoria2e5 There is only 9 items which contain "genus" in their LPSN URL (P1991) value. Did you mean to have .../species/... in your regex? Can you additionally provide one item where the value is currently wrong? Thanks Vojtěch Dostál (talk) 08:39, 17 December 2023 (UTC)[reply]
@Vojtěch Dostál: My bad, the match should be https?://www.bacterio.net/([a-z]+)(?:\.html)?#([a-z]+). If it has "genus" in it, it's already updated. Artoria2e5 (talk) 10:19, 17 December 2023 (UTC)[reply]
@Artoria2e5 Like this? Vojtěch Dostál (talk) 16:14, 17 December 2023 (UTC)[reply]
Vojtěch Dostál Exactly! Artoria2e5 (talk) 04:40, 18 December 2023 (UTC)[reply]

There is an associated task to update the higher-levels. The links still work, so it is not as urgent. Still a lot of them though: 2674!

SELECT ?item ?itemLabel ?taxrank ?taxrankLabel ?value ?result (STRLEN(STR(?value)) AS ?stringlength) ?snak ?rank
WHERE
{
	{
		SELECT ?item ?taxrank ?value ?result ?snak ?rank
		WHERE
		{
			{
				?item p:P1991 [ ps:P1991 ?value; wikibase:rank ?rank ].
				BIND("mainsnak" AS ?snak) .
			}.
			{
				?item p:P105 [ ps:P105 ?taxrank ].
			}.
			BIND( REGEX( STR( ?value ), "^(https://[a-z.]+/(domain|uncategorized|(infra|super|sub)?(phylum|class|order|family|genus|species|tribe|kingdom|division))/[-a-z0-9]+)$" ) AS ?regexresult ) .
			FILTER( ?regexresult = false ) .
			BIND( IF( ?regexresult = true, "pass", "fail" ) AS ?result ) .
			BIND( REGEX( STR( ?value ), "^([^#]+)$" ) AS ?regexresult2 ) .
          	FILTER( ?regexresult2 = true ) .
			FILTER( ?item NOT IN ( wd:Q4115189, wd:Q13406268, wd:Q15397819 ) ) .
		} 
		LIMIT 10000
	} .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}

--Artoria2e5 (talk) 10:25, 17 December 2023 (UTC)[reply]

At least all species URLs are now replaced. Vojtěch Dostál (talk) 19:56, 18 December 2023 (UTC)[reply]
Request process

Request to refine main subject .. (2024-01-01) edit

Request date: 1 January 2024, by: Combinato

Link to discussions justifying the request
Task description

replace Q6501338 to Q181923 for the main subjects of scholarly articles in Wikidata if the article contain "attention-deficit/hyperactivity disorder" in the title. Combinato (talk) 13:43, 1 January 2024 (UTC)[reply]


Licence of data to import (if relevant)
Discussion

I don't think the imprecise (but not incorrect) attention (Q6501338) main subjects should really be removed. @Combinato: edits like thise are a perfect fit for the Topic Curator tool; give it a try! I'm not sure it's a good fit for a bot request, so maybe this could be closed... --Azertus (talk) 10:54, 12 April 2024 (UTC)[reply]


Request process

Request to replace the superseded WorldCat Identities ID for humans with the WorldCat Entities ID, or delete the superseded WorldCat Identities ID when the WorldCat Entities ID is present (2024-01-09) edit

Request date: 9 January 2024, by: Peaceray

Link to discussions justifying the request

This is a link to the query to find a human with WorldCat Identities ID (superseded) but no WorldCat Entities:

Task description
  1. For a human with both a WorldCat Identities ID (superseded) (P7859) & a WorldCat Entities ID (P10832), delete the superseded WorldCat Identities ID.
  2. For a human with a WorldCat Identities ID (superseded) (P7859) but no WorldCat Entities ID (P10832), query that WorldCat Identities ID. WorldCat will redirect most, but certainly not all, of the time to the new WorldCat Entities ID. If found, insert the WorldCat Entities ID & delete the superseded WorldCat Identities ID.
Licence of data to import (if relevant)
Discussion
I agree. P7859 statements should only be deleted if a P10832 is there.
There is an additional complication. I have placed multiple instances of WorldCat Identities ID (superseded) (P7859) when each applied to the same item. WorldCat has not been careful about consolidating duplicate items; this not only applies to the Identities ID but to the OCLC control number (P243) as well. However, since clicking through on WorldCat Identities ID that have no corresponding WorldCat Entities lead to a page not found message, we should consider whether we should delete all WorldCat Identities ID (superseded) (P7859) when a WorldCat Entities ID (P10832) entity is present.
Also, perhaps we should not limit this to humans. I just thought that humans were the most important to process first. Peaceray (talk) 19:34, 9 January 2024 (UTC)[reply]


Request process

Request to replace "unisex" with "uniszex" in Hungarian (2024-01-18) edit

Request date: 18 January 2024, by: Adam78

The Hungarian-language (HU) description "unisex név" should be replaced with "uniszex név" (2346 instances). (Source for the spelling in the spelling dictionary) Thanks in advance.
Discussion

Hi Adam78, this is basically the same task as the one I'm currently requesting a bot flag for, so I'll amend my request. --Azertus (talk) 20:01, 13 April 2024 (UTC)[reply]


Request process

Accepted by (Azertus (talk) 20:01, 13 April 2024 (UTC)) and under process[reply]

Request to create/update members of the European Parliament - 9th parliamentary term (2024-01-20) edit

Request date: 22 January 2024, by: Fcairn

Link to discussions justifying the request
Task description

Important the data described here : https://www.wikidata.org/wiki/Wikidata:Dataset_Imports/Member_of_the_European_Parliament

Licence of data to import (if relevant)
Discussion

@Fcairn: the page you linked is still blank? --Azertus (talk) 10:57, 12 April 2024 (UTC)[reply]

Request process

Request to replace URL to idref.fr by the dedicated property P269 (2024-01-31) edit

Request date: 31 January 2024, by: Jahl de Vautban

Link to discussions justifying the request
  • No previous discussion
Task description

The bot should replace occurrences of reference URL (P854)http://www.idref.fr/$1 or reference URL (P854)https://www.idref.fr/$1 used in references with stated in (P248)IdRef (Q47757534) and IdRef ID (P269)$1.

Licence of data to import (if relevant)
Discussion

This should ensure a more robust referencing in the (unlikely) case of IdRef changing its URL. --Jahl de Vautban (talk) 17:56, 31 January 2024 (UTC)[reply]

Request process

Request to Google Cache URLs (2024-02-11) edit

Request date: 11 February 2024, by: GreenC

Link to discussions justifying the request
Task description
  • See work done at Enwiki: https://en.wikipedia.org/wiki/Wikipedia:Link_rot/URL_change_requests#Google_cache
    • There is a tool for parsing GC links to find the original source URL: https://github.com/greencardamom/Googcacheparse
    • Each Google Cache link logically has 4 possible outcomes:
      1. Source URL is live: Remove the GC link and replace with the source URL
      2. Source URL is dead and archive URL is not available: Remove the GC link and replace with the source URL and a dead link template
      3. Source URL is dead and archive URL is available: Remove the GC link and replace with the archive URL of the source URL
      4. Source URL is dead and archive URL of the Google Cache URL is available: Replace with the archive of the Google Cache URL

Make any adjustments to the above for Wikidata. Experience has shown #1 is most common and #4 is least common.

Licence of data to import (if relevant)
Discussion

Since I already did this for Enwiki and have a bot programmed for it, I might be of help. I don't have time to develop a bot for Wikidata. If someone wants to send me data in a file extracted from Wikidata, which I process, then you can take the output file and feed it back into Wikidata, that would be fine also.

Request process

Request to replace "unisex név" with "uniszex név" in Hungarian descriptions (2024-04-04) edit

Request date: 4 April 2024, by: Adam78

"unisex név" in the Hungarian-language description field should be replaced with "uniszex név" (2.380 occurrences). Thanks in advance.
Discussion
Request process
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. This seems to be a duplicate. I've accepted the first request. --Azertus (talk) 08:22, 15 April 2024 (UTC)[reply]

Request to replace lintel(Q1370517) with translation(Q7553) in field of work(P101) (2024-04-08) edit

Request date: 8 April 2024, by: Adam78

The Czech word překlad is ambiguous; it means both lintel (Q1370517) and translation (Q7553). Unfortunately the wrong value has been added to many entries. For example, I just corrected this, but there are 500+ more occurrences.
Please replace all its values where it was entered for field of work (P101). We'll see how many instances remain and whether those that remain can be fixed manually or in another way (possibly with another property filter).
Discussion

@Adam78: I fixed about 400 occurences. Can you have a look and see if there are more? You mentioned 500+ but it all seems to be OK now (https://www.wikidata.org/w/index.php?title=Special:WhatLinksHere/Q1370517&namespace=0&limit=500).Vojtěch Dostál (talk) 06:54, 9 April 2024 (UTC)[reply]

Request process

Accepted by (Vojtěch Dostál (talk) 06:54, 9 April 2024 (UTC)) and under process[reply]

Thank you very much! My estimate was a bit inexact. It seems all are okay now. Adam78 (talk) 09:35, 9 April 2024 (UTC)[reply]
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Vojtěch Dostál (talk) 19:05, 9 April 2024 (UTC)[reply]