Wikidata:Bot requests

(Redirected from Wikidata:RBOT)
Bot requests
If you have a bot request, add a new section using the button and tell exactly what you want. To reduce the process time, first discuss the legitimacy of your request with the community in the Project chat or in the Wikiprojects's talk page. Please refer to previous discussions justifying the task in your request.

For botflag requests, see Wikidata:Requests for permissions.

Tools available to all users which can be used to accomplish the work without the need for a bot:

  1. PetScan for creating items from Wikimedia pages and/or adding same statements to items
  2. QuickStatements for creating items and/or adding different statements to items
  3. Harvest Templates for importing statements from Wikimedia projects
  4. OpenRefine to import any type of data from tabular sources
  5. WikibaseJS-cli to write shell scripts to create and edit items in batch
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2021/03.
Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 2 days.

cleanup of English descriptions for peopleEdit

Special:Search/"born:" "died:" haswbstatement:P31=Q5 finds some 10,000 items, many of which could use a cleanup of the description. --- Jura 14:33, 29 August 2019 (UTC)

  • Would still be good to have. --- Jura 21:19, 26 March 2020 (UTC)
@Jura1, Ham II :Should we fill the description with something like "Person" or just remove the description? --Kanashimi (talk) 08:55, 8 June 2020 (UTC)
For descriptions like "born 1874; born in ; died 1933; died in" a replacement of "(1874-1933)" would do [1]. If the dates are present as statements, one could remove them entirely and add something else in another run. Please don't add "person" or "human" as English description. --- Jura 08:59, 8 June 2020 (UTC)
It seems there are too many descriptions must be semantically analyzed to understand, and not easy to extract information. Sorry that I can not continue the task. :( --Kanashimi (talk) 11:39, 8 June 2020 (UTC)

I don't think this can be easily automated by bot because descriptions vary a lot. Some sort of machine learning might be able to achieve that, but simple RegEx search and replace will only either find too few cases, or create a lot of errors if it's too broad. Also I'd oppose automatic removal of dates from descriptions - there is sometimes a good reason for them to be there, eg. for simple disambiguation. Vojtěch Dostál (talk) 09:24, 5 November 2020 (UTC)

The search above gives much cleaner results (the request is from August 2019), but it's still needed. Searching with some offset finds still the more problematic ones [2]. The idea is not suppress these entirely, but to avoid raw data in the description. Also, I don't think full dates of birth are needed. Samples:
I think they are fairly straightforward to fix. Sample: [3]. This especially as P569/P570 are already present.
To clean this up, we should probably identify the source of these strings and ensure that people who generate them (or defined MxM catalogues) attempt to fix them. --- Jura 12:46, 6 November 2020 (UTC)
Ah, yes, these strings. I think they are created by Magnus Manske's Reinheitsgebot from reconciled VIAF groups without Wikidata item.Vojtěch Dostál (talk) 14:00, 9 November 2020 (UTC)
Some catalogues have cleaner output than others. I think it depends on who defined them and/or what data was available. --- Jura 14:09, 9 November 2020 (UTC)

Permissions for new qualifiers "latest start date" and "earliest end date"Edit

The new qualifiers latest start date (P8555) and earliest end date (P8554) should be valid anywhere start time (P580) and end time (P582) are valid. Can someone set a Bot to find the appropriate items and add these as valid qualifiers?

@Jheald, ArthurPSmith, Gamaliel, Sic19: FYI! - PKM (talk) 23:04, 18 September 2020 (UTC)

@PKM: Here's a query for the properties to update: https://w.wiki/cit Not sure whether QuickStatements can do this.
We should probably make sure that earliest date (P1319) and latest date (P1326) are both also permitted. Jheald (talk) 08:58, 19 September 2020 (UTC)
+1 - PKM (talk) 18:51, 19 September 2020 (UTC)

@PKM: You can use this Petscan query to do that. Vojtěch Dostál (talk) 09:34, 5 November 2020 (UTC)

@Vojtěch Dostál: Thank you! I should be able to figure that out. - PKM (talk) 05:59, 6 November 2020 (UTC)


@PKM: Can you double check your edits? I don't think many additions are of much use, e.g. at TikTok username (P7085), Commons quality assessment (P6731). In the future, please avoid making large scale changes to property constraints without prior discussion on project chat. --- Jura 05:47, 8 November 2020 (UTC)

Lucas Werkmeister (WMDE)
Jarekt - mostly interested in properties related to Commons
MisterSynergy
John Samuel
Sannita
Yair rand
Jon Harald Søby
Pasleim
Jura
PKM
ChristianKl
Sjoerddebruin
Salgo60
Fralambert
Manu1400
Was a bee
Malore
Ivanhercaz
Peter F. Patel-Schneider
Pizza1016
Ogoorcs
ZI Jony
Eihel
cdo256
  Notified participants of WikiProject property constraints --- Jura 05:48, 8 November 2020 (UTC)

It doesn't seem very problematic to me. Vojtěch Dostál (talk) 14:02, 9 November 2020 (UTC)
It's a problem when people start using them instead of the ones that are expected. --- Jura 14:23, 9 November 2020 (UTC)
But the rationale here is that latest start date (P8555) and earliest end date (P8554) are expected wherever start time (P580) and end time (P582) are expected... Vojtěch Dostál (talk) 16:03, 9 November 2020 (UTC)
Actually, one could use it always instead of the others. --- Jura 13:13, 11 November 2020 (UTC)

Cleanup VIAF datesEdit

Task description

There are a series of imports of dates that need to be fixed, please see Topic:Un0f1g1eylmopgqu and the discussions linked there, notably Wikidata:Project_chat/Archive/2018/10#Bad_birthdays with details on how VIAF formats them. --- Jura 05:28, 14 November 2018 (UTC)

Discussion
  • Is anyone interested in working on this problem? I think it's a real issue, but it needs attention from someone who can parse the VIAF records and that's certainly not me. - PKM (talk) 21:33, 16 March 2019 (UTC)
  • Yeah, it would be good. --- Jura 12:25, 19 July 2019 (UTC)
  • Still open, I think. --- Jura 21:19, 26 March 2020 (UTC)
  • I have no knowledge of coding, but I can provide some information on parsing the VIAF data. Birth and death dates are recorded using MARC tag 997. Using Ludwig van Beethoven (Q12368917) as an example, the corresponding Wikidata record at VIAF gives a 997 MARC tag value of "‡a 1712 1773 lived 0105 1224‏ ‎‡9 1‏". Our record says he was born on 5 January 1712, which corresponds with the 1712 and 0105 elements of the tag. Our record says he died on 24 December 1773, which corresponds with the 1773 and 1224 elements of the tag. The word in the middle of the tag represents the quality of the data; the options are "lived" (the dates represent the period of life) "circa" (the dates are approximations) and "flourished" (the dates represent the work period. See work period (start) (P2031) and work period (end) (P2032)). From Hill To Shore (talk) 09:55, 25 August 2020 (UTC)
I just realised that I should give an example of a living individual so that you can see how to interpret null values. Using Prince William, Duke of Cambridge (Q36812) as an example, the corresponding Wikidata record at VIAF gives a 997 MARC tag value of "‡a 1982 0 lived 0621 0‏ ‎‡9 1‏". Our record shows he was born on 21 June 1982, which corresponds with the 1982 and 0621 elements of the tag. The "lived" tells you that it is a birth date and not a floruit or approximate date. The 0 and 0 in the death year and death date fields tells you that either the person is still alive or that the data source doesn't have a value recorded. From Hill To Shore (talk) 10:05, 25 August 2020 (UTC)
  • I am taking a stab at cleaning the dates up. I will be working from a database dump in MARC21-XML format. For each record I will extract the birth and death dates in addition to the Wikidata item number. Finally I will add the dates with a bot to Wikidata. --Pyfisch (talk) 13:58, 25 August 2020 (UTC)
  • Thanks for looking into this/cleaning it up.
I'd use "alive/floruit" (P:P1317) for "lived" and "flourished".
For "circa", why not use the qualifier sourcing circumstances (P1480)=circa (Q5727902). Ideally on a separate statement from the references w/o circa. --- Jura 14:08, 25 August 2020 (UTC)
Precise values are tagged with "lived", therefore they should not be tagged with P:P1317. I've changed my proposal to use sourcing circumstances (P1480) circa (Q5727902) with "circa" and "flourished" instead of decreasing precision and If only a single value with "flourished" exists add them as floruit (P1317) as you suggested. But I am not sure if floruit (P1317) Münsterland (Q1700) with circa qualifier captures adequately that the person was active during the 18th century. --Pyfisch (talk) 14:49, 25 August 2020 (UTC)
  • "lived" are they really dates of birth/death?
    (If I recall it correctly, the main problem of VIAF fields is that VIAF calls it "date of birth" and then uses another field to define it as the date a person was first reported as being alive (what "floruit/alive" actually is).
    Accordingly I'd use P:P1317 as this is the Wikidata property for this type of information.
    "Work period" (at Wikidata) is somewhat different as it implies one knows much more about a creator/author and their work. It could end decades before they actually die. Accordingly, I'd also use P1317 for "flourished" values, but with two separate values. --- Jura 15:01, 25 August 2020 (UTC)
  • Values with lived should be real birth and death dates, although there are certainly errors in the dataset (e.g. Ebenezer Hewlett (Q18671012)). Check for example Constant Wurzbach or Daniel Kehlmann. The more I discuss "flourished", the more I am inclined to disregard these values entirely. I think are useful for comparing different datasets and find matches but don't seem that useful on their own. --Pyfisch (talk) 15:12, 25 August 2020 (UTC)
@Pyfisch: very good proposal! Just a few points:
  1. I'm not sure about why using sourcing circumstances (P1480) circa (Q5727902) for "flourished" label;
  2. references should certainly contain, after stated in (P248) and before retrieved (P813), a statement VIAF ID (P214) code (this is really important because often VIAF clusters are redirected or abandoned, so the user may wonder which cluster contain which information: this P214 statement, combined with P813, solves this problem of verifiability);
  3. I perfectly agree with the removal of statements with only reference stated in (P248) Virtual International Authority File (Q54919), with the removal of stated in (P248) Virtual International Authority File (Q54919) references if other references (N.B. not counting imported from Wikimedia project (P143)) and with updating existing statements if possible
  4. However, I don't agree so much with the perspective of adding VIAF as an additional reference if the statement is already present: if the statement has already one or more references (N.B. not counting imported from Wikimedia project (P143)), I think references to VIAF should not be added as it is superfluous (VIAF is not a source itself, but aggregates - sometimes not well - the dates from clusters' members, so the real sources are cluster members, not VIAF itself);
  5. Answering first question: I think we should not add a birth date with precision decade, if a more precise statement already exists; moreover, I think in general we should not add any birth or death date with a precision inferior to already existent statements (e.g. no precision "year" if a statement with precision "month" or "day" already exists)
  6. Answering second question: I think statements with precision "century" for "flourished" are useful and should be added
--Epìdosis 15:07, 25 August 2020 (UTC)
@Epìdosis: Moved your comment above the proposal, to keep the discussion in one place.
  1. Values for "flourished" can be vague. E.g. 1850 meaning the person lived sometime in the 19th century. (By the way it could also be that the person flourished exactly in 1850 but you can't tell the difference in their format.) This is the reason I suggest using sourcing circumstances (P1480) circa (Q5727902).
  2. Good point, added to proposal.
  3. -
  4. That is a fair point and should reduce the number of edits needed, added to proposal.
  5. Accepted into proposal.
  6. Accepted into proposal.
  7. What are the use-cases for these flourished statements? If we have a convincing use for them I am more inclined to add them.
Thanks for your detailed comment. --Pyfisch (talk) 18:15, 25 August 2020 (UTC)
  • I've extracted all the dates of birth/dates of death from the VIAF dump. There are 2.1 million records in VIAf with a linked Wikidata entry. Most of them contain some form of dob & dod, there are 26405 records labeled "circa" and "37384" records labeled "flourished". The "flourished" dates are most often full centuries (e.g. 1300) or half-century (e.g. 1950). For me unexpected is that many "circa" dates are year-month-day. I tried to track down the source of some dates and found them only in the Wikidata record. (See also: w:Wikipedia_talk:WikiProject_Biography#Date_of_birth_Catriona_Kelly) Adding reference "VIAF" to dob & dod would be pointless if the got that information from us in the first place. Pyfisch (talk) 13:37, 27 August 2020 (UTC)
    @Pyfisch: I agree, if VIAF takes Wikidata as a source, it is pointless (and misleading) adding VIAF as a source; proposal: could you try 50 or 100 test-edits adding dob & dod to items not yet having date of birth (P569) & date of death (P570)? In this way we would probably skip this problem. --Epìdosis 12:55, 30 August 2020 (UTC)

Pyfisch's bot runEdit

(please continue the discussion above)

For VIAF records with birth date and/or death date and an associated Wikidata item add:

If the item already has a more precise birth/death date don't add one from VIAF. If the item already has a sourced (excluding imported from Wikimedia project (P143) birth/death date of equal precision don't add VIAF as a source.

Add a reference to each statement:

Remove all existing instances of date of birth (P569), date of death (P570), work period (start) (P2031) and work period (end) (P2032) with only reference stated in (P248) Virtual International Authority File (Q54919). If they have additional references the VIAF reference is removed. When possible existing statements are updated, instead of deleting and recreating them. As usual, if a statement with an identical value already exists VIAF is only added as an additional reference.

Unresolved questions:

  • Should we add a birth date with precision decade, if a more precise statement already exists?
  • Are the statements with precision century for "flourished" useful, or should they be omitted?

 – The preceding unsigned comment was added by Pyfisch (talk • contribs).

Flourished dates removedEdit

@Jura1, Epìdosis: I removed all dob/dod statements only sourced by VIAF that are actually "flourished" dates. (See EditGroup (not yet visible) or Contributions) I will not work on adding data from VIAF because VIAF only aggregates other sources that should be used instead and the VIAF dates are often bad. These dates lack explicit precision information and while a flourished/circa/lived label is available it is frequently unclear if this applies to dob, dod or both. Do you agree to add the dates from other sources and not from VIAF? --Pyfisch (talk) 19:32, 5 October 2020 (UTC)

There are plenty of other incorrect dates from VIAF but they are not listed as "flourished" in VIAF. Some examples: Adaeus (Q346767), Giacomo Marzari (Q23823709), Stanisław Zaborowski (Q9343414) (dob should be circa), see query. Should they be removed too? --Pyfisch (talk) 19:55, 5 October 2020 (UTC)
@Pyfisch: "I will not work on adding data from VIAF because VIAF only aggregates other sources that should be used instead and the VIAF dates are often bad": I perfectly agree. "Do you agree to add the dates from other sources and not from VIAF?" Yes. "There are plenty of other incorrect dates from VIAF but they are not listed as "flourished" in VIAF. [...] Should they be removed too?" Yes, of course, if they haven't sources other than VIAF. Thanks for the great job and keep us informed, --Epìdosis 20:24, 5 October 2020 (UTC)

Import Treccani IDsEdit

Request date: 6 February 2019, by: Epìdosis

Task description

At the moment we have four identifiers referring to http://www.treccani.it/: Treccani's Dizionario biografico degli italiani ID (P1986), Treccani ID (P3365), Treccani's Enciclopedia Italiana ID (P4223), Treccani's Dizionario di Storia ID (P6404). Each article of these works has, in the right column "ALTRI RISULTATI PER", a link to the articles regarding the same topic in other works (e.g. Ugolino della Gherardesca (Q706003) Treccani ID (P3365) conte-ugolino, http://www.treccani.it/enciclopedia/conte-ugolino/ has links also to Enciclopedia Italiana (Treccani's Enciclopedia Italiana ID (P4223) and Dizionario di Storia (Treccani's Dizionario di Storia ID (P6404)). This cases are extremely frequent: many items have Treccani's Dizionario biografico degli italiani ID (P1986) and not Treccani ID (P3365)/Treccani's Enciclopedia Italiana ID (P4223); others have Treccani ID (P3365) and not Treccani's Enciclopedia Italiana ID (P4223); nearly no item has Treccani's Dizionario di Storia ID (P6404), recently created.

My request is: check each value of these identifiers in order obtain values for the other three identifiers through the column "ALTRI RISULTATI PER".

Discussion

Fix local dialing code (P473) wrongly insertedEdit

Request date: 7 November 2019, by: Andyrom75

Task description

Several entities has a wrong value for the local dialing code (P473) according to the format as a regular expression (P1793) specified in it: [\d\- ]+, as clarified "excluded, such as: ,/;()+"

Typical examples of wrong values, easily identified are the following two:

  1. local dialing code (P473) that includes at the beginning the country calling code (P474)
  2. local dialing code (P473) that include at the beginning the "optional" zero
  • Case 1 can be checked looking for "+", when present, should be compared with the relevant country calling code (P474) and if matched, it should be removed
  • Case 2 can be checked looking for "(" and ")" with zeros inside. If matched it should be removed
Discussion
Request process

weekly import of new articles (periodic data import)Edit

To avoid Wikidata getting stale, it would be interesting to import new papers on a weekly basis. Maybe with a one week delay. This for repositories where this can be done.

@Daniel Mietchen: --- Jura 12:16, 7 August 2019 (UTC)

I'd certainly like to see this tested, e.g. for these two use cases:
  1. https://www.ncbi.nlm.nih.gov/pubmed/?term=zika
  2. all of PubMed Central, i.e. articles having a PMCID (P932), which point to a full text available from PMC.
--Daniel Mietchen (talk) 03:08, 23 August 2019 (UTC)
The disadvantage of skipping some might be that one wouldn't now if it's complete or not. --- Jura 17:00, 25 August 2019 (UTC)
  • Still good to have. --- Jura 21:19, 26 March 2020 (UTC)
@Jura1: Is it fine to import all articles weekly? It seems will be 31K articles every week. --Kanashimi (talk) 10:16, 23 June 2020 (UTC)
  • Yes, I still think that would be useful. --- Jura 11:58, 23 June 2020 (UTC)

@Daniel Mietchen: This seems like a WikiCite request, where do you think we should discuss this? I'm inclined to close the request here and discuss implementation elsewhere (somewhere where it may attract more attention).Vojtěch Dostál (talk) 14:08, 9 November 2020 (UTC)

  • @GZWDer: might be doing some of it. If it's something that can or should be done, I don't the request should be removed from here. --- Jura 14:20, 9 November 2020 (UTC)

Bringing SourceMD full back online would be good. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:05, 11 November 2020 (UTC)

@Jura1, Daniel Mietchen, Pigsonthewing, Vojtěch Dostál: Maybe I can do this. I just wonder if we should import all articles. --Kanashimi (talk) 00:29, 18 November 2020 (UTC)

@Kanashimi: We definitely should :) If we want to "create a large open bibliographic database within Wikidata" (see Wikidata:WikiProject Source MetaData), we don't have a choice.Vojtěch Dostál (talk) 19:51, 22 November 2020 (UTC)
I think "new" is different from "all". Anyways, the above suggests a one week delay, but maybe the optimal one is different. --- Jura 08:26, 24 November 2020 (UTC)

@Jura1, Daniel Mietchen, Pigsonthewing, Vojtěch Dostál: Hi, I have make a new request in Wikidata:Requests for permissions/Bot/Cewbot 4. Please give me some suggests, thank you. --Kanashimi (talk) 03:54, 27 November 2020 (UTC)

@Daniel Mietchen, Jura1: Are there any model entity so I can know what to import? --Kanashimi (talk) 04:51, 27 November 2020 (UTC)

Periodic update of identifiers' alphabetic sortingEdit

Request date: 6 April 2020, by: Epìdosis

Link to discussions justifying the request
Task description

An admin-bot (as MediaWiki:Wikibase-SortedProperties is protected) should periodically (e.g. every week):

  • consult the list of all properties
  • choose only the properties with "external-ID" datatype
  • exclude all the properties which are present in one of the sections before "IDs with type "external-id" - alphabetical order"
  • edit the section "IDs with type "external-id" - alphabetical order" inserting a list of all the remaining properties, according to this format:

* Pnumber (English label)

Discussion


Request process

Upgrade URLs to HTTPS, whenever possibleEdit

Request date: 4 May 2020, by: OsamaK

Task description

With over 85% of all web requests encrypted via HTTPS, the new normal has shifted to expecting web links to lead to an HTTPS version whenever a website supports it. There are two credible sources that maintain lists of supported websites: Chrome's HSTS Preload List (used by hundred of millions of Chrome/Chromium users) and EFF's HTTPSEverywhere's atlas (used by 2 million+ Chrome users and ~1 million Firefox users). Applying thes lists would lead to the exact same versions of any given web page, but via HTTPS whenever possible. I would like to create a bot that upgrades the HTTP URLs of the following properties to HTTPS:

--OsamaK (talk) 18:24, 4 May 2020 (UTC)

Discussion
  •   Support I asked @Laurentius: to do something similar on Italian Wikipedia and it worked very well, maybe his bot is available also for Wikidata ;-) --Epìdosis 22:28, 14 June 2020 (UTC)
    Yes, I did exactly the same job on Italian Wikipedia (using the same sources of data: Chrome's HSTS preload list and HTTPS Everywhere). I could easily adapt my bot to Wikidata (it is actually easier than working on Wikipedia); I'd be happy to do it, and in any case   Support. - Laurentius (talk) 18:19, 28 June 2020 (UTC)
  • That's great! Thank you, Epìdosis for bringing up this experience. At this stage, it's clear that enough time has passed for this bot approval, and since I don't yet have working code, could you, Laurentius, take this bot task on and start testing and documenting the initial tests here?--OsamaK (talk) 13:31, 3 July 2020 (UTC)
  •   Support --Haansn08 (talk) 12:16, 30 August 2020 (UTC)
  •   Support and @Laurentius: Hello Laurentius, are you planning to work on this? I am trying to clean up this page a bit :) Vojtěch Dostál (talk) 15:58, 9 November 2020 (UTC)
Request process

Translation of the Indian Women Entreprenuer in GujaratiEdit

Request date: 7 May 2020, by: Haisebhai

Link to discussions justifying the request

https://www.wikidata.org/wiki/Wikidata:Dataset_Imports/Translation_of_the_Indian_Women_Entreprenuer_in_Gujarati#DATA

Task description


Licence of data to import (if relevant)
Discussion


Request process

@Haisebhai: What do you want us to do with these data? Import them as labels to the items? This can be done with QuickStatements tool.Vojtěch Dostál (talk) 14:11, 9 November 2020 (UTC)

Getting number of cases (P1603) to rank most current figure as preferredEdit

Request process

Request date: 6 June 2020, by: Sdkb

Link to discussions justifying the request
Task description

Following up from here, could number of cases (P1603), number of deaths (P1120), and number of recoveries (P8010) be made so that they automatically mark the most recent value as preferred?

Licence of data to import (if relevant)
Discussion


Request process

@Sdkb: This is a broader issue, isn't it? Should more properties work like this? Maybe this should be discussed at Project Chat again.Vojtěch Dostál (talk) 14:13, 9 November 2020 (UTC)

Create items for books at CommonsEdit

c:Category:Files from the Biodiversity Heritage Library includes plenty of books (thousands).

Ideally we would have items for most of them. Filenames generally include Internet Archive ID (P724). --- Jura 14:39, 12 July 2020 (UTC)

For these, merely document file on Wikimedia Commons (P996) should be added. --- Jura 06:48, 21 August 2020 (UTC)
  Support but ideally the task would be followed by a run on commons to add digital representation of (P6243) to SDC of those files, otherwise Commons can not use those items.--Jarekt (talk) 15:30, 21 August 2020 (UTC)
  •   Support but be smart about it. Many works already have items. Many works have also multiple copies at the Internet Archive or elsewhere. Not every scanned copy of a book needs its own item, even if they have unique Internet Archive or BHL identifiers, as I've argued here. Another good idea for a (bot assisted) drive would be to to add the corresponding QID to the {{Book}} templates on Commons. And Jarekt: of course Commons can use those items even without P6243, just as it could have before SDC, or Wikidata itself existed. Right? -Animalparty (talk) 20:06, 29 August 2020 (UTC)
Animalparty, adding digital representation of (P6243) to SDC and "add[ing] the corresponding QID to the {{Book}} templates on Commons" are equivalent operations as c:template:Book gets wikidata item from digital representation of (P6243) if it is not provided to the template. --Jarekt (talk) 01:37, 30 August 2020 (UTC)

Complete Google Books ID (P675) from Internet Archive ID (P724)Edit

Many books in the Internet Archive come from Google Books (generally indicated as source). Accordingly, we could probably complete Google Books ID (P675) from Internet Archive ID (P724).

Maybe the inverse could be done too (find IA based on GB). --- Jura 11:25, 18 July 2020 (UTC)

@Jura1: Can you give an example of a book item which could be completed like that? Vojtěch Dostál (talk) 09:41, 5 November 2020 (UTC)
SELECT ?item ?itemLabel ?value
{
  ?item wdt:P724 ?value .
  FILTER ( contains(?value, "goog") )
  FILTER NOT EXISTS { ?item wdt:P675 [] }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"  }
}
LIMIT 100

Try it!

It probably works for all the above. Probably many more. --- Jura 10:26, 5 November 2020 (UTC)

@Jura1: Like this? Vojtěch Dostál (talk) 13:04, 5 November 2020 (UTC)
Looks good. I think had just added P724 as ref, but I suppose your approach is the more complete one.
BTW I'm not entirely sure how to identify the ones without "goog" in the id. --- Jura 12:55, 6 November 2020 (UTC)
@Jura1: OK. I'll import those. Should I ignore the constraint warning such as the one in Cur Deus homo (Q83522)? Or should I skip it if not instance of/subclass of version, edition, or translation (Q3331189)? Vojtěch Dostál (talk) 19:20, 6 November 2020 (UTC)
It seems the constraints between the two differ despite the fact that both identify the same. Personally, I think it helps determine if something should be done with the IA identifier.
Will you test all 217324 IA identifiers without P675 or just the 407 goog ones? --- Jura 19:42, 6 November 2020 (UTC)
Hmm, OK. I can try it with all but the scraping will take ages, I'll tell you when it's done. Vojtěch Dostál (talk) 19:21, 8 November 2020 (UTC)

ORCID for Wikidata 3Edit

Request date: 27 July 2020, by Eva: EvaSeidlmayer

Regarding the reasonable objection by Andrew (see my thread of 3.June 2020) I reviewed the project for matching authors and papers in Wikidata. Now, only already in Wikidata existing authors-items and already in Wikidata existing publications-items are matched.

ORCID contains in 2019 eleven archive files. For the first archive file we had been able to detected:

 457K Wikidata publication-items (3.8M publications in total)
 425K publication-items do not have any author-item registered
  32K publications are identified in Wikidata with registered authors
   of those 32K publication-items: 
             3.7K author-items listed in Wikidata are correct allocated to their publication-items (11.7%)
             4.2K author-items listed in Wikidata are not yet allocated to publication-items (24.6%)
             The other authors are not registered to Wikidata yet.

These are the numbers only for the *first* of *eleven* ORCID-files. Would be cool to introduce the matching of authors to publications on ORCID basis.

Please check the GitHub repo: https://github.com/EvaSeidlmayer/orcid-for-wikidata

Cheers, Eva

Link to discussions justifying the request
Task description
Licence of data to import (if relevant)

The data provided by ORCID is licenced with CC0: https://orcid.org/about/what-is-orcid/principles

Discussion
Request process

Administrative division code of Chongqing Municipal District, People's Republic of ChinaEdit

Request date: 8 August 2020, by: RedLightPOP

Link to discussions justifying the request
Task description

开县, 梁平县, and 武隆县 were turned counties into districts, administrative division codes have also changed. China administrative division code in the data items is changed as follows:

  • The beginning of the value needs to be changed from 50 02 34 to 50 01 54.
  • The beginning of the value needs to be changed from 50 02 28 to 50 01 55.
  • The beginning of the value needs to be changed from 50 02 32 to 50 01 56.
Licence of data to import (if relevant)
Discussion

@RedLightPOP: The existing statements should not be changed. Instead, add a new statement with the newly-valid value and set it as BestRank. Vojtěch Dostál (talk) 12:43, 5 November 2020 (UTC)

Request process

Copy lemma to F1 (perodic creation of forms for lexemes)Edit

(The request at Wikidata:Bot_requests/Archive/2018/10#Copy_lemma_to_F1 is still needed)

For lexemes without forms, could a bot copy the lemma to this form? Sample edit: https://www.wikidata.org/w/index.php?title=Lexeme:L8896&diff=772695679&oldid=772692662

Please skip any lexemes that already have forms. --- Jura 10:01, 12 March 2020 (UTC)

Seems still useful. --- Jura 15:21, 11 September 2020 (UTC)

Creating pages for Portuguese Wikisource anthemsEdit

Request date: 23 September 2020, by: NMaia

Link to discussions justifying the request
Task description

The Portuguese Wikisource has a bunch of pages unconnected to Wikidata, and a large chunk of that is made up of anthems of Brazilian or Portuguese towns (e.g. Hino do município de Bujaru -- they usually start with Hino d*). It would be useful to have them imported into Wikidata with the statements instance of (P31) -> hymn (Q484692), language of work or name (P407) -> Portuguese (Q5146) and country (P17) -> Brazil/Portugal/Angola etc. The real usefulness of the bot would be looking up the items for cities and adding anthem (P85) to them, with the relevant item. NMaia (talk) 14:05, 23 September 2020 (UTC)

Licence of data to import (if relevant)
Discussion
Request process

fix ALLCAPS of items imported from MICEdit

Request date: 5 October 2020, by: Vladimir Alexiev

Link to discussions justifying the request

https://www.wikidata.org/wiki/Topic:Vv1zfojnvvo11oj8 initiated by @Jura1:

Task description

I have imported a bunch of items with MIC market code (P7534) (stock exchanges and the like), see https://editgroups.toolforge.org/b/OR/ab49ffaac2/.

Some of them come with ALLCAPS names or descriptions, so they are listed at https://www.wikidata.org/wiki/Wikidata:Database_reports/Complex_constraint_violations/P7534.

Can someone help with fixing the names and descriptions to "Title Case"? (I thought descriptions should be in "Sentence case" but very often they also contain the exchange name)

Please note that prepositions should be in lower case, eg "BOLSA DE COMERCIO DE SANTA FE" should be come "Bolsa de Comercio de Santa Fe".

Licence of data to import (if relevant)
Discussion
  • The linked constraint page lists 10 items with all-caps labels and 10 items with all-caps descriptions. You should fix this small number by hand as writing a bot for this would take hours and the correct (automatic) handling of prepositions is difficult. --Pyfisch (talk) 10:14, 5 October 2020 (UTC)
  • 10 is just a selection. There are many more. --- Jura 10:18, 14 October 2020 (UTC)

The task is more difficult since there are many acronyms that must be left as is (eg APA, OTF, OTP, NASDAQ, STOXX, etc). So the bot should only change (capitalize) usual words found in a dictionary --Vladimir Alexiev (talk) 02:49, 11 December 2020 (UTC)

Request process

Research institutes in FranceEdit

SELECT ?item ?itemLabel ?itemDescription ?v ?place ?placeLabel
{
    ?item p:P131 [ pq:P6375 ?v ; ps:P131 ?place ] .
    ?place wdt:P17 wd:Q142 . 
    SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en" .}          
}

Try it!

When cleaning up P969 qualifiers, I came across some 2000 items about French institutions that seem to use P131 qualifiers incorrectly, sample: Q30261385#P131 or Q51784475#P131. I'm not entirely sure of the best way to fix it.

The above query should eventually find most/all of them.

@ArthurPSmith, OdileB: who created or copied some of them. --- Jura 10:32, 9 November 2020 (UTC)

These should be definitely converted to headquarters location (P159). The institutes are not buildings hence they are not "located" there, rather they are organizations which have headquarters there.Vojtěch Dostál (talk) 12:41, 9 November 2020 (UTC)
  • They seem to be localized facilities, so maybe location (P276) could work as well, but my interest in the question is rather limited ;) --- Jura 12:46, 9 November 2020 (UTC)
The P131 statement in at least the first of these examples was added by User:PintochBot, and then the qualifiers added by User:Pintoch. Not sure how they are supposed to be though? Info came from the "French national research structure identifier"? ArthurPSmith (talk) 17:31, 9 November 2020 (UTC)
If I remember correctly I hesitated on this because small research institutes like those are often marked as "facilities" by GRID, so for those P131 feel more appropriate. I have never been really satisfied with this tension between these two properties and will clearly not run after anyone who wants to change this. − Pintoch (talk) 10:37, 14 November 2020 (UTC)

Year-qualifier for "students count" (P2196) valuesEdit

SELECT DISTINCT ?item ?itemLabel ?sl
{
  ?item wdt:P2196 ?value .
  FILTER NOT EXISTS { ?item p:P2196 / pq:P585 [] }
  ?item wikibase:sitelinks ?sl .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!

The items above have student counts, but no point in time (P585)-qualifier (currently 2440 of 56228 items with the property). It would be good to find a way to add the year to these.

I noticed some come from dewiki infoboxes which don't include a year either. --- Jura 06:34, 29 November 2020 (UTC)

So @Jura1: where can we get the years from? Also, the way the query is written doesn't show the full extent of the problem. If a University has 1 claim with year and 100 without year, you won't count it --Vladimir Alexiev (talk) 02:55, 11 December 2020 (UTC)

I'm not sure about a possible source. Maybe another language Wikipedia infobox, Wikipedia article text or an external source.
Agree that more statements might need completion, but the above are most in need of it. --- Jura 07:16, 11 December 2020 (UTC)

Modules and template: sync with from sourceEdit

To make a template work on Wikidata, I imported a series of modules from frwiki. See list at Topic:Vyn971t0krpm0n8m.

It would be helpful if there was a way to have these periodically checked for updates (daily or weekly?) and, if there is an update, re-imported from source.

Ideally, I suppose the modules here would be semi or fully protected.

Maybe we could define a badge for use on sitelinks to define the source and automatically sync copies. Sample: at Q14920430 source is frwiki and automatically copy at wikidatawiki. --- Jura 08:33, 29 November 2020 (UTC)

I mentioned that last point at Wikidata:Project_chat#New_badges_for_templates_and_modules. --- Jura 08:53, 29 November 2020 (UTC)

Identifiant National Football LeagueEdit

Request date: 3 December 2020, by: Sismarinho

NFL.com ID (former scheme) (P3539)
  • (Sorry french) Bonjour, il y a un problème avec les identifiants NFL de nombreuses fiches depuis que l'architecture du site de la NFL a changé. Désormains c'est le nom de la personne. Par exemple pour Bronko Nagurski (Q927663) c'est bronko-nagurski. Un bot peut-il faire cette requête ?
Task description
Licence of data to import (if relevant)
Discussion
  • Please propose a new property for the new scheme. Once done, this could be filled by bot or in some other way. --- Jura 07:45, 3 December 2020 (UTC)
Request process

Move publisher from qualifier to referenceEdit

Request date: 3 December 2020, by: 4ing

Link to discussions justifying the request
Task description

For several municipality of the Netherlands (Q2039348), publisher (P123) has erroneously been added as a qualifier to population (P1082). It should be moved to the reference. In addition, some references include point in time (P585), which should be converted to a qualifier. And finally, the most recent value for population (P1082) should be set to preferred rank. See Vlissingen (Q10084) (relevant version) as an example.

Discussion


Request process

Preferred rank for areas of German municipalitiesEdit

Request date: 3 December 2020, by: 4ing

Link to discussions justifying the request
Task description

Most urban municipality of Germany (Q42744322) and municipality of Germany (Q262166) have two values for area (P2046): one imported from other Wikimedia projects or DBpedia (Q465), and one with DESTATIS (Q764739) as reference (incl. reference URL (P854), title (P1476), archive URL (P1065) etc.). The latter also has point in time (P585) as qualifier. This latter value should be given preferred rank if not already done manually. Example: Borkum (Q25082).

Discussion


Request process

Replace imported from Wikimedia project (P143)=Minor Planet Center (Q522039)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q522039 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 48000 statements --- Jura 09:57, 6 December 2020 (UTC)


Replace imported from Wikimedia project (P143)=Historic England (Q19604421)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q19604421 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 4300 statements --- Jura 09:57, 6 December 2020 (UTC)

Replace imported from Wikimedia project (P143)=Terrassa Museum (Q4894452)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q4894452 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,ca". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 5100 statements --- Jura 09:57, 6 December 2020 (UTC)

@ESM: Can you fix your batch from 4 years ago? I'm looking at Q27651146 and in addition to the above, I see the following problems:

  • there's neither link nor image. If I cannot SEE this chair, what is the value of having this record in WD?
  • stupid title "Cadira inv 15283". How can I determine if I'm interested in this item from its number alone?
  • it IS a chair, it does not DEPICT a chair
  • made of Tissue? Doubt that very much since the creator is not Dr Hannibal

We don't need dumps of defective museum catalog exports in WD! Let's not turn the Sum of All Paintings idea into Sum of All Junk. Fix your stuff or I request a bot to delete all these items --Vladimir Alexiev (talk) 03:11, 11 December 2020 (UTC)

@Vladimir Alexiev: I'd have appreciated you being a bit less harsh. Will try mending things when I have time to do so. Feel free to do whatever you please with the data you don't like. --ESM (talk) 08:53, 14 December 2020 (UTC)
  • @ESM, Vladimir Alexiev: the item is fairly complete (I wish more were like that). There is no requirement that there is a link to website, but link to the publication used to source this could be helpful (see @MisterSynergy: comment below). "material used"="tissue" could use an "applies to part" qualifier and another item for "tissue" (currently: Q40397). depicts (P180) isn't useful if instance of (P31)="chair" is correct. I can help fix them. --- Jura 12:27, 14 December 2020 (UTC)
  • removed 29 depicts statements found with [4]. Also, about the labels: I'm not really convinced that "cadira" would be better than the current "Cadira inv 15283". --- Jura 12:36, 14 December 2020 (UTC)
  • @ESM: Sorry, guess I had a bad day.
  • @Jura1: title="chair"@en, descr="Terassa museum, inventary number 15283" --Vladimir Alexiev (talk) 20:53, 15 December 2020 (UTC)

I can offer to make the edits as I already have bot code available that moves from imported from Wikimedia project (P143) to stated in (P248) and also changes the value item from Terrassa Museum (Q4894452) to something else. However, I do not want to create museum catalog items by myself as that often requires expertise that I do not have. So if you can set up such a museum catalog item, I can make the replacements in all the items efficiently—a task that would otherwise be quite difficult. —MisterSynergy (talk) 10:26, 14 December 2020 (UTC)

@MisterSynergy: Thank you very much for your offer and please excuse my silence for such a long time. Feel free to move all the references that use imported from Wikimedia project (P143) to stated in (P248). I'm not sure if we should change the value from Terrassa Museum (Q4894452) to something else like "Catalog of Museu de Terrassa's collection", which would have instance of (P31) = collection catalog (Q5146094). @Jura1: what do you think about this? The data (in this case and similar ones you spotted too) come from a database dump from the museum's collections management system, so I'm afraid there's no URL or similar tangible element to point as a source.
On the other hand, I'm aware of mistakes such as using tissue (tissue (Q40397) instead of woven fabric (Q1314278). They are originated in translation mistakes since both words in my language are written the same way and I failed at checking the Qs before uploading the statements. I'm sorry about that and would like to mend it, even though I'm struggling to find the time to do so. --ESM (talk) 17:18, 15 February 2021 (UTC)

Replace imported from Wikimedia project (P143)=Municipal Institute of Museums of Reus (Q23687366)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q23687366 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 5800 statements --- Jura 09:57, 6 December 2020 (UTC)

  • @ESM: similar to the previous, would you make an item with could use as value for stated in (P248) ? --- Jura 13:25, 18 December 2020 (UTC)

Replace imported from Wikimedia project (P143)=Landesfilmsammlung Baden-Württemberg (Q24469969), Haus des Dokumentarfilms (Q1590879)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q24469969 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q1590879 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The two queries currently find ca. 7800 statements --- Jura 09:57, 6 December 2020 (UTC)


Replace imported from Wikimedia project (P143)=Museu d'Art Jaume Morera (Q5476145)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q5476145 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 7800 statements --- Jura 09:57, 6 December 2020 (UTC)


Replace imported from Wikimedia project (P143)=Istat (Q214195)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q214195 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 8000 statements --- Jura 09:57, 6 December 2020 (UTC)

Replace imported from Wikimedia project (P143)=Historical Commission of the Bavarian Academy of Sciences (Q1419226)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q1419226 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 8400 statements --- Jura 09:57, 6 December 2020 (UTC)

Replace imported from Wikimedia project (P143)=Natural History Museum (Q309388)Edit

SELECT ?item ?itemLabel ?prop ?propLabel ?value ?valueLabel ?st
WHERE
{
  ?st prov:wasDerivedFrom/pr:P143 wd:Q309388 .
  hint:Prior hint:rangeSafe true .
  ?item ?p ?st .
  ?prop wikibase:claim ?p ; wikibase:statementProperty ?ps .
  ?st ?ps ?value 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Try it!

As imported from Wikimedia project (P143) should only be used with WMF projects, the above should be replaced with some other property, e.g. "stated in" or "publisher", and possibly a different value. The query currently finds ca. 23000 statements --- Jura 09:57, 6 December 2020 (UTC)


replace reference URL (P854) = PetscanEdit

SELECT *
WHERE
{
  hint:Query hint:optimizer "None".
  ?ref pr:P854 ?value .
  FILTER( REGEX( STR( ?value ), "petscan" )  )
  ?statement prov:wasDerivedFrom ?ref;
}
LIMIT 200

Try it!

reference URL (P854) could be replaced with Wikimedia import URL (P4656). --- Jura 12:35, 6 December 2020 (UTC)

Sample edit [5]. Not sure how to do it with wikibase-cli --- Jura 13:40, 18 December 2020 (UTC)

Create person items from Wikisource entries (matr. Oxonienses)Edit

SELECT ?item ?itemLabel ?itemDescription
{
	?item wdt:P1433 wd:Q19036877 . 
	FILTER NOT EXISTS { ?item wdt:P921 [] }
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Try it!

For the above, it could be interesting to create an item for each person without main subject (P921) (currently 27400).

Sample: Everest, Robert (Q94820208) with s:Everest,_RobertRobert Everest (Q104057081).

for info: @Miraclepine, Charles Matthews: --- Jura 19:20, 9 December 2020 (UTC)

Of course I have wondered about this. I think the proportion of people there who are not really notable would be at least 50%. That isn't a definitive argument, but I regard having to sift through numerous country vicars to do a disambiguation run as fairly undesirable. It would be rather better to have them in mix'n'match by some device. That would correspond to what has gone in with Cambridge alumni. Charles Matthews (talk) 19:39, 9 December 2020 (UTC)
  • I tried to find a way to use Mix'n'match for Wikisource entries, but people seemed to think that it's not desirable.
Maybe some filtering should be done beforehand, but it shouldn't be too complex to identify duplicates based on YOB and name once the items created. --- Jura 19:44, 9 December 2020 (UTC)
  • Maybe ORCID is the better comparison in terms of notability. --- Jura 13:30, 18 December 2020 (UTC)

Cleaning of streaming media services urlsEdit

Request date: 12 December 2020, by: Swicher

I'm not sure if this is the best place to propose it but when reviewing the urls of a query with this script:

import requests
from concurrent.futures import ThreadPoolExecutor

# Checks the link of an item, if it is down then saves it in the variable "novalid"
def check_url_item(item):
    # Some sites may return error if a browser useragent is not indicated
    useragent = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77'
    item_url = item["url"]["value"]
    print("Checking %s" % item_url, end="\r")
    req = requests.head(item_url, headers = {'User-Agent': useragent}, allow_redirects = True)
    if req.status_code == 404:
        print("The url %s in the element %s returned error" % (item_url, item["item"]["value"]))
        novalid.append(item)

base_query = """SELECT DISTINCT ?item ?url ?value
{
%s
  BIND(IF(ISBLANK(?dbvalue), "", ?dbvalue) AS ?value)
  BIND(REPLACE(?dbvalue, '(^.*)', ?url_format) AS ?url)
}"""
union_template = """  {{
    ?item p:{0} ?statement .
    OPTIONAL {{ ?statement ps:{0} ?dbvalue }}
    wd:{0} wdt:P1630 ?url_format.
  }}"""
properties = [
    "P2942", #Dailymotion channel
    "P6466", #Hulu movies
    "P6467", #Hulu series
]
# Items with links that return errors will be saved here
novalid = []

query = base_query % "\n  UNION\n".join([union_template.format(prop) for prop in properties])
req = requests.get('https://query.wikidata.org/sparql', params = {'format': 'json', 'query': query})
data = req.json()

# Schedule and run 25 checks concurrently while iterating over items
check_pool = ThreadPoolExecutor(max_workers=25)
result = check_pool.map(check_url_item, data["results"]["bindings"])

I have noticed that almost half are invalid. I do not know if in these cases it is better to delete or archive them but a bot should periodically perform this task since the catalogs of streaming services tend to be very changeable (probably many of these broken links are due to movies/series whose license was not renewed). Unfortunately I could only include Hulu and Dailymotion since the rest of the services have the following problems:

For those sites it is necessary to perform a more specialized check than a HEAD request (like using youtube-dl (Q28401317) for Youtube).

In the case of Hulu I have also noticed that some items can have two valid values in Hulu movie ID (P6466) and Hulu series ID (P6467) (see for example The Tower of Druaga (Q32256)) so you should take that into account when cleaning links.

Request process

Removing unnecessary disambiguation bracketsEdit

Request date: 18 December 2020, by: Bencemac

Link to discussions justifying the request
Task description

Please help me clean some Hungarian labels, where country (P17) is Hungary (Q28) and instance of (P31) is a subclass of church building (Q16970). Previously, the labels were correct and followed Help:Labels, but now, thanks to a user, they are totally messed up with unnecessary disambiguation brackets. I started undoing their wrong edits, but there are too many to handle them one by one.

Please change Name of the church (location) to Name of the church like this. Thanks in advance! Bencemac (talk) 09:22, 18 December 2020 (UTC)

Discussion


Request process

Accepted by (Edoderoo (talk) 19:33, 29 December 2020 (UTC)) and under process
  Done Task completed (19:33, 29 December 2020 (UTC))

Add info about subject to items with generic title "obituary"Edit

SELECT DISTINCT ?item ?itemLabel ?itemDescription ?pubvenueLabel
{
	?item wdt:P31 wd:Q13442814 . 
    { ?item rdfs:label "OBITUARY"@en } UNION { ?item rdfs:label "Obituary"@en }
	FILTER NOT EXISTS { ?item wdt:P921 [] }
    OPTIONAL { ?item wdt:P1433 ?pubvenue }
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Try it!

It seems with have more than 2500 items where we lack info about the person (P921).

It would be helpful if the name and possibly the lifespan of the person could be added to the items description or elsewhere. --- Jura 11:33, 18 December 2020 (UTC)

Trailing space (" ") in labelsEdit

Somehow I thought it wasn't possible, but the Italian label at "Diva" had and at Ariodante (Q22813616) still has a trailing space. Edits are from 2015/2016. I think it would be good to clean this up. Not sure what would be most efficient way to find them. --- Jura 17:48, 19 December 2020 (UTC)


Malformed entries related to Semantic ScholarEdit

SELECT ?item ?l WHERE { ?item wdt:P4012 [] ; rdfs:label ?l . FILTER( lang(?l) = "en" && REGEX( ?l, "^.+ [A-Z][A-Z]$") ) } LIMIT 2000

Try it!

The above finds ca. 2000 entries. I think it would be worth checking if these don't need to be converted

from: <surname>, <initials of given name>
to: <initials of given name> <surname>

--- Jura 21:06, 20 December 2020 (UTC)

Challenge taken. Script is running. I'll move the old style names to the alias, and reformat all existing label-languages. Edoderoo (talk) 13:43, 29 December 2020 (UTC)

  Done You're query now remains empty. Edoderoo (talk) 15:05, 29 December 2020 (UTC)

Bulk create items for given names in Russian (Cyrillic script)Edit

SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
{
	?item wdt:P31/ wdt:P279? wd:Q202444 . 
	?item wdt:P282 wd:Q8209 . 
	?item wdt:P407 wd:Q7737 .
	FILTER NOT EXISTS { ?item wdt:P282 wd:Q8229 }  
	SERVICE wikibase:label { bd:serviceParam wikibase:language "ru,en" }
}

Try it!

Sample item: Q104431130

Currently the above only finds some 310 items (or 286 if one excludes the ones that incorrectly mix them with Latin script given name items).

A few more might be available, but incomplete.

I think it would be interesting to have a more complete dataset available. --- Jura 23:03, 22 December 2020 (UTC)

Replace pr:P1343 with pr:P248Edit

SELECT ?item ?itemLabel ?value ?valueLabel ?statement
WHERE
{
	{
		SELECT DISTINCT ?item ?value ?statement
		WHERE
		{
			?ref pr:P1343 ?value .
			?statement prov:wasDerivedFrom ?ref .
			?item ?p ?statement .
		}
	} .
	FILTER( ?item NOT IN ( wd:Q4115189, wd:Q13406268, wd:Q15397819 ) ) .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}

Try it!

The query finds some 3570 references using described by source (P1343). As Epìdosis noted on Property_talk:P1343#Use_in_references, stated in (P248) would generally be the property to use. --- Jura 08:29, 23 December 2020 (UTC)

Thanks Jura. I had reported the problem directly to @Ladsgroup: who had fixed some of them with his bot, but evidently there are some more needing intervention. --Epìdosis 09:16, 23 December 2020 (UTC)
Yeah, I have been cleaning it up for a while now, I just need to constant re-run it. Hopefully, it'll be done soon Amir (talk) 06:23, 24 December 2020 (UTC)
If new ones come up in bulk, the relevant user (or bot operator) should be advised.
Trying to figure out where they came from, I found https://www.wikidata.org/w/index.php?title=Q102075911 but the redirect was deleted. @Epìdosis: Why that?
Anyways, maybe Krbot could autofix occasional ones going forward. @Ivan_A._Krestinin: what do you think? --- Jura 18:50, 28 December 2020 (UTC)
Regarding Q102075911: according to Wikidata:Requests for comment/Redirect vs. deletion, "Deleting is however appropriate if an item has not been existed longer than 24 hours and if it's clear that it's not in use elsewhere."
I surely support the autofix and I thank again @Ladsgroup: for the cleaning! --Epìdosis 20:04, 28 December 2020 (UTC)
Very strange that RFC. Makes me wonder how User:Stryn determined the "consensus". --- Jura 08:58, 29 December 2020 (UTC)

Ontario public school contact infoEdit

Request date: 27 December 2020, by: Jtm-lis

Link to discussions justifying the request
Task description

https://www.wikidata.org/wiki/Wikidata:Dataset_Imports/_Ontario_public_school_contact_information

Licence of data to import (if relevant)
Discussion

Untitled requests about WikinewsEdit

Request process

Request date: 29 December 2020, by: NMaia

Link to discussions justifying the request
Task description

For Wikinews article (Q17633526) entries, it would also be useful to add:

Licence of data to import (if relevant)
Discussion

NMaia (talk) 14:14, 29 December 2020 (UTC)

Request process

Malformed entries related to "Swiss National Sound Archives ID" (P6770)Edit

SELECT ?item ?itemLabel ?qid 
{
  ?item wdt:P6770 ?value ; wdt:P31 wd:Q5 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"  }
  BIND(xsd:integer ( strafter(str(?item), "Q")) as ?qid) 
  FILTER( ?qid > 	64000000 )
}

Try it!

The above finds approx. 1400 items created some time ago. Many invert family name and given names.

Samples: Q65032428, Q65035029. Some may have been partially fixed since 2019.

@AlessioMela: --- Jura 22:54, 29 December 2020 (UTC)

Unfortunately, there is no way for a bot to know wether the firstname/lastname is twisted, or not. I can write a script to turn them around all, but everything that got fixed manually since 2019, will get twisted again. Theoretically, the script can browse through the history, parsing it to see if the en-label got edited since the item was created, if someone can write a script like that, it might make sense to fix it by bot. Edoderoo (talk) 14:47, 30 December 2020 (UTC)
For the two samples above, it's fairly obvious that they are inverted.
It seems that P6770 generally provides a fairly straightforward format: "SURNAME, Given name". Maybe it could be checked against this or some other source.
@AlessioMela: can you check the data you have and fix it from that? --- Jura 11:15, 31 December 2020 (UTC)
Hi all, yes I can confirm the problem. As Edoderoo said we can't act automatically. Even in the raw data there wasn't anything better. I think the major part of the items with P6770 have the correct name-familyname order. Unfortunately there was a batch of inverted names, that I didn't recognize during the inital test and during the bot running. --AlessioMela (talk) 14:55, 31 December 2020 (UTC)
@AlessioMela:, if you have the data available, can you add named as (P1810) qualifiers to P6770? If not, can you try to identify the batch with inverted names? There are just too many that are the wrong way round. --- Jura 14:59, 31 December 2020 (UTC)

Accademia delle Scienze di Torino multiple referencesEdit

Request date: 30 December 2020, by: Epìdosis

Link to discussions justifying the request
Task description

Given the following query:

SELECT DISTINCT ?item
WHERE {
  ?item wdt:P8153 ?ast .
  ?item p:P570 ?statement.
  ?reference1 pr:P248 wd:Q2822396.
  ?reference2 pr:P248 wd:Q2822396.
  ?statement prov:wasDerivedFrom ?reference1.
  ?statement prov:wasDerivedFrom ?reference2.
  FILTER (?reference1 != ?reference2)
}

Try it!

In many items there are multiple references to date of death (P570) referring to Academy of Sciences of Turin (Q2822396)=Accademia delle Scienze di Torino ID (P8153). Cases:

  1. three references: maintain the first (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)+named as (P1810)), delete the second (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)), delete the third (stated in (P248)+retrieved (P813)) transferring the retrieved (P813) to the first
    1. three references bis: if the first is stated in (P248)+Accademia delle Scienze di Torino ID (P8153)+named as (P1810)+retrieved (P813), the second and the third get simply deleted
    2. three references ter: if there is a reference with reference URL (P854) containing a string "accademiadellescienze", it should be deleted; maintain the second (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)), delete the third (stated in (P248)+retrieved (P813)) transferring the retrieved (P813) to the first
  2. two references: maintain the second (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)), delete the third (stated in (P248)+retrieved (P813)) transferring the retrieved (P813) to the first

Repeat the above query substituting date of birth (P569) to date of death (P570). Cases:

  1. two references: maintain the first (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)+named as (P1810)), delete the second (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)+retrieved (P813)) transferring the retrieved (P813) to the first
    1. two references bis: if the first is stated in (P248)+Accademia delle Scienze di Torino ID (P8153)+named as (P1810)+retrieved (P813), the second gets simply deleted
    2. two references ter: if there is a reference with reference URL (P854) containing a string "accademiadellescienze", it should be deleted; maintain the second (stated in (P248)+Accademia delle Scienze di Torino ID (P8153)+retrieved (P813))
Discussion

@Ladsgroup: as his bot is probably ready for doing this. --Epìdosis 11:56, 30 December 2020 (UTC)

Request process

Archivio Storico Ricordi multiple referencesEdit

Request date: 1 January 2020, by: Epìdosis

Link to discussions justifying the request
  • ...
Task description

Given the following query:

SELECT DISTINCT ?item
WHERE {
  ?item wdt:P8290 ?asr .
  ?item p:P569 ?statement.
  ?reference1 pr:P248 wd:Q3621644.
  ?reference2 pr:P248 wd:Q3621644.
  ?statement prov:wasDerivedFrom ?reference1.
  ?statement prov:wasDerivedFrom ?reference2.
  FILTER (?reference1 != ?reference2)
}

Try it!

Typically date of birth (P569) has two references, one with stated in (P248)+Archivio Storico Ricordi person ID (P8290)+retrieved (P813) and the other with stated in (P248)+Archivio Storico Ricordi person ID (P8290); the second should always be deleted.

The same should be repeated substituting in the above query: place of birth (P19), date of death (P570), place of death (P20).

Discussion

@Ladsgroup: as his bot is probably ready for doing this. --Epìdosis 22:36, 1 January 2021 (UTC)

@Epìdosis: Started Amir (talk) 17:02, 27 February 2021 (UTC)
Request process

Accepted by (Amir (talk) 17:01, 27 February 2021 (UTC)) and under process
Task completed (14:11, 6 March 2021 (UTC))

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Amir (talk) 14:11, 6 March 2021 (UTC)

request to import podcast identifiers (2021-01-03)Edit

Request date: 3 January 2021, by: Sdkb

Link to discussions justifying the request
Task description

Several properties have recently been created (see e.g. Castbox show ID (P9005) for podcast identifiers), which are being used for the new w:Template: Podcast platform links on Wikipedia. I was told to come here to get help importing the identifiers for a bunch of podcast items.

Licence of data to import (if relevant)
Discussion


Request process

request to remove "(täsmennyssivu)" and "(sukunimi)" from labels (2021-01-07)Edit

Request date: 7 January 2021, by: 87.95.206.253

Link to discussions justifying the request
Task description

Please remove (täsmennyssivu) and (sukunimi) from Finnish-language primary labels. Either remove them completely or move to AKA sections.
Reason: "(täsmennyssivu)" is fi-wiki's equivalent of en-wiki's "(disambiguation)", and "(sukunimi)" of "(surname)". Those shouldn't be added to primary labels. I've notified two bot operators concerning additions of "(täsmennyssivu)" to labels: special:diff/1336656747 and Topic:W131u7lu86s1qpxl. Thanks!

Licence of data to import (if relevant)
Discussion


Request process

Fix social media account inconsistencies (2021-01-09)Edit

Request date: 9 January 2021, by: AntisocialRyan

Task description
Discussion

As the person who created Q87406427 for exactly this purpose, I believe it should be end cause (P1534) since an account being suspended does not make the statement invalid or deprecated. It just has an end time for when it stopped being true (if my Twitter account is suspended tomorrow, it was still my Twitter account from the point of creation until then). --SilentSpike (talk) 22:28, 4 February 2021 (UTC)

Request process

Admin bot for deletion of 100k non-notable itemsEdit

Request date: 19 January 2021, by: Epìdosis

Link to discussions justifying the request
Task description

Bot-deletion of the following items:

Discussion

I ping @Ladsgroup: and @MisterSynergy: as I know they have admin-bots. Thanks in advance, --Epìdosis 16:50, 19 January 2021 (UTC)

  • Not sure whether we should delete those at all. You can find plenty of similar datasets with *very* limited obvious use for our project. I'd say the items do meet the notability policy, so a much clearer consensus should be reached IMO, and it should be clear how this consensus relates to other similar situations. We could probably delete millions of items for the same reason, but do we want to? —MisterSynergy (talk) 17:02, 19 January 2021 (UTC)
    @MisterSynergy: I know that the situation is unclear and that it regards tens of thousands of items, maybe more. For this reason I have opened a general discussion (the third link above) and I've waited for a week with no feedback, whilst in the Project chat (the first link above) there seemed to be wide consensus for deletion. If you want to notify the discussion in other pages, or open a RfC or whatever I obviously support this, as I perfectly agree about the necessity to reach a clear conclusion on this point. Thanks as always, --Epìdosis 17:18, 19 January 2021 (UTC)
    Well, I don't find it that clear. There are quite some complaints about the bot operator not having their import approved with a separate bot task, but User:GZWDer is right that this is not explicitly required anywhere. In general, we have never managed to cleanly define batch editing and its distinction from bot editing in our policies, and we never managed to update the bot policy in a way that is suits Wikidata. It still bases on experiences made with bots in Wikipedias before Wikidata was launched, although this project relies on automated (bot) editing *much* more than Wikipedias do.
    To me, the discussions linked above seem to be fueled by the aversion to User:GZWDer that many users seem to feel. To be quite honest, I also do not like their behavior in most cases, as they aggressively edit in the gray area of our policies, and they are not very open for input from other users. This is genuinely a problem in a collaborative project, but as long as they do not clearly violate policies, which is not the case here in my opinion, I do not see a reason to suppress their contributions; please also mind that in my opinion Wikidata:Deletion policy does not allow to make use of the deletion tool here (I do consider these items notable according to WD:N).
    So, I think we should instead try to improve the policies so that this should not happen again, rather than to set a precedent for a sympathy-based use of the deletion tool. —MisterSynergy (talk) 20:01, 19 January 2021 (UTC)
    @MisterSynergy: OK, I perfectly agree about the need of updating our bot editing policy in order to avoid discussions post factum about great amounts of edits. However, my point isn't about the operate of GZWDer, but instead about the fact that, according to my interpretation, these items should be deleted because they don't respect WD:N. The discussion I opened was an attempt to find consensus about their respect, or not, of WD:N and no user contested my interpretation about the fact that didn't respect it, so I have also edited Help:Sources accordingly. I think that two separate discussions are then useful: one about bot policies; the other, already open but desert as of now, about the possibility for encyclopedia articles not having Wikisource sitelinks to fit WD:N (about which I am personally skeptical). --Epìdosis 21:45, 19 January 2021 (UTC)
    With the DOI claims, there is no doubt that they do meet the notability requirements. There also seem to be valid references on all (or at least most) of the claims. Notability is not an issue here; a deletion based on a not-notable claim would be completely at odds with our standard practice. —MisterSynergy (talk) 21:58, 19 January 2021 (UTC)
    @MisterSynergy: I partially disagree about their notability for one reason: as in these cases the DOI (P356) in fact coincides with the respective identifiers present in the items of the subjects of these articles, I agree with what @Bovlb: said in the Project chat: "Unless we're going to get a lot more information on these items, it seems to me that this sort of import would be better embodied in an identifier property." In general, my position is that the notability of items containing DOI (P356) can be taken for sure unless the DOI (P356) overlaps with an existing Wikidata property. I would prefer having a brief discussion somewhere about the fact that whichever item having DOI (P356) is notable, in order to finally add a statement DOI (P356) instance of (P31) Wikidata property for an identifier that suggests notability (Q62589316) and reach a general conclusion about this point. --Epìdosis 22:14, 19 January 2021 (UTC) P.S. As I'm not an expert of copyright, just a little confirmation: importing this sort of bibliographic metadata is CC0-compliant, isn't it?
    Well, the Benezit ID (P2843) identifier and the DOI are not identical. The identifier property was poorly managed in the past on Wikidata, but apparently mistakes/poor decisions have been made on the side of the external database as well which sort of contributed to the mess on the property page. The DOIs identify the encyclopedia articles, and the Benezit ID (P2843) identifiers identify the persons described in the articles. Of course, the URLs should *not* be identical, but Benezit ID (P2843) unfortunately uses DOI urls since March 2020 (Special:Diff/1130357608). The formatter URL should instead point to the URL which the DOI resolves to—unfortunately the identifiers would have to be changed as well then (i.e. rather make a new property and reset Benezit ID (P2843) to the pre-March 2020 state).
    The amount of content which is available in the items is not a relevant factor. There is no rule that there should be a "lot more information" available about something in order to be admissable here. —MisterSynergy (talk) 23:21, 19 January 2021 (UTC)

OK, thanks @MisterSynergy: for all the answers. As of now, it is quite clear that this problem certainly needs further discussions in other pages and I'm now convinced that probably the deletion is not to be performed anyway, as these items are notable because of DOI (P356) with few doubts. We can close, at least for now, this bot-request. Thanks again and good night, --Epìdosis 23:29, 19 January 2021 (UTC)

Request process

request to set preferred rank to population - Quebec (2021-01-20)Edit

Request date: 21 January 2021, by: YanikB

Link to discussions justifying the request
Task description

Change population (P1082) to "Preferred rank" for items of this request. thx

SELECT ?item 
WHERE {
       {?item wdt:P31/wdt:P279* wd:Q3327873} UNION {?item wdt:P31 wd:Q81066200}
       ?item p:P1082 [ ps:P1082 ?population; pq:P459 wd:Q29051383; pq:P585  ?date  ] .
       FILTER (?date  >= "2020-07-01T00:00:00Z"^^xsd:dateTime )
}
LIMIT 1500

Try it!

Licence of data to import (if relevant)
Discussion

  being done with wikibase-cli (Q87194660).

Sample command: wd uc 'Q81223838$F17472A9-6A82-4A7C-986F-986BE806AA82' --rank preferred --summary 'set preferred per [[WD:RBOT]]'

@YanikB: --- Jura 13:50, 15 February 2021 (UTC)

@YanikB: it worked except when the population is zero. So there are 59 left, e.g. Q2879370#Q2879370$BE47EFD4-DDEE-4D27-A212-50A91BEDE8AD. --- Jura 18:59, 15 February 2021 (UTC)

@Jura1: Good job ! I'll take care of the remaning 59 items. --Yanik B 19:06, 15 February 2021 (UTC)

Request process

request to fix labels of humans - disambiguator (2021-01-24)Edit

English labels for humans shouldn't end with a ")".

The following finds some 175 of them, all with "politician" in the label.

SELECT *
{
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "search" .
    bd:serviceParam mwapi:gsrsearch 'inlabel:politician@en haswbstatement:P31=Q5' .
    bd:serviceParam mwapi:gsrlimit "max" .    
    bd:serviceParam mwapi:gsrnamespace "0" .    
    ?item wikibase:apiOutputItem mwapi:title  .    
  }
  ?item rdfs:label ?l.
  FILTER(REGEX(?l, "\\)$") && lang(?l)="en").  
}

Try it!

The usual fix would be to remove the disambiguator or make the label into an alias. The same can probably be done for other occupations/languages. --- Jura 16:10, 24 January 2021 (UTC)

request to correct P248 values in references (2021-02-04)Edit

Request date: 4 February 2021, by: Trade

Link to discussions justifying the request
Task description
Replace stated in (P248) > Unterhaltungssoftware Selbstkontrolle (Q157754) with stated in (P248) > USK Classification Database (Q105272106)
Replace stated in (P248) > Entertainment Software Rating Board (Q191458) with stated in (P248) > ESRB Rating Database (Q105295303)
Replace stated in (P248) > Pan European Game Information (Q192916) with stated in (P248) > PEGI Rating Database (Q105296817)
Replace stated in (P248) > Australian Classification Board (Q874754) with stated in (P248) > Australian Classification database (Q105296839)
Replace stated in (P248) > Australian Classification (Q26708073) with stated in (P248) > Australian Classification database (Q105296839)
Replace stated in (P248) > British Board of Film Classification (Q861670) with stated in (P248) > BBFC database (Q105296939)

--Trade (talk) 12:51, 4 February 2021 (UTC)

Discussion

@Trade: As far as I can understand, the involved property is not stated as (P1932) but stated in (P248), in references:

Is it correct? --Epìdosis 13:09, 4 February 2021 (UTC)

Yes @Epìdosis:--Trade (talk) 13:10, 4 February 2021 (UTC)
OK, title edited. --Epìdosis 13:12, 4 February 2021 (UTC)
Request process

reference URL (P854)Holocaust.cz person ID (P9109) (2021-02-05)Edit

Request date: 5 February 2021, by: Daniel Baránek

Task description

After intoducing Holocaust.cz person ID (P9109), reference URL (P854) in references can be replaced by this new identificator. The result of edits should be like this. It is 285,282 references. You can see all references, their reference URL (P854) value and value for Holocaust.cz person ID (P9109) here:

SELECT ?ref ?url ?id WHERE {
  ?ref prov:wasDerivedFrom [ pr:P248 wd:Q104074149 ; pr:P854 ?url ].
  BIND (REPLACE(STR(?url),"^.*/([0-9]+)[-/].*$","$1") as ?id)
  }

Try it!

Discussion


Request process

request to .. (2021-02-05)Edit

Request date: 5 February 2021, by: Anupamdutta73

Link to discussions justifying the request
Task description

I want the service of a bot for adding age to persons and institutions.

I also need help in how to use bots.


Licence of data to import (if relevant)
Discussion


Request process

request to mass change stated in (P248) in references after new restrictions on property (2021-02-05)Edit

Request date: 5 February 2021, by: Sapfan

Link to discussions justifying the request
  • Hi! In the past months, I have created over 1000 references, which included state archives as stated in (P248). Example - Karl Mikolaschek (Q97993619). However, about three weeks ago, as described in this ex-post discussion (Property_talk:P248#Please_explain_the_processing), the property stopped accepting archives and other institutions and there are at least hundreds of items showing an error.
  • After discussion on my talk page (User_talk:Sapfan#use_of_uvedeno_v_(P248)), one of our colleagues created Wikidata items for the archive collections (rather than archives themselves) and I would now like to ask for a mass transfer of references from the old value to the new one. If anyone has time and desire, then they can also make further changes described below.
Task description

1. Primary request - to correct reported errors

Can you please replace the old value of stated in (P248) in all references according to this mapping table:

If stated in (P248) is and title (P1476) begins with then change stated in (P248) to
Prague City Archives (Q19672898) Archiv hl. m. Prahy, Matrika Collection of Registry Books at Prague City Archives (Q105319160)
Prague City Archives (Q19672898) Archiv hl. m. Prahy, Soupis pražských List of residents in Prague 1830-1910 (1920) (Q105322358)
Moravian regional archive (Q12038677) Moravský zemský archiv, Matrika Collection of Registry Books at Moravian Regional Archive (Q102116996)
Státní oblastní archiv v Litoměřicích (Q18920590) SOA Litoměřice, Matrika Collection of Registry Books at Litoměřice State Archive (Q105319095)
Q21953079 SOA Plzeň, Matrika Collection of Registry Books at Pilsen State Archive (Q105319092)
Státní oblastní archiv v Praze (Q12056840) SOA Praha, Matrika Collection of Registry Books at Prague State Archive (Q105319086)
Regional State Archives in Třeboň (Q12056841) SOA Třeboň, Matrika Collection of Registry Books at Třeboň State Archive (Q105319089)
Státní oblastní archiv v Zámrsku (Q17156873) SOA Zámrsk, Matrika Collection of Registry Books at Zámrsk State Archive (Q105319097)
Zemský archiv v Opavě (Q10860553) Zemský archiv v Opavě, Matrika Collection of Registry Books at Opava Regional Archive (Q105319099)
Museum of Czech Literature (Q5979897) Kartotéka Jaroslava Kunce Kunc Jaroslav (Q82329263)

2. If someone has time, then he/she can enhance the data using the following mappings (note: this is nice-to-have, there is no error if it stays the way it is):

If stated in (P248) is (formerly) then the standard format of title (P1476) is and we can derive: volume (P478) inventory number (P217) page(s) (P304)
Collection of Registry Books at Prague City Archives (Q105319160) Prague City Archives (Q19672898) Archiv hl. m. Prahy, Matrika zemřelých u sv. Ludmily na Vinohradech, sign. VIN Z6, s. 201 between "sign." and "," (here: VIN Z6) (N/A) after ", s."
Collection of Registry Books at Moravian Regional Archive (Q102116996) Moravian regional archive (Q12038677) Moravský zemský archiv, Matrika zemřelých Brno - sv. Tomáš 17056, s. 59 between the word "narozených", "oddaných" or "zemřelých" and "," (here: Brno - sv. Tomáš 17056) (N/A) after ", s."
Collection of Registry Books at Litoměřice State Archive (Q105319095) Státní oblastní archiv v Litoměřicích (Q18920590) SOA Litoměřice, Matrika zemřelých Z • inv. č. 4505 • sig. 96/20 • 1784 - 1831 • Dubany, Evaň, Libochovice, Poplze, Radovesice, Slatina, s. 92 between "sig." and "•" between "inv. č." and "•" after ", s."
Collection of Registry Books at Pilsen State Archive (Q105319092) Q21953079 SOA Plzeň, Matrika narozených Rokycany 16, s. 54 between the word "narozených", "oddaných" or "zemřelých" and "," (here: Rokycany 16) (N/A) after ", s."
Collection of Registry Books at Prague State Archive (Q105319086) Státní oblastní archiv v Praze (Q12056840) SOA Praha, Matrika narozených Lošany 20, s. 82 between the word "narozených", "oddaných" or "zemřelých" and "," (here: Lošany 20) (N/A) after ", s."
Collection of Registry Books at Třeboň State Archive (Q105319089) Regional State Archives in Třeboň (Q12056841) SOA Třeboň, Matrika narozených Písek 18, s. 529 between the word "narozených", "oddaných" or "zemřelých" and "," (here: Písek 18) (N/A) after ", s."
Collection of Registry Books at Zámrsk State Archive (Q105319097) Státní oblastní archiv v Zámrsku (Q17156873) SOA Zámrsk, Matrika zemřelých v Hradci Králové, sign. 51-7657, ukn 3069, s. 364 between "sign." and "," (here: 51-7657) from "ukn" to the next "," (here: ukn 3069) after ", s."
Collection of Registry Books at Opava Regional Archive (Q105319099) Zemský archiv v Opavě (Q10860553) Zemský archiv v Opavě, Matrika narozených N • inv. č. 3142 • sig. Je III 4 • 1780 - 1792 • Bobrovník, Bukovice,…, s. 266 between "sig." and "•" (here: sig. Je III 4) between "inv. č." and "•" after ", s."

Finally, many links in references pointing to Collection of Registry Books at Moravian Regional Archive (Q102116996) (formerly Moravian regional archive (Q12038677)) have rotten after a website reorganization due to Adobe Flash Player retirement. Some of the more frequently used ones could perhaps also be mass-replaced with new ones:

Existing reference URL (P854) portion (example before) should be replaced with (example after) Belongs to volume (P478)
http://actapublica.eu/matriky/brno/prohlizec/10374/?strana= http://actapublica.eu/matriky/brno/prohlizec/10374/?strana=9 https://www.mza.cz/actapublica/matrika/detail/10266?image=216000010-000253-003381-000000-017056-000000-VR-B08429-nnnn0.jp2 (where nnnn = number after "strana=" padded with leading 0) https://www.mza.cz/actapublica/matrika/detail/10266?image=216000010-000253-003381-000000-017056-000000-VR-B08429-00090.jp2 Brno - sv. Tomáš 17056
http://actapublica.eu/matriky/brno/prohlizec/11303/?strana= http://actapublica.eu/matriky/brno/prohlizec/11303/?strana=37 https://www.mza.cz/actapublica/matrika/detail/11133?image=216000010-000253-003381-000000-017057-000000-VR-B08430-nnnn0.jp2 (where nnnn = number after "strana=" padded with leading 0) https://www.mza.cz/actapublica/matrika/detail/11133?image=216000010-000253-003381-000000-017057-000000-VR-B08430-00370.jp2 Brno - sv. Tomáš 17057

Thanks in advance! --Sapfan (talk) 21:30, 5 February 2021 (UTC)

Licence of data to import (if relevant)

(No licence - public domain data already present on WD)

Discussion
  • Wouldn't "publisher" be the more appropriate property for the organization?
"stated in" indicates the publication, not the archives or museum this comes from. This can be a (printed or online) catalogue or database. That the items described there are also part of a specific collection is another question.
Maybe Help:Sources#Databases explains it better. --- Jura 21:45, 5 February 2021 (UTC)
Actually, this is the reason for the request. Right now, "stated in" points to an organization. We now want to point to the collection (or database, as you write). Do you see any issue with it? What would be a more appropriate value for this property, if it is an archive collection? --Sapfan (talk) 22:04, 5 February 2021 (UTC)
"collection" is generally what is in an archive not its catalog. A catalog describes the collection and can include data about a person.
Maybe Q101498693 (linked from Q105319160 mentioned above) given its P31 value.
Another way of looking at it, would to identify the publication the date of death "15 June 1920" is mentioned in (this is from the sample mentioned above Q97993619#P570), just as one would in reference at Wikipedia.
It's not really something that can be guessed, one needs to determine the publication. --- Jura 06:29, 6 February 2021 (UTC)
Hi Jura, thanks for trying to find the best reference. But I have doubts about the direction you are heading to.
  • Whenever someone is born, gets married or dies, then a record is made into an official book. This book is unique - no publication. But yes, it can be identified by issuing authority, sequential nr etc.
  • After 75, 100 or similar number of years, the public registry hands over the books to a designated public archive. There they are recorded and presented to the public - offline or online.
  • To identify it as a source, we need to make an archive citation. Some recommend to start with item name and end with archive name ([6]). Czech universities suggest to go "from broad to narrow", i.e. start with archive name and end with specific item (e.g., [7]). I follow the second approach in title (P1476).
  • Now, what to enter as stated in (P248). The equivalent of a "publication" would be a book title, such as "Death register JIL Z12 of St. Gilles Parish in Prague" (to take our example Karl Mikolaschek (Q97993619)). However, there is a distinction between a published book (such as Encyclopædia Britannica) and an archived registry: copies of E. B. can be found in many libraries, therefore we do not need to write where to find it. But a parish register is unique. You first need to know in which single archive it is stored and then how to find it there.
  • Therefore, the theoretically best approach would be, to create a wikidata item for each parish book. But this would lead to an explosion of items, most of which used only once or twice (or not at all, if we would replicate the whole archive catalog). That is why I used to give the highest level - the archive name - as stated in (P248). But if we need a "thing" rather than a "(legal) person" in this role, then the most practical option is a "collection of birth, marriage and death records at a given archive". This is what I am proposing.
  • Central Registry of Archive Heritage (Q101498693) is unfortunately not the right object. (You could not know it - the description was only in Czech.). It is a list of archive collections, which include the birth/death registers we want to cite. But you cannot find the birth/death date of a single person there. You need to go to a specific parish book and page, which is a part of certain archive collections such as Collection of Registry Books at Prague City Archives (Q105319160).
  • That is why I still believe, that we should enter the specific collection as stated in (P248) - because this is the first step in finding the info, similar to a book or magazine with many volumes. Once you are in that archive, you need to know the specific resource ID such as accession number (Q1417099) (in Wikidata entered as volume (P478)), inventory number (P217) and page(s) (P304) if available, in structured and/or unstructured (title (P1476)) format.

Still not convinced? Then please tell what practical option to use. I do not think that the list of key archive collections in Czechia is the appropriate one - it would give even less information to the reader than the archive name we have now. Thanks! --Sapfan (talk) 08:14, 6 February 2021 (UTC)

  • Seems I had the wrong type of reference in mind. Sorry about that. Actually we lack an outline on how to cite church registries and similar contained in archives at Help:Sources.
We didn't get much past Wikidata:Property proposal/civil registration district.
If you want to write a short summary based on your explanation above, that could be most helpful. Depending on what you prefer, it can be fairly specific and let to others to be expanded/generalized. --- Jura 09:03, 6 February 2021 (UTC)
Thanks, Jura! I have just placed a citation proposal on Help_talk:Sources#Vital_records_and_other_archive_collections_as_sources. Let's see what the community says. Feel free to comment. I will update the mapping tables based on the outcome. --Sapfan (talk) 13:28, 6 February 2021 (UTC)
Request process

request to add sitelinks for new articles on cewiki (2021-02-06)Edit

Hey! Please tell me how to transfer articles to Wikidat automatically? I mean such, The article has interwiki be:Карбатоўка. Request date: 6 February 2021, by: Takhirgeran Umar

Link to discussions justifying the request
  • @Tagishsimon: are you doing this? You edited the sample. @Takhirgeran Umar: Normally Lysbot (what was the exact name?) would do things like that. Alternatively, you could upload the sitelinks to items once you created them. --- Jura 14:02, 15 February 2021 (UTC)
I tied them by hand. Unfortunately, there is no auxiliary tool. Found it User:Tpt/viaf.js, but I didn't understand how it works.--Takhirgeran Umar (talk) 00:10, 25 February 2021 (UTC)
Task description
Licence of data to import (if relevant)
Discussion


Request process

request to fix Property:P395 for Spain (2021-02-08)Edit

Request date: 8 February 2021, by: Jura1

Link to discussions justifying the request
Task description
Discussion

I see it's only for spanish communities. I would put an end date to the property, and maybe downgrade their rank. What is the end date actually? Edoderoo (talk) 08:24, 9 February 2021 (UTC)

  • The problem is that the codes were applied not only to (autonomous) communities, but thousands of items.
w:Vehicle_registration_plates_of_Spain#Current_system mentions 18 September 2000, but I'd just use the year 2000. --- Jura 08:40, 9 February 2021 (UTC)
Request process

request to import DOI and ISBN as items when present in any Wikipedia article (2021-02-11)Edit

Request date: 11 February 2021, by: So9q

Link to discussions justifying the request
Task description

The bot follows the eventstream from WMF and looks at all changed pages whether they contain a DOI or ISBN number. Then it checks whether we already have an item for that number and import if not.

Addition: I forgot to mention that I would like to add all missing authors also (if they have ORCID) and articles that are cited from the DOI in Wikipedia also (1 hop away). That amounts to millions of items and probably most of the 86 mio articles in existence, but the import rate is gonna be lower because we use Wikipedia changes as a prism.

Source code (currently missing the upload part)

License of data to import (if relevant)

Not copyrighted (facts) from Wikipedia and Crossref (DOI) and Worldcat (ISBN).

Discussion
  • I like this. I have a long standing wish to import articles cited in important scholarly databases (and I consider Wikipedia one of them). --Egon Willighagen (talk) 15:36, 14 February 2021 (UTC)
  • This is fine due to the constrained scope of use on other Wikimedia projects (which scope, as I asserted in other fora, was scrapped by other people for some reason). The moment 'when present in any Wikipedia article' is removed as a limiting factor in your job is the moment I cease to support it, primarily due to the continued strain this will cause on the query servers. (I wonder if we should be pinging @GLederrey (WMF), CParle (WMF), ZPapierski (WMF): more for any further proposed jobs which are contentious for query service performance reasons, to get their opinions on the matter as those tasked with maintaining the query service.) Mahir256 (talk) 19:27, 14 February 2021 (UTC)
  • @mahir256: Thanks for the support. BTW I forgot to mention that I would like to add all missing authors and articles that are cited from the DOI in Wikipedia also. That amounts to millions of items I guess... Do you support that too? Not doing that means that the user cannot follow the science when they read the DOI item in WD which is one of the main advantages of adding it to WD in the first place isn't it? (that science articles are put into our rich context of links enabling very advanced queries compared to what you can do now in the proprietary databases).
Regarding the WDQS infrastructure and queries I'm not aware of any negative effects if we go from 30 mio articles to 86 mio articles. The p31-> scientific article already times out, and that is not affected. Someone in project chat suggested to query by year and then it does not time out. That might be affected if we have 2 mio articles for each year or something. Anyway timeouts are not a problem, it's a feature if I understood correctly. It tells the user "Hey you are trying to do something that WMF does not want to provide the infrastructure to do. Go ahead and set up WDQS yourself and download the data and run whatever queries you want without time limits at your own expense".
The problem from a usability POW is that the error messages of WDQS are pretty bad. Stack traces should NEVER be exposed by default to a user IMO. They are for developers. That's an important UI bug to fix IMO. See T275736--So9q (talk) 06:40, 25 February 2021 (UTC)
  • Didn't this already happen (some separate WMF installation with all references), but then wasn't imported into Wikidata due to some size issue? --- Jura 22:10, 14 February 2021 (UTC)
@jura1:I never heard about that. I asked in the wikidata group and no one seems to have done what I propose. We don't have a size issue what I'm aware in the infrastructure. With maxlag in effect it probably won't affect WDQS either. I promise to slow the bot down so that we don't get bottleneck problems because of too much write requests/new items created per minute.--So9q (talk) 22:56, 24 February 2021 (UTC)
Request process

request to add identifiers from FB (2021-02-11)Edit

Thanks to a recent import, we currently have more than >1.2 items where the only identifier is Freebase ID (P646). However, checking https://freebase.toolforge.org/ some of them have identifiers available there.

Samples:

See Wikidata:Project_chat#Freebase_(bis) for discussion.

Task description

Import ids where available. Map keys to properties if not available at Wikidata:WikiProject_Freebase/Mapping.

Discussion


Request process

request to update all ckbwiki article labels (2021-02-12)Edit

Request date: 12 February 2021, by: Aram

Link to discussions justifying the request
  • There is no any discussion because I don't think the update of the labels require discussion.
Task description

Often, when moving articles, wikidata lables will not be updated to the current article names. So, we need to update all ckbwiki article labels on wikidata. For example, I moved this by using my bot, but it's label on wikidata hasn't been updated yet. ckbwiki has 28,768 articles so far. Thanks!

Licence of data to import (if relevant)
Discussion
  • @Aram: You mean
    • the sitelinks (to ckbwiki),
    • or the labels (in ckb),
    • or both?
When page moves on ckbwiki aren't mirrored here generally that means that the user moving them hasn't created an account on Wikidatawiki. You would need to log-in to Wikidata with your bot account at least once. --- Jura 14:09, 15 February 2021 (UTC)

@Aram: --- Jura 14:10, 15 February 2021 (UTC)

@Jura1: Really? I didn't know that before. Thank you for the hint! Although, it seems that my bot has been logged in in this edit, but the label has not yet been updated. However, regarding your question, we want to only update the ckbwiki labels. Thank you! Aram (talk) 15:06, 15 February 2021 (UTC)
  • It seems the account exists on wikidatawiki so the sitelinks to cbkwiki are updated (since Feb 9), so edits like this one you mentioned above are no longer needed.
    However, this wont have any effect on the label of the item in cbk at Wikidata. These need to be updated separately if deemed correct (by bot, QuickStatements or manually). --- Jura 15:36, 15 February 2021 (UTC)
Thanks! Aram (talk) 20:14, 18 February 2021 (UTC)
Request process

request to change Belarusian language description from "спіс атыкулаў у адным з праектаў Вікімедыя" to "спіс артыкулаў у адным з праектаў Вікімедыя" in all the articles. A letter "р" was missed (2021-02-23)Edit

Request date: 23 February 2021, by: Belarus2578

Link to discussions justifying the request

There is not discussion. There is only obvious mysprint. --Belarus2578 (talk) 05:01, 25 February 2021 (UTC)

Task description

Please, change Belarusian language description from "спіс атыкулаў у адным з праектаў Вікімедыя" to "спіс артыкулаў у адным з праектаў Вікімедыя" in all the articles. A letter "р" was missed. --Belarus2578 (talk) 06:47, 23 February 2021 (UTC)

Licence of data to import (if relevant)
Discussion


Request process

request to fix descriptions "other organization" (2021-02-23)Edit

Request date: 23 February 2021, by: Jura1

Task description
  • There are some 8000 items which describe organizations as "other organization"
  • Remove "other " from these descriptions
Discussion
  • This may have some logic in its source, but not in Wikidata or in whatever other context Wikidata descriptions are used. @ArthurPSmith: who may have created some or all of them [8]. --- Jura 19:43, 23 February 2021 (UTC)


Request process

FilesEdit

Request date: 3 March 2021, by: 41.115.9.76

Link to discussions justifying the request
Task description
Licence of data to import (if relevant)
Discussion


Request process