Wikidata:Requests for permissions/Bot/DifoolBot 2
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 20:07, 6 December 2023 (UTC)[reply]
DifoolBot 2 edit
DifoolBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Difool (talk • contribs • logs)
Task/s: import VIAF ID based on Union List of Artist Names ID
Code: at Github
Function details: Wikidata currently contains 8594 pages with a non-deprecated Union List of Artist Names ID (P245) but without a VIAF ID (P214). I want to add a VIAF ID to those pages by quering the VIAF Authority Cluster endpoint using the name in Wikidata and checking if a cluster exists with the same Union List of Artist Names ID as in Wikidata. Examples:
- Q273100 Eumachia, VIAF query
- Q6042627 Nicolás de Bussy, VIAF query
Here are 4 example edits [1]
--Difool (talk) 14:14, 8 November 2023 (UTC)[reply]
- Support the above link for the example edits is broken, I think this is what @Difool: intended. I think the task is needed, I requested something similar a few months ago here. However, I would require two little things to be changed in the import: 1) add to the references based on heuristic (P887)inferred from VIAF ID containing an ID already present in the item (Q115111315) (in my opinion, reference URL (P854) is not necessary in the references, but adding it doesn't worry me particularly); 2) create a separate report (see e.g. User:Vojtěch Dostál/viaf already somewhere) for the cases in which the value of VIAF ID (P214) which the bot would theoretically add to the item is already present in another item, so that these cases can be solved manually (I would like to contribute to solving them). Thanks, --Epìdosis 15:31, 13 November 2023 (UTC)[reply]
- Hi Epidosis, thanks for the link correction. It's fine by me if the bot of @Vojtěch Dostál can do this task. I'll update the code using your comments anyway. Difool (talk) 02:55, 14 November 2023 (UTC)[reply]
- @Difool: I would download the links dump from https://viaf.org/viaf/data/ and process that. You can some example code at https://github.com/multichill/toollabs/blob/master/bot/wikidata/nta_from_viaf.py#L191 . Multichill (talk) 21:17, 13 November 2023 (UTC)[reply]
- This is not yet in my code at Github, but given a Union List of Artist Names ID, for example 500124259, you can call https://viaf.org/viaf/sourceID/JPG%7C500124259 to retrieve the VIAF ID (JPG is the Union List of Artist Names identifier). So you don't have to search VIAF by name, and a dump search is not necessary. Difool (talk) 03:08, 14 November 2023 (UTC)[reply]
- I've updated the code, new example edits are here. The duplicate report of the first 100 items is here. Difool (talk) 06:31, 17 November 2023 (UTC)[reply]
- @Difool: the new example edits and the duplicate report are both perfect; I fully support proceeding with the entire batch (and also repeating this process periodically, at least once a year, or more frequently if you prefer). --Epìdosis 10:47, 17 November 2023 (UTC)[reply]
- @Difool: to simplify the revision of duplicates, I spent this afternoon in emptying all the cases of same ULAN ID in two items, which were about 100. If possible, when you start the bot, you can use data which are updated to tomorrow (or afterwards), so that you don't add VIAF IDs on the basis of wrongly matched ULAN IDs that I removed today. Thanks again for your imminent bot activity, --Epìdosis 18:11, 18 November 2023 (UTC)[reply]
- @Epìdosis:: Wow, excellent job! Interesting use of different scripts looking at your contribution history. I made a list that shows the count of pages with a PID from a specific VIAF authority source, but without a VIAF ID. It might be interesting to look at those too, and maybe for this task too: it could (theoretically) be possible that a wikidata page results in more than 1 VIAF ID, each based on a different VIAF authority source. Difool (talk) 02:01, 20 November 2023 (UTC)[reply]
- @Difool: sure, and thanks for you statistics page, it gives a very useful overview of the work to be done. In fact my original bot request (Wikidata:Bot requests#Request to periodically add VIAF IDs to humans (2022-11-06)) regarded all the VIAF members, although then we decided to try only with a few ones. I remain convinced that the first limitation which I proposed one year ago is surely necessary, i.e. consering only items about humans (because VIAF non-personal clusters are often of very low quality); for the other one, i.e. excluding items already having one or more VIAF IDs, it could maybe be ignored, but I would prefer that we edit firstly the item with no VIAF ID and then we eventually come to items already with a VIAF ID. We can start with ULAN when the bot is authorized and then proceed, step by step, with other VIAF members; I'm sure this will help us enrich Wikidata and discover a lot of previous mistaken matches (and VIAF issues). --Epìdosis 09:57, 20 November 2023 (UTC)[reply]
- @Epìdosis:: Wow, excellent job! Interesting use of different scripts looking at your contribution history. I made a list that shows the count of pages with a PID from a specific VIAF authority source, but without a VIAF ID. It might be interesting to look at those too, and maybe for this task too: it could (theoretically) be possible that a wikidata page results in more than 1 VIAF ID, each based on a different VIAF authority source. Difool (talk) 02:01, 20 November 2023 (UTC)[reply]
- @Difool: to simplify the revision of duplicates, I spent this afternoon in emptying all the cases of same ULAN ID in two items, which were about 100. If possible, when you start the bot, you can use data which are updated to tomorrow (or afterwards), so that you don't add VIAF IDs on the basis of wrongly matched ULAN IDs that I removed today. Thanks again for your imminent bot activity, --Epìdosis 18:11, 18 November 2023 (UTC)[reply]
- @Difool: the new example edits and the duplicate report are both perfect; I fully support proceeding with the entire batch (and also repeating this process periodically, at least once a year, or more frequently if you prefer). --Epìdosis 10:47, 17 November 2023 (UTC)[reply]
- Comment there is no objection, I think the bot could now be authorized (not only for ULAN, but also for other VIAF members, with the method agreed above with @Difool: (see last example edits) and already applied in the case of Wikidata:Bot requests#Request to periodically add VIAF IDs to humans (2022-11-06)). --Epìdosis 10:48, 3 December 2023 (UTC)[reply]
- I am going to approve the bot in a couple of days assuming no objections have been raised.--Ymblanter (talk) 19:45, 4 December 2023 (UTC)[reply]