Wikidata:Requests for permissions/Bot/Openaccess cma
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --05:12, 16 April 2021 (UTC)
Openaccess cma edit
Openaccess cma (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Ethanholda (talk • contribs • logs)
Task/s: Creates new items and updates existing items for artworks in the Cleveland Museum of Art (Q657415) collection
Code: https://github.com/ClevelandArtGIT/openaccess-wikidata
Function details: The Cleveland Museum of Art has been an active contributor to Wikidata (see Wikidata:WikiProject Cleveland Museum of Art, and this bot will be run by staff. This bot uses Pywikibot to update the items for artworks from the Cleveland Museum of Art (or create missing ones), and keep the Wikidata items in sync over time. It loads artwork metadata from the CMA API, and then checks for the accession number against a SPARQL query to determine if there is an existing Wikidata item. Once it has the QID (or not), it either (1) creates the item from scratch or (2) checks the statements in the existing item to see if any updates need to be made to values or missing claims. It is currently programmed statements with the following properties, mapped from CMA data:
- instance of (P31)
- collection (P195)
- title (P1476)
- inventory number (P217)
- collection (P195)
- copyright status (P6216)
- determination method (P459)
- described at URL (P973)
- retrieved (P813)
- copyright license (P275)
- inception (P571)
- image (P18)
- author name string (P2093)
- Commons compatible image available at URL (P4765)
- author name string (P2093)
In addition, any statements added from the CMA data will contain a reference to the catalog record (i.e. reference URL (P854) where the artwork is described.
Some examples of items fully generated with this bot workflow:
- Sheet of Sketches (recto) (Q80038405)
- Geschichtete Formen (Q80035023)
- Views of Villages in Brabant and Campine: A Moated Village (Q80040492)
- The Lovers (Q79922234)
- Landscape with a Milkmaid (Q79922198)
- Rue Damiette, Rouen (Q80044627)
Note, based on feedback, the code will be modified to no longer add instance of (P31)item of collection or exhibition (Q18593264) statements to items where there is a more specific P31 claim present. In addition, where an artwork is part of a larger work that is also described in Wikidata, the bot will add part of (P361) claims (as per this discussion). Please let me know if there is any feedback or changes you would like to see to the bot's current logic and/or data modeling. Also pinging Multichill here for comment based on past discussion. Ethanholda (talk) 15:24, 5 February 2021 (UTC)[reply]
- Comment Just commenting to note that I have helped CMA with the code, which is designed with a similar common approach as some other PWB-based bots. Dominic (talk) 15:56, 5 February 2021 (UTC)[reply]
- Sorry, that ping got lost somewhere between keyboard and chair. Would love to get this up and running again. Can you update the code to not overwrite existing descriptions? Will comment in detail later. Multichill (talk) 21:47, 22 February 2021 (UTC)[reply]
- @Multichill: Sure, but to never do so? I understand not overwriting edits from Wikimedian (assuming that is the concern?), but the descriptions are generated programmatically from other fields, like title/creator/date, so what would you want to do when the institution changes the artwork title/creator/date. Dominic (talk) 22:02, 23 February 2021 (UTC)[reply]
- I expect the bot to never overwrite a human edit. That means that you have to keep state if you want to update things like descriptions. For statements it should be easier because these probably reference the museum website. Is it possible that you restore the original descriptions on items like this one? Multichill (talk) 21:17, 7 March 2021 (UTC)[reply]
- @Multichill: Sure, but to never do so? I understand not overwriting edits from Wikimedian (assuming that is the concern?), but the descriptions are generated programmatically from other fields, like title/creator/date, so what would you want to do when the institution changes the artwork title/creator/date. Dominic (talk) 22:02, 23 February 2021 (UTC)[reply]
- Sorry, that ping got lost somewhere between keyboard and chair. Would love to get this up and running again. Can you update the code to not overwrite existing descriptions? Will comment in detail later. Multichill (talk) 21:47, 22 February 2021 (UTC)[reply]
- Support synching is an important job. I looked quickly at the code (I'm not an expert in Python) and at the examples, I see nothing wrong. Cheers, VIGNERON (talk) 08:05, 27 February 2021 (UTC)[reply]
- I went ahead and implemented the change to no longer update descriptions for now, as Multichill requested. Perhaps we can refine that later per my question, but he hasn't replied in a couple of weeks now, so I don't mind just doing that because I'd rather than open question not hold anything up. Is there any other feedback? Dominic (talk) 20:13, 7 March 2021 (UTC)[reply]
- I will approve the request in a couple of days, provided that no objections will be raised. Lymantria (talk) 21:08, 7 March 2021 (UTC)[reply]
- Can you only include the inventory number in the description when this is needed for disambiguation?
- Please also add location (P276) set to Cleveland Museum of Art (Q657415), that's currently missing.
- Please also add operator (P137) set to Cleveland Museum of Art (Q657415) as qualifier to Commons compatible image available at URL (P4765)
- If inception is set to something like "c. 1561" you can add it like this.
- Bonus points for adding location of creation (P1071) (example)
- Is the cronjob disabled at the moment? Than I can remove the block. Multichill (talk) 21:17, 7 March 2021 (UTC)[reply]
- @multichill: Cronjob is disabled so it should be safe to unblock. --Ethanholda (talk) 16:59, 8 March 2021 (UTC)[reply]
- Done. Multichill (talk) 17:31, 8 March 2021 (UTC)[reply]
- @multichill: Cronjob is disabled so it should be safe to unblock. --Ethanholda (talk) 16:59, 8 March 2021 (UTC)[reply]
- Can we see 50-200 test edits?--- Jura 08:51, 8 March 2021 (UTC)[reply]
- If and when bot is unblocked, we can definitely run those test edits. --Ethanholda (talk) 16:59, 8 March 2021 (UTC)[reply]
- If no comments on the test edits arise, I will approve the bot in a couple of days. Lymantria (talk) 07:47, 28 March 2021 (UTC)[reply]
- Can you link the edits you consider relevant test edits? I see some that still use instance of (P31)=item of collection or exhibition (Q18593264) which shouldn't be used. --- Jura 11:16, 28 March 2021 (UTC)[reply]
- Oh wait, test edits have been done. I see most of the suggestions have been implemented. @Ethanholda: any update on inception (P571) like this? I noticed that if you add this qualifier to described at URL (P973) that you have one less constraint violation. Might be worth adding. Multichill (talk) 17:15, 28 March 2021 (UTC)[reply]
- @Multichill: I looked at the language issue originally, but I don't think we can do it programmatically. Some number of artwork titles are definitely non-English, so we can't hardcode that for all of them. But the metadata also does not state explicitly the language in a way we could detect it by bot. I did implement the inception (P571) modification you suggested as well (see this example), but I did not account for BCE dates. Good catch, should be an easy fix. Dominic (talk) 16:32, 2 April 2021 (UTC)[reply]
- @Dominic: described at URL (P973) is about the linked page, so the qualifier is about the language of the linked page, not of the work. I doubt the Cleveland Museum of Art has any non-English pages we link to. Hardcoding is fine in this case. Thanks for the date fix. I think this bot is more than ready to be approved. Multichill (talk) 17:01, 2 April 2021 (UTC)[reply]
- @Multichill: Sorry, you're right, I must have been reading too fast and was thinking about something else entirely. That is no trouble at all adding that qualifier. Dominic (talk) 05:02, 7 April 2021 (UTC)[reply]
- @Multichill: I've made and tested all the edits to the code. Please let me know that the bot is approved and I will turn back on the cron. Thanks for all your help and feedback. --Ethanholda (talk) 23:07, 12 April 2021 (UTC)[reply]
- @Dominic: described at URL (P973) is about the linked page, so the qualifier is about the language of the linked page, not of the work. I doubt the Cleveland Museum of Art has any non-English pages we link to. Hardcoding is fine in this case. Thanks for the date fix. I think this bot is more than ready to be approved. Multichill (talk) 17:01, 2 April 2021 (UTC)[reply]
- @Multichill: I looked at the language issue originally, but I don't think we can do it programmatically. Some number of artwork titles are definitely non-English, so we can't hardcode that for all of them. But the metadata also does not state explicitly the language in a way we could detect it by bot. I did implement the inception (P571) modification you suggested as well (see this example), but I did not account for BCE dates. Good catch, should be an easy fix. Dominic (talk) 16:32, 2 April 2021 (UTC)[reply]
- If and when bot is unblocked, we can definitely run those test edits. --Ethanholda (talk) 16:59, 8 March 2021 (UTC)[reply]
- Who is operating this? Given that we already have ten thousands of edits by Dominic that need to be cleaned up, I'm not really confident that they proceed with others. --- Jura 20:01, 2 April 2021 (UTC)[reply]
- I am not planning to operate this and do not have the password to the account. Ethan is the operator, as is stated above. I am assisting with the code, and have been contributing in the GitHub repo linked above. In any case, it's unclear to me what you are looking for, since how else would changes have been made to "clean up" tens of thousands of items if not by bot, which is not yet approved? Dominic (talk) 05:19, 7 April 2021 (UTC)[reply]
- I don't think you should be operating another bot until you have cleaned up your previous edits. --- Jura 07:33, 15 April 2021 (UTC)[reply]
Pinging Lymantria, as I think this is all ready to go. Dominic (talk) 18:44, 14 April 2021 (UTC)[reply]
- @Jura1, Multichill, Ethanholda: Are we ready indeed? Lymantria (talk) 05:17, 15 April 2021 (UTC)[reply]
- @Lymantria: as I said before: More than ready to be approved. Multichill (talk) 18:57, 15 April 2021 (UTC)[reply]
- Can we see the request link to the test edits? --- Jura 07:33, 15 April 2021 (UTC)[reply]
- To me it seems there is a sufficient number of edits to be viewed. Hence I don't understand your question. You ask for relevant edits among them, where I think it is up to you what you consider relevant. Lymantria (talk) 15:35, 15 April 2021 (UTC)[reply]
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Jura doesn't agree. Multichill (talk) 10:53, 25 April 2021 (UTC)[reply]
New sections edit
- Oppose given that the "test" edits are mostly broken and/or corrections. So it's not really clear what are the edits that should actually happen. The bot policy requires 50-200 test edtis, but here most of this bots edits are actually problematic. --- Jura 07:31, 16 April 2021 (UTC)[reply]