Wikidata:Requests for permissions/Bot/CJMbot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 09:32, 14 April 2024 (UTC)[reply]
CJMbot edit
CJMbot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Meemoo_BE (talk • contribs • logs)
CJMbot lets users upload a CSV file in a certain format. The data inside this file is then validated and processed. New items wil be created based on the data in the CSV file and existing items wil updated by adding statements and references.
Coming soon.
User uploads a CSV containing data of producers (mostly artist). This data is matched to Qids using the Openrife reconciliation api. The user get an e-mail with the updated CSV file containing the Qids, for manual correction. After the manual check the user can upload the updated CSV file. This will add new statements to the matched Qids (if they didn't exist) and add a reference to it. Or create a new item containing the data of the CSV. --CJMbot (talk) 13:31, 19 November 2020 (UTC)[reply]
- What is "Openrife"? Where does the website for this bot live? Who has access to it? BrokenSegue (talk) 18:22, 27 January 2023 (UTC)[reply]
- @BrokenSegue I work for @Meemoo BE, the organisation that created CJMbot. Openrife was a typo and should be Openrefine. Collection staff from museum and archives use our public domain tool (in Dutch, but more info in English can be found here) of which the bot on Wikidata is an integral part. Beireke1 (talk) 08:36, 30 January 2023 (UTC)[reply]
- @Beireke1: this bot is creating a bunch of constraint violations. see for example the references on Veerle Persyn (Q116317378). can we not provide a URL or identifier for these references? I'm also unclear what it means for an organization to have works in a museum as in Leonard (Q116451171). also the items being created don't have descriptions and it seems you could generate valid descriptions. there's a reason we generally require bots to be approved before operation. BrokenSegue (talk) 14:50, 30 January 2023 (UTC)[reply]
- @BrokenSegue: The developers we asked to write this bot activated it more than two years ago and it has been working since without any problems. I didn't know that a formal approvement was needed or that it was not obtained. The data always come from official archives or museums, but are not always publicly available online because of limitations of their systems and/or gradually evolving internal open data policies. If a url is available, this is provided in the reference. If not, the source institution is named to provide provenance of the data. The Leonard (Q116451171) example is a case were the creator / rights holder of a collection item is not a person but an organisation, for example a fashion house in this case. You certainly have a point about the descriptions. I will contact the developers and see what we can do to improve that in future data uploads. Beireke1 (talk) 15:09, 30 January 2023 (UTC)[reply]
- @BrokenSegue The bot seems to be blocked at this point. Is the context above sufficient to unblock it, because it is quite an important part in the workflow of an ongoing project with museum and archives in Flanders (Belgium)? Beireke1 (talk) 15:17, 30 January 2023 (UTC)[reply]
- yes it was blocked by me for not being approved. Maybe we can unblock if we get a promise that someone will go and fix the errors later? Our bot policy is pretty clear Wikidata:Bots and this bot account doesn't even have a bot flag. this page was created more than two years ago but the request was never properly made. BrokenSegue (talk) 15:25, 30 January 2023 (UTC)[reply]
- I always follow up on the bot's activity, hence my deletion request for empty items that were created by the bot. As said above, I will also make sure that the bot can be changed so that descriptions are also provided. I would appreciate it if you could unblock the bot in the meantime. Isn't this page the request page? What else should have been done more than two years ago (19 November 2020, see first message on this page) to request permission? Beireke1 (talk) 08:50, 31 January 2023 (UTC)[reply]
- @Beireke1: the request page was made but it was not embedded onto the Wikidata:Requests for permissions/Bot page where people actually look. The instructions for requesting bot permissions says "To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: Wikidata:Requests for permissions/Bot/RobotName." The last step was not done. As I mentioned you are not following up on errors introduced by the bot because it is introducing constraint violations and you have not fixed them. Will you fix those errors? BrokenSegue (talk) 15:28, 2 February 2023 (UTC)[reply]
- @BrokenSegue: Are there other constraint violations that should be fixed apart from the fact that a museum is sometimes used as a value with stated in (P248)? Strangely enough however, an archive (Q166118) is allowed as a value and a museum is not. I don't understand the logic there. Why not just add instances of museum as allowed values? The alternative would be that we need to create a separate Wikidata item for each museum's database (that is not always available online in its entirety) just to bypass the constraint violation. Beireke1 (talk) 16:14, 2 February 2023 (UTC)[reply]
- it is unhelpful to say that a statement is asserted in a museum (Q33506). Is it stated in the museum exhibits? On their website? In their archive? How am I to lookup this reference? It's not helpful. but this isn't the only thing I would want to see improved. Can we not do better than Quinet (Q111968317)? Like even add a description? Or an external link? Or an identifier? This item is nearly useless. How is anyone to make use of something so vague. There's also a constraint violation on Leo Dierickx (Q111968260). BrokenSegue (talk) 16:22, 2 February 2023 (UTC)[reply]
- What is the constraint violation on Leo Dierickx (Q111968260)? I can't see any warning on that item. As mentioned before, having descriptions added to new items is something that we can do for future uploads. It would mean that the bot won't create any new Wikidata items anymore if a description can't be provided (to avoid items like Quinet (Q111968317)). I don't really see the difference between a museum and an archival institution as a reference though. Both are cultural institutions with collections that have metadata on their collections as a central aspect of their work. Sometimes an item doesn't have a lot of statements when it is created, but just the fact that a person has work in a certain collection is sometimes enough for others to add additional data at a later stage (see for example the history of Gustave Corthouts (Q106495397). I agree with you that we need to define the bare minimum that is needed to create an item. I think that having an occupation and description derived from that occupation is a good one. Beireke1 (talk) 08:53, 3 February 2023 (UTC)[reply]
- The constraint violation was on the qualifier for the copyright status as a creator. Someone "fixed" it in this edit. I'm not clear if that is a correct change. Also the quotations in the references are literally just a year which isn't helpful. And is there really nothing more specific to put in the reference other than the name of the museum? Also why is this person's last name all capitals Veerle Persyn (Q116317378)? BrokenSegue (talk) 20:05, 7 February 2023 (UTC)[reply]
- As copyright rules are not the same in all jurisdictions, I think that the property applies to jurisdiction (P1001) as a qualifier makes very much sense. If the quotations aren't helpful, we could skip adding them of course. The last name in capitals is a mistake in the source data. I will revise those manually to solve that problem. I see two more items with the same problem, but not more. Beireke1 (talk) 16:40, 8 February 2023 (UTC)[reply]
- The constraint violation was on the qualifier for the copyright status as a creator. Someone "fixed" it in this edit. I'm not clear if that is a correct change. Also the quotations in the references are literally just a year which isn't helpful. And is there really nothing more specific to put in the reference other than the name of the museum? Also why is this person's last name all capitals Veerle Persyn (Q116317378)? BrokenSegue (talk) 20:05, 7 February 2023 (UTC)[reply]
- What is the constraint violation on Leo Dierickx (Q111968260)? I can't see any warning on that item. As mentioned before, having descriptions added to new items is something that we can do for future uploads. It would mean that the bot won't create any new Wikidata items anymore if a description can't be provided (to avoid items like Quinet (Q111968317)). I don't really see the difference between a museum and an archival institution as a reference though. Both are cultural institutions with collections that have metadata on their collections as a central aspect of their work. Sometimes an item doesn't have a lot of statements when it is created, but just the fact that a person has work in a certain collection is sometimes enough for others to add additional data at a later stage (see for example the history of Gustave Corthouts (Q106495397). I agree with you that we need to define the bare minimum that is needed to create an item. I think that having an occupation and description derived from that occupation is a good one. Beireke1 (talk) 08:53, 3 February 2023 (UTC)[reply]
- it is unhelpful to say that a statement is asserted in a museum (Q33506). Is it stated in the museum exhibits? On their website? In their archive? How am I to lookup this reference? It's not helpful. but this isn't the only thing I would want to see improved. Can we not do better than Quinet (Q111968317)? Like even add a description? Or an external link? Or an identifier? This item is nearly useless. How is anyone to make use of something so vague. There's also a constraint violation on Leo Dierickx (Q111968260). BrokenSegue (talk) 16:22, 2 February 2023 (UTC)[reply]
- @BrokenSegue: Are there other constraint violations that should be fixed apart from the fact that a museum is sometimes used as a value with stated in (P248)? Strangely enough however, an archive (Q166118) is allowed as a value and a museum is not. I don't understand the logic there. Why not just add instances of museum as allowed values? The alternative would be that we need to create a separate Wikidata item for each museum's database (that is not always available online in its entirety) just to bypass the constraint violation. Beireke1 (talk) 16:14, 2 February 2023 (UTC)[reply]
- @Beireke1: the request page was made but it was not embedded onto the Wikidata:Requests for permissions/Bot page where people actually look. The instructions for requesting bot permissions says "To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: Wikidata:Requests for permissions/Bot/RobotName." The last step was not done. As I mentioned you are not following up on errors introduced by the bot because it is introducing constraint violations and you have not fixed them. Will you fix those errors? BrokenSegue (talk) 15:28, 2 February 2023 (UTC)[reply]
- I always follow up on the bot's activity, hence my deletion request for empty items that were created by the bot. As said above, I will also make sure that the bot can be changed so that descriptions are also provided. I would appreciate it if you could unblock the bot in the meantime. Isn't this page the request page? What else should have been done more than two years ago (19 November 2020, see first message on this page) to request permission? Beireke1 (talk) 08:50, 31 January 2023 (UTC)[reply]
- yes it was blocked by me for not being approved. Maybe we can unblock if we get a promise that someone will go and fix the errors later? Our bot policy is pretty clear Wikidata:Bots and this bot account doesn't even have a bot flag. this page was created more than two years ago but the request was never properly made. BrokenSegue (talk) 15:25, 30 January 2023 (UTC)[reply]
- @BrokenSegue The bot seems to be blocked at this point. Is the context above sufficient to unblock it, because it is quite an important part in the workflow of an ongoing project with museum and archives in Flanders (Belgium)? Beireke1 (talk) 15:17, 30 January 2023 (UTC)[reply]
- @BrokenSegue: The developers we asked to write this bot activated it more than two years ago and it has been working since without any problems. I didn't know that a formal approvement was needed or that it was not obtained. The data always come from official archives or museums, but are not always publicly available online because of limitations of their systems and/or gradually evolving internal open data policies. If a url is available, this is provided in the reference. If not, the source institution is named to provide provenance of the data. The Leonard (Q116451171) example is a case were the creator / rights holder of a collection item is not a person but an organisation, for example a fashion house in this case. You certainly have a point about the descriptions. I will contact the developers and see what we can do to improve that in future data uploads. Beireke1 (talk) 15:09, 30 January 2023 (UTC)[reply]
- @Beireke1: this bot is creating a bunch of constraint violations. see for example the references on Veerle Persyn (Q116317378). can we not provide a URL or identifier for these references? I'm also unclear what it means for an organization to have works in a museum as in Leonard (Q116451171). also the items being created don't have descriptions and it seems you could generate valid descriptions. there's a reason we generally require bots to be approved before operation. BrokenSegue (talk) 14:50, 30 January 2023 (UTC)[reply]
- @BrokenSegue I work for @Meemoo BE, the organisation that created CJMbot. Openrife was a typo and should be Openrefine. Collection staff from museum and archives use our public domain tool (in Dutch, but more info in English can be found here) of which the bot on Wikidata is an integral part. Beireke1 (talk) 08:36, 30 January 2023 (UTC)[reply]
- I can't really say much about the items being created, since it's not an area I edit, but looking at items like Special:History/Q116451094, I think it should try to add the data in fewer edits.
- It should be possible to add all the data in the edit which creates the item, but, at the very least, it could add labels in the edit which creates the item (i.e. combine edits like Special:Diff/1820331590 and Special:Diff/1820331595), and add references in the same edit as the statement (i.e. combine edits like Special:Diff/1820331627 and Special:Diff/1820331647).
- The way it adds labels definitely needs to be fixed. It is adding them using edits like Special:Diff/1820331595 which claims to be clearing the item but is actually adding labels.
- - Nikki (talk) 13:43, 7 February 2023 (UTC)[reply]
- Thanks for your valuable and very specific feedback @Nikki. I'll check with the developer of the bot what they can do. Beireke1 (talk) 15:43, 7 February 2023 (UTC)[reply]
- Following from BrokenSegue's comments: most of the items about artists have no ID's whatsoever, and most of the time no data. This makes extremely borderline the acceptance of such items on Wikidata - they're in a museum catalogue, that's alright, but the museum should have a ID page for them and the item should link to it. Also, in the examples I saw, basically no description was added - another thing that must be fixed in next iterations.
- Given all of that, I would say that before granting bot status to CJMbot, the items should be thoroughly revisited - since you have primary access to data, you should do it, not a volunteer - and then, once the fixes have been made, the status can be granted. --Sannita - not just another it.wiki sysop 12:20, 8 February 2023 (UTC)[reply]
- I agree that what you say is the ideal situation @Sannita, but that's not the reality in most GLAMs. I would like to make a plea for not sitting down and wait for every museum and archive to have persistent IDs and URIs for every object and/or creator before bringing their basic data about creators to Wikidata. Sometimes even items with few data are very useful for other to enrich at a later stage. See for example the history of Gustave Corthouts (Q106495397) as mentioned before. The kind of description that I can generate will have to be derived from the occupation of a creator. As a revision, I already added English descriptions to all newly created items by the bot (see this batch). Based on the community feedback above, I have the following adaptations to the bot on my list:
- Don't make exact duplicates of already existing references
- Only create new items if minimally these data can be added: label, description, instance of and occupation (for humans).
- Add descriptions (in Dutch and English) to newly created items. These can probably be a combination of occupation and (if known) birth and death date.
- Check if data can be added in less edits
- Agree @BrokenSegue@Nikki@Sannita? Beireke1 (talk) 16:35, 8 February 2023 (UTC)[reply]
- Do you at least have identifiers for the references you are adding? Is there a document ID or something that you can refer us to? Or a link? Or is "somewhere in this museum" the best you can do? Personally I don't care about doing it in fewer edits. Mainly I'm concerned that the data you are adding is so undifferentiated that there's no way for anyone to use it. "Some artist named X whose name appears somewhere in this museum". How is anyone to use this? BrokenSegue (talk) 16:41, 8 February 2023 (UTC)[reply]
- @BrokenSegue@Marsupium@Nikki@Sannita I think I worked out a solution to solve this, based on what I see as a good practice on other items. When a reference URL (P854) is available, we'll keep using that as a reference. If not, we would create a distinct item for the database maintained by the archive/museum where the dataset originates from (see for example Royal Library of Belgium (KBR) Online Catalogue (Q104828762)) and use it as a value on catalog (P972). We would combine this with the museum/archive identifier as a value on catalog code (P528) to make the statements verifiable. I believe that this answers important questions raised above regarding constraint violations and data verifiability. I would like to propose this in combination with these changes to the bot that I mentioned before:
- Don't make exact duplicates of already existing references
- Only create new items if minimally these data can be added: label (Len, Lfr, Lnl), description (Den, Dfr, Dnl), instance of (P31), occupation (P106) (for humans) and has works in the collection (P6379).
- Descriptions can be a combination of occupation and (if known) birth and death date.
- Check if data can be added in less edits
- Beireke1 (talk) 13:39, 17 February 2023 (UTC)[reply]
- ok I'll unblock the bot account so you can do some test edits to show us what this new version will look like. please do not start running the bot full time until this approval process is over though BrokenSegue (talk) 07:06, 18 February 2023 (UTC)[reply]
- @BrokenSegue@Marsupium@Nikki@Sannita I think I worked out a solution to solve this, based on what I see as a good practice on other items. When a reference URL (P854) is available, we'll keep using that as a reference. If not, we would create a distinct item for the database maintained by the archive/museum where the dataset originates from (see for example Royal Library of Belgium (KBR) Online Catalogue (Q104828762)) and use it as a value on catalog (P972). We would combine this with the museum/archive identifier as a value on catalog code (P528) to make the statements verifiable. I believe that this answers important questions raised above regarding constraint violations and data verifiability. I would like to propose this in combination with these changes to the bot that I mentioned before:
- To jump in as well: I'd disagree that items for humans with only "label, description, instance of and occupation" should be created. I work a lot with items for artists and that information is not enough to identify a person. There are often enough people with the same name and same occupation and we end up with items that don't make clear which concept they are about which is a long-term burden for maintenance also of other items. Also, items of not clearly identifiable entities don't meet Wikidata:Notability criterion 2. Since the potential items in question probably wouldn't meed criteria 1 or 3 either, they shouldn't get created. However, if an item has date of birth (and death for dead people) it's fine I think and can be created.
- Only clicking on some of the last 20 items the bot created I've found Q116234766, Q116234763, nothing apart from label and P31=Q5 and C. J. Bisschop (Q116215346) with clearly wrong date of birth and death. I'd prefer a bot adding data with that quality rather not operating than operating. If quality improves and there is willingness not fix mess created in the future one could reconsider. To show such willingness it would be good to start to fix the messy data that is currently live. I'd be glad about such improvements! Best, --Marsupium (talk) 23:48, 14 February 2023 (UTC)[reply]
- @MarsupiumThe solutions proposed above should avoid data added like the items that you point to. I made a request for deletion for Q116234766 and Q116234763 and manually improved the birth and death date statements on C. J. Bisschop (Q116215346) based on the source data. Beireke1 (talk) 13:44, 17 February 2023 (UTC)[reply]
- Thanks that's good. But that wasn't really my point. I found the items in a random sample of 20 items. Out of the 554 item the bot created 239 still aren't deleted. I guess data quality of the others might be more or less the same. I hope the "solutions above" won't repeat this. But since there is probably still a lot of messy data in the database that hasn't been cleaned up systematically, there is no reason to assume that in the future other mess that might appear will be taken care of. And that's a bad precondition for a bot in my eyes. So I'd propose to go through those 239 items and clean them up systematically and then continue running the bot. I've there are no resources for that, then similar problems might come up in the future and a bot that no one can clean up after shouldn't work in the first place. Sorry for this other long text. Best, --Marsupium (talk) 16:46, 17 February 2023 (UTC)[reply]
- No problem, I'll take the time to revise those in the next weeks and at the same we'll work on the bot improvements. Beireke1 (talk) 20:58, 21 February 2023 (UTC)[reply]
- Thanks that's good. But that wasn't really my point. I found the items in a random sample of 20 items. Out of the 554 item the bot created 239 still aren't deleted. I guess data quality of the others might be more or less the same. I hope the "solutions above" won't repeat this. But since there is probably still a lot of messy data in the database that hasn't been cleaned up systematically, there is no reason to assume that in the future other mess that might appear will be taken care of. And that's a bad precondition for a bot in my eyes. So I'd propose to go through those 239 items and clean them up systematically and then continue running the bot. I've there are no resources for that, then similar problems might come up in the future and a bot that no one can clean up after shouldn't work in the first place. Sorry for this other long text. Best, --Marsupium (talk) 16:46, 17 February 2023 (UTC)[reply]
- @MarsupiumThe solutions proposed above should avoid data added like the items that you point to. I made a request for deletion for Q116234766 and Q116234763 and manually improved the birth and death date statements on C. J. Bisschop (Q116215346) based on the source data. Beireke1 (talk) 13:44, 17 February 2023 (UTC)[reply]
- Do you at least have identifiers for the references you are adding? Is there a document ID or something that you can refer us to? Or a link? Or is "somewhere in this museum" the best you can do? Personally I don't care about doing it in fewer edits. Mainly I'm concerned that the data you are adding is so undifferentiated that there's no way for anyone to use it. "Some artist named X whose name appears somewhere in this museum". How is anyone to use this? BrokenSegue (talk) 16:41, 8 February 2023 (UTC)[reply]
- I agree that what you say is the ideal situation @Sannita, but that's not the reality in most GLAMs. I would like to make a plea for not sitting down and wait for every museum and archive to have persistent IDs and URIs for every object and/or creator before bringing their basic data about creators to Wikidata. Sometimes even items with few data are very useful for other to enrich at a later stage. See for example the history of Gustave Corthouts (Q106495397) as mentioned before. The kind of description that I can generate will have to be derived from the occupation of a creator. As a revision, I already added English descriptions to all newly created items by the bot (see this batch). Based on the community feedback above, I have the following adaptations to the bot on my list:
- @Nikki Where can I find documentation on how to add different statements + references in one edit? It doesn't seem to be possible with the Wikidata API at first sight, according to our developer. Beireke1 (talk) 11:28, 6 June 2023 (UTC)[reply]
- Thanks for your valuable and very specific feedback @Nikki. I'll check with the developer of the bot what they can do. Beireke1 (talk) 15:43, 7 February 2023 (UTC)[reply]
- Do I understand right that the approval is for users working at Meemoo, so that Meemoo is responsible for all the edits? ChristianKl ❪✉❫ 18:29, 18 February 2023 (UTC)[reply]
- The bot writes data on behalf of archives and museums who's staff use the public domain tool developed by meemoo Beireke1 (talk) 20:52, 21 February 2023 (UTC)[reply]
@BrokenSegue@Nikki@Sannita@Marsupium@ChristianKl. I worked on revising all items previously created by the bot and we are now in the final stage of fixing the bot according to everything said above. We've run tests on test.wikidata.org such as this and this item. Which step should be taken to get final approval to get the bot working again on Wikidata? Beireke1 (talk) 11:41, 12 July 2023 (UTC)[reply]
- Currently the bot leaves no edit description. QuickStatements seems similar to what the tool from meemoo is supposed to do. It has the batch concept that allows to undo one batch with a single click and which also provides information about the user that's responsible for a given batch. From my perspective it would make sense if this tool would work the same way.
- Without any way to know who's responsible for an edit it will be hard to talk to people who in the future use the tool to create items we think are not notable or otherwise problematic. ChristianKl ❪✉❫ 11:57, 12 July 2023 (UTC)[reply]
- I agree that this would theoretically the most ideal situation, but that would require every user to have a Wikimedia account, which raises the barrier to use the tool as most users are new to the Wikimedia environment. It would also mean quite a lot of extra development for which there simply is no budget. Instead, @Meemoo BE clearly takes responsibility. To have more direct contact points, I just added my personal user account and that of my colleague to @Meemoo BEs talk page. I hope that this solves your concern about accountability and the possibility to talk to people if something should be discussed. Apart from that, could you also help me with the step(s) to be taken to get the bot approved? Beireke1 (talk) 09:12, 13 July 2023 (UTC)[reply]
- (Referring to https://test.wikidata.org/wiki/Q231706.) "Quotation" in the reference of floruit (P1317) looks a bit weird to me. Did you want object stated in reference as (P5997) perhaps?
- Is it feasible to add qualifiers such as catalog (P972)Collectie Stad Oostende and catalog code (P528)SM002305 to the references? --Azertus (talk) 09:38, 23 August 2023 (UTC)[reply]
- Hi @Azertus Good suggestions, that I already partly put into practice when manually correcting incomplete edits from the past (see for example L.V. Cauwenbergh (Q104056146)). I will check the development costs to make these changes to the bot. After that, which step should be taken to get final approval to get the bot working again on Wikidata? Beireke1 (talk) 08:18, 29 August 2023 (UTC)[reply]
- If you think all concerns have been addressed, I imagine you could ping an administrator... Maybe you should reconsider the fact that your users will not have Wikimedia accounts, though. It doesn't seem like too high a barrier. You're asking partner institutions' staff to use a tool of yours. The code for authenticating with a WM account is there (Meemoo's); it doesn't seem like it would take much to add the requirement of creating their own accounts. The lack of transparency could be the determining factor in whether or not your bot would be allowed; see e.g. this discussion I just encountered and which made me think of this proposal.
- But I do think the bot could be approved without that feature; perhaps on a trial basis? @BrokenSegue: Could they do a trial run at this point? Azertus (talk) 13:09, 8 September 2023 (UTC)[reply]
- @Beireke1: Also, maybe the Edit groups tool could be a good solution? As I understand it, you could group batches of edits and I think you could add a string identifying the operator (person or institution). That doesn't seem like it would entail much development, if any... Azertus (talk) 13:20, 8 September 2023 (UTC)[reply]
- Thanks for your constructive thoughts, @Azertus. The operating institution would currently already be visible, because it is mentioned in the reference to each statement made or updated. Regarding your first suggestion: the bot is operated by @Meemoo BE. That user was created together with the bot. Making every user authenticate with their own Wikimedia account to write data with the bot, would require development costs that we currently simply don't have. If it is generally seen as an improvement, we could try to budget this in the future though. Meemoo is a stable organisation with several employees linked on it's user page that can be contacted in case of problems. I believe that this, in combination with the institutions, catalogs and catalog codes mentioned in the references offer sufficient transparancy, more than is often even the case with batch edits via Quickstatements. Beireke1 (talk) 12:12, 11 September 2023 (UTC)[reply]
- Hi @Azertus Good suggestions, that I already partly put into practice when manually correcting incomplete edits from the past (see for example L.V. Cauwenbergh (Q104056146)). I will check the development costs to make these changes to the bot. After that, which step should be taken to get final approval to get the bot working again on Wikidata? Beireke1 (talk) 08:18, 29 August 2023 (UTC)[reply]
- I agree that this would theoretically the most ideal situation, but that would require every user to have a Wikimedia account, which raises the barrier to use the tool as most users are new to the Wikimedia environment. It would also mean quite a lot of extra development for which there simply is no budget. Instead, @Meemoo BE clearly takes responsibility. To have more direct contact points, I just added my personal user account and that of my colleague to @Meemoo BEs talk page. I hope that this solves your concern about accountability and the possibility to talk to people if something should be discussed. Apart from that, could you also help me with the step(s) to be taken to get the bot approved? Beireke1 (talk) 09:12, 13 July 2023 (UTC)[reply]
Based on the above feedback from the community, the bot has been redeveloped. I did a test with one new bot created item (Alain Braekevelt (Q123945125)) that shows that minimal requirements to create items are met. As most active participants in the above discussion, could any of you @BrokenSegue@Nikki@Sannita@Azertus give the bot the correct permission to be fully operational again? Beireke1 (talk) 10:18, 21 December 2023 (UTC)[reply]
- @Beireke1: I had a look at Alain Braekevelt (Q123945125). The reference doesn't appear to be correct:
- collection (P195) (set to Museum van Deinze en de Leiestreek (Q2215500)) uncommon, but fine
- catalog code (P528) (set to "0644/BRA.a-1") looks like an inventory number (P217) to me (example other item)
- catalog (P972) set to Axiell Collections (Q110271756) is incorrect. Now you're saying we have the huge Axiell Collections catalog, but it's actually the software they're running. Seems irrelevant in this context so should just be removed. See Q19912042#P528 as example of correct catalog usage.
- Can you update the two example items and import another one? Do you have a link to the source code? Multichill (talk) 13:52, 24 December 2023 (UTC)[reply]
- Hi @Multichill, thanks for your feedback. You are completely right about inventory number (P217). I will ask our developer to change this. Regarding values for catalog (P972), I understand your concern. Removing it could be an option. The reason it was included in the reference was to maximise verifiability of the statements. I based it on an existing practice of others who work a lot with similar data on Wikdata. I will forward you my email correspondence about this. Beireke1 (talk) 09:15, 8 January 2024 (UTC)[reply]
- @Multichill I updated Alain Braekevelt (Q123945125) manually making use of inventory number (P217) and added Albijn Van den Abeele (Q124256155) as a new bot-created item, so it is transparant that this requested change to the bot was made. Beireke1 (talk) 11:13, 12 January 2024 (UTC)[reply]
- @Multichill The code of the bot can be found in this Gitlab repository.
- @BrokenSegue @Nikki @Sannita @Azertus can we move forward from here with the bot permission? Beireke1 (talk) 08:30, 16 January 2024 (UTC)[reply]
- @BrokenSegue @Nikki @Sannita @Azertus @Multichill Gentle reminder... or could you at least point me to the people I should attend to? Many thanks! Beireke1 (talk) 16:49, 8 February 2024 (UTC)[reply]
- In the two mentioned examples, the dates should not contain spaces (I corrected them: 1; 2). Could this be fixed in the bot code? Thanks, --Epìdosis 17:16, 12 February 2024 (UTC)[reply]
- Another question (since Q124256155 was in fact a duplicate of the existing Albijn Van den Abeele (Q1862458)): which is the mechanism that is used to decide if a new item should be created or an existing item should be updated? If it is OpenRefine, it's strange that this duplicate was created, since the name and the dates are exactly the same. Could you maybe create a few other new items (e.g. 20) so that we can check if the matching system works correctly and doesn't generate too many duplicates? --Epìdosis 17:21, 12 February 2024 (UTC)[reply]
- Thanks for your reaction @Epìdosis. I ask our developer to change the way the descriptions are built up (leaving out the spaces). The tool queries the Wikidata API, and what is returned is always verified by a human. This duplicate could be either due to a human mistake or to a temporary flaw in the API resulting in not returning the expected result. Good idea to make a test batch with about 20 new items, but I'll do that after the above change to the bot code is made. Beireke1 (talk) 09:03, 13 February 2024 (UTC)[reply]
- Another question (since Q124256155 was in fact a duplicate of the existing Albijn Van den Abeele (Q1862458)): which is the mechanism that is used to decide if a new item should be created or an existing item should be updated? If it is OpenRefine, it's strange that this duplicate was created, since the name and the dates are exactly the same. Could you maybe create a few other new items (e.g. 20) so that we can check if the matching system works correctly and doesn't generate too many duplicates? --Epìdosis 17:21, 12 February 2024 (UTC)[reply]
- In the two mentioned examples, the dates should not contain spaces (I corrected them: 1; 2). Could this be fixed in the bot code? Thanks, --Epìdosis 17:16, 12 February 2024 (UTC)[reply]
- @BrokenSegue @Nikki @Sannita @Azertus @Multichill Gentle reminder... or could you at least point me to the people I should attend to? Many thanks! Beireke1 (talk) 16:49, 8 February 2024 (UTC)[reply]
- Is there a reason for why multiply users needs to operate behind a shared account instead of individual ones(which likely wouldn't need bot permissions if the tool used a tag or OAuth)? Abbe98 (talk) 12:15, 13 February 2024 (UTC)[reply]
- The reason was to keep the barrier for professionals working on collections without being Wikimedians as low as possible. The data quality barriers to allow data to be pushed to Wikidata are fairly high, and even raised following the community discussions above. I understand however that using a tag or OAuth would be a more ideal solution seen from the Wikidata perspective. My suggestion is that I investigate on this option in more detail (including the needed budget to make such a change), but that the existing data that I have put aside for more than half a year now can be uploaded to Wikidata. Agree? Beireke1 (talk) 18:14, 26 February 2024 (UTC)[reply]
- For me it's a good solution. Epìdosis 00:16, 27 February 2024 (UTC)[reply]
- Thanks for your reaction. I'll post it here when I have about 20 new items published. In the meantime, I'll check on the tag and/or OAuth options. Is this the best place to start in terms of documentation? Beireke1 (talk) 08:44, 4 March 2024 (UTC)[reply]
- @Beireke1: I'm not the best person so answer about tag and/or OAuth, since I have no direct experience; it's surely fine that you post here 20 new example items. Thanks! Epìdosis 08:56, 4 March 2024 (UTC)[reply]
- I'm also not the best person to ask, but this seems like a good place to start. And here's how Author Disambiguator (php, GPLv3) does authentication. --Azertus (talk) 13:15, 6 March 2024 (UTC)[reply]
- @Epìdosis After the requested change about the space in the descriptions was made, I uploaded a small batch of data in the tool, resulting in 2 new Wikidata items (A. Bosmans (Q125087458) and Anton Wittoeck (Q125086700) and 10 edited ones (e.g. Albert Baertsoen (Q2830938)). Can I obtain the needed approval for the bot to continue processing the data from the project that I described above? In asked our developer to investigate the tag and OAuth options more in detail, so if the cost estimation is within our possibilities we can make that switch later this year. Beireke1 (talk) 16:22, 22 March 2024 (UTC)[reply]
- Support in my opinion the example edits are OK and the approval can be granted. Thanks for the updates! Epìdosis 16:37, 22 March 2024 (UTC)[reply]
- @Epìdosis After the requested change about the space in the descriptions was made, I uploaded a small batch of data in the tool, resulting in 2 new Wikidata items (A. Bosmans (Q125087458) and Anton Wittoeck (Q125086700) and 10 edited ones (e.g. Albert Baertsoen (Q2830938)). Can I obtain the needed approval for the bot to continue processing the data from the project that I described above? In asked our developer to investigate the tag and OAuth options more in detail, so if the cost estimation is within our possibilities we can make that switch later this year. Beireke1 (talk) 16:22, 22 March 2024 (UTC)[reply]
- Thanks for your reaction. I'll post it here when I have about 20 new items published. In the meantime, I'll check on the tag and/or OAuth options. Is this the best place to start in terms of documentation? Beireke1 (talk) 08:44, 4 March 2024 (UTC)[reply]
- For me it's a good solution. Epìdosis 00:16, 27 February 2024 (UTC)[reply]
- The reason was to keep the barrier for professionals working on collections without being Wikimedians as low as possible. The data quality barriers to allow data to be pushed to Wikidata are fairly high, and even raised following the community discussions above. I understand however that using a tag or OAuth would be a more ideal solution seen from the Wikidata perspective. My suggestion is that I investigate on this option in more detail (including the needed budget to make such a change), but that the existing data that I have put aside for more than half a year now can be uploaded to Wikidata. Agree? Beireke1 (talk) 18:14, 26 February 2024 (UTC)[reply]
Thanks for your help and feedback, @Epìdosis. I hoped to get more community reaction here. Is there an admin who can grant approval? @BrokenSegue @Nikki @Sannita @Azertus @Multichill: as you were involved in the discussions above, I hope that one of you can do this or at least ping someone who can. Thanks! Beireke1 (talk) 08:17, 9 April 2024 (UTC)[reply]
- I will approve the bot in a couple of days provided no objections have been raised. Ymblanter (talk) 13:01, 10 April 2024 (UTC)[reply]
- 1! --Azertus (talk) 16:55, 12 April 2024 (UTC)[reply]