Wikidata:Requests for permissions/Bot/Maria research bot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Not done @Mahdimoqri: This request seems to be abandoned, please reopen it if that is not the case. Thanks. Mike Peel (talk) 20:13, 21 July 2020 (UTC)[reply]
maria research bot edit
maria research bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Mahdimoqri (talk • contribs • logs)
Task/s: add missing articles and citations information for articles listed on PubMed Central
Code: https://github.com/moqri/wikidata_scientific_citations
Function details: --Mahdimoqri (talk) 06:15, 13 March 2018 (UTC)[reply]
- Support Mahir256 (talk) 22:37, 13 March 2018 (UTC)[reply]
- Comment This Fatameh-based script is useful for most of phase 1 and works fine for PubMed IDs and for some Crossref IDs as well but it does not address the citation part from phase 2 onwards. --Daniel Mietchen (talk) 13:27, 14 March 2018 (UTC)[reply]
- Thanks Daniel Mietchen, I modified the description of the task here to confirm what the bot does at the moment. Mahdimoqri (talk) 15:52, 14 March 2018 (UTC)[reply]
- Support That looks good to me. --Daniel Mietchen (talk) 19:54, 14 March 2018 (UTC)[reply]
- Thanks Daniel Mietchen, I modified the description of the task here to confirm what the bot does at the moment. Mahdimoqri (talk) 15:52, 14 March 2018 (UTC)[reply]
- Comment This Fatameh-based script is useful for most of phase 1 and works fine for PubMed IDs and for some Crossref IDs as well but it does not address the citation part from phase 2 onwards. --Daniel Mietchen (talk) 13:27, 14 March 2018 (UTC)[reply]
- Support The Fatameh edits from this bot seems fine so far. It is a nice simple script. I note some Fatameh artifacts for the titles, e.g., "*." in BOOKWORMS AND BOOK COLLECTING (Q50454030). But I suppose we have to live with that... — Finn Årup Nielsen (fnielsen) (talk) 18:44, 14 March 2018 (UTC)[reply]
- I was going to write the same thing. Can we remove the trailing full stop (".") ? I'm sure some bot could clean up the existing ones as well.
--- Jura 20:37, 14 March 2018 (UTC)[reply] - Thanks Finn Årup Nielsen (fnielsen) and Jura, I would be happy to add another script to remove asterisks or to fix any other issues you find, after the PMC items added.Mahdimoqri (talk) 23:10, 14 March 2018 (UTC)[reply]
- For the final dot, can you remove this before adding it to label/title statement?
--- Jura 23:17, 14 March 2018 (UTC)[reply]- Thanks Jura! Unfortunately, as much as I know, Fatameh does not have any out of the box option for such changes. I'd recommend a separate script to be written just for this purpose since there are currently 14 Million other articles have such a problem (https://www.wikidata.org/w/index.php?tagfilter=OAuth+CID%3A+843&limit=50&days=7&title=Special:RecentChanges&urlversion=2). Daniel Mietchen might be interested in such a script too. Mahdimoqri (talk) 02:51, 15 March 2018 (UTC)[reply]
- @T Arrow, Tobias1984: could you fix Fatameh?
--- Jura 07:21, 15 March 2018 (UTC)[reply]- There is a task for it here: https://phabricator.wikimedia.org/T172383 Mahdimoqri (talk) 15:08, 15 March 2018 (UTC)[reply]
- Do any of the people who wrote the code actually follow phabricator? I tried to find the part of the code where the dot gets added/should be removed, but I was probably in the wrong module. Any ideas?
--- Jura 05:16, 16 March 2018 (UTC)[reply] - I'm just not checking it all that regularly. I've replied to the ticket. Fatameh relies on wikidatintegrator to do most of the heavy lifting. This uses PubMed as the data source and (unfortunately?) they actually report all the titles as ending in a period (or other punctuation). I think we need to find a reference for the titles without the period rather than just changing all the existing statements. There was a short discussion on the WikiCite Mailing List as well. I'm happy to work on a solution but I'm not really sure what is the best way forward. T Arrow (talk) 09:26, 16 March 2018 (UTC)[reply]
- Jura, I added the fix for the trailing dots and asterisks in a separate script (fatameh_sister_bot). Any other issues that I can address to have your support?Mahdimoqri (talk) 06:22, 17 March 2018 (UTC)[reply]
- Do any of the people who wrote the code actually follow phabricator? I tried to find the part of the code where the dot gets added/should be removed, but I was probably in the wrong module. Any ideas?
- There is a task for it here: https://phabricator.wikimedia.org/T172383 Mahdimoqri (talk) 15:08, 15 March 2018 (UTC)[reply]
- I was going to write the same thing. Can we remove the trailing full stop (".") ? I'm sure some bot could clean up the existing ones as well.
Thanks all for providing feedback and offering solutions/help to address the issue with Fatameh. It seems it will be a fix eaither for Fatameh or a separate script. In eaither case, it is to be applied to all article items which I beleive could be done independently of this bot. Meanwhile, could you support and accept this bot so I can get it started and maybe set up a new bot for fixing other issues? Mahdimoqri (talk) 21:12, 16 March 2018 (UTC)[reply]
- Oppose I don't think we should approve another Fatameh based bot until major concerns are fixed. --Succu (talk) 21:24, 16 March 2018 (UTC)[reply]
- Thanks for your feedback Succu. I just created a bot (Fatameh_sister_bot) that fixes the issue with the label for the items created using Fatameh. I'll make sure I run it on everything maria research bot creates to address the concern with the titles. Are there any other issues that I can address? Mahdimoqri (talk) 06:04, 17 March 2018 (UTC)[reply]
- @Succu: I also fixed this issue from the root in Fatameh source code here so new items are created without the trailing dot.
- Title statements would need the same fix and some labels have already been duplicated into other languages (maybe this is taken care of, but I haven't seen any in the samples).
--- Jura 09:35, 18 March 2018 (UTC)[reply]- Thanks for the feedback Jura. The translated labels (if any) are added to labels. I will take care of the title statement now.
- @Jura1: the titles are also fixed and the code has been updated (https://github.com/moqri/wikidata_scientific_citations/blob/master/fatameh_sister_bot/fix_labels_and_titles.py). Any other issues that I can address to have your support for the bot?
- I think the cleanup bot/task can be authorized.
--- Jura 12:30, 21 March 2018 (UTC)[reply]- @Jura1: wonderful! this is the request for the cleanup bot: fatameh_sister_bot. Could you please state your support there, for a bot flag?
- I don't think edits like this one are OK, Mahdimoqri, because you are ignoring the reference given. And please wait with this kind of corrections until you got the flag. --Succu (talk) 22:36, 22 March 2018 (UTC)[reply]
- @Succu: the title in the reference is not exactly correct. Please refer to this reference or this reference for the correct title. Would you like the bot to change the reference as well? – The preceding unsigned comment was added by [[User:|?]] ([[User talk:|talk]] • contribs).
- The cleanup should be fine. It just strips an artifact PMD adds.
--- Jura 09:24, 23 March 2018 (UTC)[reply]
- The cleanup should be fine. It just strips an artifact PMD adds.
- @Succu: the title in the reference is not exactly correct. Please refer to this reference or this reference for the correct title. Would you like the bot to change the reference as well? – The preceding unsigned comment was added by [[User:|?]] ([[User talk:|talk]] • contribs).
- I don't think edits like this one are OK, Mahdimoqri, because you are ignoring the reference given. And please wait with this kind of corrections until you got the flag. --Succu (talk) 22:36, 22 March 2018 (UTC)[reply]
- @Jura1: wonderful! this is the request for the cleanup bot: fatameh_sister_bot. Could you please state your support there, for a bot flag?
- I think the cleanup bot/task can be authorized.
- @Jura1: the titles are also fixed and the code has been updated (https://github.com/moqri/wikidata_scientific_citations/blob/master/fatameh_sister_bot/fix_labels_and_titles.py). Any other issues that I can address to have your support for the bot?
- Thanks for the feedback Jura. The translated labels (if any) are added to labels. I will take care of the title statement now.
- Title statements would need the same fix and some labels have already been duplicated into other languages (maybe this is taken care of, but I haven't seen any in the samples).
- Translated titles are enclosed within brackets. This should be changed. The current version overwrites existing page(s) (P304) with incomplete values. --Succu (talk) 10:08, 18 March 2018 (UTC)[reply]
- @Succu: thanks for the feedback! I could not find any instance of either of the issues! Could you please reply with one instance of each of these two issues that is created by my bot so that I can address them? Mahdimoqri (talk) 03:17, 19 March 2018 (UTC)[reply]
- [Sexually-transmitted infection in a high-risk group from Montería, Colombia]. (Q50804547) is an example for the first issue. Removing the brackets only is not the solution. --Succu (talk) 22:30, 22 March 2018 (UTC)[reply]
- I will not import any items with translated titles (until there is a consensus on what is the solution on this). Mahdimoqri (talk) 14:06, 30 March 2018 (UTC)[reply]
- We should try figure out how to handle them (e.g. import the original language and delete "title"-statement, possibly find the original title and add that as title and label in that language). For new imports, it would just need to skip adding the title statement and add a language of work or name (P407).
--- Jura 09:24, 23 March 2018 (UTC)[reply]- Or use original language of film or TV show (P364). Anyway, it should be made clear that the original title is not English. -- JakobVoss (talk) 14:45, 24 March 2018 (UTC)[reply]
- Should attempt to add a statement that identifies them as not being in English before we actually manage to determine the original language?
--- Jura 21:12, 24 March 2018 (UTC)[reply]
- Should attempt to add a statement that identifies them as not being in English before we actually manage to determine the original language?
- Or use original language of film or TV show (P364). Anyway, it should be made clear that the original title is not English. -- JakobVoss (talk) 14:45, 24 March 2018 (UTC)[reply]
- We should try figure out how to handle them (e.g. import the original language and delete "title"-statement, possibly find the original title and add that as title and label in that language). For new imports, it would just need to skip adding the title statement and add a language of work or name (P407).
- I will not import any items with translated titles (until there is a consensus on what is the solution on this). Mahdimoqri (talk) 14:06, 30 March 2018 (UTC)[reply]
- [Sexually-transmitted infection in a high-risk group from Montería, Colombia]. (Q50804547) is an example for the first issue. Removing the brackets only is not the solution. --Succu (talk) 22:30, 22 March 2018 (UTC)[reply]
- @Succu: thanks for the feedback! I could not find any instance of either of the issues! Could you please reply with one instance of each of these two issues that is created by my bot so that I can address them? Mahdimoqri (talk) 03:17, 19 March 2018 (UTC)[reply]
- Translated titles are enclosed within brackets. This should be changed. The current version overwrites existing page(s) (P304) with incomplete values. --Succu (talk) 10:08, 18 March 2018 (UTC)[reply]