Wikidata:Requests for permissions/Bot/NiraliBot
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 18:41, 13 July 2021 (UTC)[reply]
NiraliBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Nirali Sahoo (talk • contribs • logs)
Task/s: Import missing Soccerway player ID (P2369) from English Wikipedia articles
Code: import_soccerway_id.py
Function details: Goes through the articles in en:Category:Soccerway template with ID not in Wikidata and extracts the Soccerway IDs and adds it to Wikidata as Soccerway player ID (P2369) identifier. The main process involves the following functionalities:
- Extracts the Soccerway ID from Wikipedia articles and verifies its correctness using the official site (Soccerway)
- If the ID present in Wikipedia article is missing or incorrect, finds the correct ID from the official site
This script is a part of the Outreachy program under Mike Peel and I intend to run this script periodically biweekly even after the end of the program. --Nirali Sahoo (talk) 16:38, 30 June 2021 (UTC)[reply]
- can we see some (maybe 50) test edits? the logic in checkAuthenticity has me worried. seems fairly heuristic and prone to breaking. why do we need this? i'm also worried that searchPlayer will return bad results when we have multiple players with the same name (which will definitely happen). BrokenSegue (talk) 18:07, 30 June 2021 (UTC)[reply]
- checkAuthenticity() was included because there were certain misleading/incorrect IDs present in the Wikipedia articles (example: Afiq Noor which has the ID of anumanthan-mohan-kumar/234754 or the presence of IDs for competitions like 2020 Uzbekistan First League)
As for the ambiguity due to same names, I added an extra check for checking/matching the date of birth for the players.
Kindly check Special:Contributions/NiraliBot for the test edits. Nirali Sahoo (talk) 08:04, 1 July 2021 (UTC)[reply]- Ok that all seems sensible. Only other thing I'd suggest is that if we are importing the data from enwiki we should probably add a reference to the data indicating as much (e.g. as in the change I made at [1]). BrokenSegue (talk) 15:09, 1 July 2021 (UTC)[reply]
- Thank you. I modified the code to add the import information ([2], though this edit was made in the sandbox, without the check for duplicate ID in another page)
- @Nirali Sahoo: am I reading your code correctly? it seems like you are adding the "imported from enwiki" reference even when it's not imported and you did the search yourself. BrokenSegue (talk) 02:15, 2 July 2021 (UTC)[reply]
- Yes, you were right. I didn't think of it; apologies for the negligence from my side. I modified the code (import_soccerway_id.py) to add the source only if it is imported from enwiki article. Nirali Sahoo (talk) 16:07, 2 July 2021 (UTC)[reply]
- thanks. Support. BrokenSegue (talk) 16:15, 2 July 2021 (UTC)[reply]
- Yes, you were right. I didn't think of it; apologies for the negligence from my side. I modified the code (import_soccerway_id.py) to add the source only if it is imported from enwiki article. Nirali Sahoo (talk) 16:07, 2 July 2021 (UTC)[reply]
- @Nirali Sahoo: am I reading your code correctly? it seems like you are adding the "imported from enwiki" reference even when it's not imported and you did the search yourself. BrokenSegue (talk) 02:15, 2 July 2021 (UTC)[reply]
- Thank you. I modified the code to add the import information ([2], though this edit was made in the sandbox, without the check for duplicate ID in another page)
- Ok that all seems sensible. Only other thing I'd suggest is that if we are importing the data from enwiki we should probably add a reference to the data indicating as much (e.g. as in the change I made at [1]). BrokenSegue (talk) 15:09, 1 July 2021 (UTC)[reply]
- I am going to approve the bot in a couple of days provided no objections have been raised.--Ymblanter (talk) 19:22, 9 July 2021 (UTC)[reply]
- checkAuthenticity() was included because there were certain misleading/incorrect IDs present in the Wikipedia articles (example: Afiq Noor which has the ID of anumanthan-mohan-kumar/234754 or the presence of IDs for competitions like 2020 Uzbekistan First League)