Wikidata:Requests for permissions/Bot/APSbot 3
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved-- Lymantria (talk) 08:19, 12 June 2017 (UTC)[reply]
APSbot 3 edit
APSbot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: ArthurPSmith (talk • contribs • logs)
Task/s: create wikidata items from GRID database entries that do not currently have a wikidata item
Code: github repo - still under development. Note the "create" script uses WikidataIntegrator instead of pywikibot.
Function details: GRID releases an update of their research organization database roughly every month; over the past year or so we have been trying to match up their records with wikidata items via Mix n' Match and other correlation factors such as ISNI or just matching on URL, name and/or country. There are now over 32,000 wikidata items with GRID ID (P2427) relationships. We are fairly confident that few of the remaining 40,000 organizations in the GRID database have existing wikidata items; they have also been putting considerable effort into cleanup and de-duplication, and the most recent (5-22-2017) release included several hundred new de-duplications, partly based on input from wikidata users. The proposal now is to import the remaining GRID organizations into wikidata using this bot account. The bot would create the item with English name and description, add the grid external id, provide an appropriate P31 based on the GRID type classification, and also add country (P17) and official website (P856) if available from the GRID data. Sample new items are here and here.
--ArthurPSmith (talk) 20:05, 6 June 2017 (UTC)[reply]
- I think this bot request is very welcome and is the natural next step in our work with GRID. We should also stress that the way GRID is curated gives fairly good guarantees that the institutions it contains satisfy the inclusion criteria of Wikidata: these are institutions that appear as affiliations on research papers or in funding data, so they are likely to be described by many other resources.
- A few suggestions about the import:
- Why not also add the other external identifiers referenced by GRID, such as ISNI (P213)? This would help us monitor the rate of duplicates on the constraint violation reports.
- Some GRID ids also have other labels in other languages or acronyms that could be added appropriately on the items (typically for Département des Finances et des Relations Extérieures (Q30144179)). I don't know if that is easy to do with WikidataIntegrator though. But I think it would be worth the effort as it would increase the visibility of the items, which should also help avoid duplication.
- Obviously there are many other things we will want to add (for instance the relations with other institutions) but that can be done separately (as it should also cover the items which already have GRIDs).
- Excited to see that we are getting closer to a 1-1 correspondence between GRID and Wikidata! − Pintoch (talk) 21:38, 6 June 2017 (UTC)[reply]
- I'm also excited about this work. Thank you for your time! Runner1928 (talk) 21:44, 6 June 2017 (UTC)[reply]
- thanks! @Pintoch: I wasn't sure the ISNI relationships were particularly reliable, but I certainly could add them. Adding labels in other languages and aliases should also be straightforward, I'll look into that. WikidataIntegrator is very easy to use, although some bits of it seem a little overly customized for bio-medical stuff... ArthurPSmith (talk) 14:56, 7 June 2017 (UTC)[reply]
- I'm also excited about this work. Thank you for your time! Runner1928 (talk) 21:44, 6 June 2017 (UTC)[reply]
Support --PokestarFan • Drink some tea and talk with me • Stalk my edits • I'm not shouting, I just like this font! 00:15, 7 June 2017 (UTC)[reply]