Wikidata:Property proposal/Ringgold identifier
Ringgold identifier edit
Originally proposed at Wikidata:Property proposal/Organization
Description | unique identifier for organisations in the publishing industry supply chain |
---|---|
Represents | Ringgold identifier (Q17016896) |
Data type | External identifier |
Domain | organisations |
Allowed values | \d{4,6} |
Example | Wellcome Trust (Q326276) → 5072 |
Source | ORCID API: https://members.orcid.org/api/tutorial-retrieve-data-using-public-api |
- Motivation
(Add your motivation for this property here.) GZWDer (talk) 18:23, 16 January 2017 (UTC)
- Discussion
- Oppose - this is proprietary information and I do not believe it either belongs in wikidata or is legally allowed to be placed here. ArthurPSmith (talk) 19:41, 17 January 2017 (UTC)
- Support - ORCID iD (P496) are attributed to researchers, and it is possible to retrieve from the ORCID API a list of institutions a researcher is affiliated to. Very often, these institutions come with a Ringgold ID, so it would be useful to include these identifiers in the institution items. This would enable us create more links between researchers and institutions. I do not believe that using this source of Ringgold ids could be a legal issue as the ORCID data dump is released under a license that is compatible with CC0. Pintoch (talk) 18:14, 18 January 2017 (UTC)
- Support per Pintoch. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:52, 18 January 2017 (UTC)
- Comment about the legal concerns: let me quote the Mix'n'Match FAQ: "Individual identifiers, such as numbers, can not be under copyright. If you are an institution based in Europe, the whole of your ID list may be under database copyright, but we are not copying the entire list in bulk; rather, volunteers add most of them individually, one at a time." I suppose that not all Ringgold identifiers are present in the ORCID dump, so doing an import from this source would not import the whole database. − Pintoch (talk) 18:59, 18 January 2017 (UTC))
- Note: I am posting in my capacity as Wikimedian in Residence at ORCID. I have a statement from ORCID "Per our agreement with Ringgold, we are allowed to share the Ringgold identifiers and limited metadata (organization name, location) under CC0 license, just as the rest of ORCID data are available. We would not be using Ringgold otherwise. If someone gets a Ringgold ID out of ORCID, they are free to use it." Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:12, 19 January 2017 (UTC)
- Excellent. Here is the kind of data we can extract from the ORCID 2016 dump (formatted for the Mix'n'Match tool):
Ringgold ID, Name, Description |
---|
4919 University College London University College London, London, London, GB 16724 Universitat de Barcelona Universitat de Barcelona, Barcelona, Catalunya, ES 16765 Universidad de Zaragoza Universidad de Zaragoza, Zaragoza, Aragón, ES 16778 Universidad de Sevilla Universidad de Sevilla, Seville, Andalucía, ES 2152 University of Cambridge University of Cambridge, Cambridge, Cambridgeshire, GB 1259 University of Michigan University of Michigan, Ann Arbor, MI, US 14736 Texas A&M University Texas A&M University, College Station, TX, US 1877 University of Colorado Boulder University of Colorado Boulder, Boulder, CO, US 72971 Universidade Técnica de Lisboa Instituto Superior Técnico Universidade Técnica de Lisboa Instituto Superior Técnico, Lisboa, Lisboa, PT 27106 Karolinska Institutet Karolinska Institutet, Stockholm, Stockholm, SE 16734 Universidad Complutense de Madrid Universidad Complutense de Madrid, Madrid, Comunidad de Madrid, ES 2281 University of Melbourne University of Melbourne, Melbourne, VIC, AU 16719 Universitat Autònoma de Barcelona Universitat Autònoma de Barcelona, Bellaterra, Catalunya, ES 16722 Universidad Autónoma de Madrid Universidad Autónoma de Madrid, Madrid, Madrid, ES 4615 Imperial College London Imperial College London, London, London, GB 6396 University of Oxford University of Oxford, Oxford, Oxfordshire, GB 28133 Universidade de São Paulo Universidade de São Paulo, São Paulo, SP, BR ... |
I'll publish the dataset (15423 ids) and the code I used if this property is created. − Pintoch (talk) 00:15, 20 January 2017 (UTC)
- Hmm, well, I have concerns about verifiability - what if some of the id's provided by ORCID are wrong? How would we know? I've worked with some datasets that had RInggold id's before where a substantial fraction (several percent) of the entered id's were incorrect - or at least disagreed between two comparable sources. But I suppose it's better than nothing and glad that they worked out that license agreement. ArthurPSmith (talk) 16:24, 23 January 2017 (UTC)
- @ArthurPSmith: yeah, the ORCID dataset is quite noisy too. Unfortunately they have made UI design decisions that allow users to pollute the dataset with fake matches (see this GitHub issue). The excerpt from the dataset above contains matches by decreasing number of occurrences (so for the ones I have quoted, we can be sure these are the right identifiers). We can always confirm an ID by using the ORCID UI, adding an institution to a (fake) profile on sandbox.orcid.org, and checking which Ringgold id it gets (or calling manually the AJAX url that does the autocompletion there). That's very hacky and quite annoying, but I'm not aware of any other open data source to do that. If you still have access to your other datasets, do you think they could be used just to compare?
- I wish ORCID exposed ISNI ids instead of Ringgold ids (since Ringgold seems to have aligned their own dataset with ISNI), because that would make all this a lot simpler… − Pintoch (talk) 17:39, 23 January 2017 (UTC)
- the datasets I have access to with Ringgold information are explicitly NOT open or allowed to be used for other uses of this sort (per the license agreement with Ringgold when initially set up). Also they are quite small - just a few thousand institutions at most. And still plenty of errors. We simply had to drop the conflicting identifiers as we had no way to verify things once our Ringgold contract expired. But ISNI is hardly better - I've been comparing ISNI's between Orgref and GRID (both open datasets with tens of thousands of ISNI id's) and there's a lot of disagreements in that too. At least with ISNI you end up with a URL you can dereference to verify although often there's not much more than a name that might not actually resolve the issue. ArthurPSmith (talk) 23:37, 23 January 2017 (UTC)
Update: I have finally used the disambiguation dialog to circumvent the issue of fake matches introduced by users. It also improves coverage a lot. The dataset can be found at https://doi.org/10.5281/zenodo.268334 . The first half of it is on Mix'n'Match (where ISNIs are used, to leverage the existing statements for that identifier). I have pulled ISNI identifiers from GRID and VIAF. I will soon add Ringgold statements based on the existing ISNIs. − Pintoch (talk) 12:18, 3 February 2017 (UTC)