Wikidata:WikiProject Authority control
This WikiProject aims to the improvement of the quantity and quality of external identifiers present on Wikidata.
If you have ideas, please open a new thread in the talk of the WikiProject.
Overview on external identifiers edit
- For a description of the structure of external-id properties, see Wikidata:External identifiers
- For a list (obviously incomplete) of databases using Wikidata as authority control, see Wikidata:Wikidata for authority control
- For information about the progresses of coreferencing on Wikidata in the years up to 2020, see the archive of this page
Most useful tools and gadgets edit
Main tools for coreferencing:
- Mix'n'match (Q28054658): catalogs of external ids can be imported and gradually matched to Wikidata items
- QuickStatements 2 (Q29032512): allows adding batches of statements to Wikidata items
Main gadgets for coreferencing:
- User:Magnus Manske/mixnmatch gadget.js: when you open an item, you see all the Mix'n'match entries automatically matched to the item and you can manually confirm the correct automatches
- User:Bargioni/MnM ext2.js: little extension of the previous gadget, which allows to easily remove the incorrect automatches (one click on X) and to eventually mark entries as not applicable to Wikidata (two clicks on X)
VIAF edit
For information about the relationship between Wikidata and Virtual International Authority File (Q54919), see Wikidata:VIAF and its subpages
AAT edit
The Art & Architecture Thesaurus (Q611299) from the Getty Research Institute (Q11203476) is a crucial multilingual thesaurus in cultural heritage, with 56,537 concepts as of 14 June 2023. See http://vocab.getty.edu/sparql
select (count(*) as ?c) { ?x a skos:Concept; skos:inScheme aat: }
Of these, 22,000 (54.4%) are mapped to 21,773 Wikidata items, also as of 11 September 2020. (Live query.)
AAT is actively coreferenced on Mix-n-Match.
For historical information about the relationship between Wikidata and AAT, see the archived material at Coreference AAT.
GND edit
Maintenance lists: Wikidata:WikiProject Authority control/Tn
ULAN edit
The Union List of Artist Names (Q2494649), also from the Getty Research Institute (Q11203476), is a dataset of entities in the art world, primarily artists but also museums, galleries, organizations, and companies, with 312,079 entries as of 12 September 2020. See http://vocab.getty.edu/sparql
select (count(*) as ?c) { ?x a skos:Concept; skos:inScheme ulan: }
Of these, 79,529 (43.2%) are mapped to 88,415 Wikidata items, also as of 12 September 2020. (Live query.) 45,032 are preliminarily matched based on labels (names) and need to be verified (expect a high percentage of false positives in this group).
ULAN is actively coreferenced on Mix-n-Match, but this dataset requires some manual review after import. Items to watch for:
- Mix'n'Match contains only 183,912 of the ULAN items (those representing humans).
- A few ULAN names are formatted LAST NAME, FIRST NAME, and will be imported that way by Mix'n'Match.
- Some punctuation in names will be imported with escape characters (//) by Mix'n'Match; these need to be removed.
- Mix'n'Match may import floruit (Q36424) or "active" dates as birth and death dates; these should be deprecated, ideally with <reason for deprecation> work period dates imported or interpreted as birth/death dates (Q80833195). The active dates can be correctly added using floruit (P1317) or work period (start) (P2031) and work period (end) (P2032).
- ULAN is coreferenced in VIAF.
- ULAN contains values for sex or gender (P21) and occupation (P106), but these are not imported by Mix'n'Match.
- ULAN contains many alternative names and spellings, which are not captured by Mix'n'Match but can be very helpful for coreferencing to other sources. Adding these as aliases by hand is good!
RKD artists edit
RKDartists (Q17299517) is a database of artist biographies from the Netherlands Institute for Art History (Q758610). The database is bilingual (Dutch and English). Of ~422K entries in the database, about 40% are redirects to other items and have been marked at "not applicable" to Wikidata. 92,346 entries are mapped to 88,415 Wikidata items (as of 15 September 2020). 55,389 entries are preliminarily matched based on labels (names) and need to be verified (expect a high percentage of false positives in this group).
RKDartists is actively coreferenced on Mix-n-Match. The structured data in RKDartists is very robust, and once an RKDartist ID has been mapped to a human in Wikidata, a Bot will automatically create statements for the available structured data, with references. The statements added by the Bot include labels in some European languages, date and place of birth, date and place of death, occupation, floruit or work period start/end dates, and work locations with start/end times.
Note:
- Items created from RKDartists ID using Mix'n'Match may contain EN descriptions in Dutch; these should be replaced.
- The database contains alternate forms and spellings of names, but these are not automatically added as aliases. Adding them manually will help in coreferencing to other datasets such as ULAN.
History edit
Please add here references, blogs etc on the topic. For news prior to 2019 see the archive.
Tweet using tag #coreferencing.
- 2020: ...
Useful resources edit
Participants edit
Please become members of this project!
The participants listed below can be notified using the following template in discussions:
{{Ping project|Authority control}}
- Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC)
- Jonathan Groß (talk) 17:52, 26 March 2017 (UTC)
- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits
- Jneubert (talk) 13:47, 29 April 2017 (UTC)
- Sic19 (talk) 20:42, 12 July 2017 (UTC)
- Wikidelo (talk) 21:15, 8 May 2018 (UTC)
- ArthurPSmith (talk) 19:52, 22 August 2018 (UTC)
- PKM (talk) 19:40, 23 August 2018 (UTC)
- Ettorerizza (talk) 06:44, 8 October 2018 (UTC)
- Fuzheado (talk) 03:47, 19 December 2018 (UTC)
- Daniel Mietchen (talk) 16:30, 7 April 2019 (UTC)
- Iwan.Aucamp (talk) 21:48, 3 October 2019 (UTC)
- Epìdosis (talk) 23:49, 22 November 2019 (UTC)
- Sotho Tal Ker (talk) 00:52, 1 May 2020 (UTC)
- Bargioni (talk) 09:48, 02 May 2020 (UTC)
- Carlobia (talk) 14:34, 11 May 2020 (UTC)
- Pablo Busatto (talk) 03:22, 23 June 2020 (UTC)
- Matlin (talk) 10:53, 6 July 2020 (UTC)
- Msuicat (talk) 21:57, 27 August 2020 (UTC)
- Uomovariabile (talk) 10:04, 27 October 2020 (UTC)
- Silva Selva (talk) 17:21, 30 November 2020 (UTC)
- 1-Byte (talk) 15:52, 14 December 2020 (UTC)
- Alessandra.Moi (talk) 17:26, 16 February 2021 (UTC)
- CamelCaseNick (talk) 21:20, 20 February 2021 (UTC)
- Songceci (talk) 18:45, 24 February 2021 (UTC)]]
- moz (talk) 10:48, 8 March 2021 (UTC)
- AhavaCohen (talk) 14:41, 11 March 2021 (UTC)
- Kolja21 (talk) 17:37, 13 March 2021 (UTC)
- RShigapov (talk) 14:34, 19 September 2021 (UTC)
- Jason.nlw (talk) 15:15, 30 September 2021 (UTC)
- MasterRus21thCentury (talk) 20:22, 18 October 2021 (UTC)
- Newt713 (talk) 08:42, 13 March 2022 (UTC)
- Pierre Tribhou (talk) 08:00, 20 March 2022 (UTC)
- Powerek38 (talk) 17:21, 14 April 2022 (UTC)
- Ahatd (talk) 08:34, 4 August 2022 (UTC)
- JordanTimothyJames (talk) 00:54, 31 August 2022 (UTC)
- --Silviafanti (talk) 17:07, 14 September 2022 (UTC)
- Back ache (talk) 02:03, 1 November 2022 (UTC)
- AfricanLibrarian (talk)
- M.roszkowski (talk) 10:44, 4 January 2023 (UTC)
- Rhagfyr (talk) 19:36, 9 January 2023 (UTC)
- Maxime
- — Haseeb (talk) 13:10, 4 August 2023 (UTC)
- 沈澄心✉ 13:26, 15 November 2023 (UTC)