Wikidata:WikiProject Cultural heritage/Reports/Linking Private Collections Semi-Automatically

Wikidata Case Report: Linking Private Collections Semi-Automatically (P485) edit

27 January 2020

Project Team

  • Petra Maier, Michael Gasser (ETH Library Zurich)
  • Stephanie Bolliger, Julia Lyskawa, Lothar Schmitt (Central Library of Zurich)
  • Brigitte Bruederlin (Swiss National Library)

Aim of the project edit

During the second half of 2019 the Swiss Wikidata Hackday Series – a cooperation between three Swiss libraries (Swiss National Library, Central Library of Zurich, ETH Library Zurich) and the Bern University of Applied Sciences – took place for the first time. This format provided an opportunity for participants to explore the potential of Wikidata for libraries.

One project team saw a lot of potential in using Wikidata to increase the visibility of private collections housed by the three libraries. When the Hackday Series started, only very few references of these collections were linked to Wikidata items. Making use of the “archives at” property (P485), this project was initiated to close this gap. The following project goals were defined:

  • Creation of not yet existing notable Wikidata items where necessary
  • Enrichment of Wikidata items with link(s) to the respective archival finding aid(s)
  • Definition and documentation of a semi-automatic workflow of the entire process to be made available to other institutions

At the very beginning of the project, collections referring to organizations, institutions, or (former) parts of institutions were to be included as well. However, the focus was immediately limited to persons and private collections. For one reason, in our case private collections were in the overwhelming majority. For another, especially historical changes in institutions (e.g. a former institute of a former department of a university) are complex to model in Wikidata and are best processed manually.

Project results edit

Workflow and documentation edit

The main focus of the project was the establishment of a data processing workflow and its documentation. The process consists of the following parts:

  • metadata extraction (e.g. from an archival information system)
  • quality assurance and data preparation
  • identification of existing Wikidata items
  • creation of new Wikidata items where necessary
  • add “archives at” references to Wikidata items

Since several hundred private collections were to be linked, the process was automated as far as possible. The extensive use of the well-established tools OpenRefine and QuickStatements were of great help.

The project team consolidated its work in the short manual “How to Link Your Institution’s Collections to Wikidata?” The document is available in two versions:

Both versions have been published under a CC BY 4.0 licence. We hope that you find the manual useful, even though parts of it may have to be adapted to your or your institution’s specific needs.

Enrichment of Wikidata edit

Based on the process established, the three institutions were able to make the following contributions to Wikidata:

Institution Number of new WD items "archives at" enrichments only
ETH Zurich University Archives (Q39934978) 371 528
Zentralbibliothek Zürich (Q190260) 145 277
Swiss National Library (Q201787) 63 313
Total 579 1,118

Outlook edit

We hope that our contribution to Wikidata helps to promote a wider and increasing use of the “archives at” property within the GLAM community. Making this possibility more widely known and showing concrete application possibilities are important next steps for us. This includes presentations and hands-on workshops within the local, national, and international communities.

Another issue is the long-term maintenance of the data. For example, how can we ensure that changes in the source system (updated and new entries) are reflected in Wikidata? Do we have to monitor changes in Wikidata (e.g. two WD items being merged) that might have an impact on the data in our source systems? Questions like these still need to be discussed and answered.