User:Walkuraxx/e-scholarship

Der Bücherwurm, Carl Spitzweg, ca. 1850

Documentation of workflows for the ingestion of bibliographic data into Wikidata

Project description edit

The project documents possible workflows for the ingestion of bibliographic data into Wikidata in an easily comprehensible way. It is based on the workflow employed by the African Studies Centre Leiden. The ASC Library uses different workflows for ingesting bibliographic data into Wikidata depending on the material type. The library adds journal articles (from the Internet and the library catalogue) via Zotero and QuickStatements and books (from the library catalogue) via OpenRefine and QuickStatements. See f.e. the titles on Somali literature, the publications in the African Studies Collection series, or articles published in the Lagos historical review.

This documentation is produced thanks to the WikiCite e-scholarship programme 2020. The screencasts are produced with Kaltura Capture (recording) and Shotcut (editing).

The screencasts are not perfect but I hope that they are nevertheless useful for many future contributors to WikiCite.

How to import scholarly articles to Wikidata using Zotero edit

 
Journals at the Library of the African Studies Centre in Leiden

Introduction edit

The ASCL started adding journal articles to Wikidata because this method of bibliographic data ingestion is likely the easiest way to import bibliographic data to Wikidata. The upload procedure consists of "prefabricated" parts; special technical skills are not required.

Once the bibliographic data is copied to Zotero’s library - via f.e. Zotero’s in-built RSS Feed Reader or the Zotero connector, a browser extension that recognizes bibliographic data on websites - the data can be easily converted into the Wikidata Quick Statements format and uploaded to Wikidata.

This workflow is especially suitable for journal articles and less for other material types as Zotero’s download file in the Wikidata Quick Statements format lacks essential information for other material types than journal articles. For example, the publisher and place of publication are missing in a book’s download file. This information is missing because there is no direct link between Zotero and Wikidata. Zotero cannot retrieve the publisher’s unique identifier (its Q-number).

Workflow in a nutshell edit

How to import scholarly articles to Wikidata using Zotero, screencast video, 2020
  1. Add the scholarly articles to your Zotero Library.
  2. Download the scholarly articles using the Wikidata Quick Statements export format.
  3. Check and amend the scholarly articles in a text editor.
    1. Check for duplicates.
    2. Add the source (P1433 published in).
    3. Check and correct - if necessary - the language of the title (P1476 title: und:“Title“ > en:“Title“).
    4. Check and correct - if necessary - strange/garbled characters.
  4. Upload the scholarly articles via Wikidata Quick Statements 2.

Documentation edit

and further

How to import books to Wikidata using a spreadsheet and OpenRefine edit

 
Books published in Somaliland, Collection African Studies Centre Leiden

Introduction edit

The ASCL uses the library software Alma, which offers the possibility to export bibliographic records into excel. The library applies this excel spreadsheet for the upload of monographs into Wikidata. The Alma excel sheet also includes the monograph’s call number (which a download in MARC21 would not). The Alma excel file is less suitable for the upload of journal articles and book chapters into Wikidata because source information, such as the journal title, is missing in the download file. If your library system generates a satisfying and complete download in a spreadsheet, you could use this workflow for all material types.

Workflow in a nutshell edit

How to import books to Wikidata using a spreadsheet and OpenRefine, screencast video, 2020
  1. Import/Copy the spreadsheet to OpenRefine.
  2. Edit the bibliographic data in OpenRefine.
  3. Create a Wikidata schema in OpenRefine.
  4. Upload the data to Wikidata with Wikidata Quick Statements 2.

Documentation edit

and further

GREL functions / Jython used in screencast example edit

1. Combine cell values : GREL: value + cells['Column'].value.

value + cells['MMS ID'].value (Example: Combine base url + MMS ID)

2. Extract year of publication (= 4 digits) : Jython:

import re myregex = re.compile(r"\d{4}") return "|".join(myregex.findall(value))

3. Extract place of publication : Jython:

import re myregex = re.compile(r".?\(.*:") return "|".join(myregex.findall(value))

How to import library materials to Wikidata using MarcEdit and OpenRefine edit

 
Library of the African Studies Centre in Leiden

Introduction edit

The ASCL describes its bibliographic records in MARC21. In order to upload a set of records into Wikidata that consists of different material types, the library converts a binary MARC21 file into a tab delimited file by using MarcEdit. The tab delimited file can be further edited in Open Refine and then uploaded to Wikidata.

The screencast video focuses on the data conversion in MarcEdit as data cleanup and reconciliation in OpenRefine are described in more detail in the second part of this documentation on How to import books to Wikidata using a spreadsheet and OpenRefine.

Workflow in a nutshell edit

How to import library materials to Wikidata using MarcEdit and OpenRefine, screencast video, 2020
  1. Convert a binary MARC21 file into a tab delimited file with MarcEdit.
  2. Open the tab delimited file in OpenRefine.
  3. Group the bibliographic records by material type using text facets.
  4. Clean and reconcile the bibliographic records by material type.
  5. Upload the records to Wikidata by using Quick Statements 2.

Documentation edit

and further

Q&A edit

Feel free to drop me a question on the user talk page and I’ll do my best to answer it.