Wikidata:WikiProject LD4 Wikidata Affinity Group/Wikidata Working Hours/Wikidata Working Hour Summer-Fall Project 2022/2022-July-18 Wikidata Working Hour

July 18, 2022 Wikidata Working Hour edit

Monday, July 18, 2022 at 11:00am PT / 2:00pm ET / 18:00 UTC / 8:00pm CEST

Recording edit

View recording: https://stanford.zoom.us/rec/share/VRZwuI_4pseU2Ea79SSgEZB-sVePBzQvZjoTX7ingXeb3QR4cwm17hNlq8NmF2e8.ObOSYgx-vklJ0vsR?startTime=1658166925000

If you wish to download the files, you can use the "Download (4 files)" link on the upper right of the page linked above.

Logistics edit

To get the most out of this Working Hour, please install the most recent version of OpenRefine in advance. Detailed installation instructions here: https://docs.openrefine.org/manual/installing

Metrics edit

Login to the Event Dashboard with your Wikimedia account to keep track of your edits today

Background edit

Over the summer and into the early fall the LD4 Wikidata Affinity Group will be offering a series of Wikidata Working Hours to give folks an opportunity to try out various Wikidata-related skills and tools by adding data about diverse children’s books from the Cooperative Children’s Book Center at the University of Wisconsin, Madison to Wikidata. Wikidata Working Hours provide hands-on Wikidata experience in a supportive space. We hope you will join us if you are interested in learning more about Wikidata, love children’s books, or have been looking for a fun Wikidata project to contribute to.

The second Wikidata Working Hour in the series will cover reconciliation in OpenRefine, so we can identify which authors from our spreadsheet of children's book metadata already exist and/or need to be created in Wikidata.

You do not have to have attended the first Wikidata Working Hour in the series to attend this one. However, for this Working Hour, we will assume that you already have OpenRefine installed and can import data into it, so it may be helpful to review the recording of the previous Working Hour.

Agenda edit

  • Introduction to reconciliation in OpenRefine
    • Different reconciliation services
    • Why/how to reconcile data
    • Tips/tricks for using Wikidata reconciliation service
  • Editing task

Editing task edit

We'll be using OpenRefine and the Wikidata reconciliation service to identify which contributors (i.e., authors, illustrators, translators, etc.) listed in the CCBC data already have associated Wikidata items and augment our dataset with various fields from Wikidata.

In the first Wikidata Working Hour, we split out individual contributors listed in the Contributors column in the original spreadsheets. You can claim a spreadsheet to work with today in this Google Drive folder: https://drive.google.com/drive/folders/1CqTihLQN-5fgrA2GpbgQyTST-E0fW12a. Claim them by editing the name of the spreadsheet and adding your initials to it (look for the files with names starting OR-CCBC-Data-Cleanup- to take advantage of the work done in the last session).

Once you have reconciled your spreadsheet, you can upload it as a TSV file to this subfolder called Reconciled contributor files: https://drive.google.com/drive/folders/164B7AMMpcz_mj9ktVy4ewBqIz87lydhe

Detailed instructions from demo notes edit

  • Claim cleaned spreadsheet in Google Drive
  • Prepare spreadsheet
    • Open TSV file in OpenRefine
    • Fill down Title and Id (maintain relationship between contributors and works as we shift to contributor records)
      • Title > Edit cells > Fill down
      • Id > Edit cells > Fill down
    • (May want to rename Id as BookId to make it more clear that it refers to the book, not the contributor)
      • Id> Edit column > Rename column
    • Move Contributors to beginning
      • Contributors > Edit column > Move column to beginning
  • Reconciliation (non-Wikidata)
    • Contributors > Reconcile > Start reconciling > Add Standard Service…
      • This is where we enter the reconciliation endpoint URL
    • Go through and select matches
      • Right-click on potential match, see VIAF record and confirm that it’s same person
      • Depending on service, may be able to hover and get preview
      • (VIAF - multiple entries may correspond to same person; for this purpose, just selecting first)
    • Reconcile > Add entity identifiers column
  • Reconciliation (Wikidata)
    • Contributors > Reconcile > Start reconciling > Wikidata (en)
      • Match to human (Q5)
      • Include VIAF ID as property VIAF ID (P214)
      • Deselect auto-match
    • Select matches
      • If matches show up, see if any obviously are the person (e.g., same name with description “children’s book author”); select match if that’s the case
      • If no matches show up, still worth clicking “Search for match” to try again and see if anything turns up there
      • If still nothing, search Wikidata directly; possibly person exists under slightly different name (e.g., middle name included or not) without aliases and search algorithm on Wikidata is more sophisticated than whatever reconciliation service is using
    • Contributors > Reconcile > Add entity identifiers column
      • Show option to Reconcile > Use values as identifiers
    • Contributors > Edit column > Add columns from reconciled column
      • Library of Congress authority ID, VIAF ID, Goodreads ID, occupation
      • (Play around with pulling in different columns)
    • Add special properties from Wikidata
  • Export
    • If export as OpenRefine project, maintains reconciliation decisions
    • But know that we can easily get those again from the identifiers!
    • Save and upload TSV to subfolder on Google Drive

Resources edit