Wikidata:WikiProject PCC Wikidata Pilot/San Diego State University/Comic Arts Project Workflows/OpenRefine

Resources/Training

edit

While the project lead will demonstrate how to use OpenRefine, you may also want to check out some of the OpenRefine tutorials listed here which go into more depth. There is also a chance, especially when we move onto fictitious characters, that we can use OpenRefine to also import statements, as demonstrated in this tutorial.

Awards

edit

We will be using OpenRefine and writing directly to Wikidata and/or using QuickStatements to batch load comic books awards. The project lead will create Google sheets using the import HTML function or APIs to grab award information from various websites/sources. Once this information is on our sheet with references, it will be loaded to OpenRefine and we will run reconciliation to match award winners with their Q number in Wikidata and match their award winning works (if applicable). Following that process, a schema will be built and the statements will be written into Wikidata.

Process: OpenRefine Reconciliation

edit
  1. Go to the Comic Arts Wikidata Project (Google Drive) and navigate to the folder: Awards ready for reconciling
  2. Open the spreadsheet for the awards you want to work on (double check to see if it’s been completed). If it’s a short list, import the entire list (either download and import the excel file or copy paste from clipboard. If copy/pasting, make sure that you have separate headings). Once your data is in, create the project. Choose a name that works for you (Inkpot pt.1 for example)
  3. Next, reconcile the people. Click on the arrow next to name/person and go to “start reconciling.” Make sure you reconcile to Wikidata (it might be your only option unless you’ve added more services). You should reconcile to human (Q5) and if any other information is included (such as birth year), you can click that to use as relevant details
  4. Once it’s finished reconciling, you will see some names have automatically been matched and others will either ask you to create an item (if there’s no match) or show you multiple matching options. If you hover over each name, you’ll see the brief description. You can also click on a name to open the Wikidata item. Once you determine a match, click the single check mark to match the item.
  5. Once you’ve finished reconciling all your names, you can work on creating your schema. You should see (on the right of your screen) a drop down with Extensions: Wikidata. Click on Wikidata and go to “Edit Wikidata schema” to create your schema.
  6. Start by adding an item. Then drag down the gray box which says either “name” or “person”. This is the person you’re adding the statement to. (The first text box, which states type item or drag reconciled column here). Then click to “add a statement” and begin adding a statement like you would in Wikidata. For any variables (year, reference URL) you will drag down the gray box. For statements that are the same for every person (the award, the qualifier point in time, etc.), you just select the appropriate Wikidata item. Note that for retrieved (for reference URL), you can just type in all caps, TODAY.

QuickStatements

edit
  1. Once your schema is complete, go once again to the Extensions:Wikidata and now select export to QuickStatements. Once you export, you should get a new tab opened in your browser with a series of statements that look something like this:
       Q380113    P166    Q4908985    P585    +2005-01-01T00:00:00Z/9    S854    "https://www.comics.org/award/21/"    S813    +2021-07-14T00:00:00Z/11
  1. Go to QuickStatements. Make sure that you log in, then copy and paste your tab of QuickStatement ready text. Then click on the export V1 commands. You will then get a screen with the statements to be added. Click on run to begin the process.
  2. Once the statements have run, you can check on the work either by clicking on the name in QuickStatements or by looking at your personal contributions.

After confirming everything looks correct, go back to the spreadsheet in the Drive and mark whichever items you’ve worked on as done.

Writing Directly to Wikidata

edit
  1. For some awards, instead of using QuickStatements, we will be writing directly to Wikidata from OpenRefine. Follow all of the steps above, and then, instead of going to "export to QuickStatements" go to "write to Wikidata."
  2. Sign in to your Wikidata account (do not remember the password as OpenRefine is not incrypted) and your data will be entered directly into Wikidata.
  3. This will be used when a person has won the same type of award over multiple years, i.e. Eisner Awards

Award Winning Works

edit

The project lead will investigate processes to easily batch MARC to Wikidata for award winning titles. This workflow will also require catalogers to go into OCLC and add 586 statements for awards. Potential issues for this workflow include differentiating between works, series, and manifestations.

This workflow is still being developed and owes a lot to the University of Washington, which developed a workflow for MARC to Wikidata

  1. Begin by searching Wikipedia to see if the award winning works exist (this is mainly for the Eisner Awards, many of which have a Wikipedia entry and therefore will be in Wikidata
  2. Search OCLC for your titles; add them to your local save file
  3. Using MarcEdit in 32 bit mode, extract all the records and save the file. Next, open MarcEdit and use the "Export Tab Delimited" to export specific MARC fields. Which fields you export will depend on the type of work; monograph vs. serial, series, etc. The fields will likely include 245$a, 245$b, 245$c, 700$a, 260/264 $b/$c, etc.
  4. After you've exported your tab delimited file, you're ready to start using OpenRefine. Open your file in OpenRefine and begin work on cleaning and reconciling the data. (More detailed instructions coming soon see the Univesrity of Washington for some possible steps for cleaning the data
    1. Of the steps from University of Washington that might be used are title/subtitle, create label, creator, and degree supervisor, for cleaning up the 7xx fields, specifically for adding illustrators, colorists, and letterers.
  5. Once your data has been reconciled, upload your edits to Wikidata
  • For some types of works, such as single issues, the OCLC workflow will not work. Also, depending upon how detailed the MARC records are, it may be faster to skip this step altogether and just manually update Wikidata with the appropriate information

Awards List

edit

The following is a list of awards we would like to batch into Wikidata:

  • Harvey Award (in process)
  • Will Eisner Comics Industry Awards (awards given to people completed in Sept.)
  • Ignatz Award
  • Dwayne McDuffie Award for Diversity in Comics (in process)
  • Dwayne McDuffie Award for Kids' Comics
  • Joe Shuster Canadian Comic Book Creator Awards
  • Doug Wright Awards for Canadian Cartooning
  • The Mike Wieringo Comic Book Industry Awards aka the Ringo Award (in process)
  • Reuben Award (complete)
  • Inkpot Award (mostly complete, will need to create items for some individuals)
  • Russ Manning Award (mostly complete, will need to create items for some individuals)
  • Bill Finger Award (complete)
  • Inkwell Award
  • International Manga Award
  • Japan Cartoonists Association Award
  • Kodansha Manga Award
  • Tezuka Award
  • Shogakukan Manga Award
  • Manga Taishō

please note, we're starting with people and then will move onto works As of October 2021, all awards for people have been added; we have ~200 people we are working on creating

Alexander Street Press titles

edit

If there is time: for OpenRefine, we will focus on the Alexander Street Press Underground Comics Collection, roughly 360 titles. The Project Lead will divide the titles and demonstrate how to run the reconciliation process for Wikidata. Since many of these titles also need enhancements in OCLC, she will also demonstrate how to reconcile to the Name Authority File.

The spreadsheet includes the title, main author, publisher, OCLC number, and a link to the resource. The reconciliation process will be run on both the author and publisher; however, since only the first author is listed in the spreadsheet, you will need to actually view the resource to include any additional authors/creators. When you view the comic, you should also update the OCLC record.

Run the reconciliation process for either main author or publisher. Go through the names on your sheet and determine if the suggested Q# is a match or not.

Since we will have complete access to these titles electronically, we will be enhancing the records in OCLC. Many, though listed as full level records, do not include fields important to comic book researchers such as the table of contents or a full list of creators/contributors.

Follow the SDSU Cataloging Comics Guide and update the records to RDA and make them all provider neutral (see OCLC's guidelines and PCC's guidelines for guidance).

For Name Authority Records, follow the best practices for the 024 field.