Wikidata:WikiProject PCC Wikidata Pilot/University of Washington/Workflow, Trainings, and Resources/MARC21 to Wikidata Workflow

University of Washington Libraries Linked Data Team
Crystal Clements (cec23@uw.edu)
Contributions by Thad Guidry
Last revised February 22, 2022

Workflow

Export MARC Records (there are many right ways to do this)

As a batch through OCLC Connexion
As a published set through Alma
- You need the appropriate permissions in Alma to manage and run jobs on sets
- Create a set using Admin --> manage sets
- Run a job on the set to export the bibliographic records
- University of Washington
  - Select job "Export Bibliographic Records"
  - Save to UW FTP, subdirectory "MARC21" (You need permissions to access this file)

Convert Selected MARC Fields to a Tab-Delimited File (there are many right ways to do this)

Get the newest version of MarcEdit
Open the program "Export Tab Delimited"
- Set file paths for binary files going in, and tab-delimited files going out
- Leave default delimiter settings
- Add MARC fields to extract (this can be saved in a .txt file so you only have to manually enter them once for each type of entity)
- University of Washington export settings reflected in the table below (versions for MARC exported from Alma and OCLC Connexion available via GitHub)
Export

Create a Project in OpenRefine

Download the latest version of OpenRefine
- Note: OpenRefine version 3.4.1 runs with Java versions 8-15, but will not run on Java 16 or higher.
Select "choose files", and choose the tab-delimited file(s) you just created in MarcEdit and select "Next"
Name columns: title,subtitle,statedAs,creator,genre,submittedTo,inception,language,pages,degreeSupervisor,hdlURI,OCLCID,ISBN (for easy interoperability with our schema)
- ISBN is included here for other workflows, but is not applicable to our electronic theses and dissertations. Instructions for this data point are italicized and are missing from the JSON history. Thank you to Kyla Jemison from the University of Toronto Libraries for this addition.

MARC21 Field for Export	Column Name in OpenRefine
245 $a	title
245 $b	subtitle
245 $c	statedAs
100	creator
502 $b	genre
502 $c	submittedTo
502 $d	inception
008	language
300 $a	pages
700	degreeSupervisor
856 $u	hdlURI
035 (for Alma)...001 if exporting from Connexion	OCLCID
020 $a	ISBN

Instruct OpenRefine to discard the first row of data
Uncheck the option to 'Use character " to enclose cells containing column separators'
Name your project
"Create project"

Manipulate MARC Data in OpenRefine

Either:
Click "Apply" in the "Undo/Redo" tab
Paste the extracted JSON history from this GitHub repository into the text box
Click "Perform Operations"
Review your data
Or:
Follow the steps in the table below

OpenRefine Cleanup: GUI Instructions	OpenRefine Cleanup: Apply JSON
title and subtitle [Broken code needs to be replaced] Exception words = and, or, the, a, of, in, to, an, for, with, by, at, if, as, from, nor, but, yet, so, through, near, across, about, on, amid Edit cells → Common transforms → Trim leading and trailing whitespace Edit cells → Common transforms → Collapse consecutive whitespace Edit cells → Transform GREL value.chomp(" :") value.chomp(" /")	[ { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "title", "expression": "jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column title using expression jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "subtitle", "expression": "jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column subtitle using expression jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "title", "expression": "value.trim()", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column title using expression value.trim()" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "subtitle", "expression": "value.trim()", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column subtitle using expression value.trim()" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "title", "expression": "value.replace(/\\s+/,' ')", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column title using expression value.replace(/\\s+/,' ')" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "subtitle", "expression": "value.replace(/\\s+/,' ')", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column subtitle using expression value.replace(/\\s+/,' ')" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "title", "expression": "grel:value.chomp(\" :\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column title using expression grel:value.chomp(\" :\")" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "title", "expression": "grel:value.chomp(\" /\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column title using expression grel:value.chomp(\" /\")" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "subtitle", "expression": "grel:value.chomp(\" :\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column subtitle using expression grel:value.chomp(\" :\")" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "subtitle", "expression": "grel:value.chomp(\" /\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column subtitle using expression grel:value.chomp(\" /\")" } ]
Create label column “subtitle” → facet → customized facet → facet by blank (null or empty string) “False” “title” → edit column → add column based on this column New column name: “label” GREL cells.title.value + ": " + cells.subtitle.value “True” “title” → edit column → add column based on this column New column name: “label2” GREL value Remove facet “label” → edit column → join columns Select “label” and “label2” Skip nulls Write result in selected column	[ { "op": "core/column-addition", "engineConfig": { "facets": [ { "type": "list", "name": "subtitle", "expression": "isBlank(value)", "columnName": "subtitle", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": false, "l": "false" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "baseColumnName": "title", "expression": "grel:cells.title.value + \": \" + cells.subtitle.value", "onError": "set-to-blank", "newColumnName": "label", "columnInsertIndex": 1, "description": "Create column label at index 1 based on column title using expression grel:cells.title.value + \": \" + cells.subtitle.value" }, { "op": "core/column-addition", "engineConfig": { "facets": [ { "type": "list", "name": "subtitle", "expression": "isBlank(value)", "columnName": "subtitle", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "baseColumnName": "title", "expression": "grel:value", "onError": "set-to-blank", "newColumnName": "label2", "columnInsertIndex": 1, "description": "Create column label2 at index 1 based on column title using expression grel:value" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "label", "expression": "join ([cells['label'].value,cells['label2'].value],'')", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column label using expression join ([cells['label'].value,cells['label2'].value],'')" }, { "op": "core/column-reorder", "columnNames": [ "title", "label", "subtitle", "statedAs", "creator", "genre", "submittedTo", "inception", "language", "pages", "degreeSupervisor", "hdlURI", "OCLCID" ], "description": "Reorder columns" } ]
statedAs Edit cells → transform GREL value.chomp(".")	[ { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "statedAs", "expression": "grel:value.chomp(\".\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column statedAs using expression grel:value.chomp(\".\")" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "statedAs", "expression": "value.replace(\"by \",\"\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column statedAs using expression value.replace(\"by \",\"\")" } ]
creator Edit cells → transform GREL if(value.contains("http:"), "http://" + value.rpartition("http://")[2], value.rpartition(/\-[a-z]+/)[0].rpartition(/\,author/)[0].rpartition(/\d+/)[0].chomp(",") ) StatedAs → Facet → Customized facet → Facet by blank "False" Creator → Edit cells → Transform GREL if(value.contains(","), (cells.statedAs.value), (value))	[ { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "creator", "expression": "grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,author/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column creator using expression grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,author/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "creator", "expression": "grel:if(value.contains(\",\"),\n(cells.statedAs.value),\n(value))", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column creator using expression grel:if(value.contains(\",\"),\n(cells.statedAs.value),\n(value))" } ]
genre Hover over individual cells and click the blue “edit” button Change values to “master’s thesis”, “doctoral thesis”, or other appropriate term Selecting the option to “change all identical cells” expedites this process Make sure you do not miss any values Facet → custom text facet GREL (replace “doctoral thesis” with your most common value(s) value.contains("doctoral thesis")	[ { "op": "core/mass-edit", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "genre", "expression": "value", "edits": [ { "from": [ "Ph. D." ], "fromBlank": false, "fromError": false, "to": "doctoral thesis" } ], "description": "Mass edit cells in column genre" }, { "op": "core/mass-edit", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "genre", "expression": "value", "edits": [ { "from": [ "M.S." ], "fromBlank": false, "fromError": false, "to": "master's thesis" } ], "description": "Mass edit cells in column genre" } ]
inception Edit cells → transform GREL value.chomp(".")	[ { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "inception", "expression": "grel:value.chomp(\".\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column inception using expression grel:value.chomp(\".\")" } ]
Create Den (description in English) genre → edit column → add column based on this column New column name: “Den” GREL value + " by " + cells.statedAs.value + ", " + cells.inception.value	[ { "op": "core/column-addition", "engineConfig": { "facets": [], "mode": "row-based" }, "baseColumnName": "genre", "expression": "grel:value + \" by \" + cells.statedAs.value + \", \" + cells.inception.value", "onError": "set-to-blank", "newColumnName": "Den", "columnInsertIndex": 6, "description": "Create column Den at index 6 based on column genre using expression grel:value + \" by \" + cells.statedAs.value + \", \" + cells.inception.value" } ]
language Edit cells → transform GREL if(value.contains("eng"), "Q1860", value) Edit cells → Transform GREL if(value.contains("zxx"), null, value)	[ { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "language", "expression": "grel:if(value.contains(\"eng\"),\n\"Q1860\",\nvalue)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column language using expression grel:if(value.contains(\"eng\"),\n\"Q1860\",\nvalue)" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "language", "expression": "grel:if(value.contains(\"zxx\"),\nnull,\nvalue)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column language using expression grel:if(value.contains(\"zxx\"),\nnull,\nvalue)" } ]
pages Edit cells → replace Take out all the words, phrases, and punctuation in your data except for the numbers, commas between numbers, and words representing units (“page” or “leaf” for instance) For each unit type represented in your data: pages → facet → custom text facet value.contains("[word representing unit]") Example: value.contains("pages") “True” Edit column → Add column based on this column Name based on unit Example: “page” page (or new column name) → Edit cells → Transform GREL value.rpartition(" pages")[0].rpartition(/\d+/)[1] pages → Edit column → Add column based on this column New column name "romanNumbered[word representing unit]" Example: “romanNumberedPage” GREL if (value.contains (","), value.rpartition(", ") [0], null ) Remove “pages” and “leaves”, any other excess columns, leaving only one column per pagination type per numbering type (page,romanNumberedPage,leaf,romanNumberedLeaf) Convert roman numerals to arabic numbers Edit cells → transform Clojure (defn ro2ar [r] (->> (reverse (.toUpperCase r)) (map {\M 1000 \D 500 \C 100 \L 50 \X 10 \V 5 \I 1}) (partition-by identity) (map (partial apply +)) (reduce #(if (< %1 %2) (+ %1 %2) (- %1 %2))))) (ro2ar value)	[ { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "pages", "expression": "value.replace(\"1 online resource (\",\"\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column pages using expression value.replace(\"1 online resource (\",\"\")" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "pages", "expression": "value.replace(\") :\",\"\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column pages using expression value.replace(\") :\",\"\")" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "pages", "expression": "value.trim()", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column pages using expression value.trim()" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "pages", "expression": "value.replace(/\\s+/,' ')", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column pages using expression value.replace(/\\s+/,' ')" }, { "op": "core/column-addition", "engineConfig": { "facets": [ { "type": "list", "name": "pages", "expression": "grel:value.contains(\"pages\")", "columnName": "pages", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "baseColumnName": "pages", "expression": "grel:value", "onError": "set-to-blank", "newColumnName": "page", "columnInsertIndex": 11, "description": "Create column page at index 11 based on column pages using expression grel:value" }, { "op": "core/text-transform", "engineConfig": { "facets": [ { "type": "list", "name": "pages", "expression": "grel:value.contains(\"pages\")", "columnName": "pages", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "page", "expression": "grel:value.rpartition(\"pages\")[0].rpartition(/\\d+/)[1]", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column page using expression grel:value.rpartition(\"pages\")[0].rpartition(/\\d+/)[1]" }, { "op": "core/column-addition", "engineConfig": { "facets": [ { "type": "list", "name": "pages", "expression": "grel:value.contains(\"pages\")", "columnName": "pages", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "baseColumnName": "pages", "expression": "grel:if (value.contains (\",\"),\nvalue.rpartition(\", \") [0],\nnull\n)", "onError": "set-to-blank", "newColumnName": "romanNumberedPage", "columnInsertIndex": 11, "description": "Create column romanNumberedPage at index 11 based on column pages using expression grel:if (value.contains (\",\"),\nvalue.rpartition(\", \") [0],\nnull\n)" }, { "op": "core/column-removal", "columnName": "pages", "description": "Remove column pages" }, { "op": "core/text-transform", "engineConfig": { "facets": [ { "type": "list", "name": "pages", "expression": "grel:value.contains(\"pages\")", "columnName": "pages", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "romanNumberedPage", "expression": "clojure:(defn ro2ar [r]\n(->> (reverse (.toUpperCase r))\n(map {\\M 1000 \\D 500 \\C 100 \\L 50 \\X 10 \\V 5 \\I 1})\n(partition-by identity)\n(map (partial apply +))\n(reduce #(if (< %1 %2) (+ %1 %2) (- %1 %2)))))\n(ro2ar value)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column romanNumberedPage using expression clojure:(defn ro2ar [r]\n(->> (reverse (.toUpperCase r))\n(map {\\M 1000 \\D 500 \\C 100 \\L 50 \\X 10 \\V 5 \\I 1})\n(partition-by identity)\n(map (partial apply +))\n(reduce #(if (< %1 %2) (+ %1 %2) (- %1 %2)))))\n(ro2ar value)" } ]
degreeSupervisor Facet → Custom text facet GREL value.contains("degree supervisor") If “false” is an option in facet menu, look at contributors and remove any who are not committee chairs Remove facet Delete any empty columns that result from this process Edit column → split into multiple columns Based on separator ; First column Edit cells → transform GREL if(value.contains("http:"), "http://" + value.rpartition("http://")[2], value.rpartition(/\-[a-z]+/)[0].rpartition(/\,degree supervisor/)[0].rpartition(/\d+/)[0].chomp(",") ) Subsequent columns Facet → customized facet → blank “False” GREL if(value.contains("http:"), "http://" + value.rpartition("http://")[2], value.rpartition(/\-[a-z]+/)[0].rpartition(/\,degree supervisor/)[0].rpartition(/\d+/)[0].chomp(",") ) All degreeSupervisor columns (one at a time): Facet → custom text facet GREL value.contains(",") “True” Edit column → add column based on this column Column name: inverseName GREL value inverseName → edit column → split into several columns by separator “,” degreeSupervisor → edit cells → transform GREL cells["inverseName 2"].value + " " + cells["inverseName 1"].value Remove facet Delete inverseName columns	[ { "op": "core/column-split", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "degreeSupervisor", "guessCellType": true, "removeOriginalColumn": true, "mode": "separator", "separator": ";", "regex": false, "maxColumns": 0, "description": "Split column degreeSupervisor by separator" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "degreeSupervisor 1", "expression": "grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column degreeSupervisor 1 using expression grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)" }, { "op": "core/text-transform", "engineConfig": { "facets": [ { "type": "list", "name": "degreeSupervisor 1", "expression": "isBlank(value)", "columnName": "degreeSupervisor 1", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": false, "l": "false" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "degreeSupervisor 1", "expression": "grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column degreeSupervisor 1 using expression grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)" }, { "op": "core/column-addition", "engineConfig": { "facets": [ { "type": "list", "name": "degreeSupervisor 1", "expression": "grel:value.contains(\",\")", "columnName": "degreeSupervisor 1", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "baseColumnName": "degreeSupervisor 1", "expression": "grel:value", "onError": "set-to-blank", "newColumnName": "inverseName", "columnInsertIndex": 13, "description": "Create column inverseName at index 13 based on column degreeSupervisor 1 using expression grel:value" }, { "op": "core/column-split", "engineConfig": { "facets": [ { "type": "list", "name": "degreeSupervisor 1", "expression": "grel:value.contains(\",\")", "columnName": "degreeSupervisor 1", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "inverseName", "guessCellType": true, "removeOriginalColumn": true, "mode": "separator", "separator": ",", "regex": false, "maxColumns": 0, "description": "Split column inverseName by separator" }, { "op": "core/text-transform", "engineConfig": { "facets": [ { "type": "list", "name": "degreeSupervisor 1", "expression": "grel:value.contains(\",\")", "columnName": "degreeSupervisor 1", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": true, "l": "true" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "degreeSupervisor 1", "expression": "grel:cells[\"inverseName 2\"].value + \" \" + cells[\"inverseName 1\"].value", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column degreeSupervisor 1 using expression grel:cells[\"inverseName 2\"].value + \" \" + cells[\"inverseName 1\"].value" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "degreeSupervisor 1", "expression": "value.trim()", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column degreeSupervisor 1 using expression value.trim()" }, { "op": "core/column-removal", "columnName": "inverseName 1", "description": "Remove column inverseName 1" }, { "op": "core/column-removal", "columnName": "inverseName 2", "description": "Remove column inverseName 2" } ]
hdlURI If several values exist in some cells, split into several columns and remove excess columns Edit column → create new column based on this column Column name hdl GREL value.substring(22)	[ { "op": "core/column-addition", "engineConfig": { "facets": [], "mode": "row-based" }, "baseColumnName": "hdlURI", "expression": "grel:value.substring(22)", "onError": "set-to-blank", "newColumnName": "hdl", "columnInsertIndex": 14, "description": "Create column hdl at index 14 based on column hdlURI using expression grel:value.substring(22)" } ]
OCLCID (applies to Alma exports only; Connexion exports do not need editing) Split multi-valued cells using ; as delimiter On each resulting column: Facet → custom text facet GREL value.contains("(OCoLC)") False Edit cells → replace Regex .* Replace with blank Remove facet Edit column → join columns Select all OCLCID columns Separate each column with ; Write result in new column named OCLCID Delete joined columns Edit cells → replace (OCoLC) Replace with blank Eyeball your values to make sure the column only has one numeric value of about the right length	[ { "op": "core/column-split", "engineConfig": { "facets": [ { "type": "list", "name": "OCLCID 3", "expression": "grel:value.contains(\"(OCoLC)\")", "columnName": "OCLCID 3", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": false, "l": "false" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "OCLCID", "guessCellType": true, "removeOriginalColumn": true, "mode": "separator", "separator": ";", "regex": false, "maxColumns": 0, "description": "Split column OCLCID by separator" }, { "op": "core/text-transform", "engineConfig": { "facets": [ { "type": "list", "name": "OCLCID 3", "expression": "grel:value.contains(\"(OCoLC)\")", "columnName": "OCLCID 3", "invert": false, "omitBlank": false, "omitError": false, "selection": [ { "v": { "v": false, "l": "false" } } ], "selectBlank": false, "selectError": false } ], "mode": "row-based" }, "columnName": "OCLCID 3", "expression": "value.replace(/./,\"\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column OCLCID 3 using expression value.replace(/./,\"\")" }, { "op": "core/column-addition", "engineConfig": { "facets": [], "mode": "row-based" }, "baseColumnName": "OCLCID 1", "expression": "join ([coalesce(cells['OCLCID 1'].value,''),coalesce(cells['OCLCID 2'].value,''),coalesce(cells['OCLCID 3'].value,'')],';')", "onError": "keep-original", "newColumnName": "OCLCID", "columnInsertIndex": 16, "description": "Create column OCLCID at index 16 based on column OCLCID 1 using expression join ([coalesce(cells['OCLCID 1'].value,''),coalesce(cells['OCLCID 2'].value,''),coalesce(cells['OCLCID 3'].value,'')],';')" }, { "op": "core/column-reorder", "columnNames": [ "title", "label", "subtitle", "statedAs", "creator", "genre", "Den", "submittedTo", "inception", "language", "romanNumberedPage", "page", "degreeSupervisor 1", "hdlURI", "hdl", "OCLCID" ], "description": "Reorder columns" }, { "op": "core/text-transform", "engineConfig": { "facets": [], "mode": "row-based" }, "columnName": "OCLCID", "expression": "value.replace(\"(OCoLC)\",\"\")", "onError": "keep-original", "repeat": false, "repeatCount": 10, "description": "Text transform on cells in column OCLCID using expression value.replace(\"(OCoLC)\",\"\")" } ]
label Edit column → add column based on this column Column name: "newItem” GREL value newItem → edit column → move column to beginning	[ { "op": "core/column-addition", "engineConfig": { "facets": [], "mode": "row-based" }, "baseColumnName": "label", "expression": "grel:value", "onError": "set-to-blank", "newColumnName": "newItem", "columnInsertIndex": 2, "description": "Create column newItem at index 2 based on column label using expression grel:value" }, { "op": "core/column-move", "columnName": "newItem", "index": 0, "description": "Move column newItem to position 0" } ]
ISBN Edit cells → split multi-valued cells Delimiter ; Edit column → add column based on this column Column name: “ISBN13” GREL value.match(/.(\d{13})./)[0] Edit column → add column based on this column Column name: “ISBN10” GREL value.match(/.(\d{10})./)[0] For Wikidata schema: ISBN13 and ISBN10 correspond with Wikidata properties P212 and P957

Reconcile in OpenRefine

Reconcile the following columns against Wikidata:

newItem
creator
genre
submittedTo
language
degreeSupervisor

Plug Your Data into a Wikidata Schema

From the Extensions menu, choose "Import Wikidata Schema"

Our current schema is available in this Github repository, and can be copied and pasted into OpenRefine
You can edit this base schema or create your own by selecting "Edit Wikidata schema" from the Extensions menu

Review Schema and Data
Specifically,

Use the available buttons, which closely mirror the Wikidata user interface, to click and drag or type in properties and values to adjust the schema to your dataset
Make sure your schema makes the right statements, without duplicating statements, for items that already exist (reconciled values). Faceting and selectively including/excluding rows using flags/stars is helpful with this
Make sure you have included references where they are needed
Use “Preview” to make sure your statements are correct
Tip: If you’re having trouble dragging a reconciled column into your schema, go back and make sure that you have selected “create new item” or reconciled the values in each row
- If you still have trouble, make sure you have the latest version of OpenRefine

From the Extensions menu, choose “Upload edits to Wikidata” (or, if you prefer, “Export to Quickstatements” and use Quickstatements to upload your edits. This is especially handy if you need to submit your edits for bot review prior to uploading to Wikidata)
Review new items and statements loaded to Wikidata

Useful Resources

Companion GitHub Repository
OpenRefine User Manual
GREL Functions
Jython and Clojure
RegEx Cheat Sheet
University of Washington Libraries Application Profile for Electronic Theses and Dissertations
University of Washington Libraries Github Repository for this Workflow