Wikidata:WikiProject PCC Wikidata Pilot/University of Washington/Workflow, Trainings, and Resources/MARC21 to Wikidata Workflow
University of Washington Libraries Linked Data Team
Crystal Clements (cec23@uw.edu)
Contributions by Thad Guidry
Last revised February 22, 2022
Workflow
editExport MARC Records (there are many right ways to do this)
edit- As a batch through OCLC Connexion
- As a published set through Alma
- You need the appropriate permissions in Alma to manage and run jobs on sets
- Create a set using Admin --> manage sets
- Run a job on the set to export the bibliographic records
- University of Washington
- Select job "Export Bibliographic Records"
- Save to UW FTP, subdirectory "MARC21" (You need permissions to access this file)
Convert Selected MARC Fields to a Tab-Delimited File (there are many right ways to do this)
edit- Get the newest version of MarcEdit
- Open the program "Export Tab Delimited"
- Set file paths for binary files going in, and tab-delimited files going out
- Leave default delimiter settings
- Add MARC fields to extract (this can be saved in a .txt file so you only have to manually enter them once for each type of entity)
- University of Washington export settings reflected in the table below (versions for MARC exported from Alma and OCLC Connexion available via GitHub)
- Export
Create a Project in OpenRefine
edit- Download the latest version of OpenRefine
- Note: OpenRefine version 3.4.1 runs with Java versions 8-15, but will not run on Java 16 or higher.
- Select "choose files", and choose the tab-delimited file(s) you just created in MarcEdit and select "Next"
- Name columns: title,subtitle,statedAs,creator,genre,submittedTo,inception,language,pages,degreeSupervisor,hdlURI,OCLCID,ISBN (for easy interoperability with our schema)
- ISBN is included here for other workflows, but is not applicable to our electronic theses and dissertations. Instructions for this data point are italicized and are missing from the JSON history. Thank you to Kyla Jemison from the University of Toronto Libraries for this addition.
MARC21 Field for Export | Column Name in OpenRefine |
---|---|
245 $a | title |
245 $b | subtitle |
245 $c | statedAs |
100 | creator |
502 $b | genre |
502 $c | submittedTo |
502 $d | inception |
008 | language |
300 $a | pages |
700 | degreeSupervisor |
856 $u | hdlURI |
035 (for Alma)...001 if exporting from Connexion | OCLCID |
020 $a | ISBN |
- Instruct OpenRefine to discard the first row of data
- Uncheck the option to 'Use character " to enclose cells containing column separators'
- Name your project
- "Create project"
Manipulate MARC Data in OpenRefine
editEither:
Click "Apply" in the "Undo/Redo" tab
Paste the extracted JSON history from this GitHub repository into the text box
Click "Perform Operations"
Review your data
Or:
Follow the steps in the table below
OpenRefine Cleanup: GUI Instructions | OpenRefine Cleanup: Apply JSON |
---|---|
title and subtitle
|
[
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "title",
"expression": "jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column title using expression jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "subtitle",
"expression": "jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column subtitle using expression jython:import re\nexceptions = [\"and\", \"or\", \"the\", \"a\", \"of\", \"in\", \"to\", \"an\", \"for\", \"with\", \"by\", \"at\", \"if\", \"as\", \"from\", \"nor\", \"but\", \"yet\", \"so\", \"through\", \"near\", \"across\", \"about\", \"on\", \"amid\"]\ntitle = value\nlowercase_words = re.split(\" \", title.lower())\nfinal_words = [lowercase_words[0].capitalize()]\nfinal_words += [word if word in exceptions else word.capitalize() for word in lowercase_words[1:]]\nfinal_title = \" \".join(final_words)\nreturn (final_title)"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "title",
"expression": "value.trim()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column title using expression value.trim()"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "subtitle",
"expression": "value.trim()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column subtitle using expression value.trim()"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "title",
"expression": "value.replace(/\\s+/,' ')",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column title using expression value.replace(/\\s+/,' ')"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "subtitle",
"expression": "value.replace(/\\s+/,' ')",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column subtitle using expression value.replace(/\\s+/,' ')"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "title",
"expression": "grel:value.chomp(\" :\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column title using expression grel:value.chomp(\" :\")"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "title",
"expression": "grel:value.chomp(\" /\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column title using expression grel:value.chomp(\" /\")"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "subtitle",
"expression": "grel:value.chomp(\" :\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column subtitle using expression grel:value.chomp(\" :\")"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "subtitle",
"expression": "grel:value.chomp(\" /\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column subtitle using expression grel:value.chomp(\" /\")"
}
]
|
Create label column
|
[
{
"op": "core/column-addition",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "subtitle",
"expression": "isBlank(value)",
"columnName": "subtitle",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"baseColumnName": "title",
"expression": "grel:cells.title.value + \": \" + cells.subtitle.value",
"onError": "set-to-blank",
"newColumnName": "label",
"columnInsertIndex": 1,
"description": "Create column label at index 1 based on column title using expression grel:cells.title.value + \": \" + cells.subtitle.value"
},
{
"op": "core/column-addition",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "subtitle",
"expression": "isBlank(value)",
"columnName": "subtitle",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"baseColumnName": "title",
"expression": "grel:value",
"onError": "set-to-blank",
"newColumnName": "label2",
"columnInsertIndex": 1,
"description": "Create column label2 at index 1 based on column title using expression grel:value"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "label",
"expression": "join ([cells['label'].value,cells['label2'].value],'')",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column label using expression join ([cells['label'].value,cells['label2'].value],'')"
},
{
"op": "core/column-reorder",
"columnNames": [
"title",
"label",
"subtitle",
"statedAs",
"creator",
"genre",
"submittedTo",
"inception",
"language",
"pages",
"degreeSupervisor",
"hdlURI",
"OCLCID"
],
"description": "Reorder columns"
}
]
|
statedAs
|
[
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "statedAs",
"expression": "grel:value.chomp(\".\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column statedAs using expression grel:value.chomp(\".\")"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "statedAs",
"expression": "value.replace(\"by \",\"\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column statedAs using expression value.replace(\"by \",\"\")"
}
]
|
creator
"http://" + value.rpartition("http://")[2], value.rpartition(/\-[a-z]+/)[0].rpartition(/\,author/)[0].rpartition(/\d+/)[0].chomp(",") )
(cells.statedAs.value), (value)) |
[
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "creator",
"expression": "grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,author/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column creator using expression grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,author/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "creator",
"expression": "grel:if(value.contains(\",\"),\n(cells.statedAs.value),\n(value))",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column creator using expression grel:if(value.contains(\",\"),\n(cells.statedAs.value),\n(value))"
}
]
|
genre
|
[
{
"op": "core/mass-edit",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "genre",
"expression": "value",
"edits": [
{
"from": [
"Ph. D."
],
"fromBlank": false,
"fromError": false,
"to": "doctoral thesis"
}
],
"description": "Mass edit cells in column genre"
},
{
"op": "core/mass-edit",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "genre",
"expression": "value",
"edits": [
{
"from": [
"M.S."
],
"fromBlank": false,
"fromError": false,
"to": "master's thesis"
}
],
"description": "Mass edit cells in column genre"
}
]
|
inception
|
[
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "inception",
"expression": "grel:value.chomp(\".\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column inception using expression grel:value.chomp(\".\")"
}
]
|
Create Den (description in English)
|
[
{
"op": "core/column-addition",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"baseColumnName": "genre",
"expression": "grel:value + \" by \" + cells.statedAs.value + \", \" + cells.inception.value",
"onError": "set-to-blank",
"newColumnName": "Den",
"columnInsertIndex": 6,
"description": "Create column Den at index 6 based on column genre using expression grel:value + \" by \" + cells.statedAs.value + \", \" + cells.inception.value"
}
]
|
language
"Q1860", value)
null, value) |
[
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "language",
"expression": "grel:if(value.contains(\"eng\"),\n\"Q1860\",\nvalue)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column language using expression grel:if(value.contains(\"eng\"),\n\"Q1860\",\nvalue)"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "language",
"expression": "grel:if(value.contains(\"zxx\"),\nnull,\nvalue)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column language using expression grel:if(value.contains(\"zxx\"),\nnull,\nvalue)"
}
]
|
pages
value.rpartition(", ") [0], null )
(->> (reverse (.toUpperCase r)) (map {\M 1000 \D 500 \C 100 \L 50 \X 10 \V 5 \I 1}) (partition-by identity) (map (partial apply +)) (reduce #(if (< %1 %2) (+ %1 %2) (- %1 %2))))) (ro2ar value) |
[
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "pages",
"expression": "value.replace(\"1 online resource (\",\"\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column pages using expression value.replace(\"1 online resource (\",\"\")"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "pages",
"expression": "value.replace(\") :\",\"\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column pages using expression value.replace(\") :\",\"\")"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "pages",
"expression": "value.trim()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column pages using expression value.trim()"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "pages",
"expression": "value.replace(/\\s+/,' ')",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column pages using expression value.replace(/\\s+/,' ')"
},
{
"op": "core/column-addition",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "pages",
"expression": "grel:value.contains(\"pages\")",
"columnName": "pages",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"baseColumnName": "pages",
"expression": "grel:value",
"onError": "set-to-blank",
"newColumnName": "page",
"columnInsertIndex": 11,
"description": "Create column page at index 11 based on column pages using expression grel:value"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "pages",
"expression": "grel:value.contains(\"pages\")",
"columnName": "pages",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "page",
"expression": "grel:value.rpartition(\"pages\")[0].rpartition(/\\d+/)[1]",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column page using expression grel:value.rpartition(\"pages\")[0].rpartition(/\\d+/)[1]"
},
{
"op": "core/column-addition",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "pages",
"expression": "grel:value.contains(\"pages\")",
"columnName": "pages",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"baseColumnName": "pages",
"expression": "grel:if (value.contains (\",\"),\nvalue.rpartition(\", \") [0],\nnull\n)",
"onError": "set-to-blank",
"newColumnName": "romanNumberedPage",
"columnInsertIndex": 11,
"description": "Create column romanNumberedPage at index 11 based on column pages using expression grel:if (value.contains (\",\"),\nvalue.rpartition(\", \") [0],\nnull\n)"
},
{
"op": "core/column-removal",
"columnName": "pages",
"description": "Remove column pages"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "pages",
"expression": "grel:value.contains(\"pages\")",
"columnName": "pages",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "romanNumberedPage",
"expression": "clojure:(defn ro2ar [r]\n(->> (reverse (.toUpperCase r))\n(map {\\M 1000 \\D 500 \\C 100 \\L 50 \\X 10 \\V 5 \\I 1})\n(partition-by identity)\n(map (partial apply +))\n(reduce #(if (< %1 %2) (+ %1 %2) (- %1 %2)))))\n(ro2ar value)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column romanNumberedPage using expression clojure:(defn ro2ar [r]\n(->> (reverse (.toUpperCase r))\n(map {\\M 1000 \\D 500 \\C 100 \\L 50 \\X 10 \\V 5 \\I 1})\n(partition-by identity)\n(map (partial apply +))\n(reduce #(if (< %1 %2) (+ %1 %2) (- %1 %2)))))\n(ro2ar value)"
}
]
|
degreeSupervisor
"http://" + value.rpartition("http://")[2], value.rpartition(/\-[a-z]+/)[0].rpartition(/\,degree supervisor/)[0].rpartition(/\d+/)[0].chomp(",") )
"http://" + value.rpartition("http://")[2], value.rpartition(/\-[a-z]+/)[0].rpartition(/\,degree supervisor/)[0].rpartition(/\d+/)[0].chomp(",") )
|
[
{
"op": "core/column-split",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "degreeSupervisor",
"guessCellType": true,
"removeOriginalColumn": true,
"mode": "separator",
"separator": ";",
"regex": false,
"maxColumns": 0,
"description": "Split column degreeSupervisor by separator"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "degreeSupervisor 1",
"expression": "grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column degreeSupervisor 1 using expression grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "degreeSupervisor 1",
"expression": "isBlank(value)",
"columnName": "degreeSupervisor 1",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "degreeSupervisor 1",
"expression": "grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column degreeSupervisor 1 using expression grel:if(value.contains(\"http:\"),\n\"http://\" + value.rpartition(\"http://\")[2],\nvalue.rpartition(/\\-[a-z]+/)[0].rpartition(/\\,degree supervisor/)[0].rpartition(/\\d+/)[0].chomp(\",\")\n)"
},
{
"op": "core/column-addition",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "degreeSupervisor 1",
"expression": "grel:value.contains(\",\")",
"columnName": "degreeSupervisor 1",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"baseColumnName": "degreeSupervisor 1",
"expression": "grel:value",
"onError": "set-to-blank",
"newColumnName": "inverseName",
"columnInsertIndex": 13,
"description": "Create column inverseName at index 13 based on column degreeSupervisor 1 using expression grel:value"
},
{
"op": "core/column-split",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "degreeSupervisor 1",
"expression": "grel:value.contains(\",\")",
"columnName": "degreeSupervisor 1",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "inverseName",
"guessCellType": true,
"removeOriginalColumn": true,
"mode": "separator",
"separator": ",",
"regex": false,
"maxColumns": 0,
"description": "Split column inverseName by separator"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "degreeSupervisor 1",
"expression": "grel:value.contains(\",\")",
"columnName": "degreeSupervisor 1",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "degreeSupervisor 1",
"expression": "grel:cells[\"inverseName 2\"].value + \" \" + cells[\"inverseName 1\"].value",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column degreeSupervisor 1 using expression grel:cells[\"inverseName 2\"].value + \" \" + cells[\"inverseName 1\"].value"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "degreeSupervisor 1",
"expression": "value.trim()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column degreeSupervisor 1 using expression value.trim()"
},
{
"op": "core/column-removal",
"columnName": "inverseName 1",
"description": "Remove column inverseName 1"
},
{
"op": "core/column-removal",
"columnName": "inverseName 2",
"description": "Remove column inverseName 2"
}
]
|
hdlURI
|
[
{
"op": "core/column-addition",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"baseColumnName": "hdlURI",
"expression": "grel:value.substring(22)",
"onError": "set-to-blank",
"newColumnName": "hdl",
"columnInsertIndex": 14,
"description": "Create column hdl at index 14 based on column hdlURI using expression grel:value.substring(22)"
}
]
|
OCLCID (applies to Alma exports only; Connexion exports do not need editing)
|
[
{
"op": "core/column-split",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "OCLCID 3",
"expression": "grel:value.contains(\"(OCoLC)\")",
"columnName": "OCLCID 3",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "OCLCID",
"guessCellType": true,
"removeOriginalColumn": true,
"mode": "separator",
"separator": ";",
"regex": false,
"maxColumns": 0,
"description": "Split column OCLCID by separator"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [
{
"type": "list",
"name": "OCLCID 3",
"expression": "grel:value.contains(\"(OCoLC)\")",
"columnName": "OCLCID 3",
"invert": false,
"omitBlank": false,
"omitError": false,
"selection": [
{
"v": {
"v": false,
"l": "false"
}
}
],
"selectBlank": false,
"selectError": false
}
],
"mode": "row-based"
},
"columnName": "OCLCID 3",
"expression": "value.replace(/.*/,\"\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column OCLCID 3 using expression value.replace(/.*/,\"\")"
},
{
"op": "core/column-addition",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"baseColumnName": "OCLCID 1",
"expression": "join ([coalesce(cells['OCLCID 1'].value,''),coalesce(cells['OCLCID 2'].value,''),coalesce(cells['OCLCID 3'].value,'')],';')",
"onError": "keep-original",
"newColumnName": "OCLCID",
"columnInsertIndex": 16,
"description": "Create column OCLCID at index 16 based on column OCLCID 1 using expression join ([coalesce(cells['OCLCID 1'].value,''),coalesce(cells['OCLCID 2'].value,''),coalesce(cells['OCLCID 3'].value,'')],';')"
},
{
"op": "core/column-reorder",
"columnNames": [
"title",
"label",
"subtitle",
"statedAs",
"creator",
"genre",
"Den",
"submittedTo",
"inception",
"language",
"romanNumberedPage",
"page",
"degreeSupervisor 1",
"hdlURI",
"hdl",
"OCLCID"
],
"description": "Reorder columns"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "OCLCID",
"expression": "value.replace(\"(OCoLC)\",\"\")",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column OCLCID using expression value.replace(\"(OCoLC)\",\"\")"
}
]
|
label
|
[
{
"op": "core/column-addition",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"baseColumnName": "label",
"expression": "grel:value",
"onError": "set-to-blank",
"newColumnName": "newItem",
"columnInsertIndex": 2,
"description": "Create column newItem at index 2 based on column label using expression grel:value"
},
{
"op": "core/column-move",
"columnName": "newItem",
"index": 0,
"description": "Move column newItem to position 0"
}
]
|
ISBN
For Wikidata schema: ISBN13 and ISBN10 correspond with Wikidata properties P212 and P957 |
Reconcile in OpenRefine
editReconcile the following columns against Wikidata:
edit- newItem
- creator
- genre
- submittedTo
- language
- degreeSupervisor
Plug Your Data into a Wikidata Schema
edit From the Extensions menu, choose "Import Wikidata Schema"
- Our current schema is available in this Github repository, and can be copied and pasted into OpenRefine
- You can edit this base schema or create your own by selecting "Edit Wikidata schema" from the Extensions menu
Review Schema and Data
Specifically,
- Use the available buttons, which closely mirror the Wikidata user interface, to click and drag or type in properties and values to adjust the schema to your dataset
- Make sure your schema makes the right statements, without duplicating statements, for items that already exist (reconciled values). Faceting and selectively including/excluding rows using flags/stars is helpful with this
- Make sure you have included references where they are needed
- Use “Preview” to make sure your statements are correct
- Tip: If you’re having trouble dragging a reconciled column into your schema, go back and make sure that you have selected “create new item” or reconciled the values in each row
- If you still have trouble, make sure you have the latest version of OpenRefine
From the Extensions menu, choose “Upload edits to Wikidata” (or, if you prefer, “Export to Quickstatements” and use Quickstatements to upload your edits. This is especially handy if you need to submit your edits for bot review prior to uploading to Wikidata)
Review new items and statements loaded to Wikidata
Useful Resources
editCompanion GitHub Repository
OpenRefine User Manual
GREL Functions
Jython and Clojure
RegEx Cheat Sheet
University of Washington Libraries Application Profile for Electronic Theses and Dissertations
University of Washington Libraries Github Repository for this Workflow