Wikidata:WikiProject PCC Wikidata Pilot/Fondren Library, Rice University
Aim and Scope
editProject aims to expand the authority and subject work of a Rice Historic Images collection. Goals are to enrich both library and Wikidata via reconciliation efforts, then publish on Wikidata. The initial scope of this project is to focus on subjects within this collection; though the scope may be expanded to other collections/departments in the future (music collections or current faculty).
Background
editRice has a collection of university- and Houston-related historic images. Part of the effort to digitize these holdings included the creation of a local thesaurus. The thesaurus could be expanded and made available for researchers online.
Timeline and Workflow
editA rough timeline as follows; a fair amount of time will be researching data models:
1. Survey and expand Rice University entry - add schools/departments as needed
2. Survey and expand Houston area entities that may be relevant to the collection
3. Reconcile the local thesaurus against various sources available - ensure entities meet notability guidelines
4. Publish new entities/expand existing entities
5. Bring Wikidata URIs in as tags in the repository.
Contributors
edit- Archiverlandson (talk • contribs • logs)
Tasks
edit1. Complete - Survey and expand Rice University entry - add schools/departments as needed
- On a separate project I was able to obtain a listing of Schools - Departments - and Current Faculty. This made expansion and relation of schools/departments a straightforward task as I had a flat file showing the relationships. A few edits were implemented directly in Wikidata, but OpenRefine includes tools to batch edit. It works by creating an ontology which allows you to dictate how values in columns translate to Wikidata statements. There are some tidying and inconsistencies to take care of as of the beginning of September, but the entities exist and can be linked to and fro. Inconsistencies stemmed from a couple areas and are learning opportunities:
- 1. A mix of approaches to the work. I wanted to first get a feel for manual editing of entities in Wikidata and then move to OpenRefine so I knew what actions in OR related to WD. The work done on OR is likely a bit more consistent because it will apply statements for each row in a given column. It's a great tool providing your data is set up properly (though that can be time consuming.)
- 2. Developing an ontology/data model. I examined a few comparable entities on WD to get an idea for the properties I'd be working with. Not knowing what properties existed on WD was the first hurdle; then implementing them on everything I worked on was another. Keeping a tab open to a similar entity helped, but drawing from multiple examples meant more potential human failure points and/or conflicting ways of implementing similar statements (see has subsidiary (P355) vs. part of (P361) note under University heading below). Working to develop a model prior to making batch edits going forward.
2. Survey and expand Houston area entities that may be relevant to the collection
3. Reconcile the local thesaurus against various sources available - ensure entities meet notability guidelines
4. Publish new entities/expand existing entities
Example Tables:
editInput
editThesaurus/Source/"Thesourceus" Example
editHeading | Heading Notes | Source | LC Authority ID #s | Heading Type | Notes from UNIT |
---|---|---|---|---|---|
"Ricobot I" (Robot) | LOCAL | none | Name, Corporate | ||
Academic costume | e.g. caps & gowns. Can be subdivided geographically | LCSH | sh 85000298 | Topic | |
Adair, Linda--Faculty | LOCAL | none | Name, Personal |
The focus will need to be on headings. Strings will need some initial transformations; rearranging names into a "First Last" standard, removing subdivisions, etc. Heading Notes for local entries are largely unused. Source and Heading Types will be used to identify the priority work (Local and Name, Personal respectively.) Entries that have an LC ID should still be examined to see if their entries are in Wikidata, if so, that the statements are close to the data models below. Notes from UNIT are internal regarding the development of the thesaurus and can mostly be ignored.
Wikidata Entry
editGeneral Rules
editFollow Wikidata suggestions for qualifying statements where possible.
Provide a reference URL for most statements. Exceptions may be made if you're applying a classification statements like instance of (P31).
Provide retrieved (P813) dates when referencing URLs.
References may be skipped in statements of fact.
If editing or creating an item related to this project utilize on focus list of Wikimedia project (P5008) along with WikiProject PCC Wikidata Pilot/Fondren Library, Rice University (Q100152473).
University
editRice University had an entry that was fairly complete. Expanded the schools. Relate university to schools utilizing has subsidiary (P355) though other similar entries rely on part of (P361) for the relationship; could be changed. Some centers had already been entered here; moved them under the appropriate school.
Schools
editSchools relate to university via parent organization (P749).
Schools relate to departments via has part(s) (P527).
Property | Value | Requirements | Notes |
---|---|---|---|
Label | String - Entity's name | Required | Label based off the official about page. |
Description | String - Describe the entity as related to the school. | Optional | |
Alias | String - Known variations of the Label | Optional | Generally acronyms that were also available on the about page (rare). |
instance of (P31) | faculty (Q180958) and organization (Q43229)* | Required | *organization (Q43229) can be replaced with a more specific school code, so long as its class can be traced back to organization (Q43229) |
inception (P571) | String - Year | Optional | |
named after (P138) | Q entity | Optional | Wikipedia has been helpful in identifying appropriate names/Q numbers |
country (P17) | United States of America (Q30) | Required | |
located in the administrative territorial entity (P131) | Houston (Q16555) | Required | |
coordinate location (P625) | Coordinates in Degrees, Minutes, Seconds | Optional | |
parent organization (P749) | Rice University (Q842909) | Required | |
affiliation (P1416) | Q entities | Optional | Multiple values allowed |
official website (P856) | String - URL | Required | Qualify with English |
has part(s) (P527) | Q entities | Required | Multiple values allowed; list all applicable departments |
Departments
editDepartments relate to schools via part of (P361).
Property | Value | Requirements | Notes |
---|---|---|---|
Label | String - Entity's name | Required | Naming convention "Rice University _____ Department" |
Description | String - Describe the entity as related to the department. | Optional | |
Alias | String - Known variations of the Label | Optional | Generally acronyms that were also available on the about page (rare). |
instance of (P31) | academic department (Q2467461) | Required | |
inception (P571) | String - Year | Optional | |
country (P17) | United States of America (Q30) | Required | |
located in the administrative territorial entity (P131) | Houston (Q16555) | Required | |
field of work (P101) | Q entities | Required | Choose the most specific Q entity available. Single/Multiple values based on degrees granted? |
part of (P361) | Q entity | Required | The Q ID of the parent school |
official website (P856) | String - URL | Required | Qualify with English |
Faculty
editEntries under Faculty should meet notability guidelines, reviewable at Wikidata:Notability.
Property | Value | Requirements | Notes |
---|---|---|---|
Label | String - Entity's name | Required | Names should be in standard format i.e. "First Last". Align with Wikipedia where possible. |
Description | String - Describe the entity as related to department. | Optional | Descriptions are generally all lowercase and omit punctuation. |
Alias | String - Known variations of the Label | Optional | Middle names/initials; do not rearrange name format i.e. "Last, First". |
instance of (P31) | A Q entity, usually human (Q5) | Required | |
occupation (P106) | Q entities, usually university teacher (Q1622272) | Required | Multiple values allowed, suggestions: professor (Q121594), researcher (Q1650915), dean (Q723682), director (Q1162163) and/or those based on field of work |
employer (P108) | Rice University (Q842909) | Required | Qualify with start/end dates if available. |
work location (P937) | Houston (Q16555) | Required | |
field of work (P101) | Q entities, academic discipline(s) | Required | Multiple values allowed. Base this off of choices made for occupation (P106) |
part of (P361) | Q entities, academic department(s) | Required | Multiple values allowed; overview of each department can be reviewed under each school |
official website (P856) | String - URL (Usually Rice if they're still employed, may be elsewhere) | Optional | |
Library of Congress authority ID (P244)* | String - LoC name authority ID | Required | |
VIAF ID (P214)* | String - VIAF ID | Optional | |
on focus list of Wikimedia project (P5008) | WikiProject PCC Wikidata Pilot/Fondren Library, Rice University (Q100152473) | Required | Does not add notability to a given entity, but will help us track/report outcomes. |
*Other identifiers should be added if they're available. ORCID may be a good fit for current faculty, but will be sparse for the HI Thesaurus.