Wikidata:WikiProject PCC Wikidata Pilot/Fondren Library, Rice University

Aim and Scope

edit

Project aims to expand the authority and subject work of a Rice Historic Images collection. Goals are to enrich both library and Wikidata via reconciliation efforts, then publish on Wikidata. The initial scope of this project is to focus on subjects within this collection; though the scope may be expanded to other collections/departments in the future (music collections or current faculty).

Background

edit

Rice has a collection of university- and Houston-related historic images. Part of the effort to digitize these holdings included the creation of a local thesaurus. The thesaurus could be expanded and made available for researchers online.

Timeline and Workflow

edit

A rough timeline as follows; a fair amount of time will be researching data models:
1. Survey and expand Rice University entry - add schools/departments as needed
2. Survey and expand Houston area entities that may be relevant to the collection
3. Reconcile the local thesaurus against various sources available - ensure entities meet notability guidelines
4. Publish new entities/expand existing entities
5. Bring Wikidata URIs in as tags in the repository.

Contributors

edit

Tasks

edit

1. Complete - Survey and expand Rice University entry - add schools/departments as needed

On a separate project I was able to obtain a listing of Schools - Departments - and Current Faculty. This made expansion and relation of schools/departments a straightforward task as I had a flat file showing the relationships. A few edits were implemented directly in Wikidata, but OpenRefine includes tools to batch edit. It works by creating an ontology which allows you to dictate how values in columns translate to Wikidata statements. There are some tidying and inconsistencies to take care of as of the beginning of September, but the entities exist and can be linked to and fro. Inconsistencies stemmed from a couple areas and are learning opportunities:
1. A mix of approaches to the work. I wanted to first get a feel for manual editing of entities in Wikidata and then move to OpenRefine so I knew what actions in OR related to WD. The work done on OR is likely a bit more consistent because it will apply statements for each row in a given column. It's a great tool providing your data is set up properly (though that can be time consuming.)
2. Developing an ontology/data model. I examined a few comparable entities on WD to get an idea for the properties I'd be working with. Not knowing what properties existed on WD was the first hurdle; then implementing them on everything I worked on was another. Keeping a tab open to a similar entity helped, but drawing from multiple examples meant more potential human failure points and/or conflicting ways of implementing similar statements (see has subsidiary (P355) vs. part of (P361) note under University heading below). Working to develop a model prior to making batch edits going forward.

2. Survey and expand Houston area entities that may be relevant to the collection
3. Reconcile the local thesaurus against various sources available - ensure entities meet notability guidelines
4. Publish new entities/expand existing entities

Example Tables:

edit

Input

edit

Thesaurus/Source/"Thesourceus" Example

edit
Heading Heading Notes Source LC Authority ID #s Heading Type Notes from UNIT
"Ricobot I" (Robot) LOCAL none Name, Corporate
Academic costume e.g. caps & gowns. Can be subdivided geographically LCSH sh 85000298 Topic
Adair, Linda--Faculty LOCAL none Name, Personal

The focus will need to be on headings. Strings will need some initial transformations; rearranging names into a "First Last" standard, removing subdivisions, etc. Heading Notes for local entries are largely unused. Source and Heading Types will be used to identify the priority work (Local and Name, Personal respectively.) Entries that have an LC ID should still be examined to see if their entries are in Wikidata, if so, that the statements are close to the data models below. Notes from UNIT are internal regarding the development of the thesaurus and can mostly be ignored.

Wikidata Entry

edit

General Rules

edit

Follow Wikidata suggestions for qualifying statements where possible.
Provide a reference URL for most statements. Exceptions may be made if you're applying a classification statements like instance of (P31).
Provide retrieved (P813) dates when referencing URLs.
References may be skipped in statements of fact.
If editing or creating an item related to this project utilize on focus list of Wikimedia project (P5008) along with WikiProject PCC Wikidata Pilot/Fondren Library, Rice University (Q100152473).

University

edit

Rice University had an entry that was fairly complete. Expanded the schools. Relate university to schools utilizing has subsidiary (P355) though other similar entries rely on part of (P361) for the relationship; could be changed. Some centers had already been entered here; moved them under the appropriate school.

Schools

edit

Schools relate to university via parent organization (P749).
Schools relate to departments via has part(s) (P527).

Property Value Requirements Notes
Label String - Entity's name Required Label based off the official about page.
Description String - Describe the entity as related to the school. Optional
Alias String - Known variations of the Label Optional Generally acronyms that were also available on the about page (rare).
instance of (P31) faculty (Q180958) and organization (Q43229)* Required *organization (Q43229) can be replaced with a more specific school code, so long as its class can be traced back to organization (Q43229)
inception (P571) String - Year Optional
named after (P138) Q entity Optional Wikipedia has been helpful in identifying appropriate names/Q numbers
country (P17) United States of America (Q30) Required
located in the administrative territorial entity (P131) Houston (Q16555) Required
coordinate location (P625) Coordinates in Degrees, Minutes, Seconds Optional
parent organization (P749) Rice University (Q842909) Required
affiliation (P1416) Q entities Optional Multiple values allowed
official website (P856) String - URL Required Qualify with English
has part(s) (P527) Q entities Required Multiple values allowed; list all applicable departments

Departments

edit

Departments relate to schools via part of (P361).

Property Value Requirements Notes
Label String - Entity's name Required Naming convention "Rice University _____ Department"
Description String - Describe the entity as related to the department. Optional
Alias String - Known variations of the Label Optional Generally acronyms that were also available on the about page (rare).
instance of (P31) academic department (Q2467461) Required
inception (P571) String - Year Optional
country (P17) United States of America (Q30) Required
located in the administrative territorial entity (P131) Houston (Q16555) Required
field of work (P101) Q entities Required Choose the most specific Q entity available. Single/Multiple values based on degrees granted?
part of (P361) Q entity Required The Q ID of the parent school
official website (P856) String - URL Required Qualify with English

Faculty

edit

Entries under Faculty should meet notability guidelines, reviewable at Wikidata:Notability.

Property Value Requirements Notes
Label String - Entity's name Required Names should be in standard format i.e. "First Last". Align with Wikipedia where possible.
Description String - Describe the entity as related to department. Optional Descriptions are generally all lowercase and omit punctuation.
Alias String - Known variations of the Label Optional Middle names/initials; do not rearrange name format i.e. "Last, First".
instance of (P31) A Q entity, usually human (Q5) Required
occupation (P106) Q entities, usually university teacher (Q1622272) Required Multiple values allowed, suggestions: professor (Q121594), researcher (Q1650915), dean (Q723682), director (Q1162163) and/or those based on field of work
employer (P108) Rice University (Q842909) Required Qualify with start/end dates if available.
work location (P937) Houston (Q16555) Required
field of work (P101) Q entities, academic discipline(s) Required Multiple values allowed. Base this off of choices made for occupation (P106)
part of (P361) Q entities, academic department(s) Required Multiple values allowed; overview of each department can be reviewed under each school
official website (P856) String - URL (Usually Rice if they're still employed, may be elsewhere) Optional
Library of Congress authority ID (P244)* String - LoC name authority ID Required
VIAF ID (P214)* String - VIAF ID Optional
on focus list of Wikimedia project (P5008) WikiProject PCC Wikidata Pilot/Fondren Library, Rice University (Q100152473) Required Does not add notability to a given entity, but will help us track/report outcomes.

*Other identifiers should be added if they're available. ORCID may be a good fit for current faculty, but will be sparse for the HI Thesaurus.