Wikidata:WikiProject PCC Wikidata Pilot/Smithsonian Libraries/Projects/Smithsonian Research Online
Aim and Scope
editThis project aims to use our knowledge of Smithsonian researchers to explore how they are represented in Wikidata. The information source will mainly be the Smithsonian Libraries research output tracking program Smithsonian Research Online and the VIVO instance branded SI Profiles. We want to clearly identify those individuals and their related Smithsonian organization. It would be fabulous if we could find a way to connect citations of scholarly products credited to the individual.
Background
editSince 2008, the Smithsonian Libraries has been tracking the research output of the Smithsonian. More recently, a VIVO instance was launched. See SI Research Online and Smithsonian Profiles
Timeline
edit- 22 November2020: complete initial draft project plan
- Fall/Winter 2020/2021: determine priorities, finalize core fields
- Spring 2021: determine workflow, tools, and test reconciliation and bulk loads
- Summer 2021:complete data investigation for missing values
Contributors
edit- Suzanne Pilsk Pilsks - Smithsonian Libraries and Archives Metadata Department
- Richard Naples Drastician - Smithsonian Libraries and Archives Metadata Department
- Amy Watson WatsonAmy - Smithsonian Libraries and Archives Resource Description Department
- Deborah Shapiro BornOnTheCob - Archives, Smithsonian Libraries and Archives
Workflow
editAuthors
- Identify authors in SRO that are already in Wikidata
- Identify authors in SRO that are worthy of Wikidata but not represented (use of SI Profiles Project)
- Determine what we deem as core properties are missing in wikidata for authors
Publications
- Relate existing publications in Wikidata to known Smithsonian authors
- Contribute publication data and associate with Smithsonian authors
Organizations
- Determine what Smithsonian organizations are already in Wikidata
- Determine what Smithsonian organizations are missing from Wikidata that will be critical for SRO
- Determine what we deem as core properties are missing in wikidata for SI organizations
Tasks
editOrgs
edit- Export Profiles Orgs.
- Compare to what is in Wikidata as "part of" and "parent organization" = Smithsonian.
- Look at and reconcile those that are not a one-to-one match.
- Edit record so all appropriate orgs get both "part of" and "parent organization"
- Set up Google Sheet with all appropriate orgs and matching data model properties.
- Add missing orgs
- Create a google/excel sheet with Q items with P# as headers, and fill in this data. Label=Len, Alias=Aen, Description=Den (verify with documentation)
- Move into OpenRefine (do we reconcile, or do we use Q# in the sheet?)
- Generate schema in OpenRefine to send to Wikidata
- Check, double check, triple check, and do it in small batches
- Fill in gaps
- Fix discrepancies
- Add missing orgs
People
edit- Work on the Properties below, to decide what ones we can provide data for
Example http://www.wikidata.org/entity/Q88485672 Andrea Quattrini Example https://www.wikidata.org/wiki/Q19060876 http://www.wikidata.org/entity/Q19060876 Victoria Funk
- Pull from Profiles all the people according to duty station (matching our orgs)
- Take those people and see if they are in Wikidata
- Add missing people
Decisions for data modeling
edit- employer (P108) - Smithsonian Institution AND Museum both
- occupation (P106) - Work on the list. Occupation (P106) vs Field of Work (P101). Profession (Q28640) vs Position (Q4164871) needs clarification.
- instance of (P31) - Work on the list instance of
Item Label, Description, and Aliases
editProperty | Value | Usage note |
---|---|---|
Label | Person's name as given on Smithsonian website. | Researchers typically have a preferred name by which they refer to themselves while publishing, and are known by in the professional world. Use the name they typically go by FirstName MiddleName LastName, suffix format. |
Description | For items without existing descriptions, model on nearest neighbor | Consistency decisions need to be documented |
Alias (For People) | name string variants commonly found in publications | This will include the various ways researchers names are used in literature and citation indexes. Aliases examples: Lastname, Firstname M., FM Lastname, etc. |
Alias (For SI Organizations) | abbreviations and other name variants | Various forms of the units and departments alternatives |
People Properties
editCore for People
editProperty | Value | Notes |
---|---|---|
instance of (P31) | human (Q5) | individual must have human/person |
Employer (P108) | Smithsonian Institution (Q131626) | |
Field of work (P101) | Must add department level (see Stanford's example) for Natural History people. In the future we can determine success to apply to other Museums (philatelist vs historian) based off of active directory. Botany Department - botany; NH-Entomology field of work would be entomology. Not add entomologist for occupation | |
Position held (P39) | If significant | |
ORCID iD (P496) | ||
ISNI (P213) |
Extended
editProperty | Value | Notes |
---|---|---|
Occupation (P106) | Directors of Museums will use both 1) occupation = museum director and 2) employer - positions held (with dates) | |
date of birth (P569) | Only if publicly available | |
place of birth (P19) | Only if publicly available | |
date of death (P570) | Only if publicly available | |
place of death (P20) | Only if publicly available | |
educated at (P69) | where the person was educated, at any level | see also qualifier usage |
educated at (P69) qualifier: academic degree (P512) | academic degree earned at that institution | e.g., bachelor's degree (Q163727), master's degree (Q183816), doctorate (Q849697) |
educated at (P69) qualifier: start time (P580) | year in which the degree was started | |
educated at (P69) qualifier: end time (P582) | year in which the degree was completed | |
educated at (P69) qualifier: point in time (P585) | year in which the degree was awarded, if start date not known | |
educated at (P69) qualifier: academic major (P812) | academic major or discipline that was the focus of the degree | |
educated at (P69) qualifier: academic thesis (P1026) | item representing the doctoral dissertation (work) | |
educated at (P69) qualifier: doctoral advisor (P184) | primary advisor of the doctoral dissertation | |
award received (P166) qualifier: for work (P1686) | This is for prizes and honors, not grants awarded. qualifier of award received (P166) is to specify the work that an award was given to the creator for | |
VIAF ID (P214) | ||
notable work (P800) | Only use this for major academic works. This might be a lower priority, actually. |
Graph of Smithsonian People
editPeople who have employer, affiliation, or member of Smithsonian, in a graph based on co-authorship.
Organization Properties
editCore
editProperty | Value | Notes |
---|---|---|
instance of (P31) | organization (Q43229) art museum (Q3196771) history museum (Q16735822) science museum (Q588140) museum (Q33506) research center (Q7315155) zoo (Q43501) Organization must have something that is non human |
Museum? Research Facility? natural history museum, art museum? Q3196771 art museum institution (not building - building is Q207694)See below for current terms used by SI organizations |
inception (P571) | ||
parent organization (P749) | Smithsonian Institution (Q131626) | both parent organization and part of |
part of (P361) | Smithsonian Institution (Q131626) | both part of and parent organization |
official website (P856) | ||
ISNI (P213) | ||
located in the administrative territorial entity (P131) | Washington, D.C. (Q61) Manhattan (Q11299) Prince George's County (Q26807) Cambridge (Q49111) Anne Arundel County (Q488701) Panama City (Q3306) |
Property | Value | Notes |
---|---|---|
alias | NMNH, Smithsonian's National Museum of Natural History, Natural History Museum | |
instance of (P31) | natural history museum (Q1970365) | repeat instance or use subclass? Pilsks (talk) 20:08, 5 November 2020 (UTC) |
inception (P571) | 17 March 1910 | |
parent organization (P749) | Smithsonian Institution Q131626 | |
part of (P361) | Smithsonian Institution Q131626 | |
official website (P856) | https://naturalhistory.si.edu/ | |
ISNI (P213) | 0000 0001 2364 2127 | |
located in the administrative territorial entity (P131) | Washington, D.C. (Q61) |
Items Part of / Parent Organization is Smithsonian
edit
Graph display of Organizations part of (P361) Smithsonian, 4 levels down
Extended
editProperty | Value | Notes |
---|---|---|
director / manager (P1037) | qualifier: start time (P580) | |
country (P17) | United States of America (Q30) Panama (Q804) |
|
street address (P6375) | ||
has part(s) (P527) | ||
has subsidiary (P355) | ||
VIAF ID (P214) | ||
Open Funder Registry funder ID (P3153) | ||
GRID ID (P2427) | This should change to ROR |
Items Part of / Parent is Smithsonian (Extended)
editVocabularies
editinstance of (P31) for SI organizations already in Wikidata
editpart of (P361) Smithsonian, aka Smithsonian Organizations
editThese are organizations that have property "part of (P361)" with Smithsonian Institution (Q131626).
Questions And Notes
editUse Cases Smithsonian Research Online
editInquiry | Query | Discussion |
---|---|---|
Find people who have authored something and are associated with the Smithsonian in some way. | https://w.wiki/3Kem https://w.wiki/3Kf5 https://w.wiki/3KfQ |
Discussion |
Find scientists at the Smithsonian that work in botany | https://w.wiki/3J4d https://w.wiki/3J4f |
Discussion |
Find all people affiliated with Smithsonian and report the respective organization/research center/ in Smithsonian - but not bring back the building of the same name: National Museum of Natural History - building vs concept. National Museum of Women (only a concept right now) | https://w.wiki/3MtY | Problem statment |
Find all publications with authors associated with organization/research center/ department of the Smithsonian (all publications by authors at National Museum of Natural History) | https://w.wiki/3MsV | Discussion |
Project Year-end report
editThe purpose of participating in the PCC Wikidata Pilot Project was to model people and organizations associated with Smithsonian Research Online. Initially the project had bold goals of creating Wikidata items for all the current published authors, the associated Smithsonian research facility/museum, and connect publications. The data modeling produced interesting discussion around practical issues, clarity of meaning, and ethical issues. Staff came away from working on this project with practical and theoretical data modeling skills to be applied to our local wikibase installation (once it is established). Staff used a variety of tools and services to query, reconcile, and push data. At the end of the year we had narrowed our goals for enhancing Wikidata to updating accurate core information on Smithsonian museums and reviewing a select few of the researchers already with a presence in Wikidata.