Wikidata:WikiProject PCC Wikidata Pilot/Smithsonian Libraries/Projects/Smithsonian Research Online

Aim and Scope

edit

This project aims to use our knowledge of Smithsonian researchers to explore how they are represented in Wikidata. The information source will mainly be the Smithsonian Libraries research output tracking program Smithsonian Research Online and the VIVO instance branded SI Profiles. We want to clearly identify those individuals and their related Smithsonian organization. It would be fabulous if we could find a way to connect citations of scholarly products credited to the individual.

Background

edit

Since 2008, the Smithsonian Libraries has been tracking the research output of the Smithsonian. More recently, a VIVO instance was launched. See SI Research Online and Smithsonian Profiles

Timeline

edit
  • 22 November2020: complete initial draft project plan
  • Fall/Winter 2020/2021: determine priorities, finalize core fields
  • Spring 2021: determine workflow, tools, and test reconciliation and bulk loads
  • Summer 2021:complete data investigation for missing values

Contributors

edit
  • Suzanne Pilsk Pilsks - Smithsonian Libraries and Archives Metadata Department
  • Richard Naples Drastician - Smithsonian Libraries and Archives Metadata Department
  • Amy Watson WatsonAmy - Smithsonian Libraries and Archives Resource Description Department
  • Deborah Shapiro BornOnTheCob - Archives, Smithsonian Libraries and Archives

Workflow

edit

Authors

  1. Identify authors in SRO that are already in Wikidata
  2. Identify authors in SRO that are worthy of Wikidata but not represented (use of SI Profiles Project)
  3. Determine what we deem as core properties are missing in wikidata for authors

Publications

  1. Relate existing publications in Wikidata to known Smithsonian authors
  2. Contribute publication data and associate with Smithsonian authors

Organizations

  1. Determine what Smithsonian organizations are already in Wikidata
  2. Determine what Smithsonian organizations are missing from Wikidata that will be critical for SRO
  3. Determine what we deem as core properties are missing in wikidata for SI organizations

Tasks

edit

Orgs

edit
  1. Export Profiles Orgs.
  2. Compare to what is in Wikidata as "part of" and "parent organization" = Smithsonian.
  3. Look at and reconcile those that are not a one-to-one match.
  4. Edit record so all appropriate orgs get both "part of" and "parent organization"
  5. Set up Google Sheet with all appropriate orgs and matching data model properties.
    1. Add missing orgs
      1. Create a google/excel sheet with Q items with P# as headers, and fill in this data. Label=Len, Alias=Aen, Description=Den (verify with documentation)
      2. Move into OpenRefine (do we reconcile, or do we use Q# in the sheet?)
      3. Generate schema in OpenRefine to send to Wikidata
      4. Check, double check, triple check, and do it in small batches
    2. Fill in gaps
    3. Fix discrepancies

People

edit
  • Work on the Properties below, to decide what ones we can provide data for
Example http://www.wikidata.org/entity/Q88485672  Andrea Quattrini
Example https://www.wikidata.org/wiki/Q19060876  http://www.wikidata.org/entity/Q19060876 Victoria Funk
  1. Pull from Profiles all the people according to duty station (matching our orgs)
  2. Take those people and see if they are in Wikidata
  3. Add missing people

Decisions for data modeling

edit
  1. employer (P108) - Smithsonian Institution AND Museum both
  2. occupation (P106) - Work on the list. Occupation (P106) vs Field of Work (P101). Profession (Q28640) vs Position (Q4164871) needs clarification.
  3. instance of (P31) - Work on the list instance of

Item Label, Description, and Aliases

edit
Property Value Usage note
Label Person's name as given on Smithsonian website. Researchers typically have a preferred name by which they refer to themselves while publishing, and are known by in the professional world. Use the name they typically go by FirstName MiddleName LastName, suffix format.
Description For items without existing descriptions, model on nearest neighbor Consistency decisions need to be documented
Alias (For People) name string variants commonly found in publications This will include the various ways researchers names are used in literature and citation indexes. Aliases examples: Lastname, Firstname M., FM Lastname, etc.
Alias (For SI Organizations) abbreviations and other name variants Various forms of the units and departments alternatives

People Properties

edit

Core for People

edit
Property Value Notes
instance of (P31) human (Q5) individual must have human/person
Employer (P108) Smithsonian Institution (Q131626)
Field of work (P101) Must add department level (see Stanford's example) for Natural History people. In the future we can determine success to apply to other Museums (philatelist vs historian) based off of active directory. Botany Department - botany; NH-Entomology field of work would be entomology. Not add entomologist for occupation
Position held (P39) If significant
ORCID iD (P496)
ISNI (P213)

Extended

edit
Property Value Notes
Occupation (P106) Directors of Museums will use both 1) occupation = museum director and 2) employer - positions held (with dates)
date of birth (P569) Only if publicly available
place of birth (P19) Only if publicly available
date of death (P570) Only if publicly available
place of death (P20) Only if publicly available
educated at (P69) where the person was educated, at any level see also qualifier usage
educated at (P69) qualifier: academic degree (P512) academic degree earned at that institution e.g., bachelor's degree (Q163727), master's degree (Q183816), doctorate (Q849697)
educated at (P69) qualifier: start time (P580) year in which the degree was started
educated at (P69) qualifier: end time (P582) year in which the degree was completed
educated at (P69) qualifier: point in time (P585) year in which the degree was awarded, if start date not known
educated at (P69) qualifier: academic major (P812) academic major or discipline that was the focus of the degree
educated at (P69) qualifier: academic thesis (P1026) item representing the doctoral dissertation (work)
educated at (P69) qualifier: doctoral advisor (P184) primary advisor of the doctoral dissertation
award received (P166) qualifier: for work (P1686) This is for prizes and honors, not grants awarded.
qualifier of award received (P166) is to specify the work that an award was given to the creator for
VIAF ID (P214)
notable work (P800) Only use this for major academic works. This might be a lower priority, actually.

Graph of Smithsonian People

edit

People who have employer, affiliation, or member of Smithsonian, in a graph based on co-authorship.

Organization Properties

edit

Core

edit
Property Value Notes
instance of (P31) organization (Q43229)
art museum (Q3196771)
history museum (Q16735822)
science museum (Q588140)
museum (Q33506)
research center (Q7315155)
zoo (Q43501) Organization must have something that is non human
Museum? Research Facility? natural history museum, art museum? Q3196771 art museum institution (not building - building is Q207694)See below for current terms used by SI organizations
inception (P571)
parent organization (P749) Smithsonian Institution (Q131626) both parent organization and part of
part of (P361) Smithsonian Institution (Q131626) both part of and parent organization
official website (P856)
ISNI (P213)
located in the administrative territorial entity (P131) Washington, D.C. (Q61)
Manhattan (Q11299)
Prince George's County (Q26807)
Cambridge (Q49111)
Anne Arundel County (Q488701)
Panama City (Q3306)
Property Value Notes
alias NMNH, Smithsonian's National Museum of Natural History, Natural History Museum
instance of (P31) natural history museum (Q1970365) repeat instance or use subclass? Pilsks (talk) 20:08, 5 November 2020 (UTC)[reply]
inception (P571) 17 March 1910
parent organization (P749) Smithsonian Institution Q131626
part of (P361) Smithsonian Institution Q131626
official website (P856) https://naturalhistory.si.edu/
ISNI (P213) 0000 0001 2364 2127
located in the administrative territorial entity (P131) Washington, D.C. (Q61)

Items Part of / Parent Organization is Smithsonian

edit

Core Properties Listeria


Graph display of Organizations part of (P361) Smithsonian, 4 levels down

Extended

edit
Property Value Notes
director / manager (P1037) qualifier: start time (P580)
country (P17) United States of America (Q30)
Panama (Q804)
street address (P6375)
has part(s) (P527)
has subsidiary (P355)
VIAF ID (P214)
Open Funder Registry funder ID (P3153)
GRID ID (P2427) This should change to ROR

Items Part of / Parent is Smithsonian (Extended)

edit

Extended Properties Listeria

Vocabularies

edit

instance of (P31) for SI organizations already in Wikidata

edit
Item Notes
Native American organization (Q100731037) no
astronomical observatory (Q1254933) no
facility (Q13226383) yes - we want to do future work for research facilities like STRI facilities, Tennenbaum, Labs, etc.
research library (Q1438040) no
local museum (Q1595639) no
nonprofit organization (Q163740) no
archive (Q166118) yes - AAA etc.
history museum (Q16735822) yes (NMAH, NMAAHC, ??)
national museum (Q17431399) yes - all things that say "National" (NMAA?)
sculpture garden (Q1759852) no
institution (Q178706) no
natural history museum (Q1970365) yes
art museum (Q207694) yes
postal museum (Q2106220) yes
urban park (Q22746) no
museum building (Q24699794) no
art archive (Q27032254) no
research institute (Q31855) yes
museum (Q33506) yes
Asian art (Q3399573) no
work (Q386724) no
organization (Q43229) no - but ask group (is museum an instance of org so we don't need it, right?)
zoo (Q43501) no
aviation museum (Q4828724) yes
cultural institution (Q5193377) no
African-American museum (Q54934129) yes
tourist attraction (Q570116) no
library (Q7075) yes
architectural structure (Q811979) no

part of (P361) Smithsonian, aka Smithsonian Organizations

edit

These are organizations that have property "part of (P361)" with Smithsonian Institution (Q131626).

Item Notes
Air & Space/Smithsonian (Q4697697)
Archives of American Art (Q2860568)
Archives of American Gardens (Q4787294)
Arthur M. Sackler Gallery (Q259767)
Bocas del Toro Research Station (Q4936149)
Bureau of American Ethnology (Q3791290)
Caribbean Coral Reef Ecosystems Program (Q5039373)
Center for Earth and Planetary Studies (Q5059600)
Center for Short-Lived Phenomena (Q5059869)
Charles A. Lindbergh Chair in Aerospace History (Q5074867)
Consortium for the Barcode of Life (Q5163381)
Fred Lawrence Whipple Observatory (Q1452194)
Freer Gallery of Art (Q1075126)
Global Volcanism Program (Q3108976)
Harvard–Smithsonian Center for Astrophysics (Q1133697)
National Museum of African American History and Culture (Q3073495)
National Museum of American History (Q148584)
National Museum of the American Indian George Gustav Heye Center (Q5539979)
National Numismatic Collection (Q6974600)
National Portrait Gallery (Q403987)
Renwick Gallery (Q876537)
S. Dillon Ripley Center (Q7387446)
Smithsonian Magazine (Q3487014) (Smithsonian Magazine)
Smithsonian Asian Pacific American Center (Q4806714)
Smithsonian Folklife Festival (Q7545604)
Smithsonian Institution Archives (Q2154134)
Smithsonian Institution Building (Q605937)
Smithsonian Libraries and Archives (Q1609326)
Smithsonian Jazz Masterworks Orchestra (Q20858340)
Smithsonian Latino Center (Q30264183)
Smithsonian Marine Station at Fort Pierce (Q7545615)
Smithsonian Migratory Bird Center (Q7545616)
Smithsonian Tropical Research Institute (Q1673980)
Steven F. Udvar-Hazy Center (Q1055100)
Systematic Entomology Laboratory (Q47067728)
Verville Fellowship (Q17066651)
Woodrow Wilson International Center for Scholars (Q872975)

Questions And Notes

edit

Use Cases Smithsonian Research Online

edit
Inquiry Query Discussion
Find people who have authored something and are associated with the Smithsonian in some way. https://w.wiki/3Kem
https://w.wiki/3Kf5
https://w.wiki/3KfQ
Discussion
Find scientists at the Smithsonian that work in botany https://w.wiki/3J4d
https://w.wiki/3J4f
Discussion
Find all people affiliated with Smithsonian and report the respective organization/research center/ in Smithsonian - but not bring back the building of the same name: National Museum of Natural History - building vs concept. National Museum of Women (only a concept right now) https://w.wiki/3MtY Problem statment
Find all publications with authors associated with organization/research center/ department of the Smithsonian (all publications by authors at National Museum of Natural History) https://w.wiki/3MsV Discussion

Project Year-end report

edit

The purpose of participating in the PCC Wikidata Pilot Project was to model people and organizations associated with Smithsonian Research Online. Initially the project had bold goals of creating Wikidata items for all the current published authors, the associated Smithsonian research facility/museum, and connect publications. The data modeling produced interesting discussion around practical issues, clarity of meaning, and ethical issues. Staff came away from working on this project with practical and theoretical data modeling skills to be applied to our local wikibase installation (once it is established). Staff used a variety of tools and services to query, reconcile, and push data. At the end of the year we had narrowed our goals for enhancing Wikidata to updating accurate core information on Smithsonian museums and reviewing a select few of the researchers already with a presence in Wikidata.