Open main menu

Wikidata:WikiProject Visual arts/Getty Vocabularies

< Wikidata:WikiProject Visual arts

This sub-project of the WikiProject Visual arts aims to align the information of the Getty Vocabulary Program (Q5554720) and Wikidata. The current approach is to use federated SPARQL queries that compare Getty and Wikidata date and whose results are fed to Wikidata via QuickStatements. There are bots around which work on this as well, so by Multichill dealing with labels, aliases and sex or gender (P21) and by Magnus Magnus dealing with date of birth (P569) and date of death (P570).

As of mid-June 2018 already executed edits on Wikidata are mainly test edits.

Contents

QueriesEdit

The table contains queries that can be easily used as input for QuickStatements.

Property Query 1st ref Comment
place of birth (P19)

QScsv nice-QS

  • most frequent as POB or POD in ULAN, but TGN not in WD:
    PREFIX gvp: <http://vocab.getty.edu/ontology#>
    
    SELECT
    ?placeURL ?wdSearchURL ?name ?id ?count
      WHERE {
      SERVICE <http://vocab.getty.edu/sparql.json> {
        { SELECT ?place ?name ?id (COUNT(?place) as ?count) {
          ?agent gvp:biographyPreferred/(schema:birthPlace|schema:deathPlace)/^foaf:focus ?place .
          ?place gvp:prefLabelGVP/gvp:term ?name .
          ?place dc:identifier ?id .
          }
          GROUP BY ?place ?name ?id
        }
      }
      MINUS { ?item wdt:P1667 ?id . }
      BIND(URI(CONCAT("http://vocab.getty.edu/page/tgn/", ?id)) AS ?placeURL)
      BIND(URI(CONCAT("https://www.wikidata.org/wiki/Special:Search/", ?name)) AS ?wdSearchURL)
    }
    ORDER BY DESC(?count)
    LIMIT 30
    
    Try it!, plus work location similar places:
    PREFIX gvp: <http://vocab.getty.edu/ontology#>
    PREFIX bio: <http://purl.org/vocab/bio/0.1/>
    
    SELECT
    ?placeURL ?wdSearchURL ?name ?id ?count
      WHERE {
      SERVICE <http://vocab.getty.edu/sparql.json> {
        { SELECT ?place ?name ?id (COUNT(?place) as ?count) {
          ?agent (gvp:biographyPreferred/(schema:birthPlace|schema:deathPlace)|bio:event/(schema:location|(schema:location/gvp:broaderExtended)))/^foaf:focus ?place .
          ?place gvp:prefLabelGVP/gvp:term ?name .
          ?place dc:identifier ?id .
          }
          GROUP BY ?place ?name ?id
        }
      }
      MINUS { ?item wdt:P1667 ?id . }
      BIND(URI(CONCAT("http://vocab.getty.edu/page/tgn/", ?id)) AS ?placeURL)
      BIND(URI(CONCAT("https://www.wikidata.org/wiki/Special:Search/", ?name)) AS ?wdSearchURL)
    }
    ORDER BY DESC(?count)
    LIMIT 30
    
    Try it!
place of death (P20)

QScsv nice-QS

sex or gender (P21)

QScsv nice-QS

  • count genders in ULAN:
    SELECT ?gender (COUNT(?gender) as ?count) {
      ?agent gvp:biographyPreferred/schema:gender ?gender .
      }
    GROUP BY ?gender
    
    Try it!
  • a messy query to spot contradictory gender in Wikidata and ULAN (nothing found so far up to OFFSET given here, on 2018-06-10; could perhaps be sped up by hints, now better queries from complex constraints on Property talk:P245!):
    PREFIX gvp: <http://vocab.getty.edu/ontology#>
    PREFIX aat: <http://vocab.getty.edu/aat/>
    
    SELECT
    ?item ?ulanID
    # ?itemLabel ?wdValueLabel ?ulanHuman
      WITH { SELECT ?item ?ulanID ?wdValue
      WHERE {
        ?item wdt:P245 ?ulanID .
        ?item wdt:P21 ?wdValue .
      } ORDER BY ?item LIMIT 5000 OFFSET 55000 } AS %items
    
      # now see what ULAN says to those statements
      WHERE { INCLUDE %items
      BIND(URI(CONCAT("http://vocab.getty.edu/ulan/", ?ulanID)) AS ?ulanURI)
    #  BIND(URI(CONCAT("http://vocab.getty.edu/page/ulan/", ?ulanID)) AS ?ulanHuman)
      SERVICE <http://vocab.getty.edu/sparql.json> {
        ?ulanURI foaf:focus ?ulanAgent .
        ?ulanAgent gvp:biographyPreferred ?ulanBio.
        ?ulanBio schema:gender ?ulanValue .
        # translate: aat:300189557 (female) -> wd:Q6581072, aat:300189559 (male) -> wd:Q6581097
        # don't use aat:300400512 (unavailable; 61321 (2018-06-10)), aat:300400513 (other; 2 (2018-06-10))
        BIND(IF(?ulanValue = aat:300189559, wd:Q6581097, IF(?ulanValue = aat:300189557, wd:Q6581072, wd:Q1)) AS ?wdValueFromGetty)
      }
      FILTER((?wdValueFromGetty IN (aat:300189557, aat:300189559)) && (?wdValue != ?wdValueFromGetty))
    #  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
    }
    LIMIT 500
    
    Try it!
work location (P937)

QScsv nice-QS

this does not add dates, but they are anyway rough estimates, cf. [1]
occupation (P106)

QScsv nice-QS

capital of (P1376)

QScsv nice-QS

Edits doneEdit

TGN matchingEdit

Matching on a specific administrative level can be done with queries like this one (heavy adaption necessary, also to escape timeout):

  • PREFIX gvp: <http://vocab.getty.edu/ontology#>
    
    SELECT DISTINCT
    ?placeURL
    # ?wdSearchURL
    ?name ?id
    ?item #?itemLabel
    ?itemType #?itemTypeLabel
    ?match ?matchLabel
    # ?matchType ?matchTypeLabel
    ?QSlink
      WHERE {
      SERVICE <http://vocab.getty.edu/sparql.json> {
        { SELECT ?place ?name ?id ?upperTgn {
          ?place gvp:prefLabelGVP/gvp:term ?name .
          BIND(<http://vocab.getty.edu/tgn/1000080> AS ?upperTgn) # e.g. Tuscany
          ?place gvp:broaderPreferred/gvp:broaderPreferred ?upperTgn .
          ?place gvp:placeTypePreferred <http://vocab.getty.edu/aat/300000774> . # e.g. province
          ?place dc:identifier ?id .
          }
        }
      }
      OPTIONAL { ?item wdt:P1667 ?id .
                 OPTIONAL { ?item wdt:P31 ?itemType . }
               }
      BIND(SUBSTR(STR(?upperTgn), 28) AS ?upperTgnId)
      OPTIONAL { 
                 MINUS { ?item wdt:P1667 ?id . }
                 ?match wdt:P131+ ?upper .
                 ?upper wdt:P1667 ?upperTgnId .
    #             ?match rdfs:label ?matchL . #FILTER(LANG(?matchL) = "en")
                 ?match skos:altLabel ?matchA . FILTER(LANG(?matchA) = "en")
                 FILTER((STR(?matchL) = ?name ) || (STR(?matchA) = ?name ))
    #             ?match wdt:P31 ?matchType .
                 ?match wdt:P31 wd:Q15089 . # e.g. province of Italy
               }
    #  hint:Query hint:optimizer "None".
      BIND(URI(CONCAT("http://vocab.getty.edu/page/tgn/", ?id)) AS ?placeURL)
    #  BIND(URI(CONCAT("https://www.wikidata.org/wiki/Special:Search/", ?name)) AS ?wdSearchURL)
        
      BIND(SUBSTR(STR(?match), 32) AS ?qid)
      BIND(URI(CONCAT("https://tools.wmflabs.org/quickstatements/index_old.html#v1=",
                      ?qid, '%09P1667%09"', ?id, '"')) AS ?preQSlink)
      BIND((IF(BOUND(?item),"",?preQSlink)) AS ?QSlink)
        
    #  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de". }
    }
    ORDER BY ?name
    LIMIT 300
    
    Try it!

TODOEdit

  • introduce TGN ID (P1667) checks:
  • add comments Help:QuickStatements#Comments with a link to WD:WPVA/Getty   Done
  • implement more actions than just "add first reference"
  • do /reference normalization for the Getty Vocabularies
  • ULAN schema:gender "unknown" shouldn't be used probably like at Q29473318#P21
  • after adding of references recheck the manual (non-templated) queries from (the version history of) this page, they had 1 (P19) and 2 (P20) results more (inner limit was 50000)
  • introduce checking for existing qualifiers! and don't add reference if they aren't sourced as well! I've checked old cases by hand with queries like
    SELECT ?item ?itemLabel ?someProp ?somePropLabel ?foo WHERE {
      hint:Query hint:optimizer "None".
      ?wdRef pr:P248 wd:Q2494649. # ULAN …
      ?wdRef pr:P577 ?bar.
      ?item p:P20 [ prov:wasDerivedFrom ?wdRef; ?qualProp ?qualValue ]. # … used as references for POD statements
      ?foo wikibase:qualifier ?qualProp.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    LIMIT 30
    
    Try it! partly implemented: [2], but doesn't work for cases like this where there is the same value twice!

Ontology mappingEdit

  • A subject-object relationship in the GVP, as an example here gvp:ulan1532_uncle-aunt_of:
    SELECT DISTINCT ?subj ?subjTerm ?obj ?objTerm WHERE {
      ?subj gvp:ulan1532_uncle-aunt_of ?obj .
      OPTIONAL { ?subj xl:prefLabel/xl:literalForm ?subjTerm . }
      OPTIONAL { ?obj xl:prefLabel/xl:literalForm ?objTerm . }
      FILTER(LANG(?subjTerm) = "en")
      FILTER(LANG(?objTerm) = "en")
      }
    
    Try it!
  • From the database (some have still to be tested since the GVP ontology specification is not that clear; as of 2018-06-10 all ULAN properties have tried to be mapped there):
    # TODO: could include usage numbers from Getty!
    
    SELECT DISTINCT ?wdProp ?wdPropLabel ?relationshipStProp ?relationshipPropLabel ?gettyProp ?projectLabel ?singleValue
    # (URI(CONCAT("http://www.wikidata.org/entity/Q", STR(ROUND(RAND()*1000000)))) AS ?item)
    # Unfortunately, the random item generator doesn't work, that would enable the use of "Template:Wikidata list"
    WHERE {
      VALUES ?relationshipStProp { ps:P1628 ps:P2235 ps:P2236 }
      { ?wdProp ?a [ ?relationshipStProp ?gettyProp ] .
        FILTER(STRSTARTS(STR(?gettyProp), "http://vocab.getty.edu/ontology#"))
      } UNION {
        ?wdProp ?b [ ?relationshipStProp ?gettyProp ;
                     pq:P1535 ?project ] .
        FILTER(!(!(?project = wd:Q5554720) && !(EXISTS { ?project wdt:P361 wd:Q5554720 . })))
        # prevent wikitext problems with double pipe
      }
      ?relationshipProp wikibase:statementProperty ?relationshipStProp .
      BIND(EXISTS { ?wdProp wdt:P2302 wd:Q19474404 . } AS ?singleValue)
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    ORDER BY ?gettyProp
    
    Try it!

For the GVP ontology see http://vocab.getty.edu/ontology!

Some matches that need special modelling here:

Getty Wikidata
gvp:ulan1006_formerly_identified_with different from (P1889)+said to be the same as (P460) with deprecated rank
gvp:ulan1204_donor_was donated by (P1028) is similar
gvp:ulan1513_grandchild_of relative (P1038) with qualifier
gvp:ulan1514_gandparent_of "
gvp:ulan1532_uncle-aunt_of "