User:Kbseah/queries

Linking errata to articles they correct edit

Titles of errata often have a stereotyped format, e.g. "Correction to: Original article name". This usually depends on publisher though it may vary within the same periodical over time.

We can extract the original article title with regex, find articles of the same title, and link them to their respective errata.

Issues:

  • Some articles have short, non-unique titles. How to avoid falsely linking them to wrong articles? Matching journal titles causes timeout in some queries I've tried.
  • Article title and erratum title may use different case, or represent special characters differently. How to do fuzzy matching? MWAPI search seems to timeout sometimes.

The following query uses these:

  • Properties: instance of (P31)     , publisher (P123)     , title (P1476)     , published in (P1433)     , corrigendum / erratum (P2507)     
    SELECT DISTINCT ?item ?title ?item2 ?item2Label ?journalLabel WITH {
      SELECT DISTINCT ?journal WHERE {
        ?journal wdt:P31 wd:Q5633421; # scientific journal
                 wdt:P123 wd:Q180419; # published by Nature Portfolio
                 wdt:P1476 ?jtitle.
        FILTER ( STRSTARTS(?jtitle, "Nature") ) . # with name starting with "Nature"
      }
    } AS %i 
    WHERE {
      INCLUDE %i
      FILTER (?journal != wd:Q24908540) . # Short article titles in this journal causing problems
      ?item wdt:P31 wd:Q1348305;
            wdt:P1433 ?journal;
            wdt:P1476 ?title.
      FILTER (STRSTARTS(?title, 'Author Correction: ')). # also seen: "Publisher Correction: ", "Erratum: "
      BIND(REPLACE(?title, "^Author Correction: (.*)$", "$1") AS ?substring) # extract title of original article
      ?item2 wdt:P1476 ?substring. # title matching (case sensitive!) 
      FILTER ( NOT EXISTS { ?item2 wdt:P2507 ?item } ) # not already linked as a corrigendum 
    #  ?item2 wdt:P1433 ?journal2.   # Trying to ensure that article is in same journal, as sanity check
    #  FILTER (?journal = ?journal2) # Why does this time out?
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
        ?item2 rdfs:label ?item2Label .
        ?journal rdfs:label ?journalLabel .
      }
    }
    


Generate Quickstatements directly edit

The following query uses these:

  • Properties: instance of (P31)     , published in (P1433)     , title (P1476)     , corrigendum / erratum (P2507)     
    SELECT DISTINCT ?qid ?P2507 ?S887 ?comment WHERE {
      VALUES ?prefix { "CORRIGENDUM: " "Author Correction: " "Publisher Correction: " "ERRATUM: " }
      ?item wdt:P31 wd:Q1348305;
            wdt:P1433 wd:Q2261792;
            wdt:P1476 ?title.
      FILTER (STRSTARTS(?title, ?prefix)).
      BIND(REPLACE(?title, CONCAT("^", STR(?prefix), "(.*)$"), "$1") AS ?substring) # concat the prefix to make a regex
      ?item2 wdt:P1476 ?substring.
      FILTER ( NOT EXISTS { ?item2 wdt:P2507 ?item } )
      BIND (ENCODE_FOR_URI(REPLACE(STR(?item2), ".*Q", "Q")) AS ?qid) # article item
      BIND (ENCODE_FOR_URI(REPLACE(STR(?item), ".*Q", "Q")) AS ?P2507) # corrigendum/erratum
      BIND ("Q69652283" AS ?S887) # based on heuristic: inferred from title
      BIND ("link errata to articles by matching title" AS ?comment) 
    }
    


Taxa with POWO identifiers but not IPNI identifiers edit

POWO identifiers are given identifiers that correspond to IPNI IDs, so if there is already a POWO id, these can be parsed to link to IPNI. Exception are temporary identifiers ending in "-4".

The following query uses these:

  • Properties: parent taxon (P171)     , taxon rank (P105)     , Plants of the World Online ID (P5037)     , IPNI plant ID (P961)     
    PREFIX gas: <http://www.bigdata.com/rdf/gas#>
    SELECT DISTINCT ?qid ?ipni ?comment
    WHERE
    {
      SERVICE gas:service {
        gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
                    gas:in wd:Q21908; # parent taxon Malvales
                    gas:traversalDirection "Reverse" ;
                    gas:out ?item ;
                    gas:out1 ?depth ;
                    gas:maxIterations 10 ;
                    gas:linkType wdt:P171 .
      }
      ?item wdt:P105 wd:Q7432; # taxon rank species
            wdt:P5037 ?powo.
      FILTER ( NOT EXISTS { ?item wdt:P961 ?id. } )
      OPTIONAL { ?item wdt:P171 ?linkTo }
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
        ?item rdfs:label ?itemLabel .
      }
      FILTER ( !STRENDS(?powo, "-4") ) # POWO IDs ending in -4 are temporary, see: https://powo.science.kew.org/about 
      BIND (CONCAT('\"', STRAFTER(?powo, "urn:lsid:ipni.org:names:"), '\"') AS ?ipni ) # surround with """ for Quickstatements v2
      BIND ("parsed from existing POWO identifier" AS ?comment)
      BIND (ENCODE_FOR_URI(REPLACE(STR(?item), ".*Q", "Q")) AS ?qid)
    }
    

Taxa without English labels edit

The following query uses these:

  • Properties: parent taxon (P171)     , taxon name (P225)     
    PREFIX gas: <http://www.bigdata.com/rdf/gas#>
    
    SELECT DISTINCT ?qid ?Len
    WHERE
    {
      SERVICE gas:service {
        gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;
                    gas:in wd:Q4589415; # Diaphoretickes
                    gas:traversalDirection "Reverse" ;
                    gas:out ?item ;
                    gas:out1 ?depth ;
                    gas:maxIterations 10 ;
                    gas:linkType wdt:P171 .
      }
      OPTIONAL { ?item wdt:P171 ?linkTo }
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
      }
      FILTER(
        NOT EXISTS {
          ?item rdfs:label ?itemLabelEn.
          FILTER(LANG(?itemLabelEn) = "en")
        }
      )
      ?item wdt:P225 ?Len.
      BIND (ENCODE_FOR_URI(REPLACE(STR(?item), ".*Q", "Q")) AS ?qid)
    }