Wikidata:SPARQL query service/WDQS graph split/Federated Queries Examples

This page is a compilation of queries identified during an analysis of the impact of the graph split on existing queries^[1] and an explanation on how to rewrite them with federation. Note that the way we interpreted the queries might not reflect the intent of the original author but our goal is to rewrite them in such a way that the query returns identical results on the global graph and the split graph with a federated query. To make federation mandatory the queries were sometimes slightly modified from what we originally found in the query logs.

Sitelinks lookup edit

The query below does a simple lookup of the sitelinks of a given entity.

SELECT * {
  ?wikimediaUrl schema:about       wd:Q330955  ;
                schema:isPartOf    ?partOf_    ;
                schema:inLanguage  ?inLanguage .
  bind(replace(str(?partOf_), '^https://[^.]+\\.([^.]+)\\.[^.]+/$', '$1') as ?partOf)
  bind(replace(str(?wikimediaUrl), '.*/wiki/([^/]+)$', '$1') as ?articleTitle)
} ORDER BY ?partOf ?inLanguage}}

This query does target a single entity and extracts its sitelinks. This query does not require any federation and can be used as-is if the target entity is known to be a scientific article. If not we can use a union to fetch the data from both graphs (here running from the main graph graph and federating from the scholarly article subgraph):

SELECT * {
  VALUES (?item) {(wd:Q330955) (wd:Q42)}
  {
    ?wikimediaUrl schema:about       ?item  ;
                  schema:isPartOf    ?partOf_    ;
                  schema:inLanguage  ?inLanguage .
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?wikimediaUrl schema:about       ?item       ;
                    schema:isPartOf    ?partOf_    ;
                    schema:inLanguage  ?inLanguage .
    }
  }
  bind(replace(str(?partOf_), '^https://[^.]+\\.([^.]+)\\.[^.]+/$', '$1') as ?partOf)
  bind(replace(str(?wikimediaUrl), '.*/wiki/([^/]+)$', '$1') as ?articleTitle)
} ORDER BY ?partOf ?inLanguage

Try it

The above query has been slightly modified to fetch two entities from the two sub-graphs.

Note: It is worth explaining that it is not appropriate to use the query service for such query, the wbgetentities API module or the REST api should be preferred instead.

Simple lookup by object on the truthy graph edit

The query below does try to find all the entities that have a direct claim^[2] to a given entity.

SELECT ?x ?xLabel ?r ?relLabel {
  ?x ?r wd:Q1542532 .
  ?rel wikibase:directClaim ?r
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
} ORDER BY ?r ?xLabel

Since the property definitions are in the main graph we may want to start the query from the main endpoint and federate the scholarly article one. Using the same approach as the previous query we leverage a UNION to capture the data from both splits. We also want to use the label service for both the direct property label and the entity that is linking the searched work. For this we pass again the label service to the federated query and bind the label artificially (BIND(?xLabel AS ?xLabel)) to indicate the label service that we want this variable as an output of the federated query:

SELECT ?x ?xLabel ?r ?relLabel {
  VALUES (?work) {(wd:Q1542532)}
  {
    ?x ?r ?work .
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?x ?r ?work .
      BIND(?xLabel AS ?xLabel)  
      SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
    }
  }
  ?rel wikibase:directClaim ?r
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
} ORDER BY ?r ?xLabel

Try it

Simple lookup by subject edit

The query below tries to fetch the label and the description of a given item or a lemma:

SELECT ?xLabel ?descr {
   VALUES (?x) {(wd:Q77051335) (wd:L17271) (wd:Q42)}
   OPTIONAL { ?x wikibase:lemma ?xLabel }
   OPTIONAL { ?x schema:description ?descr . filter(lang(?descr) = 'en') }
   SERVICE wikibase:label { bd:serviceParam  wikibase:language  'en,de,fr,rm' }
}

This query is slightly confusing because it can bind the ?xLabel variable via the BGP on wikibase:lemma or via the label service. It also contains two optional clauses which mean that nonexistent entities can be returned. For the purpose of this rewrite we slightly change the semantic of the query by removing the OPTIONAL and prefer a UNION. For federation we use the same technique as before by using a UNION:

SELECT ?xLabel ?descr {
  VALUES (?x) {(wd:Q77051335) (wd:L17271) (wd:Q42)}
  { 
    { ?x wikibase:lemma ?xLabel }
    UNION
    { ?x schema:description ?descr . filter(lang(?descr) = 'en') }
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?x schema:description ?descr . filter(lang(?descr) = 'en')
      BIND(?xLabel AS ?xLabel)
      SERVICE wikibase:label { bd:serviceParam  wikibase:language  'en,de,fr,rm' }
    }
  }
  SERVICE wikibase:label { bd:serviceParam  wikibase:language  'en,de,fr,rm' }
}

Try it

We removed the wikibase:lemma lookup from the scholarly subgraph, it would be pointless since there are no lemmas in this subgraph.

Note: It is worth explaining that it is not appropriate to use the query service for such query, the wbgetentities API module or the REST api should be preferred instead.

Lookup from mwapi results edit

The query below does source the list of items from the wikidata search API and returns the type of the item and one of its label.

 SELECT ?item ?itemLabel ?type ?typeLabel WHERE {
   SERVICE wikibase:mwapi {
       bd:serviceParam wikibase:api "EntitySearch" .
       bd:serviceParam wikibase:endpoint "www.wikidata.org" .
       bd:serviceParam mwapi:search "house of medici" .
       bd:serviceParam mwapi:language "en" .
       ?item wikibase:apiOutputItem mwapi:item .
       ?num wikibase:apiOrdinal true .
   }   ?item (wdt:P31|wdt:P279|wdt:P366) ?type
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  } ORDER BY ASC(?num) LIMIT 10

Similar to how we solved previous use cases we use UNION to collect the results from the scholarly articles subgraph:

 SELECT ?item ?itemLabel ?type ?typeLabel WHERE {
   SERVICE wikibase:mwapi {
       bd:serviceParam wikibase:api "EntitySearch" .
       bd:serviceParam wikibase:endpoint "www.wikidata.org" .
       bd:serviceParam mwapi:search "house of medici" .
       bd:serviceParam mwapi:language "en" .
       ?item wikibase:apiOutputItem mwapi:item .
       ?num wikibase:apiOrdinal true .
   }
   {
     ?item (wdt:P31|wdt:P279|wdt:P366) ?type
   } UNION {
     SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
       ?item (wdt:P31|wdt:P279|wdt:P366) ?type
       BIND(?itemLabel as ?itemLabel)
       SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
     }
   }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  } ORDER BY ASC(?num) LIMIT 10

Try it

Note: It is worth explaining that it is not appropriate to use the query service for such query, the wbsearchentities API module should be combined with the wbgetentities API module or the REST api on the returned results.

Simple count edit

The query below is issued from InteGraality (Q84572489) and is simply counting the number of items matching certain criteria:

SELECT (COUNT(*) as ?count) WHERE {
  ?entity wdt:P4101 wd:Q41506
  FILTER(EXISTS {
    ?entity p:P495 []
  })
}

If the count requires items in both sub graph to be added federation is required, two approaches can be taken here. First by doing the count on the individual graphs and then adding the results (probably the most efficient):

SELECT (?countMain + ?countSchol AS ?count) WHERE {
  {
    SELECT (COUNT(*) as ?countMain) WHERE {
      ?entity wdt:P4101 wd:Q41506 ;
              p:P495 [] .
    }
  }
  SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
    SELECT (COUNT(*) as ?countSchol) WHERE {
      ?entity wdt:P4101 wd:Q41506 ;
              p:P495 [] .
    }
  }
}

Try it

Or by transferring the individual results from the scholarly article sub-graph engine to the main graph engine and counting:

SELECT (COUNT(*) as ?count) WHERE {
  {
    ?entity wdt:P4101 wd:Q41506 ;
            p:P495 [] .
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?entity wdt:P4101 wd:Q41506 ;
              p:P495 [] .
    }
  }
}

Try it

Notes: It might be worth explaining that for the two properties used here the search API could perhaps have been a good fit using the hasbwstatement search keyword.

Joining papers and authors edit

The query below is emitted from a tool named duplicate_researchers does join research papers and their authors to identify duplicates:

SELECT (group_concat(DISTINCT ?qLabel;separator='|') AS ?labels) (group_concat(DISTINCT REPLACE(str(?q),'^.+/','');separator='|') AS ?qs) ?lastName ?firstLetter (count(DISTINCT ?q) AS ?cnt) {
  VALUES ?author { wd:Q64213594 }.
  ?paper wdt:P50 ?author.
  ?paper wdt:P50 ?q.
  ?q rdfs:label ?qLabel.
  FILTER(lang(?qLabel)='en').
  BIND(REPLACE(?qLabel, '^.* ', '') AS ?lastName).
  BIND(REPLACE(?qLabel, '^(.).*$', '$1') AS ?firstLetter)
} GROUP BY ?lastName ?firstLetter HAVING (?cnt>1)

Since authors and papers might not be on the same graph we have to use federation.

SELECT (group_concat(DISTINCT ?qLabel;separator='|') AS ?labels) (group_concat(DISTINCT REPLACE(str(?q),'^.+/','');separator='|') AS ?qs) ?lastName ?firstLetter (count(DISTINCT ?q) AS ?cnt) {
  VALUES ?author { wd:Q64213594 }.
  {
    ?paper wdt:P50 ?author.
    ?paper wdt:P50 ?q.
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?paper wdt:P50 ?author.
      ?paper wdt:P50 ?q.
    }
  }
  hint:Prior hint:runFirst true .
  ?q rdfs:label ?qLabel.
  FILTER(lang(?qLabel)='en').
  BIND(REPLACE(?qLabel, '^.* ', '') AS ?lastName).
  BIND(REPLACE(?qLabel, '^(.).*$', '$1') AS ?firstLetter)
} GROUP BY ?lastName ?firstLetter	HAVING (?cnt>1)

Try it

We note here that we have to help blazegraph by telling it to first grab the paper data from the federated service. The reason is that by default it might try to gather most of the data from the initial service before sending it to the federated one. Here for instance it might try to bind all possible ?q with ?q rdfs:label ?qLabel while it should first bind ?paper (the most limiting criteria) from the federated service. This is what hint:Prior hint:runFirst true is doing by telling blazegraph to run the prior statements before attempting to resolve ?q rdfs:label ?qLabel.

Finding duplicated external ids with a `group by` edit

The query below is issued from Mix'n'match (Q28054658) and attempts to find items that share the same external id.

SELECT ?extid (count(?q) AS ?cnt) (GROUP_CONCAT(?q; SEPARATOR = '|') AS ?items) {
  ?q wdt:P244 ?extid
}
GROUP BY ?extid HAVING (?cnt>1) ORDER BY ?extid

This query can be rewritten with federation naively using the same UNION technique:

SELECT ?extid (count(?q) AS ?cnt) (GROUP_CONCAT(?q; SEPARATOR = '|') AS ?items) {
  {
    ?q wdt:P244 ?extid
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?q wdt:P244 ?extid
    }
  }
}
GROUP BY ?extid HAVING (?cnt>1) ORDER BY ?extid

Try it

But when using a property that is more widely used than Library of Congress authority ID (P244) we hit limitations due to federation. For instance we can attempt to rewrite this query with described at URL (P973) (which will timeout if not properly optimized). The difficulty is that all triples matching ?q wdt:P973 ?extid have to be transferred and if this list is too long the grouping can't seem to be computed in a reasonable amount of time. One trick might be to attempt to limit this list by decomposing the query in multiple steps and optimize the grouping by excluding external ids that can't match the HAVING (?cnt>1) criteria.

Get the list of external id and Q id pairs that are duplicated in the federated service and that are not in the main service
Get the list of external id and Q id pairs from the federated service that are in the main service
Get the list of external id and Q id pairs from the main service

By this we exclude all the external id and Q id pairs that can't be duplicated (which is assumed to be the majority since that we are looking for anomalies).

SELECT ?extid (COUNT(?q) AS ?cnt) (GROUP_CONCAT(?q; SEPARATOR = '|') AS ?items) {
  {
    # First get the duplicates that are solely duplicated in query-scholarly-experimental
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      SELECT ?q ?extid {
        ?q wdt:P973 ?extid .
        {
          SELECT ?extid (COUNT(*) AS ?c) {
            ?q wdt:P973 ?extid
          } GROUP BY ?extid HAVING (?c > 1)
        }
      }
    }
    # exludes the external ids present in the main graph, these will be covered below
    MINUS { [] wdt:P973 ?extid }
  } UNION {
    # Get the possible candidates that could be duplicated between query-scholarly-experimental and main
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?q wdt:P973 ?extid
    }
    hint:Prior hint:runFirst true .
    # filter the external ids present in the main graph
    [] wdt:P973 ?extid
  } UNION {
    ?q wdt:P973 ?extid
  }
} GROUP BY ?extid HAVING (?cnt>1) ORDER BY ?extid

Try it

Please that this is one way to optimize and make this query when running with federation, other techniques might exist.

Property paths edit

The query below is emitted by Scholia (Q45340488) and tries to list co-occurring topics^[3]:

PREFIX target: <http://www.wikidata.org/entity/Q104450895>

SELECT ?count (CONCAT("/topics/Q104450895,", SUBSTR(STR(?topic), 32)) AS ?countUrl)
       ?topic ?topicLabel (CONCAT("/topic/", SUBSTR(STR(?topic), 32)) AS ?topicUrl)
       ?example_work ?example_workLabel (CONCAT("/work/", SUBSTR(STR(?example_work), 32)) AS ?example_workUrl)
WITH {
  SELECT (COUNT(DISTINCT(?work)) AS ?count) ?topic (SAMPLE(?work) AS ?example_work) WHERE {
    # Find works for the specific queried topic
	  ?work wdt:P921/( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+) target: .
    
    # Find co-occuring topics
    ?work wdt:P921 ?topic .
    
    # Avoid listing the queried topic
      FILTER (target: != ?topic)
  }
  GROUP BY ?topic
} AS %result
WHERE {
  # Label the results
  INCLUDE %result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,nl,no,ru,sv,zh" . } 
}
ORDER BY DESC(?count)

Note that first we fixed a small issue in the original query by introducing DISTINCT around ?work^[4].

First problem we encounter is the property path ?work wdt:P921/( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+) target: .. Since ?work and target: might not be on the same graph we have introduce a variable holding the list of topics we are interested in (named ?topics in the rewritten example). Determining the topics has to run first to avoid timeouts so we add a hint to bind this variable first. The last problem we have to solve is given the construction of the query with a GROUP BY and the use of SAMPLE, the labels are fetched after the fact. Sadly for some reason I could not find a way yet to make the label service work in this scenario without a timeout:

PREFIX target: <http://www.wikidata.org/entity/Q104450895>

SELECT ?count (CONCAT("/topics/Q104450895,", SUBSTR(STR(?topic), 32)) AS ?countUrl)
       ?topic ?topicLabel (CONCAT("/topic/", SUBSTR(STR(?topic), 32)) AS ?topicUrl)
       ?example_work ?example_workLabel (CONCAT("/work/", SUBSTR(STR(?example_work), 32)) AS ?example_workUrl)
WITH {
  SELECT (COUNT(DISTINCT(?work)) AS ?count) ?topic (SAMPLE(?work) AS ?example_work) WHERE {
    { ?topics ( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+) target: }
    hint:Prior hint:runFirst true .
    {
      # Find works for the specific queried topic
	  ?work wdt:P921 ?topics .
      # Find co-occuring topics
      ?work wdt:P921 ?topic .
      # Avoid listing the queried topic
      FILTER (target: != ?topic)
    } UNION {
      SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
        # Find works for the specific queried topic
	    ?work wdt:P921 ?topics .
        # Find co-occuring topics
        ?work wdt:P921 ?topic .
        # Avoid listing the queried topic
        FILTER (target: != ?topic)
      }
    }
  }
  GROUP BY ?topic
} AS %result
WHERE {
  # Label the results
  INCLUDE %result
  OPTIONAL {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      # TODO Figure out a way to use the label service
      ?example_work rdfs:label ?example_workLabel .
      FILTER(LANG(?example_workLabel) = "en")
    }
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,nl,no,ru,sv,zh" . } 
}
ORDER BY DESC(?count)

Try it

Recent publications edit

The query below is configured to run via Synia (Q121294613) and presents the recent publications for a given author.

PREFIX target: <http://www.wikidata.org/entity/Q37860261>

SELECT
  (MIN(?dates) AS ?date)

  ?work ?workLabel (CONCAT("#work/", SUBSTR(STR(?work), 32)) AS ?workUrl)

  (GROUP_CONCAT(DISTINCT ?type_label; separator=", ") AS ?type)
  (SAMPLE(?pages_) AS ?pages)

  ?venue ?venueLabel
  (CONCAT("#venue/", SUBSTR(STR(?venue), 32)) AS ?venueUrl)

  (GROUP_CONCAT(DISTINCT ?author_label; separator=", ") AS ?authors)
  # (CONCAT("#authors/", GROUP_CONCAT(DISTINCT SUBSTR(STR(?author), 32); separator=",")) AS ?authorsUrl)
WHERE {
  { 
    ?work wdt:P50 target: ,
                  ?author .
    OPTIONAL {
      ?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en')
    }
    BIND(COALESCE(?author_label_, SUBSTR(STR(?author), 32)) AS ?author_label)
    OPTIONAL { ?work wdt:P1104 ?pages_ }
  }
  UNION
  {
    ?work wdt:P98 target: ,
                  ?author .
    OPTIONAL {
      ?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en')
    }
    BIND(CONCAT(COALESCE(?author_label_, SUBSTR(STR(?author), 32)), " (ed.)") AS ?author_label)
  }
  
  hint:Prior hint:runFirst true .
  
  OPTIONAL { ?work wdt:P31 ?type_ . ?type_ rdfs:label ?type_label . FILTER (LANG(?type_label) = 'en') }
  ?work wdt:P577 ?datetimes .
  BIND(xsd:date(?datetimes) AS ?dates)
  OPTIONAL { ?work wdt:P1433 ?venue }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?work ?workLabel ?venue ?venueLabel 
ORDER BY DESC(?date)

This query may sound complicated and how to apply federation is far from obvious. For complex queries like this it is advised to attempt a small refactoring to help the comprehension of the query structure especially on the inbound and outbound connections to the publication (here referenced as ?work). Here there are multiple things that might be simplified:

we do a union between to fetch works by author (P50) or editor (P98) while a property path expression wdt:P50|wdt:P98 might give similar results.
the attributes related to the work are fetched in various places while it could probably be done close to where the work is bound.

Here is a simplified version of the query:

PREFIX target: <http://www.wikidata.org/entity/Q37860261>

SELECT
  (MIN(?dates) AS ?date)

  ?work ?workLabel (CONCAT("#work/", SUBSTR(STR(?work), 32)) AS ?workUrl)

  (GROUP_CONCAT(DISTINCT ?type_label; separator=", ") AS ?type)
  (SAMPLE(?pages_) AS ?pages)

  ?venue ?venueLabel
  (CONCAT("#venue/", SUBSTR(STR(?venue), 32)) AS ?venueUrl)

  (GROUP_CONCAT(DISTINCT ?author_label; separator=", ") AS ?authors)
  # (CONCAT("#authors/", GROUP_CONCAT(DISTINCT SUBSTR(STR(?author), 32); separator=",")) AS ?authorsUrl)
WHERE {
  {
    ?work (wdt:P50|wdt:P98) target: ,
                            ?author ;
          wdt:P577 ?datetimes .
    OPTIONAL { ?work wdt:P1433 ?venue }
    OPTIONAL { ?work wdt:P1104 ?pages_ }
    OPTIONAL { ?work wdt:P31 ?type_ }
  }
  OPTIONAL { ?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en') }
  OPTIONAL { ?type_ rdfs:label ?type_label . FILTER (LANG(?type_label) = 'en') }
  BIND(COALESCE(?author_label_, SUBSTR(STR(?author), 32)) AS ?author_label)
  
  BIND(xsd:date(?datetimes) AS ?dates)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?work ?workLabel ?venue ?venueLabel 
ORDER BY DESC(?date)

The way publications are filtered and what attributes we extract from them are clearly identified and we can simply use the same UNION approach to fetch the works from both graphs:

PREFIX target: <http://www.wikidata.org/entity/Q37860261>

SELECT
  (MIN(?dates) AS ?date)

  ?work ?workLabel (CONCAT("#work/", SUBSTR(STR(?work), 32)) AS ?workUrl)

  (GROUP_CONCAT(DISTINCT ?type_label; separator=", ") AS ?type)
  (SAMPLE(?pages_) AS ?pages)

  ?venue ?venueLabel
  (CONCAT("#venue/", SUBSTR(STR(?venue), 32)) AS ?venueUrl)

  (GROUP_CONCAT(DISTINCT ?author_label; separator=", ") AS ?authors)
  # (CONCAT("#authors/", GROUP_CONCAT(DISTINCT SUBSTR(STR(?author), 32); separator=",")) AS ?authorsUrl)
WHERE {
  {
    ?work (wdt:P50|wdt:P98) target: ,
                            ?author ;
          wdt:P577 ?datetimes .
    OPTIONAL { ?work wdt:P1433 ?venue }
    OPTIONAL { ?work wdt:P1104 ?pages_ }
    OPTIONAL { ?work wdt:P31 ?type_ }
  } UNION {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?work (wdt:P50|wdt:P98) target: ,
                    ?author ;
            wdt:P577 ?datetimes .
      OPTIONAL { ?work wdt:P1104 ?pages_ }
      OPTIONAL { ?work wdt:P1433 ?venue }
      OPTIONAL { ?work wdt:P31 ?type_ }
      BIND(?workLabel AS ?workLabel)
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
  }
  OPTIONAL { ?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en') }
  OPTIONAL { ?type_ rdfs:label ?type_label . FILTER (LANG(?type_label) = 'en') }
  BIND(COALESCE(?author_label_, SUBSTR(STR(?author), 32)) AS ?author_label)
  BIND(xsd:date(?datetimes) AS ?dates)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?work ?workLabel ?venue ?venueLabel 
ORDER BY DESC(?date)

Try it

Number of articles with CiTO-annotated citations by year edit

This is another example from Scholia (Q45340488) and ultimately displays a bar chart on the scholia website^[5]:

select ?year (count(?work) as ?number_of_publications) ?role where {
  {
    select ?work (min(?years) as ?year) ?type_ ?venue where {
      ?work wdt:P577 ?dates ;
            p:P2860 / pq:P3712 / wdt:P31 wd:Q96471816 .
      bind(str(year(?dates)) as ?years) .
      OPTIONAL {
        ?work wdt:P31 wd:Q109229154 . bind("explicit" as ?type_)
        ?work wdt:P1433 ?venue_ . ?venue_ rdfs:label ?venue . FILTER (LANG(?venue) = "en")
        MINUS { ?venue_ wdt:P31 wd:Q1143604 }
      }
    }
    group by ?work ?type_ ?venue
  }
  bind(
    coalesce(
      if(bound(?type_), ?venue,
      'other source')
    ) as ?role
  )
}
group by ?year ?role
order by ?year

For this query we will attempt a quick rewrite to properly identify the links to and from the works (identified as ?work).

First thing to note is the property path p:P2860 / pq:P3712 / wdt:P31 wd:Q96471816 . that we have to unfold introducing a new variable we will name ?cit_intent.
Secondly we see that the venue label is fetched in an OPTIONAL, since the label of the venue might not be part of the same split we might delay loading its label after the inner SELECT in a separate OPTIONAL.

But there is something to be very careful about: we have to select and group by using the ?venue_ and that variable is not necessarily bound. To avoid the clause that fetches the labels to load all labels of all entity we no longer return an unbound ?venue_ variable using COALESCE:

# tool: scholia
select ?year (count(distinct(?work)) as ?number_of_publications) ?role where {
  ?cit_intent wdt:P31 wd:Q96471816 .
  {
    select ?work (min(?years) as ?year) ?type_ (coalesce(if(bound(?venue_), ?venue_, '__no_venue__')) AS ?venue_) where {
      ?work wdt:P577 ?dates ;
            p:P2860 / pq:P3712 ?cit_intent .
      bind(str(year(?dates)) as ?years) .
      OPTIONAL {
        ?work wdt:P31 wd:Q109229154 .
        bind("explicit" as ?type_)
        ?work wdt:P1433 ?venue_ . 
      }
    }
    group by ?work ?type_ ?venue_
  }
  OPTIONAL { ?venue_ rdfs:label ?venue . FILTER (LANG(?venue) = "en") }
  bind(
    coalesce(
      if(bound(?type_), ?venue,
      'other source')
    ) as ?role
  )
}
group by ?year ?role
order by ?year

Note that we dropped the clause MINUS { ?venue_ wdt:P31 wd:Q1143604 }^[6] and have introduced a distinct to avoid counting duplicated work (this is because we extracted the ?cit_intent out of the inner SELECT).

To rewrite this query with federation we again use the same UNION technique:

select ?year (count(distinct(?work)) as ?number_of_publications) ?role where {
  ?cit_intent wdt:P31 wd:Q96471816 .
  {
    select ?work (min(?years) as ?year) ?type_ (coalesce(if(bound(?venue_), ?venue_, '__no_venue__')) AS ?venue_) where {
      {
        ?work wdt:P577 ?dates ;
              p:P2860 / pq:P3712 ?cit_intent .
        OPTIONAL {
          ?work wdt:P31 wd:Q109229154 .
          bind("explicit" as ?type_)
          ?work wdt:P1433 ?venue_ .
        }
      } UNION {
        SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
          ?work wdt:P577 ?dates ;
                p:P2860 / pq:P3712 ?cit_intent .
          OPTIONAL {
            ?work wdt:P31 wd:Q109229154 .
            bind("explicit" as ?type_)
            ?work wdt:P1433 ?venue_ . 
          }
        }
      }
      bind(str(year(?dates)) as ?years) .
    }
    group by ?work ?type_ ?venue_
  }
  OPTIONAL { ?venue_ rdfs:label ?venue . FILTER (LANG(?venue) = "en") }
  bind(
    coalesce(
      if(bound(?type_), ?venue,
      'other source')
    ) as ?role
  )
}
group by ?year ?role
order by ?year

Try it

Publications in a WikiProject (Q16695773) that have a main subject that is an instance of a person edit

This example is taken from the NZThesisProject (Q111645234) project and is listing all publications belonging to this project that have a main subject (P921) that is a human (Q5):

SELECT DISTINCT ?Thesis ?ThesisLabel ?mainsubjectLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?Thesis wdt:P5008 wd:Q111645234.
  ?Thesis wdt:P921 ?mainsubject.
  ?mainsubject wdt:P31 wd:Q5.
}
LIMIT 100

Similar to previous examples we first identify the patterns that filter and extract the publications and its statements:

   ?Thesis wdt:P5008 wd:Q111645234.
   ?Thesis wdt:P921 ?mainsubject.

Note that the label of the publication is pulled implicitly by the label service and is easy to forget^[7].

The general case (if publications are spread across the general and the scholarly graph) can be rewritten as:

SELECT DISTINCT ?Thesis ?ThesisLabel ?mainsubjectLabel WHERE {
  {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?Thesis wdt:P5008 wd:Q111645234 ;
              wdt:P921 ?mainsubject.
      BIND(?Thesis as ?ThesisLabel) # we pull the label of the publication from the federated subgraph
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
  } UNION {
    ?Thesis wdt:P5008 wd:Q111645234 ;
            wdt:P921 ?mainsubject.
  }
  hint:Prior hint:runFirst true .
  ?mainsubject wdt:P31 wd:Q5.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Here we UNION the publications from both graphs.

If the publication are not spread across multiple subgraphs the rewrite is slightly easier since we no longer need the UNION:

SELECT DISTINCT ?Thesis ?ThesisLabel ?mainsubjectLabel WHERE {
  SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
    ?Thesis wdt:P5008 wd:Q111645234 ;
            wdt:P921 ?mainsubject.
    BIND(?Thesis as ?ThesisLabel) # we pull the label of the publication from the federated subgraph
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  }

  hint:Prior hint:runFirst true .
  ?mainsubject wdt:P31 wd:Q5.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Publications in a WikiProject (Q16695773) but where the linked author is not in that project edit

This is another example from the NZThesisProject (Q111645234) project:

SELECT DISTINCT ?person ?personLabel WHERE
{ ?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
  ?person wdt:P1026 ?thesis.       #the thesis has a linked author
  
 MINUS {    {  ?person wdt:P5008 wd:Q111645234.}  }  #person is not marked as in the thesis project yet
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}

In this example we filter the publications using a single triple patter: ?thesis wdt:P5008 wd:Q111645234. The general case (publications spread across the subgraphs):

SELECT DISTINCT ?person ?personLabel WHERE { 
  {
    SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
      ?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
    }
  } UNION {
    ?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
  }
  hint:Prior hint:runFirst true .

  ?person wdt:P1026 ?thesis.       #the thesis has a linked author
  MINUS { ?person wdt:P5008 wd:Q111645234 . }  #person is not marked as in the thesis project yet
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}

Or the simplified version if the publications are only part of the scholarly subgraph:

SELECT DISTINCT ?person ?personLabel WHERE { 
  SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
    ?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
  }
  hint:Prior hint:runFirst true .

  ?person wdt:P1026 ?thesis.       #the thesis has a linked author
  MINUS { ?person wdt:P5008 wd:Q111645234 . }  #person is not marked as in the thesis project yet
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}

Notes edit

↑ wikitech:User:DCausse/WDQS_Graph_Split_Impact_Analysis
↑ Also known as truthy statements
↑ example for Q104450895
↑ Fixing https://github.com/WDscholia/scholia/issues/2388 which might be exacerbated because of federation
↑ https://scholia.toolforge.org/cito/#articles
↑ It was not necessary
↑ Please see Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits#Misplacing_the_label_service

[1] wikitech:User:DCausse/WDQS_Graph_Split_Impact_Analysis

[2] Also known as truthy statements

[3] xample for Q104450895

[4] Fixing https://github.com/WDscholia/scholia/issues/2388 which might be exacerbated because of federation

[5] ttps://scholia.toolforge.org/cito/#articles

[6] It was not necessary

[7] Please see Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits#Misplacing_the_label_service

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Wikidata:SPARQL query service/WDQS graph split/Federated Queries Examples

Sitelinks lookup edit

Simple lookup by object on the truthy graph edit

Simple lookup by subject edit

Lookup from mwapi results edit

Simple count edit

Joining papers and authors edit

Finding duplicated external ids with a group by edit

Property paths edit

Recent publications edit

Number of articles with CiTO-annotated citations by year edit

Publications in a WikiProject (Q16695773) that have a main subject that is an instance of a person edit

Publications in a WikiProject (Q16695773) but where the linked author is not in that project edit

Notes edit

Finding duplicated external ids with a `group by` edit