Wikidata:SPARQL query service/WDQS graph split/Federated Queries Examples
This page is a compilation of queries identified during an analysis of the impact of the graph split on existing queries[1] and an explanation on how to rewrite them with federation. Note that the way we interpreted the queries might not reflect the intent of the original author but our goal is to rewrite them in such a way that the query returns identical results on the global graph and the split graph with a federated query. To make federation mandatory the queries were sometimes slightly modified from what we originally found in the query logs.
Sitelinks lookup
editThe query below does a simple lookup of the sitelinks of a given entity.
SELECT * {
?wikimediaUrl schema:about wd:Q330955 ;
schema:isPartOf ?partOf_ ;
schema:inLanguage ?inLanguage .
bind(replace(str(?partOf_), '^https://[^.]+\\.([^.]+)\\.[^.]+/$', '$1') as ?partOf)
bind(replace(str(?wikimediaUrl), '.*/wiki/([^/]+)$', '$1') as ?articleTitle)
} ORDER BY ?partOf ?inLanguage
This query does target a single entity and extracts its sitelinks. If you know if the entity is a scientific article or not this query can be used as-is by choosing the right endpoint to run it. If not we can use a union to fetch the data from both graphs (here running from the main graph graph and federating from the scholarly article subgraph):
SELECT * {
VALUES (?item) {(wd:Q330955) (wd:Q42)}
{
?wikimediaUrl schema:about ?item ;
schema:isPartOf ?partOf_ ;
schema:inLanguage ?inLanguage .
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?wikimediaUrl schema:about ?item ;
schema:isPartOf ?partOf_ ;
schema:inLanguage ?inLanguage .
}
}
bind(replace(str(?partOf_), '^https://[^.]+\\.([^.]+)\\.[^.]+/$', '$1') as ?partOf)
bind(replace(str(?wikimediaUrl), '.*/wiki/([^/]+)$', '$1') as ?articleTitle)
} ORDER BY ?partOf ?inLanguage
The above query has been slightly modified to fetch two entities from the two sub-graphs.
Note: It is worth explaining that it is not appropriate to use the query service for such query, the wbgetentities API module or the REST api should be preferred instead.
Simple lookup by object on the truthy graph
editThe query below does try to find all the entities that have a direct claim[2] to a given entity.
SELECT ?x ?xLabel ?r ?relLabel {
?x ?r wd:Q1542532 .
?rel wikibase:directClaim ?r
SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
} ORDER BY ?r ?xLabel
Since the property definitions are in the main graph we may want to start the query from the main endpoint and federate the scholarly article one. Using the same approach as the previous query we leverage a UNION
to capture the data from both splits.
We also want to use the label service for both the direct property label and the entity that is linking the searched work. For this we pass again the label service to the federated query and bind the label artificially (BIND(?xLabel AS ?xLabel)
) to indicate the label service that we want this variable as an output of the federated query:
SELECT ?x ?xLabel ?r ?relLabel {
VALUES (?work) {(wd:Q1542532)}
{
?x ?r ?work .
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?x ?r ?work .
BIND(?xLabel AS ?xLabel)
SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
}
}
?rel wikibase:directClaim ?r
SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
} ORDER BY ?r ?xLabel
Simple lookup by subject
editThe query below tries to fetch the label and the description of a given item or a lemma:
SELECT ?xLabel ?descr {
VALUES (?x) {(wd:Q77051335) (wd:L17271) (wd:Q42)}
OPTIONAL { ?x wikibase:lemma ?xLabel }
OPTIONAL { ?x schema:description ?descr . filter(lang(?descr) = 'en') }
SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,rm' }
}
This query is slightly confusing because it can bind the ?xLabel
variable via the BGP on wikibase:lemma
or via the label service. It also contains two optional clauses which mean that nonexistent entities can be returned. For the purpose of this rewrite we slightly change the semantic of the query by removing the OPTIONAL
and prefer a UNION
. For federation we use the same technique as before by using a UNION
:
SELECT ?xLabel ?descr {
VALUES (?x) {(wd:Q77051335) (wd:L17271) (wd:Q42)}
{
{ ?x wikibase:lemma ?xLabel }
UNION
{ ?x schema:description ?descr . filter(lang(?descr) = 'en') }
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?x schema:description ?descr . filter(lang(?descr) = 'en')
BIND(?xLabel AS ?xLabel)
SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,rm' }
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,rm' }
}
We removed the wikibase:lemma
lookup from the scholarly subgraph, it would be pointless since there are no lemmas in this subgraph.
Note: It is worth explaining that it is not appropriate to use the query service for such query, the wbgetentities API module or the REST api should be preferred instead.
Lookup from mwapi results
editThe query below does source the list of items from the wikidata search API and returns the type of the item and one of its label.
SELECT ?item ?itemLabel ?type ?typeLabel WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch" .
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
bd:serviceParam mwapi:search "house of medici" .
bd:serviceParam mwapi:language "en" .
?item wikibase:apiOutputItem mwapi:item .
?num wikibase:apiOrdinal true .
} ?item (wdt:P31|wdt:P279|wdt:P366) ?type
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ASC(?num) LIMIT 10
Similar to how we solved previous use cases we use UNION
to collect the results from the scholarly articles subgraph:
SELECT ?item ?itemLabel ?type ?typeLabel WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch" .
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
bd:serviceParam mwapi:search "house of medici" .
bd:serviceParam mwapi:language "en" .
?item wikibase:apiOutputItem mwapi:item .
?num wikibase:apiOrdinal true .
}
{
?item (wdt:P31|wdt:P279|wdt:P366) ?type
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?item (wdt:P31|wdt:P279|wdt:P366) ?type
BIND(?itemLabel as ?itemLabel)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ASC(?num) LIMIT 10
Note: It is worth explaining that it is not appropriate to use the query service for such query, the wbsearchentities API module should be combined with the wbgetentities API module or the REST api on the returned results.
Simple count
editThe query below is issued from InteGraality (Q84572489) and is simply counting the number of items matching certain criteria:
SELECT (COUNT(*) as ?count) WHERE {
?entity wdt:P4101 wd:Q41506
FILTER(EXISTS {
?entity p:P495 []
})
}
If the count requires items in both sub graph to be added federation is required, two approaches can be taken here. First by doing the count on the individual graphs and then adding the results (probably the most efficient):
SELECT (?countMain + ?countSchol AS ?count) WHERE {
{
SELECT (COUNT(*) as ?countMain) WHERE {
?entity wdt:P4101 wd:Q41506 ;
p:P495 [] .
}
}
SERVICE wdsubgraph:scholarly_articles {
SELECT (COUNT(*) as ?countSchol) WHERE {
?entity wdt:P4101 wd:Q41506 ;
p:P495 [] .
}
}
}
Or by transferring the individual results from the scholarly article sub-graph engine to the main graph engine and counting:
SELECT (COUNT(*) as ?count) WHERE {
{
?entity wdt:P4101 wd:Q41506 ;
p:P495 [] .
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?entity wdt:P4101 wd:Q41506 ;
p:P495 [] .
}
}
}
Notes: It might be worth explaining that for the two properties used here the search API could perhaps have been a good fit using the hasbwstatement search keyword.
Joining papers and authors
editThe query below is emitted from a tool named duplicate_researchers does join research papers and their authors to identify duplicates:
SELECT (group_concat(DISTINCT ?qLabel;separator='|') AS ?labels) (group_concat(DISTINCT REPLACE(str(?q),'^.+/','');separator='|') AS ?qs) ?lastName ?firstLetter (count(DISTINCT ?q) AS ?cnt) {
VALUES ?author { wd:Q64213594 }.
?paper wdt:P50 ?author.
?paper wdt:P50 ?q.
?q rdfs:label ?qLabel.
FILTER(lang(?qLabel)='en').
BIND(REPLACE(?qLabel, '^.* ', '') AS ?lastName).
BIND(REPLACE(?qLabel, '^(.).*$', '$1') AS ?firstLetter)
} GROUP BY ?lastName ?firstLetter HAVING (?cnt>1)
Since authors and papers might not be on the same graph we have to use federation.
SELECT (group_concat(DISTINCT ?qLabel;separator='|') AS ?labels) (group_concat(DISTINCT REPLACE(str(?q),'^.+/','');separator='|') AS ?qs) ?lastName ?firstLetter (count(DISTINCT ?q) AS ?cnt) {
VALUES ?author { wd:Q64213594 }.
{
?paper wdt:P50 ?author.
?paper wdt:P50 ?q.
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?paper wdt:P50 ?author.
?paper wdt:P50 ?q.
}
}
hint:Prior hint:runFirst true .
?q rdfs:label ?qLabel.
FILTER(lang(?qLabel)='en').
BIND(REPLACE(?qLabel, '^.* ', '') AS ?lastName).
BIND(REPLACE(?qLabel, '^(.).*$', '$1') AS ?firstLetter)
} GROUP BY ?lastName ?firstLetter HAVING (?cnt>1)
We note here that we have to help blazegraph by telling it to first grab the paper data from the federated service. The reason is that by default it might try to gather most of the data from the initial service before sending it to the federated one. Here for instance it might try to bind all possible ?q
with ?q rdfs:label ?qLabel
while it should first bind ?paper
(the most limiting criteria) from the federated service. This is what hint:Prior hint:runFirst true
is doing by telling blazegraph to run the prior statements before attempting to resolve ?q rdfs:label ?qLabel
.
Finding duplicated external ids with a group by
edit
The query below is issued from Mix'n'match (Q28054658) and attempts to find items that share the same external id.
SELECT ?extid (count(?q) AS ?cnt) (GROUP_CONCAT(?q; SEPARATOR = '|') AS ?items) {
?q wdt:P244 ?extid
}
GROUP BY ?extid HAVING (?cnt>1) ORDER BY ?extid
This query can be rewritten with federation naively using the same UNION technique:
SELECT ?extid (count(?q) AS ?cnt) (GROUP_CONCAT(?q; SEPARATOR = '|') AS ?items) {
{
?q wdt:P244 ?extid
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?q wdt:P244 ?extid
}
}
}
GROUP BY ?extid HAVING (?cnt>1) ORDER BY ?extid
Property paths
editThe query below is emitted by Scholia (Q45340488) and tries to list co-occurring topics[3]:
PREFIX target: <http://www.wikidata.org/entity/Q104450895>
SELECT ?count (CONCAT("/topics/Q104450895,", SUBSTR(STR(?topic), 32)) AS ?countUrl)
?topic ?topicLabel (CONCAT("/topic/", SUBSTR(STR(?topic), 32)) AS ?topicUrl)
?example_work ?example_workLabel (CONCAT("/work/", SUBSTR(STR(?example_work), 32)) AS ?example_workUrl)
WITH {
SELECT (COUNT(DISTINCT(?work)) AS ?count) ?topic (SAMPLE(?work) AS ?example_work) WHERE {
# Find works for the specific queried topic
?work wdt:P921/( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+) target: .
# Find co-occuring topics
?work wdt:P921 ?topic .
# Avoid listing the queried topic
FILTER (target: != ?topic)
}
GROUP BY ?topic
} AS %result
WHERE {
# Label the results
INCLUDE %result
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,nl,no,ru,sv,zh" . }
}
ORDER BY DESC(?count)
Note that first we fixed a small issue in the original query by introducing DISTINCT
around ?work
[4].
First problem we encounter is the property path ?work wdt:P921/( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+) target: .
. Since ?work
and target:
might not be on the same graph we have introduce a variable holding the list of topics we are interested in (named ?topics
in the rewritten example).
Determining the topics has to run first to avoid timeouts so we add a hint to bind this variable first.
The last problem we have to solve is given the construction of the query with a GROUP BY
and the use of SAMPLE
, the labels are fetched after the fact.
Sadly for some reason I could not find a way yet to make the label service work in this scenario without a timeout:
PREFIX target: <http://www.wikidata.org/entity/Q104450895>
SELECT ?count (CONCAT("/topics/Q104450895,", SUBSTR(STR(?topic), 32)) AS ?countUrl)
?topic ?topicLabel (CONCAT("/topic/", SUBSTR(STR(?topic), 32)) AS ?topicUrl)
?example_work ?example_workLabel (CONCAT("/work/", SUBSTR(STR(?example_work), 32)) AS ?example_workUrl)
WITH {
SELECT (COUNT(DISTINCT(?work)) AS ?count) ?topic (SAMPLE(?work) AS ?example_work) WHERE {
{ ?topics ( wdt:P31*/wdt:P279* | wdt:P361+ | wdt:P1269+) target: }
hint:Prior hint:runFirst true .
{
# Find works for the specific queried topic
?work wdt:P921 ?topics .
# Find co-occuring topics
?work wdt:P921 ?topic .
# Avoid listing the queried topic
FILTER (target: != ?topic)
} UNION {
SERVICE wdsubgraph:scholarly_articles {
# Find works for the specific queried topic
?work wdt:P921 ?topics .
# Find co-occuring topics
?work wdt:P921 ?topic .
# Avoid listing the queried topic
FILTER (target: != ?topic)
}
}
}
GROUP BY ?topic
} AS %result
WHERE {
# Label the results
INCLUDE %result
OPTIONAL {
SERVICE wdsubgraph:scholarly_articles {
# TODO Figure out a way to use the label service
?example_work rdfs:label ?example_workLabel .
FILTER(LANG(?example_workLabel) = "en")
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,jp,nl,no,ru,sv,zh" . }
}
ORDER BY DESC(?count)
Recent publications
editThe query below is configured to run via Synia (Q121294613) and presents the recent publications for a given author.
PREFIX target: <http://www.wikidata.org/entity/Q37860261>
SELECT
(MIN(?dates) AS ?date)
?work ?workLabel (CONCAT("#work/", SUBSTR(STR(?work), 32)) AS ?workUrl)
(GROUP_CONCAT(DISTINCT ?type_label; separator=", ") AS ?type)
(SAMPLE(?pages_) AS ?pages)
?venue ?venueLabel
(CONCAT("#venue/", SUBSTR(STR(?venue), 32)) AS ?venueUrl)
(GROUP_CONCAT(DISTINCT ?author_label; separator=", ") AS ?authors)
# (CONCAT("#authors/", GROUP_CONCAT(DISTINCT SUBSTR(STR(?author), 32); separator=",")) AS ?authorsUrl)
WHERE {
{
?work wdt:P50 target: ,
?author .
OPTIONAL {
?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en')
}
BIND(COALESCE(?author_label_, SUBSTR(STR(?author), 32)) AS ?author_label)
OPTIONAL { ?work wdt:P1104 ?pages_ }
}
UNION
{
?work wdt:P98 target: ,
?author .
OPTIONAL {
?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en')
}
BIND(CONCAT(COALESCE(?author_label_, SUBSTR(STR(?author), 32)), " (ed.)") AS ?author_label)
}
hint:Prior hint:runFirst true .
OPTIONAL { ?work wdt:P31 ?type_ . ?type_ rdfs:label ?type_label . FILTER (LANG(?type_label) = 'en') }
?work wdt:P577 ?datetimes .
BIND(xsd:date(?datetimes) AS ?dates)
OPTIONAL { ?work wdt:P1433 ?venue }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?work ?workLabel ?venue ?venueLabel
ORDER BY DESC(?date)
This query may sound complicated and how to apply federation is far from obvious. For complex queries like this it is advised to attempt a small refactoring to help the comprehension of the query structure especially on the inbound and outbound connections to the publication (here referenced as ?work
).
Here there are multiple things that might be simplified:
- we do a union between to fetch works by author (P50) or editor (P98) while a property path expression
wdt:P50|wdt:P98
might give similar results. - the attributes related to the work are fetched in various places while it could probably be done close to where the work is bound.
Here is a simplified version of the query:
PREFIX target: <http://www.wikidata.org/entity/Q37860261>
SELECT
(MIN(?dates) AS ?date)
?work ?workLabel (CONCAT("#work/", SUBSTR(STR(?work), 32)) AS ?workUrl)
(GROUP_CONCAT(DISTINCT ?type_label; separator=", ") AS ?type)
(SAMPLE(?pages_) AS ?pages)
?venue ?venueLabel
(CONCAT("#venue/", SUBSTR(STR(?venue), 32)) AS ?venueUrl)
(GROUP_CONCAT(DISTINCT ?author_label; separator=", ") AS ?authors)
# (CONCAT("#authors/", GROUP_CONCAT(DISTINCT SUBSTR(STR(?author), 32); separator=",")) AS ?authorsUrl)
WHERE {
{
?work (wdt:P50|wdt:P98) target: ,
?author ;
wdt:P577 ?datetimes .
OPTIONAL { ?work wdt:P1433 ?venue }
OPTIONAL { ?work wdt:P1104 ?pages_ }
OPTIONAL { ?work wdt:P31 ?type_ }
}
OPTIONAL { ?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en') }
OPTIONAL { ?type_ rdfs:label ?type_label . FILTER (LANG(?type_label) = 'en') }
BIND(COALESCE(?author_label_, SUBSTR(STR(?author), 32)) AS ?author_label)
BIND(xsd:date(?datetimes) AS ?dates)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?work ?workLabel ?venue ?venueLabel
ORDER BY DESC(?date)
The way publications are filtered and what attributes we extract from them are clearly identified and we can simply use the same UNION
approach to fetch the works from both graphs:
PREFIX target: <http://www.wikidata.org/entity/Q37860261>
SELECT
(MIN(?dates) AS ?date)
?work ?workLabel (CONCAT("#work/", SUBSTR(STR(?work), 32)) AS ?workUrl)
(GROUP_CONCAT(DISTINCT ?type_label; separator=", ") AS ?type)
(SAMPLE(?pages_) AS ?pages)
?venue ?venueLabel
(CONCAT("#venue/", SUBSTR(STR(?venue), 32)) AS ?venueUrl)
(GROUP_CONCAT(DISTINCT ?author_label; separator=", ") AS ?authors)
# (CONCAT("#authors/", GROUP_CONCAT(DISTINCT SUBSTR(STR(?author), 32); separator=",")) AS ?authorsUrl)
WHERE {
{
?work (wdt:P50|wdt:P98) target: ,
?author ;
wdt:P577 ?datetimes .
OPTIONAL { ?work wdt:P1433 ?venue }
OPTIONAL { ?work wdt:P1104 ?pages_ }
OPTIONAL { ?work wdt:P31 ?type_ }
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?work (wdt:P50|wdt:P98) target: ,
?author ;
wdt:P577 ?datetimes .
OPTIONAL { ?work wdt:P1104 ?pages_ }
OPTIONAL { ?work wdt:P1433 ?venue }
OPTIONAL { ?work wdt:P31 ?type_ }
BIND(?workLabel AS ?workLabel)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
}
OPTIONAL { ?author rdfs:label ?author_label_ . FILTER (LANG(?author_label_) = 'en') }
OPTIONAL { ?type_ rdfs:label ?type_label . FILTER (LANG(?type_label) = 'en') }
BIND(COALESCE(?author_label_, SUBSTR(STR(?author), 32)) AS ?author_label)
BIND(xsd:date(?datetimes) AS ?dates)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?work ?workLabel ?venue ?venueLabel
ORDER BY DESC(?date)
Number of articles with CiTO-annotated citations by year
editThis is another example from Scholia (Q45340488) and ultimately displays a bar chart on the scholia website[5]:
select ?year (count(?work) as ?number_of_publications) ?role where {
{
select ?work (min(?years) as ?year) ?type_ ?venue where {
?work wdt:P577 ?dates ;
p:P2860 / pq:P3712 / wdt:P31 wd:Q96471816 .
bind(str(year(?dates)) as ?years) .
OPTIONAL {
?work wdt:P31 wd:Q109229154 . bind("explicit" as ?type_)
?work wdt:P1433 ?venue_ . ?venue_ rdfs:label ?venue . FILTER (LANG(?venue) = "en")
MINUS { ?venue_ wdt:P31 wd:Q1143604 }
}
}
group by ?work ?type_ ?venue
}
bind(
coalesce(
if(bound(?type_), ?venue,
'other source')
) as ?role
)
}
group by ?year ?role
order by ?year
For this query we will attempt a quick rewrite to properly identify the links to and from the works (identified as ?work
).
- First thing to note is the property path
p:P2860 / pq:P3712 / wdt:P31 wd:Q96471816 .
that we have to unfold introducing a new variable we will name?cit_intent
. - Secondly we see that the venue label is fetched in an
OPTIONAL
, since the label of the venue might not be part of the same split we might delay loading its label after the innerSELECT
in a separateOPTIONAL
.
But there is something to be very careful about: we have to select and group by using the ?venue_
and that variable is not necessarily bound. To avoid the clause that fetches the labels to load all labels of all entity we no longer return an unbound ?venue_ variable using COALESCE
:
# tool: scholia
select ?year (count(distinct(?work)) as ?number_of_publications) ?role where {
?cit_intent wdt:P31 wd:Q96471816 .
{
select ?work (min(?years) as ?year) ?type_ (coalesce(if(bound(?venue_), ?venue_, '__no_venue__')) AS ?venue_) where {
?work wdt:P577 ?dates ;
p:P2860 / pq:P3712 ?cit_intent .
bind(str(year(?dates)) as ?years) .
OPTIONAL {
?work wdt:P31 wd:Q109229154 .
bind("explicit" as ?type_)
?work wdt:P1433 ?venue_ .
}
}
group by ?work ?type_ ?venue_
}
OPTIONAL { ?venue_ rdfs:label ?venue . FILTER (LANG(?venue) = "en") }
bind(
coalesce(
if(bound(?type_), ?venue,
'other source')
) as ?role
)
}
group by ?year ?role
order by ?year
Note that we dropped the clause MINUS { ?venue_ wdt:P31 wd:Q1143604 }
[6] and have introduced a distinct
to avoid counting duplicated work (this is because we extracted the ?cit_intent
out of the inner SELECT
).
To rewrite this query with federation we again use the same UNION
technique:
select ?year (count(distinct(?work)) as ?number_of_publications) ?role where {
?cit_intent wdt:P31 wd:Q96471816 .
{
select ?work (min(?years) as ?year) ?type_ (coalesce(if(bound(?venue_), ?venue_, '__no_venue__')) AS ?venue_) where {
{
?work wdt:P577 ?dates ;
p:P2860 / pq:P3712 ?cit_intent .
OPTIONAL {
?work wdt:P31 wd:Q109229154 .
bind("explicit" as ?type_)
?work wdt:P1433 ?venue_ .
}
} UNION {
SERVICE wdsubgraph:scholarly_articles {
?work wdt:P577 ?dates ;
p:P2860 / pq:P3712 ?cit_intent .
OPTIONAL {
?work wdt:P31 wd:Q109229154 .
bind("explicit" as ?type_)
?work wdt:P1433 ?venue_ .
}
}
}
bind(str(year(?dates)) as ?years) .
}
group by ?work ?type_ ?venue_
}
OPTIONAL { ?venue_ rdfs:label ?venue . FILTER (LANG(?venue) = "en") }
bind(
coalesce(
if(bound(?type_), ?venue,
'other source')
) as ?role
)
}
group by ?year ?role
order by ?year
Publications in a WikiProject (Q16695773) that have a main subject that is an instance of a person
editThis example is taken from the NZThesisProject (Q111645234) project and is listing all publications belonging to this project that have a main subject (P921) that is a human (Q5):
SELECT DISTINCT ?Thesis ?ThesisLabel ?mainsubjectLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?Thesis wdt:P5008 wd:Q111645234.
?Thesis wdt:P921 ?mainsubject.
?mainsubject wdt:P31 wd:Q5.
}
LIMIT 100
Similar to previous examples we first identify the patterns that filter and extract the publications and its statements:
?Thesis wdt:P5008 wd:Q111645234.
?Thesis wdt:P921 ?mainsubject.
Note that the label of the publication is pulled implicitly by the label service and is easy to forget[7].
The general case (if publications are spread across the general and the scholarly graph) can be rewritten as:
SELECT DISTINCT ?Thesis ?ThesisLabel ?mainsubjectLabel WHERE {
{
SERVICE wdsubgraph:scholarly_articles {
?Thesis wdt:P5008 wd:Q111645234 ;
wdt:P921 ?mainsubject.
BIND(?ThesisLabel as ?ThesisLabel) # we pull the label of the publication from the federated subgraph
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
} UNION {
?Thesis wdt:P5008 wd:Q111645234 ;
wdt:P921 ?mainsubject.
}
hint:Prior hint:runFirst true .
?mainsubject wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100
Here we UNION
the publications from both graphs.
If the publication are not spread across multiple subgraphs the rewrite is slightly easier since we no longer need the UNION
:
SELECT DISTINCT ?Thesis ?ThesisLabel ?mainsubjectLabel WHERE {
SERVICE wdsubgraph:scholarly_articles {
?Thesis wdt:P5008 wd:Q111645234 ;
wdt:P921 ?mainsubject.
BIND(?ThesisLabel as ?ThesisLabel) # we pull the label of the publication from the federated subgraph
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
hint:Prior hint:runFirst true .
?mainsubject wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100
Publications in a WikiProject (Q16695773) but where the linked author is not in that project
editThis is another example from the NZThesisProject (Q111645234) project:
SELECT DISTINCT ?person ?personLabel WHERE
{ ?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
?person wdt:P1026 ?thesis. #the thesis has a linked author
MINUS { { ?person wdt:P5008 wd:Q111645234.} } #person is not marked as in the thesis project yet
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}
In this example we filter the publications using a single triple patter: ?thesis wdt:P5008 wd:Q111645234
.
The general case (publications spread across the subgraphs):
SELECT DISTINCT ?person ?personLabel WHERE {
{
SERVICE wdsubgraph:scholarly_articles {
?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
}
} UNION {
?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
}
hint:Prior hint:runFirst true .
?person wdt:P1026 ?thesis. #the thesis has a linked author
MINUS { ?person wdt:P5008 wd:Q111645234 . } #person is not marked as in the thesis project yet
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
Or the simplified version if the publications are only part of the scholarly subgraph:
SELECT DISTINCT ?person ?personLabel WHERE {
SERVICE wdsubgraph:scholarly_articles {
?thesis wdt:P5008 wd:Q111645234. #thesis is in the NZThesisProject
}
hint:Prior hint:runFirst true .
?person wdt:P1026 ?thesis. #the thesis has a linked author
MINUS { ?person wdt:P5008 wd:Q111645234 . } #person is not marked as in the thesis project yet
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
}
Notes
edit- ↑ wikitech:User:DCausse/WDQS_Graph_Split_Impact_Analysis
- ↑ Also known as truthy statements
- ↑ example for Q104450895
- ↑ Fixing https://github.com/WDscholia/scholia/issues/2388 which might be exacerbated because of federation
- ↑ https://scholia.toolforge.org/cito/#articles
- ↑ It was not necessary
- ↑ Please see Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits#Misplacing_the_label_service