User:D1gggg/Wikidata model and SPARQL

WDQS, the Wikidata Query Service (Q20950365) View with Reasonator View with SQID is an awesome tool to answer many questions we might have.

For brief introduction about interface with pictures and very first queries: A gentle introduction to the Wikidata Query Service.

SPARQL 1.1 Query Language (Q32146616) View with Reasonator View with SQID is a language used in Wikidata Query Service (Q20950365) View with Reasonator View with SQID.


We will:

Whitespace is significant in strings, but not meaningful otherwise.[sparqlspec 1] WDQS editor indents lines for us automatically.

Let's go!

RDF graph (Q31386861)Edit


Relations. Claims. object (Q488383).

subject >  Wikidata property  object or value >

RDF nodeEdit

Internationalized Resource Identifier (Q424583), typed literal (Q31381203)[rdfconcepts 2] and blank node (Q3427875) are RDF node (Q31465098) in RDF graph (Q31386861);[rdfconcepts 3]

Internationalized Resource IdentifierEdit

IRIs differ from typed literal in RDF and in SPARQL.

typed literalEdit

SPARQL treats them separately:[sparqlspec 8]

SELECT ?node ?predicate WHERE {
  ?node ?predicate "Wikidata"

Try it!

is different from

SELECT ?node ?predicate WHERE {
  ?node ?predicate "Wikidata"@en # @en is different from @en-gb and @en-ca

Try it!

  • to get datatype of typed literal: DATATYPE("Wikidata")
  • to get language tag of language-tagged string: LANG("Wikidata"@en)
  • to construct typed literal with datatype: STRDT("Wikidata", xsd:string)
  • to construct language-tagged string: STRLANG("Wikidata", "en")

See also:

datatype in Wikidata RDF Dump FormatEdit

Following datatype of typed literal could be seen in Wikidata RDF Dump Format:


  • YEAR() to get year
  • MONTH() to get month
  • DAY() to get day
  • NOW() to get current date and time


  • ROUND(1950/100) will return 20 and ROUND(1949/100) will return 19, so it is inappropriate for centuries; more accurate solution is to use FLOOR((?year-1)/100)+1 (works well for 1..2001 range)

nodes in WDQSEdit

RDF nodes in Wikidata RDF Dump Format (Q32786132) follow specific naming conventions.

# We can inspect complex parts of data model at any second
SELECT ?property ?RDFNode (IF(isLiteral(?RDFNode), CONCAT("literal, datatype IRI:", STR(DATATYPE(?RDFNode))), IF(isIRI(?RDFNode), "IRI", IF(isBlank(?RDFNode), "blank node", "impossible?!!"))) as ?kindOfRDFNode)
  # prefixed subjects                                 or their IRIs
  #                                                      <>
  #                                                      <>
  #                                                      <>
  #                                                      <>
  #    wd:Q12418                                      or <>
  #    wd:P571                                        or <>
  #   wds:Q12418-8EDF7B01-3F71-4DA7-8B52-8C26242F0293 or <>
  # wdref:8f08ac3e0839bdbc4c6eb8d671e772deb12ba423    or <>
  #   wdv:817fac0649608d9ebd295b60135818d4 QuantityValue <>
  #   wdv:804d3164e16f5c568523ef7b563ee1af QuantityValue, Normalized
  #   wdv:800000d7a293881690f27762757ec940 wikibase:TimeValue
  #   wdv:800fbeee96e1b9bd5d91c1f66b25365d wikibase:GlobecoordinateValue
  wdv:788f87d431fffec0fc34235813459708 ?property ?RDFNode.

Try it!


Entities that represent propertiesEdit

Wikidata property (Q18616576)

It is possible to use entities for properties (they have information about wikibase:directClaim).

It is impossible to substitute property path at second position with property at second position in one triple [as opposed to Q31209160 and Q31209194]. But it is possible with more triples or other variable-forming constructs. One nuance is to use entity outside triple where resulting property should be applied.

edges in WDQSEdit

  • 0..optional (or semi-structured) parts
  • ..* — limitless
  • ..1 — at most one
edge (Q3297804)
from to
Domain * Domain *
sitelink (Q17587456) 0..1** Wikidata item (Q16222597) 0..1 schema:about
Wikidata entity (Q32753077) 0..1 statement node (Q17586663) 0..* p: prefix
statement node (Q17586663) 0..1 reference node (Q32753827) 0..* prov:wasDerivedFrom
Links to value node (Q32753852)
statement node (Q17586663) 0..1 value node (Q32753852) 0..1 psv: prefix
statement node (Q17586663) 0..1 value node (Q32753852) 0..1 pqv: prefix
reference node (Q32753827) 0..1 value node (Q32753852) 0..1 prv: prefix
wikibase:QuantityValue specific[WikibaseDumpRDF 1]
statement node (Q17586663) 0..1 normalized value node (Q33126575) 0..1 psn: prefix
statement node (Q17586663) 0..1 normalized value node (Q33126575) 0..1 pqn: prefix
reference node (Q32753827) 0..1 normalized value node (Q33126575) 0..1 prn: prefix
* - multiplicity; ** - per language per project

Multiple valuesEdit

Aggregate function (Q4115063)

Rarely in Wikidata, we may enter multiple values.

When we query for ?item wdt:mvproperty ?value we can get multiple records about values, not one about item. This is different from object-oriented approach where one record corresponds to one object.

In order to get one subject (or item) per record:

ignore such properties
the most radical way; do not place properties that return multiple values (wd:Q12418 wdt:P186 ?material) in "SELECT" part of your query
SAMPLE aggregate[sparqlspec 11]
returns an arbitrary value
working query
GROUP_CONCAT aggregate[sparqlspec 12]
working query. simplest query with label service wouldn't work.
LIMIT 1 (when item and property is known beforehand)
a less radical way than first, but it discards data as well: SELECT ?materialLabel { SELECT ?materialLabel WHERE { wd:Q12418 wdt:P186 ?material . SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } . } LIMIT 1 }; with a good hammer it is possible to fit square in circle

Wikidata property (Q18616576)Edit

Practical implications of statements with different ranks
number of statement nodes wds: with such rank scaling
wikibase:rank of wds:
wikibase:PreferredRank 1 0 1 1 1 0 0 1
wikibase:NormalRank 0 100 100 100 0 100 0 100
wikibase:DeprecatedRank 0 0 10 0 10 10 10 10
below per above
wdt: in Wikidata entity (Q32753077) 1 100 1 1 1 100 0
p: between Wikidata entity and statement node 1 100 111 101 11 110 10
Statement nodes with rdf:type wikibase:BestRank are with red border

Property in Wikidata model is augmented with Help:Ranks and can be used in multiple positions (references, qualifiers).

Most Wikibase types have simple values.[WikibaseDumpRDF 2]

By simple values we mean anything from RDF node section [IRIs, xsd:string, language-tagged literals, literals with other types, blank nodes].

Simple values can be accessed with following prefixes, depending on where property was used:

  • from Entity - wdt: [historic and wrong values aren't accessible here, see table on the right]
  • from Statement node to value of property - ps:
  • from Statement node to value of qualifier - pq:
  • from Reference node - pr:

Equivalent of wdt:

   wd:Q2807 wdt:P1082 ?pop

Try it!

# equivalent of wdt:
# wd:Q2807 wdt:P1082 ?pop
   wd:Q2807     p:P1082 ?popNode           . # will return every node
     ?popNode rdf:type  wikibase:BestRank  . # will restrict it to "best" nodes, similar to wdt: 
     ?popNode  ps:P1082 ?pop                 # extract value of node

Try it!

Common mistake is to mix wdt:P1082 with p:P1082 in one SELECT clause: in most cases we should use only one way, not both. We can mix wdt: and p: of different properties.

When we switch from wdt: to p: (in order to use qualifiers) we should use ps: prefixes (they would respect current statement node). Common mistake is to use wdt: instead of ps:.

Group Graph PatternsEdit

Johann Sebastian Bach (Q1339) had two wives. How can we see the children of Johann Sebastian Bach with his first wife, Maria Barbara Bach (Q57487)?

The simplest way to do this is to add a second triple with that restriction:

SELECT ?child ?childLabel
  ?child wdt:P22 wd:Q1339.     # Child  has father  Johann Sebastian Bach.
  ?child wdt:P25 wd:Q57487.    # Child  has mother     Maria Barbara Bach.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

Try it!

Dot between triple patterns corresponds to "and" conjunction; ";" can be used instead. Note: it is possible to omit last conjunction symbol, but some place it for interchangeability.

SPARQL punctuationEdit

  • Each triple about a subject is terminated by a period;
  • Multiple predicates about the same subject can be separated by semicolons;
  • Multiple objects for the same subject and predicate can be separated by commas.
SELECT ?s1 ?s2 ?s3
  ?s1 p1 o1;             # s1
      p2 o2;             # s1
      p3 o31, o32, o33.  # s1
  ?s2 p4 o41, o42.       # s2
  ?s3 p5 o5;             # s3
      p6 o6.             # s3
Blank nodesEdit

Relative clauses. Properties of the object.

Suppose we’re not actually interested in Bach’s children, but in his grandchildren.

For this task we would use child (P40), which points from parent to child and is gender-independent. Possible solution below:

SELECT ?grandChild ?grandChildLabel
  wd:Q1339 wdt:P40 ?child.                     #    Bach  has a child       ?child.
    ?child wdt:P40 ?grandChild.                #  ?child  has a child  ?grandChild.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

Try it!

Property pathsEdit

Property paths are a way to very tersely write down a path of properties between two items. Sequence path elements are separated with a forward slash (/):

Repeated and endless paths could be expressed using +; same but optional - using *. | can be used to provide alternatives.

Duplicates and alternative claimsEdit

Duplicates are possible with relatively complex paths.

Another reason for this is alternative "routes":

Query ?item wdt:P31/wdt:279* item6 will return 4 results: item1 twice and item2 twice.

Sometimes it is possible to use less multiple 279 and P31 claims, but not always.

Solution is to replace SELECT with SELECT DISTINCT.

Symmetric properties and self-referencesEdit

In Wikidata properties can refer to other elements. Sometimes items are obligated to have links between each other: symmetric property.

In practice this means that you might encounter:

Possible solution is to append FILTER (?item != wd:Q801551) after triple in Group Graph Patterns.

Exercise: a query for all paintings with their painting surface?

Retrieving items with optional information (OPTIONAL)Edit


A president can have a spouse, but this is optional. More generally, in Wikidata an entity can miss properties (as opposed to explicit "no value" statements).

Let’s try to query books by Arthur Conan Doyle (Q35610) that also includes fthe title (P1476), illustrator (P110), publisher (P123) and publication date (P577):

# First query, incorrect
# 6 Including Optional Values

SELECT ?book ?title ?illustratorLabel ?publisherLabel ?published
  ?book wdt:P50 wd:Q35610;
        wdt:P1476 ?title;
        wdt:P110 ?illustrator;
        wdt:P123 ?publisher;
        wdt:P577 ?published.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

Try it!

It only returns two results, why is that?

That’s not what we want: we primarily want a list of all the books – if additional data is available, we’d like to include it, but we don’t want that to limit our list of results.

The solution is to tell SPARQL executor that those properties are optional:

  • wrap each group graph pattern with optional clause when desired, line before: ?book wdt:P1476 ?title. and after: OPTIONAL { ?book wdt:P1476 ?title }
  • optionals can be (and should be) nested for every part of graph where data could be missing (optional)
  • order matters, place "OPTIONAL" after required patterns[1]
  • place it after VALUES

Instances and classesEdit

Earlier, we noted that most Wikidata properties are “has” relations: has child, has father, has occupation. But sometimes (in fact, frequently), you also need to talk about what something is:

When we want to search for “all work of art”, it’s not enough search for all items that are direct instances of work of art:

SELECT ?work ?workLabel
  ?work wdt:P31 wd:Q838948. # instance of work of art
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

Try it!

That query only returns 2815 results – obviously, there are over 868119 work of art! The problem is that this misses items like Gone with the Wind, which is only an instance of film, not of work of art. We need to tell SPARQL to account following claim when searching:

One possible solution to this is the brackets syntax we talked about: Gone with the Wind is an instance of some class subclass of “work of art”.

But this might be not what you want:

  1. We’re no longer including items that are directly instances of work of art. In other words, subclass of relations in path can be optional.
  2. We’re still missing items that are instances of some subclass of some other subclass of “work of art” – for example, Snow White and the Seven Dwarfs is an animated film, which is a film, which is a work of art. In this case, we need to follow two “subclass of” statements – but it might also be three, four, five, any number really.
  3. For some properties, degree of nesting isn't known beforehand: not only it means that there might be a deep chain of subclass of but also such chain should be combined (wasn't covered yet) with short chains of few subclass of. The more links, the more nesting, the less query is readable by humans. Furthermore query that uses simplest syntax or brackets syntax won't match layers of underlying data exactly (3 levels in query, but 4 in data) and every time you change the data, you have to update query as well in order to match them back.

More complex, but also more flexible solution:

# instance of any subclass of work of art

SELECT ?work ?workLabel
  ?work wdt:P31/wdt:P279* wd:Q838948. # one P31 and any number of P279 between the item and the class
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 42

Try it!

Now you know how to search for all work of arts, or all buildings, or all human settlements: the magic incantation wdt:P31/wdt:P279*, along with the appropriate class. This uses some more SPARQL features that I haven’t explained yet, but quite honestly, this is almost the only relevant use of those features, so you don’t need to understand how it works in order to use WDQS effectively.

subclass of (P279) is the most common transitive property (Q18647515), see others.

Wider or narrower resultsEdit

Matching Alternatives. Negation.

Over time we will lose interest in some items as well-known, visited or done in any sense. It's time to exclude them (MINUS), or to include new items (UNION):

The following query uses these:

Features: ImageGrid (Q24515278)    

 1 #defaultView:ImageGrid
 2 SELECT ?item ?itemLabel ?image ?genreLabel ?movementLabel
 4 {
 5            ?item wdt:P31/wdt:P279*          wd:Q838948   . # works of art
 6            ?item wdt:P276                   wd:Q19675    . # located in Louvre
 7            # 117 items
 8 MINUS    { ?item wdt:P136                   wd:Q440928   } # except ONE sculptural genre (Q440928)
 9            # 116 items
10 MINUS    { ?item wdt:P136/wdt:P31/wdt:P279* wd:Q18783400 } # except ANY sculptural genre (Q18783400)
11            # 113 items
13 OPTIONAL { ?item wdt:P18                   ?image       }
14 OPTIONAL { ?item wdt:P136                  ?genre       }
15 OPTIONAL { ?item wdt:P135                  ?movement    }
16 	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
17 }

Unknown or no valuesEdit

concept of no-value in Wikibase (Q19798647). concept of unknown value in Wikibase (Q19798648).

This is rarely used.

When properties are:

  • known beforehand: solution involves checks IF(boolean condition, then, else) where conditions are as described above
  • unknown beforehand: solution is more complex

Pagination (ORDER and LIMIT)Edit

It’s quite common to care only about a few results: a first, first to, pioneer in; oldest, earliest; youngest, latest.

In order to get an answer our entities should be ordered and limited:

ORDER BY something sorts the results by something.
something can be any expression – for now, the only kind of expression we know are simple variables (?something), but we’ll see some other kinds later. This expression can also be wrapped in either ASC() or DESC() to specify the sorting order (ascending or descending). (If you don’t specify either, the default is ascending sort, so ASC(something) is equivalent to just something.)
LIMIT count cuts off the result list at count results,
where count is any natural number. For example, LIMIT 10 limits the query to ten results. LIMIT 1 only returns a single result.

(You can also use LIMIT without ORDER BY. In this case, the results aren’t sorted, so you don’t have any guarantee which results you’ll get. Which is fine if you happen to know that there’s only a certain number of results, or you’re just interested in some result, but don’t care about which one. In either case, adding the LIMIT can significantly speed up the query, since WDQS can stop searching for results as soon as it’s found enough to fill the limit.)

The query that returns the ten most populous countries:

SELECT DISTINCT ?country ?countryLabel ?population ?ended
# ideally we don't need a "DISTINCT" above
# we get multiple records because some items have multiple P31 statements that lead to a Q3624078
# we can trim duplicates as workaround (or inspect classification and P31 links)
#SELECT ?country ?countryLabel ?population ?ended
  ?country wdt:P31/wdt:P279* wd:Q3624078; #countries
           wdt:P1082         ?population; #with their population
  ?country wdt:P576          ?ended. 
  } # exclude "former" countries
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
ORDER BY DESC(?population) # most populous countries - descending population

Try it!


In Wikidata sort order defined for following types of properties:

But not for:

Ways to reduce multiplicityEdit

Sources of multiplicity are explained in edges and Wikidata properties.

We will start with an example about two competitors and their rewards. It is natural to win same competition but in different years. Let's see how to deal with this in queries

SELECT ?e ?value WHERE { 
VALUES (?e ?value ?date) {
("James" "Belgium" "70")
("Mary" "worldwide" "71")
("Mary" "worldwide" "72")
("Mary" "worldwide" "73")
("Mary" "France" "76")

Try it!

The following query uses these:

 1 # we can return every event with respect to person
 2 SELECT ?e (GROUP_CONCAT(?event) as ?events)
 3 {
 4   SELECT ?e ?event WHERE { 
 5   VALUES (?e ?event ?date) {
 6   ("James" "Belgium" "70")
 7   ("Mary" "worldwide" "71")
 8   ("Mary" "worldwide" "72")
 9   ("Mary" "worldwide" "73")
10   ("Mary" "France" "76")
11   }
12   }
13 }
14 GROUP BY ?e

In order to return dates we could use ordinary CONCAT as part of BIND() in WHERE or directly in SELECT (expr AS ?var):

The following query uses these:

 1 #same: select awards with respect to person
 2 SELECT ?e (GROUP_CONCAT(?v; separator=", ") as ?events)
 3 {
 4   #different: return CONCAT(?event,"'",?date) as ?v
 5   SELECT ?e (CONCAT(?event,"'",?date) as ?v) WHERE { 
 6   VALUES (?e ?event ?date) {
 7   ("James" "Belgium" "70")
 8   ("Mary" "worldwide" "71")
 9   ("Mary" "worldwide" "72")
10   ("Mary" "worldwide" "73")
11   ("Mary" "France" "76")
12   }
13   } ORDER BY ASC(?date)
14 }
15 GROUP BY ?e

Now we might not need all details, for example we only need "number of" or "total count" of something. Solution is to use one of Aggregate function (Q4115063), for example COUNT:

  • (COUNT(?v) as ?events) - number of events

DISTINCT is used to count distinct events.

HAVING construct is used to ask questions over results of grouping

The following query uses these:

 1 # participants ...
 2   SELECT ?e (COUNT(DISTINCT ?event) as ?events) WHERE
 3   {
 4     SELECT ?e ?event ?date WHERE { 
 5     VALUES (?e ?event ?date) {
 6     ("James" "Belgium" "70")
 7     ("Mary" "worldwide" "71")
 8     ("Mary" "worldwide" "72")
 9     ("Mary" "worldwide" "73")
10     ("Mary" "France" "76")
11     }
12     }
13   }
14   GROUP BY ?e
15   # with at least 2 different competitions
16   HAVING(?events>1) # () are mandatory here too

Note about "Bad Aggregate" messagesEdit

When we place ?materialLabel in SELECT part of our query, we should copy such variable in GROUP BY too.

1 # Working query without ?materialLabel
2 SELECT ?material (COUNT(?painting) AS ?count)
4 {
5   ?painting wdt:P31/wdt:P279* wd:Q3305213;
6             p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
7   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
8 }
9 GROUP BY ?material # nothing else here
1 # Working query with ?materialLabel
2 SELECT ?material ?materialLabel (COUNT(?painting) AS ?count)
4 {
5   ?painting wdt:P31/wdt:P279* wd:Q3305213;
6             p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
7   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
8 }
9 GROUP BY ?material ?materialLabel # copied here to avoid message


Paintings along with their painting materialEdit
Guns by manufacturerEdit

What is the total number of guns produced by each manufacturer?

Publishers by number of pagesEdit

What is the average (function: AVG) number of pages of books by each publisher?

And beyond…Edit

This guide ends here, SPARQL doesn’t. Same about extensions of RDF.

Some semantic software can be found here: - information is outdated for very active programs and projects.

Furthermore, there are other technologies build upon RDF such as RDF Schema (Q1751819) and Web Ontology Language (Q826165).


We would appreciate any comments about difficult parts of this article or any suggestions how to improve this page. Any other suggestions are welcome.