Wikidata:SPARQL tutorial

Translate this page; This page contains changes. Please contact a translation admin to mark them for translation.

WDQS, the Wikidata Query Service, is a powerful tool to provide insight into Wikidata's content. This guide will teach you how to use it. See also the interactive tutorial by Wikimedia Israel.

Before writing your own SPARQL query, look at {{Item documentation}} or any other generic SPARQL query template and see if your query is already included.

Before we start

While this guide may look very long and intimidating, please don’t let that scare you away! Just learning the SPARQL basics will get you a long way – even if you stop reading after #Our first query, you will already understand enough to construct many interesting queries. Each section of this tutorial will empower you to write even more powerful queries.

If you have never heard of Wikidata, SPARQL, or WDQS before, here’s a short explanation of those terms:

Wikidata is a knowledge database. It contains millions of statements, such as “the capital of Canada is Ottawa”, or “the Mona Lisa is painted in oil paint on poplar wood”, or “gold has a melting point of 1,064.18 degrees Celsius”.
SPARQL is a language to formulate questions (queries) for knowledge databases. With the right database, a SPARQL query could answer questions like “what is the most popular tonality in music?” or “which character was portrayed by the most actors?” or “what’s the distribution of blood types?” or “which authors’ works entered the public domain this year?”.
WDQS, the Wikidata Query Service, brings the two together: You enter a SPARQL query, it runs it against Wikidata’s dataset and shows you the result.

SPARQL basics

A simple SPARQL query looks like this:

SELECT ?a ?b ?c
WHERE
{
  x y ?a.
  m n ?b.
  ?b f ?c.
}

The SELECT clause lists variables that you want returned (variables start with a question mark), and the WHERE clause contains restrictions on them, mostly in the form of triples. All information in Wikidata (and similar knowledge databases) is stored in the form of triples; when you run the query, the query service tries to fill in the variables with actual values so that the resulting triples appear in the knowledge database, and returns one result for each combination of variables it finds.

A triple can be viewed as two vertices (alias 2 nodes, 2 resources) connected by an edge (an arc, a property) inside the vast directed (oriented) property multigraph which forms Wikidata. It can be read like a sentence (which is why it ends with a period), with a subject, a predicate, and an object:

SELECT ?fruit
WHERE
{
  ?fruit hasColor yellow.
  ?fruit tastes sour.
}

The results for this query could include, for example, “lemon”. In Wikidata, most properties are “has”-kind properties, so the query might instead read:

SELECT ?fruit
WHERE
{
  ?fruit color yellow.
  ?fruit taste sour.
}

which reads like “?fruit has color ‘yellow’” (not “?fruit is the color of ‘yellow’” – keep this in mind for property pairs like “parent”/“child”!).

However, that’s not a good example for WDQS. Taste is subjective, so Wikidata doesn’t have a property for it. Instead, let’s think about parent/child relationships, which are mostly unambiguous.

Our first query

Suppose we want to list all children of the baroque composer Johann Sebastian Bach. Using pseudo-elements like in the queries above, how would you write that query?

Hopefully you got something like this:

SELECT ?child
WHERE
{
  #  child "has parent" Bach
  ?child parent Bach.
  # (note: everything after a ‘#’ is a comment and ignored by WDQS.)
}

or this,

SELECT ?child
WHERE
{
  # child "has father" Bach 
  ?child father Bach. 
}

or this,

SELECT ?child
WHERE
{
  #  Bach "has child" child
  Bach child ?child.
}

The first two triples say that the ?child must have the parent/father Bach; the third says that Bach must have the child ?child. Let’s go with the second one for now.

So what remains to be done in order to turn this into a proper WDQS query? On Wikidata, items and properties are not identified by human-readable names like “father” (property) or “Bach” (item). (For good reason: “Johann Sebastian Bach” is also the name of a German painter, and “Bach” might also refer to the surname, the French commune, the Mercury crater, etc.) Instead, Wikidata items and properties are assigned an identifier. To find the identifier for an item, we search for the item and copy the Q-number of the result that sounds like it’s the item we’re looking for (based on the description, for example). To find the identifier for a property, we do the same, but search for “P:search term” instead of just “search term”, which limits the search to properties. This tells us that the famous composer Johann Sebastian Bach is Q1339, and the property to designate an item’s father is P:P22.

And last but not least, we need to include prefixes. For simple WDQS triples, items should be prefixed with wd:, and properties with wdt:. (But this only applies to fixed values – variables don’t get a prefix!)

Putting this together, we arrive at our first proper WDQS query:

SELECT ?child
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
}

child	childLabel
wd:Q57225	Johann Christoph Friedrich Bach
wd:Q76428	Carl Philipp Emanuel Bach
…

natural language	example	SPARQL	example
sentence	Juliet loves Romeo.	period	`juliet loves romeo.`
conjunction (clause)	Romeo loves Juliet and kills himself.	semicolon	`romeo loves juliet; kills romeo.`
conjunction (noun)	Romeo kills Tybalt and himself.	comma	`romeo kills tybalt, romeo.`
relative clause	Juliet loves someone who kills Tybalt.	brackets	`juliet loves [ kills tybalt ].`

Wikidata:SPARQL tutorial

Before we start

SPARQL basics

Our first query

Autocompletion

Advanced triple patterns

Instances and classes

Property paths

Qualifiers

ORDER and LIMIT

Exercise

Arthur Conan Doyle books

Chemical elements

Rivers that flow into the Mississippi

Rivers that flow into the Mississippi II

OPTIONAL

Expressions, FILTER and BIND

Data types

Operators

FILTER

BIND, BOUND, IF

COALESCE

Grouping

City populations

Painting materials

Guns by manufacturer

Publishers by number of pages

HAVING

Aggregate functions summary

wikibase:Label and aggregations

VALUES

And beyond…

See also

`ORDER` and `LIMIT`

`OPTIONAL`

Expressions, `FILTER` and `BIND`

`FILTER`

`BIND`, `BOUND`, `IF`

`COALESCE`

`HAVING`

`VALUES`