Wikidata:History Query Service

The Wikidata History Query Service was a work in progress by Tpt to provide a SPARQL endpoint allowing to query Wikidata edit history. It provides a limited set of features. The querying UI is available at https://wdhqs.wmflabs.org/ and the SPARQL endpoint at https://wdhqs.wmflabs.org/sparql

It was storing metadata about each item or property revision (contributor, timestamp, entity edited, previous/next revision of the given entity) and a part of the revisions content (direct claim relations and redirects). It allows to query the triples added and removed by a revision and query the full state of the Wikidata graph after any revision. The data loaded covers a range from the creation of Wikidata to July 1st 2019. The public endpoint has a 5 minutes timeout.

If you are not familiar with SPARQL or the Wikidata Query Service, give first a look at Wikidata Query Help.

Example queries edit

Number of Wikidata items of a given class at a given point in time edit

E.g. number of humans in Wikidata in January 1st, 2015 at midnight.

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE {
  ?globalState hist:globalStateAt "2015-01-01T00:00:00Z"^^xsd:dateTime .
  GRAPH ?globalState {
    ?item wdt:P31 wd:Q5
  }
}

Number of contributors having changed values involving a given property edit

E.g. for equivalent class (P1709) values:

PREFIX schema: <http://schema.org/>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT (COUNT(DISTINCT ?user) AS ?count) WHERE {
  # this is going to only set in ?addOrDel the graphs where a value of wd:P1709 is added or removed
  GRAPH ?addOrDel {
    ?item wdt:P1709 ?value .
  }
  ?rev hist:additions|hist:deletions ?addOrDel ;
       schema:author ?user .
}

Statistics on the number of main snak additions by property for a given user edit

E.g. for user Tpt:

PREFIX schema: <http://schema.org/>
PREFIX hist: <http://wikiba.se/history/ontology#>

SELECT ?prop (COUNT(DISTINCT ?revision) AS ?c) WHERE {
  ?revision schema:author "Tpt" ;
            hist:additions ?additionsGraph .
  GRAPH ?additionsGraph {
     ?topic ?prop ?o .
  }
} GROUP BY ?prop ORDER BY DESC(COUNT(?revision))

The most common replacements of a value of sex or gender (P21) by an other edit

PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?addedGender ?deletedGender (COUNT(?revision) AS ?count) WHERE {
  GRAPH ?additionsGraph {
    ?s wdt:P21 ?addedGender .
  }
  GRAPH ?deletionsGraph {
    ?s wdt:P21 ?deletedGender .
  }
  ?revision hist:additions ?additionsGraph ;
            hist:deletions ?deletionsGraph .
} GROUP BY ?addedGender ?deletedGender ORDER BY DESC(?count) LIMIT 10

Variation of the number of usages of sex or gender (P21) values per year edit

Only the values with a number of occurences greater than 10 are displayed:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?value (GROUP_CONCAT(CONCAT(STR(?year), ": ", STR(?count)); SEPARATOR=" ") AS ?variation) WHERE {
  {SELECT (YEAR(?reference) AS ?year) ?value (COUNT(?item) AS ?count) WHERE {
    VALUES ?reference { 
      "2014-01-01T00:00:00Z"^^xsd:dateTime
      "2015-01-01T00:00:00Z"^^xsd:dateTime
      "2016-01-01T00:00:00Z"^^xsd:dateTime
      "2017-01-01T00:00:00Z"^^xsd:dateTime
      "2018-01-01T00:00:00Z"^^xsd:dateTime
    }
    ?globalState hist:globalStateAt ?reference .
    GRAPH ?globalState {
      ?item wdt:P21 ?value .
    }
  } GROUP BY ?reference ?value ORDER BY ?reference}
  FILTER(?count >= 10)
} GROUP BY ?value ORDER BY DESC(SUM(?count))

Evolution of gender gap in Wikidata per year edit

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT 
	(YEAR(?reference) AS ?year)
	?maleCount ?femaleCount
	((?maleCount - ?femaleCount) AS ?absoluteGap)
	((?maleCount - ?femaleCount) / (?maleCount + ?femaleCount) AS ?relativeGap)
WHERE {
  {SELECT ?reference (COUNT(?item) AS ?maleCount) WHERE {
    VALUES ?reference { 
      "2014-01-01T00:00:00Z"^^xsd:dateTime
      "2015-01-01T00:00:00Z"^^xsd:dateTime
      "2016-01-01T00:00:00Z"^^xsd:dateTime
      "2017-01-01T00:00:00Z"^^xsd:dateTime
      "2018-01-01T00:00:00Z"^^xsd:dateTime
    }
    ?globalState hist:globalStateAt ?reference .
    GRAPH ?globalState {
      ?item wdt:P21 wd:Q6581097 .
    }
  } GROUP BY ?reference}
  {SELECT ?reference (COUNT(?item) AS ?femaleCount) WHERE {
    VALUES ?reference { 
      "2014-01-01T00:00:00Z"^^xsd:dateTime
      "2015-01-01T00:00:00Z"^^xsd:dateTime
      "2016-01-01T00:00:00Z"^^xsd:dateTime
      "2017-01-01T00:00:00Z"^^xsd:dateTime
      "2018-01-01T00:00:00Z"^^xsd:dateTime
    }
    ?globalState hist:globalStateAt ?reference .
    GRAPH ?globalState {
      ?item wdt:P21 wd:Q6581072 .
    }
  } GROUP BY ?reference}
} ORDER BY ?year

RDF model edit

Revisions edit

Example :

<http://www.wikidata.org/revision/1339>
        schema:about wd:Q90 ;
        schema:isBasedOn <http://www.wikidata.org/revision/1335> ;
        schema:dateCreated "2012-10-30T01:39:58Z"^^xsd:dateTime ;
        schema:author "TheBestContributor" ;
        hist:revisionId 1339 ;
        hist:additions <http://www.wikidata.org/revision/additions/1339> ;
        hist:deletions <http://www.wikidata.org/revision/deletions/1339> ;
        hist:globalState <http://www.wikidata.org/revision/global/1339> ;
        hist:previousRevision <http://www.wikidata.org/revision/1338> ;
        hist:nextRevision <http://www.wikidata.org/revision/1340> .

Revisions are described using the following properties :

schema:about
the entity that has been modified by this revision.
schema:isBasedOn
the previous revision of the same entity (it has no value if the revision is the entity creation).
schema:dateCreated
the time when the revision has been save.
schema:author
the user name of the editor that have made the revision or the IP of the anonymous user.
hist:revisionId
the revision ID as an xsd:integer.
hist:additions
the IRI of the named graph containing the triples added by this revisions.
hist:deletions
the IRI of the named graph containing the triples removed by this revisions.
hist:globalState
the IRI of the named graph containing all the Wikidata triples after this revision have been saved.
hist:previousRevision
the IRI of the Wikidata revision just before this one.
hist:nextRevision
the IRI of the Wikidata revision just after this one.
hist:globalStateAt
give for a timestamp (object) the URI of the global state graph at this moment (subject) (i.e. the global state of the latest revisions saved before or at this timestamp). Example:
<http://www.wikidata.org/revision/global/5587110> hist:globalStateAt "2013-02-02T00:00:00Z"^^xsd:dateTime

Content edit

The Wikidata entity representation is the same as query.wikidata.org. The triples are contained in named graphs. There are three named graphs per revision detailed below.

Currently, for storage space reasons only direct claim relations (wdt:PXXX) and redirections ( owl:sameAs) triples are stored but it is planned to maked the full Wikidata RDF representation availlable in the future.

Additions named graph edit

It contains all the triples that have been added by a given revision. For example the following query returns all the triples added at revision 39984492:

PREFIX hist: <http://wikiba.se/history/ontology#>

SELECT ?s ?p ?o WHERE {
  <http://www.wikidata.org/revision/39984492> hist:additions ?additionsGraph .
  GRAPH ?additionsGraph {
     ?s ?p ?o .
  }
}

Deletions named graph edit

It contains all the triples that have been removed by a given revision. For example the following query returns all the triples removed at revision 39984492:

PREFIX hist: <http://wikiba.se/history/ontology#>

SELECT ?s ?p ?o WHERE {
  <http://www.wikidata.org/revision/39984492> hist:deletions ?deletionsGraph .
  GRAPH ?deletionsGraph {
     ?s ?p ?o .
  }
}

Global state named graph edit

It contains all the triples that existed in Wikidata after the revision has been saved. For example the following query returns all the triples of wd:Q42 after revision 39984492:

PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>

SELECT ?p ?o WHERE {
  <http://www.wikidata.org/revision/39984492> hist:globalState ?globalState .
  GRAPH ?globalState {
     wd:Q42 ?p ?o .
  }
}

Opposite to the SPARQL 1.1 specification the following query is only going to bind to the variable ?graph the additions and deletions named graphs but not the global sets one in order to allow to efficiently retrive the revisions where a triple have been inserted or deleted.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?o ?g WHERE {
  GRAPH ?g {
     wd:Q42 wdt:P227 ?o .
  }
}

Full list of prefixes edit

PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

Technical details edit

The source code of the system is on GitHub. The data are stored using RocksDB and the SPARQL evaluation is done by RDF4J. This design has been chosen instead of Blazegraph to allow queries on the global state without having to store explicitly each named graph in the database.

Privacy Policy edit

The Wikidata History Query Service follows the Cloud Services Privacy policy. In addition, the service keeps a log of the valid SPARQL queries sent to the service. This log is kept for 30 days. The queries are not connected to any other information (IP, request timestamp...) and their order is not the same as the one of the received requests.