Open main menu

Wikidata:History Query Service

The Wikidata History Query Service is a work in progress by Tpt (talkcontribslogs) to provide a SPARQL endpoint allowing to query Wikidata edit history. It provides a limited set of features. The querying UI is available at https://wdhqs.wmflabs.org/

It currently stores metadata about each item or property revision (contributor, timestamp, entity edited, previous/next revision of the given entity) and a part of the revisions content (direct claim relations and redirects). It allows to query the triples added and removed by a revision and query the full state of the Wikidata graph after any revision. The data loaded covers a range from the creation of Wikidata to July 1st 2018. The public endpoint has a 5 minutes timeout.

If you are not familiar with SPARQL or the Wikidata Query Service, give first a look at Wikidata Query Help.

Contents

Example queriesEdit

Number of Wikidata items of a given class at a given point in timeEdit

E.g. number of humans in Wikidata in February 2nd, 2015 at midnight.

PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE {
  ?rev schema:dateCreated "2015-02-02T00:00:00Z"^^xsd:dateTime ; 
       hist:globalState ?state .
  GRAPH ?state {
    ?item wdt:P31 wd:Q5
  }
}

Number of contributors having changed values involving a given propertyEdit

E.g. for equivalent class (P1709) values:

PREFIX schema: <http://schema.org/>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT (COUNT(DISTINCT ?user) AS ?count) WHERE {
  # this is going to only set in ?addOrDel the graphs where a value of wd:P1709 is added or removed
  GRAPH ?addOrDel {
    ?item wdt:P1709 ?value .
  }
  ?rev hist:additions|hist:deletions ?addOrDel ;
       schema:author ?user .
}

Statistics on the number of main snak additions by property for a given userEdit

E.g. for user Tpt:

PREFIX schema: <http://schema.org/>
PREFIX hist: <http://wikiba.se/history/ontology#>

SELECT ?prop (COUNT(DISTINCT ?revision) AS ?c) WHERE {
  ?revision schema:author "Tpt" ;
            hist:additions ?additionsGraph .
  GRAPH ?additionsGraph {
     ?topic ?prop ?o .
  }
} GROUP BY ?prop ORDER BY DESC(COUNT(?revision))

RDF modelEdit

RevisionsEdit

Example :

<http://www.wikidata.org/revision/1339>
        schema:about wd:Q90 ;
        schema:isBasedOn <http://www.wikidata.org/revision/1335> ;
        schema:dateCreated "2012-10-30T01:39:58Z"^^xsd:dateTime ;
        schema:author "TheBestContributor" ;
        hist:revisionId 1339 ;
        hist:additions <http://www.wikidata.org/revision/additions/1339> ;
        hist:deletions <http://www.wikidata.org/revision/deletions/1339> ;
        hist:globalState <http://www.wikidata.org/revision/global/1339> ;
        hist:previousRevision <http://www.wikidata.org/revision/1338> ;
        hist:nextRevision <http://www.wikidata.org/revision/1340> .

Revisions are described using the following properties :

schema:about 
the entity that has been modified by this revision.
schema:isBasedOn 
the previous revision of the same entity (it has no value if the revision is the entity creation).
schema:dateCreated 
the time when the revision has been save.
schema:author 
the user name of the editor that have made the revision or the IP of the anonymous user.
hist:revisionId 
the revision ID as an xsd:integer.
hist:additions 
the IRI of the named graph containing the triples added by this revisions.
hist:deletions 
the IRI of the named graph containing the triples removed by this revisions.
hist:globalState 
the IRI of the named graph containing all the Wikidata triples after this revision have been saved.
hist:previousRevision 
the IRI of the Wikidata revision just before this one.
hist:nextRevision 
the IRI of the Wikidata revision just after this one.

ContentEdit

The Wikidata entity representation is the same as query.wikidata.org. The triples are contained in named graphs. There are three named graphs per revision detailed below.

Currently, for storage space reasons only direct claim relations (wdt:PXXX) and redirections ( owl:sameAs) triples are stored but it is planned to maked the full Wikidata RDF representation availlable in the future.

Additions named graphEdit

It contains all the triples that have been added by a given revision. For example the following query returns all the triples added at revision 39984492:

PREFIX hist: <http://wikiba.se/history/ontology#>

SELECT ?s ?p ?o WHERE {
  <http://www.wikidata.org/revision/39984492> hist:additions ?additionsGraph .
  GRAPH ?additionsGraph {
     ?s ?p ?o .
  }
}

Deletions named graphEdit

It contains all the triples that have been removed by a given revision. For example the following query returns all the triples removed at revision 39984492:

PREFIX hist: <http://wikiba.se/history/ontology#>

SELECT ?s ?p ?o WHERE {
  <http://www.wikidata.org/revision/39984492> hist:deletions ?deletionsGraph .
  GRAPH ?deletionsGraph {
     ?s ?p ?o .
  }
}

Global state named graphEdit

It contains all the triples that existed in Wikidata after the revision has been saved. For example the following query returns all the triples of wd:Q42 after revision 39984492:

PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>

SELECT ?p ?o WHERE {
  <http://www.wikidata.org/revision/39984492> hist:globalState ?globalState .
  GRAPH ?globalState {
     wd:Q42 ?p ?o .
  }
}

Opposite to the SPARQL 1.1 specification the following query is only going to bind to the variable ?graph the additions and deletions named graphs but not the global sets one in order to allow to efficiently retrive the revisions where a triple have been inserted or deleted.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?o ?g WHERE {
  GRAPH ?g {
     wd:Q42 wdt:P227 ?o .
  }
}

Full list of prefixesEdit

PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX hist: <http://wikiba.se/history/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

Technical detailsEdit

The source code of the system is on GitHub. The data are stored using RocksDB and the SPARQL evaluation is done by RDF4J. This design has been chosen instead of Blazegraph to allow queries on the global state without having to store explicitly each named graph in the database.

Privacy PolicyEdit

The Wikidata History Query Service follows the Cloud Services Privacy policy. In addition, the service keeps a log of the valid SPARQL queries sent to the service. This log is kept for 30 days. The queries are not connected to any other information (IP, request timestamp...) and their order is not the same as the one of the received requests.