Wikidata:Sćehnjenje datoweje banki

This page is a translated version of the page Wikidata:Database download and the translation is 9% complete.
Outdated translations are marked like this.

Wikidata offers copies of the available content for anyone to download.

Note that there are also several other methods for accessing structured content from Wikidata, which may not require a complete database dump.

There are several different kinds of data dumps available. Note that while JSON and RDF dumps are considered stable interfaces, XML dumps are not. Changes to the data formats used by stable interfaces are subject to the Stable Interface Policy.

JSON dumps (recommended)

JSON dumps containing all Wikidata entities in a single JSON array can be found under https://dumps.wikimedia.org/wikidatawiki/entities/. The entities in the array are not necessarily in any particular order, e.g., Q2 doesn't necessarily follow Q1. The dumps are being created on a weekly basis.

This is the recommended dump format. Please refer to the JSON structure documentation for information about how Wikidata entities are represented.

Hint: Each entity object (data item or property) is placed on a separate line in the JSON file, so the file can be read line by line, and each line can be decoded separately as an individual JSON object.

Note that the files are using parallel compression, which means that some decompressors cannot reliably unpack the files. If you are using Windows you can use e.g. Bzip2. On *nix systems, use lbzip2 which can decompress Bzip2 in parallel. pbzip2 is not a good choice because it is not able to decompress in parallel files not compressed with pbzip2.

You can currently download a fairly recent dump using a torrent. wikidata-20240101-all.json.gz (130.53 GiB) on academictorrents.com ( magnet)

  • JsonDumpReader is a PHP library for reading the dumps.
  • gitlab.com/tozd/go/mediawiki is a Go library for processing Wikipedia and Wikidata dumps.
  • WDSub is a Scala library that processes JSON Wikibase dumps and can generate subsets using entity schemas as inputs
  • simple-wikidata-db is a JSON dump parser written in Python
  • qwikidata supports JSON dumps and is written in Python

RDF dumps

First, canonical RDF dumps using the Turtle and NTriples formats can be found under https://dumps.wikimedia.org/wikidatawiki/entities/. The mapping is described here. These full statement dumps are noted as all.

Secondly, so called truthy dumps are provided. They use the NTriples format. They are in the same format as the full dumps, but only contain direct ("truthy", wdt: and wdtn:) values of best-rank statements. This also means they do not contain meta data such as qualifiers and references.

The -all dump files contain all entity information in Wikidata with the exception of order (of aliases, of statements, etc.), which is not naturally represented in RDF. The -truthy dump files encode the *best* statements (i.e. the ones with the highest non-deprecated rank of each given (subject, property) pair) as single RDF triples (qualifiers and references are omitted).

The dumps of Wikidata Lexeme namespace in Turtle and NTriples formats can be found in the same place with lexemes suffix.

For details on the RDF dump format please see the page RDF Dump Format. Also note the section "WDQS data differences" which explains the differences in the RDF formats of these dumps and the WDQS.

Partial RDF dumps

WDumper is a third-party tool to create custom Wikidata RDF dumps. Entities and statements may be filtered.

XML dumps

Komplette XML dumps von Wikidata finden sich unter http://dumps.wikimedia.org/wikitawiki.

Warning: The format of the JSON data embedded in the XML dumps is subject to change without notice, and may be inconsistent between revisions. It should be treated as opaque binary data. It is strongly recommended to use the JSON or RDF dumps instead, which use canonical representations of the data!

Tež inkrementelne zawěsćenja datoweho wobstatka za Wikidata za sćehnjenje k dispoziciji steja. Tute zawěsćenja datoweho wobstatka wobsahuja material, kotryž je so na 24 hodźin přidał, a přez to njeje trjeba, dospołny datowy wobstatk sćahnyć. Tute zawěsćenja datoweho wobstatka su wo wjele mjeńše hač zawěsćenja dospołneho datoweho wobstatka.

Tute zawěsćenja datoweho wobstatka steja tu k dispoziciji.

Old JSON and RDF dumps

Data model

The data model can be looked up here. The data model describes the fundamental building blocks of Wikidata's data.

Database schema

An overview over the schema of the database can be found at this page. (This is not the schema of the data in Wikidata.)

License

Wikidata poskići (bórze) kopije k dispoziciji stejaceho wobsaha za sćehnjenje. Tute datowe banki dadźa so za priwatne abo komercielne zaměry, zawěsćenja abo za offline-wužiwanje wužiwać. Wšě strukturowane daty z hłowneho a kajkosćoweho mjenoweho ruma steja pod licencu Creative Commons CC0 k dispoziciji. Tekst w druhich mjenowych rumach steji pod licencu Creative Commons Attribution/Share-Alike k dispoziciji; přidatne wuměnjenja dadźa so nałožić. Medijowe elementy a druhi wobsah steja pod druhimi licencami k dispoziciji, kaž to je na jich wopisanskich stronach podate.

See also