User:Mahir256/Triples
There are about 13.6 billion triples on Wikidata. But what even *is* a triple, you might ask?
The best way to see how stuff on Wikidata is mapped to triples might be to show how many are added or removed when certain things are done:
Additions of triples
editHow many triples are added...
Scenario | Example triples | Count added |
---|---|---|
...when you add a label? | wd:Q42 rdfs:label "Douglas Adams"@en. | 1 |
...when you add a description? | wd:Q42 schema:description "ব্রিটিশ লেখক"@bn. | 1 |
...when you add an alias? | wd:Q42 skos:altLabel "डग्लस अडम्स"@hi. | 1 |
...when you add a statement? | wd:Q42 p:P31 wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 wikibase:rank wikibase:NormalRank |
2 (minimum) |
...and that statement has an item[1] as a value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P31 wd:Q5 | (2 +) 1 |
...and that statement has a string[2] as a value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P1559 "Douglas Noël Adams"@en . | (2 +) 1 |
...and that statement has an external ID as a value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P214 "113230702" | (2 +) 1 |
...and that statement has an external ID as a value and that external ID has a formatter URL? |
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psn:P214 "https://viaf.org/viaf/113230702/" | ((2 +) 1 +) 1 |
...and that statement has a coordinate as a value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P119 "Point(0.0 50.0)"^^geo:wktLiteral wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psv:P119 v:a10564107110b2d5739b8fe235cddf73 v:a10564107110b2d5739b8fe235cddf73 a wikibase:GlobecoordinateValue v:a10564107110b2d5739b8fe235cddf73 wikibase:geoLatitude "50.0"^^xsd:double v:a10564107110b2d5739b8fe235cddf73 wikibase:geoLongitude "0.0"^^xsd:double v:a10564107110b2d5739b8fe235cddf73 wikibase:geoPrecision "0.000277778"^^xsd:double v:a10564107110b2d5739b8fe235cddf73 wikibase:geoGlobe wd:Q2 |
(2 +) 7 |
...and that statement has a quantity as a value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P2048 "+1.96"^^xsd:decimal wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psv:P119 v:a10564107110b2d5739b8fe235cddf73 v:a10564107110b2d5739b8fe235cddf73 a wikibase:QuantityValue v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityAmount "+1.96"^^xsd:decimal v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityUnit wd:Q11573 |
(2 +) 5 |
...and that statement has a quantity as a value and that quantity isn't exact? |
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityUpperBound "+1.97"^^xsd:decimal v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityLowerBound "+1.95"^^xsd:decimal |
((2 +) 5 +) 2 |
...and that statement has a quantity as a value and that quantity isn't exact and the units of that quantity can be expressed in some normalized form? (e.g. furlongs → meters, stone → kilograms) |
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityNormalized v:85374998f22bda54efb44a5617d76e51 v:85374998f22bda54efb44a5617d76e51 a wikibase:QuantityValue v:85374998f22bda54efb44a5617d76e51 wikibase:quantityAmount "+1.96"^^xsd:decimal v:85374998f22bda54efb44a5617d76e51 wikibase:quantityUnit wd:Q11573 v:85374998f22bda54efb44a5617d76e51 wikibase:quantityUpperBound "+1.97"^^xsd:decimal v:85374998f22bda54efb44a5617d76e51 wikibase:quantityLowerBound "+1.95"^^xsd:decimal |
(((2 +) 5 +) 2 +) 4 + 2 |
...and that statement has a time as a value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P569 "+1952-03-11T00:00:00Z/11"^^xsd:dateTime wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psv:P569 v:a10564107110b2d5739b8fe235cddf73 v:a10564107110b2d5739b8fe235cddf73 a wikibase:Time v:a10564107110b2d5739b8fe235cddf73 wikibase:timeValue "+1948-04-12T00:00:00Z"^^xsd:dateTime v:a10564107110b2d5739b8fe235cddf73 wikibase:timePrecision "11"^^xsd:integer v:a10564107110b2d5739b8fe235cddf73 wikibase:timeTimezone "0"^^xsd:integer v:a10564107110b2d5739b8fe235cddf73 wikibase:timeCalendarModel wd:Q1985727 |
(2 +) 7 |
...and that statement has somevalue (unknown value) as its value? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 p:P2021 _:genid1 | (2 +) 1 |
...and that statement has novalue as its value? | wd:Q42 a wdno:P6553 | (2 +) 1 |
...and that statement is "truthy" (is preferred, or is normal in the absence of preferred statements for that property)? |
wd:Q42 wdt:P31 wd:Q5 wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 a wikibase:BestRank |
(2 +) 2 |
...and the "truthy" statement is an external ID with a formatter URL? | wd:Q42 wdtn:P214 "https://viaf.org/viaf/113230702/" | ((((2 +) 1 +) 1 +) 2 +) 1 |
...when you add a qualifier? | wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 pq:P407 wd:Q1860 | 1 |
...and that qualifier has X as its value? | (see "...and that statement has X as its value?", substituting (2 +) with (1 +)) | |
...when you add a reference? | wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom wdref:87d0dc1c7847f19ac0f19be978015dfb202cf59a | 1 |
...and that reference has a property with X as its value? | (see "...and that statement has X as its value?", substituting (2 +) with (1 +)) | |
...when you add a sitelink? | <https://en.wikipedia.org/wiki/Douglas_Adams> a schema:Article <https://en.wikipedia.org/wiki/Douglas_Adams> schema:about wd:Q3 <https://en.wikipedia.org/wiki/Douglas_Adams> schema:inLanguage "en" <https://en.wikipedia.org/wiki/Douglas_Adams> schema:isPartOf <https://en.wikipedia.org/> <https://en.wikipedia.org/wiki/Douglas_Adams> schema:name "Douglas Adams"@en |
5 |
...and that sitelink has a badge? | <https://en.wikipedia.org/wiki/Douglas_Adams> wikibase:badge wd:Q17437796 | (5 +) 1 |
Net-zero triple changes
editIn some of the rows of the table above, there were some italicized triples. These are special because the value of the statement—more specifically, the set of predicate-object pairs among the italicized statements—is hashed to generate a unique string (a10564107110b2d5739b8fe235cddf73 in most of the examples above) which is then tied to all of those triples by setting it as their subject. Since this string is a hash, any other values of the same type (on statements, qualifiers, claims in references, normalized quantities) whose corresponding triple sets yield the same hash are not stored again, and instead the hash is merely linked to the statement/qualifier/claim in reference (the triples with "psv:" in them). As long as some statement/qualifier/claim in reference/normalized quantity has that exact value, there will be exactly one set of triples for it in the store.
More concretely, consider the date
- "20 January 2009" stored with day precision using the proleptic Gregorian calendar and time zone "0"
Every use of that exact date configuration, whether as the point in time (P585) of the inauguration of Barack Obama, the start time (P580) qualifier of Barack Obama's position held (P39) President of the United States (Q11696) claim, or the retrieved (P813) date of some reference, will point to the same hash.
That date's hash will be different from that of
- "19 January 2009" with day precision using the proleptic Gregorian calendar and time zone "0",
- "20 January 2009" with year precision using the proleptic Gregorian calendar and time zone "0",
- "20 January 2009" with day precision using the proleptic Julian calendar and time zone "0", and
- "20 January 2009" with day precision using the proleptic Gregorian calendar and time zone "1"
(bearing in mind that it's currently impossible to change the time zone).
Removing the P585 date mentioned above without changing anything else would remove 3 triples (the ones listed for "...when you add a statement?" and the non-italicized triple for "...and that statement has a time as a value?"), and conversely adding the same date as P585 to some other item without changing anything else would add 3 triples. On the other hand, if ("19 January 2009" with day precision using the proleptic Gregorian calendar and time zone "0") was not already stored in Wikidata somewhere, then adding that as a P585 statement would add 2 + 7 = 9 triples.
A similar analysis to the above holds for references, where the triples for all property-value pairs in the reference are hashed to yield the object of the "prov:wasDerivedFrom" triples.
Removals of triples
editNotwithstanding the conditions in "Net-zero triple changes" above, removing one of the objects mentioned in "Additions of triples" above will remove that many triples from the store.
There is one special case, however, which has not been dealt with above: when merging an item X into an item Y, a triple of the form "wd:X owl:sameAs wd:Y" is created, in addition to any triple removals due to equivalences or supersessions (which will differ depending on what tool you use to merge things and the settings for that tool).
Implications for future removals
editAs of ~21:00 UTC, 21 January 2022,
- there were 681,204,320 labels (~5.01%), 2,617,161,217 descriptions (~19.24%), and 168,219,562 aliases (~1.23%) in all of Wikidata. Introducing the "mul" language code would cut into the first number, while introducing automated descriptions for certain item classes into the Wikibase software would cut significantly into the second number.
- there were 52,819,797 unique quantity hashes (meaning at minimum 369,738,579 triples, ~2.71%, since this does not count the number of quantity values themselves) on Wikidata. Resolving phab:T181319 would help cut into this number.
- (ditto for the 9,014,219 unique coordinate hashes (meaning at minimum 63,099,533 triples, ~0.46%)
- there were 32,843,784 unique reference hashes(!) that have stated in (P248) Europe PubMed Central (Q5412157). Most of these hashes have a P248 claim (1), a PMC publication ID (P932) claim (2), a reference URL (P854) link (1), and a retrieved (P813) value (1)--resulting in at most 164,218,920 triples (~1.21%) among them. (Similar figures may be obtained for the 10,187,883 references with stated in (P248) PubMed Central (Q229883), 7,084,720 with Crossref (Q5188229), or 5,195,398 for PubMed (Q180686)). (As with the quantity and coordinate hashes, this does not count how many statements have any of these as a reference.) Finding a way to cut the size of these references in the triple store might be useful.
References
edit(Thanks to mw:Wikibase/Indexing/RDF_Dump_Format for being around when we need it!)