User:Tagishsimon/WDQS

Qualifiers and references edit

Getting qualifier or reference data for properties of items is a bit confusing until it clicks. The key to the whole thing (for me, at least) is the data model diagram. You need to stare at this until it makes sense, no matter how long it takes :)

Getting a property value is easy. You ask the report service for the wdt: value. You can see that on the diagram - start in the item circle, follow the wdt: arrow line, and arrive at the simple value. You can also see from the same diagram that there's nowhere you can go from the simple value; you are at journey's end.

If you want a Property Qualifier, you need to ask for a pq: value. To do that, you need to follow the path from the item circle to the statement circle using the p: path. From the statement circle you can then take a variety of paths, such as the ps: path (which will give you the same value as the wdt: path did) or the pq: path which will give you the simple value for the qualifier.

The statement itself is just a long & unique URI cobbled together by wikidata, the only purpose of which is to allow things like qualifiers to be hung from it.

So in report terms, it looks kinda like this (and here I'm reporting on the set of YYYY Sweet Adelines International chorus competition items:

SELECT ?item ?itemLabel (year(?when) as ?year) ?winnerLabel ?score
WHERE 
{
  ?item wdt:P31 wd:Q57776091.  # the item is an instance of a Sweet Adelines International chorus competition
  ?item wdt:P585 ?when.        # it has a when statement
  ?item p:P1346 ?statement.    # it has a p: value for P1346, which points to a statement - the unique URI
  ?statement ps:P1346 ?winner. # the statement has a ps: value i.e. the winner's identity
  ?statement pq:P1351 ?score.  # the statement has a pq:P1351 value, which is the score
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by desc(?year)
Try it!

And there's no reason why you should not see the statement - that might help

SELECT ?item ?itemLabel (year(?when) as ?year) ?statement ?winnerLabel ?score
WHERE 
{
  ?item wdt:P31 wd:Q57776091.  # the item is an instance of a Sweet Adelines International chorus competition
  ?item wdt:P585 ?when.        # it has a when statement
  ?item p:P1346 ?statement.    # it has a p: value for P1346, which points to a statement - the unique URI
  ?statement ps:P1346 ?winner. # the statement has a ps: value i.e. the winner's identity
  ?statement pq:P1351 ?score.  # the statement has a pq:P1351 value, which is the score
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by desc(?year)
Try it!

and you can also write the query more efficiently with some clever square brackets, which do away with the need to explicitly reference the statement:

SELECT ?item ?itemLabel (year(?when) as ?year) ?winnerLabel ?score
WHERE 
{
  ?item wdt:P31 wd:Q57776091.  # the item is an instance of a Sweet Adelines International chorus competition
  ?item wdt:P585 ?when.        # it has a when statement
  ?item p:P1346 [ps:P1346 ?winner ; pq:P1351 ?score] .  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by desc(?year)
Try it!

Hope all that helps; ping me here or on twitter for more. (References kinda work the same way, but there's some more complexity there. We could look at that when you've grokked this bit. --Tagishsimon (talk) 01:29, 29 October 2018 (UTC)

Ah, go on then. Let's do references too. So here we need to hop from the ?statement to ?another_statement via a prov:wasDerivedFrom path. The ?another_statement is another of the machine generated long URIs, cobbled together to provide somewhere from which to hang references. Once we've got the ?another_statement, we can ask for pr: (Property Reference) values, and in your example, you have a Ref_URL and a Retrieved date - P854 and P813.
Again, follow the logic on the diagram; we go from the item to a statement using p:, and then from the statement to the reference_node circle (which in the code below I've called ?another_statement) using prov:wasDerivedFrom ... and from there to the simple values of the references using pr: - specifically pr:P854 to get the ref_url and pr:P813 to get the retrieved date.
SELECT ?item ?itemLabel (year(?when) as ?year) ?winnerLabel ?score ?ref_url ?retrieved
WHERE 
{
  ?item wdt:P31 wd:Q57776091.  # the item is an instance of a Sweet Adelines International chorus competition
  ?item wdt:P585 ?when.        # it has a when statement
  ?item p:P1346 ?statement.    # it has a p: value for P1346, which points to a statement - the unique URI
  ?statement ps:P1346 ?winner. # the statement has a ps: value i.e. the winner's identity
  ?statement pq:P1351 ?score.  # the statement has a pq:P1351 value, which is the score
  ?statement prov:wasDerivedFrom ?another_statement.
  ?another_statement pr:P854 ?ref_url.
  ?another_statement pr:P813 ?retrieved.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by desc(?year)
Try it!
And again we can see the ?another_statement, for what it's worth
SELECT ?item ?itemLabel (year(?when) as ?year) ?winnerLabel ?score ?another_statement ?ref_url ?retrieved
WHERE 
{
  ?item wdt:P31 wd:Q57776091.  # the item is an instance of a Sweet Adelines International chorus competition
  ?item wdt:P585 ?when.        # it has a when statement
  ?item p:P1346 ?statement.    # it has a p: value for P1346, which points to a statement - the unique URI
  ?statement ps:P1346 ?winner. # the statement has a ps: value i.e. the winner's identity
  ?statement pq:P1351 ?score.  # the statement has a pq:P1351 value, which is the score
  ?statement prov:wasDerivedFrom ?another_statement.
  ?another_statement pr:P854 ?ref_url.
  ?another_statement pr:P813 ?retrieved.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by desc(?year)
Try it!
and we could scrunch things up bigly with square brackets if we felt like being flash:
SELECT ?item ?itemLabel (year(?when) as ?year) ?winnerLabel ?score ?ref_url ?retrieved
WHERE 
{
  ?item wdt:P31 wd:Q57776091.  # the item is an instance of a Sweet Adelines International chorus competition
  ?item wdt:P585 ?when.        # it has a when statement
  ?item p:P1346 [ps:P1346 ?winner ; pq:P1351 ?score; prov:wasDerivedFrom [pr:P854 ?ref_url; pr:P813 ?retrieved] ] .  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by desc(?year)
Try it!

Cuba - truthy & non-truthy statements edit

Consider the case of Cuba (Q241) - have a look at its P31 statement. It's a country (Q6256). It is also a sovereign state (Q3624078)

In the following examples, I use VALUES ?item {wd:Q241} to constrain the report to look at only Cuba ... there can be multiple values in a VALUES statement ... it's useful for examining subsets of records where you know the QIDs. If you remove the VALUES statement yuo'll get, variously, all Countries or all Sovereign states.

So, if we do a report looking for items that are countries (constrained to Cuba), we expect to see Cuba being returned:

SELECT ?item ?itemLabel 
WHERE 
{
  VALUES ?item {wd:Q241}
  ?item wdt:P31 wd:Q6256 .   # country
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

But we get nothing.

If we do the same report and look for Sovereign States, it works:

SELECT ?item ?itemLabel 
WHERE 
{
  VALUES ?item {wd:Q241}
  ?item wdt:P31 wd:Q3624078.  # sovereign state
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

And the reason is to do with Statement Ranks, and the way they interact with the wdt:predicate.

In short, a Rank can be assigned to any statement ... you'll see in Cuba's P31 statements, just to the left of each entry, there is a single 'character' made up of an up-pointing arrow on top of a circle, on top of a down pointing arrow. The Country entry has the circle shaded. The Sovereign State entry has the up arrow shaded.

The arrows represent the ranks: specifically 'Preferred Rank", "Normal Rank' and 'Deprecated Rank". For Cuba, Sovereign State is has Preferred Rank. Country has Normal rank.

Now we come to Truthy and Non Truthy. A Truthy statement is the statement which is probably the current right answer. A non-truthy statement may or may not be true, but is probably for most purposes less useful than the truthy statement. An example might be a railway station that has closed. It would probably be worth marking it as P31=Railway Station, with normal rank (because it was once a station) but also P31=Former Railway Station with preferred rank. If we ask WDQS what the truthy value of the item is, it will tell us "It's a former railway station". And for most purposes this is the truth.

When we do a report asking for a wdt: statement, we are asking for the Truthy statement. And that's what we did in the above two reports.

Wikidata / WDQS has rules which define what the truthy statement(s) is(are), when there are multiple statements. Preferred rank statements are truthy, and the presence of a Preferred Rank statement makes all the other statements non-truthy. This is all defined here: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Statement_types

Back to Cuba. Soversign State is truthy because it is of Preferred rank. Country is non-truthy, becaus eit is Normal Rank, and there is a Preferred Rank statements.

So a query for ?item wdt:P31 wd:Q6256 . # country finds nothing for the Cuba example.

WDQS has many Prefixes - such as wdt: - which are briefly listed here: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Prefixes_used

Prefixes all us to drill into more detail in out data model.

For Cuba, and establishing it is a country, and knowing that the prefix wdt: will not help, we turn to p: and ps:

Take a deep breath...

When we create a statement such as

⟨ Cuba (Q241)      ⟩ instance of (P31)   ⟨ country (Q6256)      ⟩

, WDQS creates a very particular sort of Triple in the form of Q241 p:P31 wds:Q241-????????????. We can see this for Cuba in this report:

SELECT ?item ?itemLabel ?statement
WHERE 
{
  VALUES ?item {wd:Q241}
  ?item p:P31 ?statement .   # country
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

And, for the avoidance of doubt, Wikidata / WDQS is all about Triples ... everything in WDQS is a Triple. And a Triple is nothing more than some Subject, Predicate, Object. Although we are used to ?Item (e.g. Cuba) wdt:P31 (instance) wd:Q6256 (country) as a typical sort of Triple, we now need to get used to slightly more weird triples such as we see in the above report.

The purpose of these weird triples is to allow us to hang truthy and non-truthy statements, or qualifiers, or references, onto an item. This is how it works for our Cuba / Country problem.

We can write a report which asks, does Cuba have a P31 statement, and is the value of that P31 statement 'Country", like so:

SELECT ?item ?itemLabel 
WHERE 
{
  VALUES ?item {wd:Q241}
  ?item p:P31 ?statement .   # has a P31 statement
  ?statement ps:P31 wd:Q6256 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

So, to pause for a moment: a combination of p: and ps: can be used to return non-truthy values.

p: is also the route to getting at qualifiers. Let's consider the head of state (P35) value for Cuba - https://www.wikidata.org/wiki/Q241#P35 ... we can see a Start time there. How to get at it? Like this

SELECT ?item ?itemLabel ?value ?valueLabel ?start_time
WHERE 
{
  VALUES ?item {wd:Q241}
  ?item p:P35 ?statement .   # has a P35 statement
  ?statement ps:P35 ?value .
  ?statement pq:P580 ?start_time .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

Let me know when you want to take this further and drill down to References.

Oh. And you might now look at the diagram at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model and considier the Country versus Sovereign State issue, as well as a Qualifier. Understanding this diagram is the most important thing you need to do. :)

In the diagram, let's start at the Item bubble. That in our example is Cuba. We can see that we can follow a wdt: arrow which takes us to a 'simple value' bubble. That is our Truthy value - for Cuba and P31, Sovereign State. Notice that you cannot go anywhere from the simple value bubble ... you cannot get at qualifiers nor references.

Now let's think about the Cuba P31 Country business. Start at the Item bubble. You can move right using the p: arrow to get to a Statement bubble. And from the Statement bubble, we can use ps: to get to the simple value - Country - but also we could use the pq: arrow to get to a qualifier, or other routes - psv: or prov:wasDerivedFrom to get other more granular information.

So. Enough for now. Wrap your head around all of that :)