Wikidata:Property proposal/property for implied values

value hierarchy property edit

Originally proposed at Wikidata:Property proposal/Generic

Descriptionproperty which specifies less precise items than the indicated value for which statements using the subject property would still be true
Data typeProperty
DomainWikidata property (Q18616576)
Allowed valuestransitive Wikidata property (Q18647515)
Example 1instance of (P31)subclass of (P279)
Example 2headquarters location (P159)located in the administrative territorial entity (P131)
Example 3occupation (P106)subclass of (P279)
Example 4afflicts (P689)anatomical location (P927)
Planned useadd to a bunch of properties and use in tools, see below

Note: we have considered various labels for this property: "property for implied values", "transitive over", "refinement hierarchy" and currently "value hierarchy property". All these variations are used in the discussion below. − Pintoch (talk) 21:40, 15 January 2019 (UTC)[reply]

Motivation edit

I would be interested in expressing that a given property (such as subclass of (P279)) is used to determine how values of another property (instance of (P31)) can be refined. This is a pattern that we use at various places in Wikidata:

Potentially this property could be multi-valued if there are multiple refinement hierarchies in place (but I can't think of an example right now). I am not very happy with the label so I hope others have better ideas.

The idea behind introducing such a property would be to make it easier for data import tools (such as OpenRefine) to respect these hierarchies natively (for instance, when adding headquarters location (P159)City of Brussels (Q239) to an item, any unsourced headquarters location (P159)Brussels-Capital Region (Q240) statement could be safely deleted, I think).

This could also potentially improve the constraint system: for instance, if we have an item-requires-statement constraint (Q21503247) which requires headquarters location (P159)Paris (Q90), then the constraint system could also accept headquarters location (P159)7th arrondissement of Paris (Q259463) (because it is more precise). As far as I am aware this behaviour is only available for instance of (P31)/subclass of (P279) via subject type constraint (Q21503250) currently. @Lucas Werkmeister (WMDE): I have no idea if it is feasible (and in which case you could prefer another syntax for this information)? − Pintoch (talk) 15:09, 8 January 2019 (UTC)[reply]

  WikiProject Ontology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

@Jean-Frédéric, Olivier LPB: you have requested this feature at Help_talk:Property_constraints_portal/ItemPintoch (talk) 15:14, 8 January 2019 (UTC)[reply]

Discussion edit

ChristianKl18:23, 10 January 2019 (UTC)[reply]
  • @ChristianKl: I have asked around and it does not seem to have a particular name, it is just a particular case of an OWL construct: property chains. People in graph theory or other fields of math might have names for that particular relation, but I am not sure how to look for it (and it is likely to be as obscure as any other name we come up with, TBH). − Pintoch (talk)

  WikiProject Properties has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

John C. Calhoun (Q207191)position held (P39)Vice President of the United States (Q11699)
Vice President of the United States (Q11699)part of (P361)Federal Government of the United States (Q48525)
part of (P361)instance of (P31)transitive Wikidata property (Q18647515)
However, it would be wrong to say that John C. Calhoun (Q207191)position held (P39)Federal Government of the United States (Q48525).
So, position held (P39) is transitive over subclass of (P279) but not over part of (P361) (and both subclass of (P279) and part of (P361) are transitive).
So this notion of "X transitive over Y" is distinct from "Y transitive". Do you agree? Maybe the new label is causing confusion? − Pintoch (talk) 20:12, 10 January 2019 (UTC)[reply]
If you prefer symbolic statements over prose:
  • Q is transitive when  
  • P is transitive over Q when  
Pintoch (talk) 20:58, 10 January 2019 (UTC)[reply]
Another counterexample found by sparqling around: sibling (P3373) is transitive. child (P40) is transitive over sibling (P3373) (if X has child Y and Y has sibling Z, then X has child Z), but mother (P25) is not transitive over sibling (P3373) (if X has mother Y and Y has sibling Z, then surely Z is is not X's mother!). − Pintoch (talk) 23:47, 10 January 2019 (UTC)[reply]
@Pintoch: I don't think so. Halfsiblings frequently are siblings. ChristianKl07:16, 11 January 2019 (UTC)[reply]
@ChristianKl: yes that is true − this example is not something we would want to have anyway for the two applications mentioned above (because sibling (P3373) is symmetric, so it cannot be seen as a refinement hierarchy like the other examples). I just hope it helps MisterSynergy grasp the difference with transitivity a bit better (in particular, the fact that mother (P25) is not transitive over sibling (P3373) is still a counterexample of his claim, I think). − Pintoch (talk) 07:52, 11 January 2019 (UTC)[reply]
Yeah, I understand. However, this is the same problem that User:Yair_rand mentioned above and which you acknowledged, just in a much more obvious sense. Transitivity of a property Y (e.g. part of (P361)) used in value items of property X (e.g. position held (P39)) holds only as long as you stay within the range of X (value-type constraint (Q21510865) we say at Wikidata). It works for a couple of steps for occupations and then breaks down, but barely for parthood relations which are typically very messi at Wikidata. It works practically always for the proposed headquarters location (P159)located in the administrative territorial entity (P131), as the range of the latter is completely contained within the range of the former; the problem does not really matter for the proposed instance of (P31)subclass of (P279), as instance of (P31) does not have a defined range.
There is another, independent problem with parthood relations in particular, in contrast to other transitive properties (mereology (Q1194916) is the research field that studies parthood relations). For parthood relations transitivity is typically assumed within mereology, but there is research in the community about whether this is generally valid or not (see here). Consider for instance these (fictional) claims:
Although both claims would be acceptable for Wikidata and part of (P361) is transitive, you clearly would not infer that
⟨ Pintoch's big toe ⟩ part of (P361)   ⟨ Wikidata admin corps ⟩
. The problem is that parthood relations are only transitive under certain circumstances, for example in a local meronomy (such as “Pintoch's body parts” or “Wikimedia project entites”). However, parthood relations are not generally transitive.
I still do not think that this proposed property should be created. The range problem raised by Yair_rand would have to be addressed with extra efforts based on existing value-type constraint (Q21510865) claims on properties anyway (so the proposed property does not add new value), and the parthood problem would rather be solved by removing the transitivity from part of (P361)/has part(s) (P527). (I still have to have a look at the sibling (P3373) problem, though.) —MisterSynergy (talk) 10:30, 11 January 2019 (UTC)[reply]
@MisterSynergy: you rightly point out that transitivity itself is often messy. I would argue that this is not just true for part of (P361): very often, long chains of subclass of (P279) also give nonsensical results when we follow them high up into very abstract concepts. That's inevitable: we are building a knowledge graph, not a math textbook.
Let's step back and look at the motivation of the proposal. What I mean with this proposal is that many properties tend to have one designated property for their refinement hierarchy. It seems to me that this pattern is pretty pervasive, but we tend to only acknowledge this for the instance of (P31)/subclass of (P279) pair. Why?
It would be massively useful if the constraint system could handle other such hierarchies. Typically when working with human (Q5), the fact that we discourage subclasses of human (Q5) makes the type system useless there. Have you ever wanted to enforce things like "every item with an ORCID iD (P496) should have occupation (P106) researcher (Q1650915), or any subclass of it"? Now, I could propose to add a parameter to the constraints definition so that we could provide subclass of (P279) there, but I think this would be misguided as this information is really determined by the property used for the required statement. So I think this should be stored in a generic way, outside the constraint system. Now I would very much welcome any other suggestion about how to represent this. − Pintoch (talk) 10:54, 11 January 2019 (UTC)[reply]
Also (sorry for the long reply!), it seems to me that there might still be a conceptual misunderstanding here: in the abstract, the fact that a property Q is transitive really does not mean that "x P y Q z" implies "x P z" for any property P. This is wrong both in mathematics and in ontology design. So there is nothing to "fix" about part of (P361) being transitive in this regard. It turns out that most of the transitive properties we have in Wikidata are containment relations, where this tends to hold often,, but that does not have to be the case. For instance, imagine we had a property called "hates". The fact that "MisterSynergy hates cauliflower (Q23900272)" really does not imply that "MisterSynergy hates vegetable (Q11004)", even if the chain of subclass of (P279) is totally correct. − Pintoch (talk) 11:41, 11 January 2019 (UTC)[reply]
Now   Neutral, as I do not want to stand in the way too much. I changed my mind after a second read of this proposal and its comments. The transitivity does no longer seem to be the critical aspect of this property, and I really hope that this term does not appear in the potential property label. It is also worth to mention that it may be useful for refinements, i.e. looking *down* along a hierarchy for potentially better/more specific values, which pretty much avoids the discussed problems that arise when going *up* a hierarchy towards more general values. —MisterSynergy (talk) 11:53, 13 January 2019 (UTC)[reply]
Yes, it could be interesting to have something for implications which are going down a hierarchy. I cannot think of an example though (beyond my "hates" example above, maybe). − Pintoch (talk) 15:21, 14 January 2019 (UTC)[reply]
I'm not sure if the above can of worms is what ChristianKl intended to open, but the question of how other ontologies deal with this again seems relevant. I remember some peculiarities with transitivity in SKOS - there are both the relations "broader" and "broaderTransitive", and "broaderTransitive" is declared as a superproperty of "broader" (and note this does NOT make "broader" transitive, despite the fact that "superproperty" like "superclass" is itself transitive). There's a discussion of some of this (in a more general case) here. It all kind of hurts the brain a bit... ArthurPSmith (talk) 14:06, 11 January 2019 (UTC)[reply]
@ArthurPSmith: I don't think that properties like this should be trivally created. We should generally follow existing standards and only invent our own if we have good reasons to invent the wheel anew. ChristianKl10:46, 13 January 2019 (UTC)[reply]
@ChristianKl: I totally agree. That being said, it seems to me that there is no established RDF predicate for this in the usual namespaces. The reason for that is that this information is best expressed in OWL, not in RDF directly. Wikidata has not engaged much with OWL so far - the project has followed a different route by storing ontological information in the same data model as the data itself. For instance we have inverse property (P1696), and we use statements such that instance of (P31)transitive Wikidata property (Q18647515), which intrinsically don't do anything (Wikibase does not give them any special meaning, the query service does not infer any triples from them, and so on). Storing property constraints as statements on the property is another example. So this is why I proposed this property: it seems to be in line with the current customs of storing ontological information as statements on properties. That being said, it would obviously be much better if we had generic support for constructs like property chains, which would be much more expressive. But that is a much larger project, and we might want to think twice before construing these concepts in the Wikibase data model (maybe this is something we want to store elsewhere?) − Pintoch (talk) 11:03, 13 January 2019 (UTC)[reply]
Yes, I also expect that subclass of (P279) and located in the administrative territorial entity (P131) would be the most common values for this property. Concerning the label, does "domain" generally refer to the target value? Intuitively, for me it would refer to the subject item. (Maybe by a vague mathematical analogy with the domain of a function, although statements aren't functions) So it would be more intuitive for me to have something else like "range", "value" or "target". − Pintoch (talk) 15:21, 14 January 2019 (UTC)[reply]
Oh! Yes, I was thinking "domain" in the generic sense of type of item. Maybe "value hierarchy property"? ArthurPSmith (talk) 17:10, 14 January 2019 (UTC)[reply]
I think that would be quite clear indeed! − Pintoch (talk) 17:21, 14 January 2019 (UTC)[reply]
I attempted to update the label and description in English in the proposal - ok now? ArthurPSmith (talk) 20:33, 15 January 2019 (UTC)[reply]
Yes, thanks! I am adding a note to explain the variations of labels in the discussion. − Pintoch (talk) 21:40, 15 January 2019 (UTC)[reply]
  •   Comment as far as constraint checks are concerned: any kind of transitivity hurts performance (“type” and “value type” are among the most expensive constraint checks by far) and caching (we can’t invalidate all cached results if an item buried somewhere in the chain is edited, that’s why you’ll see “this result may be outdated” on some type / value type check results). --Lucas Werkmeister (WMDE) (talk) 17:27, 14 January 2019 (UTC)[reply]
Thanks! Yes I can imagine this would be expensive to run. So it makes all the more sense to store this information outside the constraint system. I would still be interested in this to ease data imports. − Pintoch (talk) 20:07, 14 January 2019 (UTC)[reply]

Wrapping up edit

It seems to me that the remaining issues are:

Pintoch (talk) 14:10, 3 February 2019 (UTC)[reply]