Wikidata:Property proposal/property for implied values
value hierarchy property edit
Originally proposed at Wikidata:Property proposal/Generic
Description | property which specifies less precise items than the indicated value for which statements using the subject property would still be true |
---|---|
Data type | Property |
Domain | Wikidata property (Q18616576) |
Allowed values | transitive Wikidata property (Q18647515) |
Example 1 | instance of (P31) → subclass of (P279) |
Example 2 | headquarters location (P159) → located in the administrative territorial entity (P131) |
Example 3 | occupation (P106) → subclass of (P279) |
Example 4 | afflicts (P689) → anatomical location (P927) |
Planned use | add to a bunch of properties and use in tools, see below |
Note: we have considered various labels for this property: "property for implied values", "transitive over", "refinement hierarchy" and currently "value hierarchy property". All these variations are used in the discussion below. − Pintoch (talk) 21:40, 15 January 2019 (UTC)
Motivation edit
I would be interested in expressing that a given property (such as subclass of (P279)) is used to determine how values of another property (instance of (P31)) can be refined. This is a pattern that we use at various places in Wikidata:
- instance of (P31)business (Q4830453) is more precise than instance of (P31)organization (Q43229), because there is a chain of subclass of (P279) statements between business (Q4830453) and organization (Q43229). This would be expressed by instance of (P31)property for implied valuessubclass of (P279).
- headquarters location (P159)City of Brussels (Q239) is more precise than headquarters location (P159)Brussels-Capital Region (Q240), because there is a chain of located in the administrative territorial entity (P131) between the two values. This would be expressed by headquarters location (P159)property for implied valueslocated in the administrative territorial entity (P131).
Potentially this property could be multi-valued if there are multiple refinement hierarchies in place (but I can't think of an example right now). I am not very happy with the label so I hope others have better ideas.
The idea behind introducing such a property would be to make it easier for data import tools (such as OpenRefine) to respect these hierarchies natively (for instance, when adding headquarters location (P159)City of Brussels (Q239) to an item, any unsourced headquarters location (P159)Brussels-Capital Region (Q240) statement could be safely deleted, I think).
This could also potentially improve the constraint system: for instance, if we have an item-requires-statement constraint (Q21503247) which requires headquarters location (P159)Paris (Q90), then the constraint system could also accept headquarters location (P159)7th arrondissement of Paris (Q259463) (because it is more precise). As far as I am aware this behaviour is only available for instance of (P31)/subclass of (P279) via subject type constraint (Q21503250) currently. @Lucas Werkmeister (WMDE): I have no idea if it is feasible (and in which case you could prefer another syntax for this information)? − Pintoch (talk) 15:09, 8 January 2019 (UTC)
WikiProject Ontology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
- @Jean-Frédéric, Olivier LPB: you have requested this feature at Help_talk:Property_constraints_portal/Item − Pintoch (talk) 15:14, 8 January 2019 (UTC)
Discussion edit
- Support would a label something like "property over which this is transitive" make sense? But yes, this seems like a very useful abstraction to provide a structured mechanism for. ArthurPSmith (talk) 19:36, 8 January 2019 (UTC)
- Thanks, yes that could be clearer, especially if we can get rid of the "this". − Pintoch (talk) 06:59, 9 January 2019 (UTC)
- Given Yair rand's phrasing below (occupation (P106) is only transitive over subclass of (P279)), I have renamed to transitive over. − Pintoch (talk) 07:28, 10 January 2019 (UTC)
- Thanks, yes that could be clearer, especially if we can get rid of the "this". − Pintoch (talk) 06:59, 9 January 2019 (UTC)
- This can get complicated. occupation (P106) is only transitive over subclass of (P279) until outside the occupation class tree. --Yair rand (talk) 23:45, 8 January 2019 (UTC)
- Good point. But values outside the occupation class tree are already forbidden by a value-type constraint (Q21510865), so maybe we can say that the implication only holds if the broader value is acceptable with respect to other constraints? Or we could indicate the containing class with a qualifier? Anyway, for both applications mentioned (data import and constraint violations), this should not be an issue. − Pintoch (talk) 06:57, 9 January 2019 (UTC)
- Support David (talk) 12:53, 9 January 2019 (UTC)
- Comment New label ("transitive over") works for me! ArthurPSmith (talk) 16:28, 10 January 2019 (UTC)
- @ArthurPSmith: given MisterSynergy's reaction below, I have the feeling that this label is too close to "transitive" and might cause confusion for others. Maybe something like "refinement hierarchy" would be better - it would also rule out values like sibling (P3373) which are symmetric properties. − Pintoch (talk)
- Wait How do other ontologies handle this? Where is prior art? Especially, when it comes to naming this relationship?
ChristianKl ❪✉❫ 18:23, 10 January 2019 (UTC)
- @ChristianKl: yes indeed surely this has been formalized in other places. I will investigate. − Pintoch (talk) 20:12, 10 January 2019 (UTC)
- @ChristianKl: I have asked around and it does not seem to have a particular name, it is just a particular case of an OWL construct: property chains. People in graph theory or other fields of math might have names for that particular relation, but I am not sure how to look for it (and it is likely to be as obscure as any other name we come up with, TBH). − Pintoch (talk)
WikiProject Properties has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
- (Thanks for the ping.)
(see below). For all given examples, the value properties subclass of (P279), located in the administrative territorial entity (P131), and anatomical location (P927) have already an explicit relation instance of (P31): transitive Wikidata property (Q18647515). This transitivity is valid for any kind of use, not just the ones which should use the property proposed here. As only transitive Wikidata property (Q18647515) values should be allowed according to the proposal, we do not gain anything with this proposed property which is not already there. —MisterSynergy (talk) 19:22, 10 January 2019 (UTC){{Oppose}}
- @MisterSynergy: sorry, it looks like I did not pick the best examples. Please consider this case:
- John C. Calhoun (Q207191)position held (P39)Vice President of the United States (Q11699)
- Vice President of the United States (Q11699)part of (P361)Federal Government of the United States (Q48525)
- part of (P361)instance of (P31)transitive Wikidata property (Q18647515)
- However, it would be wrong to say that John C. Calhoun (Q207191)position held (P39)Federal Government of the United States (Q48525).
- So, position held (P39) is transitive over subclass of (P279) but not over part of (P361) (and both subclass of (P279) and part of (P361) are transitive).
- So this notion of "X transitive over Y" is distinct from "Y transitive". Do you agree? Maybe the new label is causing confusion? − Pintoch (talk) 20:12, 10 January 2019 (UTC)
- If you prefer symbolic statements over prose:
- Q is transitive when
- P is transitive over Q when
- − Pintoch (talk) 20:58, 10 January 2019 (UTC)
- If you prefer symbolic statements over prose:
- Another counterexample found by sparqling around: sibling (P3373) is transitive. child (P40) is transitive over sibling (P3373) (if X has child Y and Y has sibling Z, then X has child Z), but mother (P25) is not transitive over sibling (P3373) (if X has mother Y and Y has sibling Z, then surely Z is is not X's mother!). − Pintoch (talk) 23:47, 10 January 2019 (UTC)
- @Pintoch: I don't think so. Halfsiblings frequently are siblings. ChristianKl ❪✉❫ 07:16, 11 January 2019 (UTC)
- @ChristianKl: yes that is true − this example is not something we would want to have anyway for the two applications mentioned above (because sibling (P3373) is symmetric, so it cannot be seen as a refinement hierarchy like the other examples). I just hope it helps MisterSynergy grasp the difference with transitivity a bit better (in particular, the fact that mother (P25) is not transitive over sibling (P3373) is still a counterexample of his claim, I think). − Pintoch (talk) 07:52, 11 January 2019 (UTC)
- @Pintoch: I don't think so. Halfsiblings frequently are siblings. ChristianKl ❪✉❫ 07:16, 11 January 2019 (UTC)
- Another counterexample found by sparqling around: sibling (P3373) is transitive. child (P40) is transitive over sibling (P3373) (if X has child Y and Y has sibling Z, then X has child Z), but mother (P25) is not transitive over sibling (P3373) (if X has mother Y and Y has sibling Z, then surely Z is is not X's mother!). − Pintoch (talk) 23:47, 10 January 2019 (UTC)
- Yeah, I understand. However, this is the same problem that User:Yair_rand mentioned above and which you acknowledged, just in a much more obvious sense. Transitivity of a property Y (e.g. part of (P361)) used in value items of property X (e.g. position held (P39)) holds only as long as you stay within the range of X (value-type constraint (Q21510865) we say at Wikidata). It works for a couple of steps for occupations and then breaks down, but barely for parthood relations which are typically very messi at Wikidata. It works practically always for the proposed headquarters location (P159) → located in the administrative territorial entity (P131), as the range of the latter is completely contained within the range of the former; the problem does not really matter for the proposed instance of (P31) → subclass of (P279), as instance of (P31) does not have a defined range.
- There is another, independent problem with parthood relations in particular, in contrast to other transitive properties (mereology (Q1194916) is the research field that studies parthood relations). For parthood relations transitivity is typically assumed within mereology, but there is research in the community about whether this is generally valid or not (see here). Consider for instance these (fictional) claims:
- Although both claims would be acceptable for Wikidata and part of (P361) is transitive, you clearly would not infer that . The problem is that parthood relations are only transitive under certain circumstances, for example in a local meronomy (such as “Pintoch's body parts” or “Wikimedia project entites”). However, parthood relations are not generally transitive.
- I still do not think that this proposed property should be created. The range problem raised by Yair_rand would have to be addressed with extra efforts based on existing value-type constraint (Q21510865) claims on properties anyway (so the proposed property does not add new value), and the parthood problem would rather be solved by removing the transitivity from part of (P361)/has part(s) (P527). (I still have to have a look at the sibling (P3373) problem, though.) —MisterSynergy (talk) 10:30, 11 January 2019 (UTC)
- @MisterSynergy: you rightly point out that transitivity itself is often messy. I would argue that this is not just true for part of (P361): very often, long chains of subclass of (P279) also give nonsensical results when we follow them high up into very abstract concepts. That's inevitable: we are building a knowledge graph, not a math textbook.
- Let's step back and look at the motivation of the proposal. What I mean with this proposal is that many properties tend to have one designated property for their refinement hierarchy. It seems to me that this pattern is pretty pervasive, but we tend to only acknowledge this for the instance of (P31)/subclass of (P279) pair. Why?
- It would be massively useful if the constraint system could handle other such hierarchies. Typically when working with human (Q5), the fact that we discourage subclasses of human (Q5) makes the type system useless there. Have you ever wanted to enforce things like "every item with an ORCID iD (P496) should have occupation (P106) researcher (Q1650915), or any subclass of it"? Now, I could propose to add a parameter to the constraints definition so that we could provide subclass of (P279) there, but I think this would be misguided as this information is really determined by the property used for the required statement. So I think this should be stored in a generic way, outside the constraint system. Now I would very much welcome any other suggestion about how to represent this. − Pintoch (talk) 10:54, 11 January 2019 (UTC)
- Also (sorry for the long reply!), it seems to me that there might still be a conceptual misunderstanding here: in the abstract, the fact that a property Q is transitive really does not mean that "x P y Q z" implies "x P z" for any property P. This is wrong both in mathematics and in ontology design. So there is nothing to "fix" about part of (P361) being transitive in this regard. It turns out that most of the transitive properties we have in Wikidata are containment relations, where this tends to hold often,, but that does not have to be the case. For instance, imagine we had a property called "hates". The fact that "MisterSynergy hates cauliflower (Q23900272)" really does not imply that "MisterSynergy hates vegetable (Q11004)", even if the chain of subclass of (P279) is totally correct. − Pintoch (talk) 11:41, 11 January 2019 (UTC)
- Now Neutral, as I do not want to stand in the way too much. I changed my mind after a second read of this proposal and its comments. The transitivity does no longer seem to be the critical aspect of this property, and I really hope that this term does not appear in the potential property label. It is also worth to mention that it may be useful for refinements, i.e. looking *down* along a hierarchy for potentially better/more specific values, which pretty much avoids the discussed problems that arise when going *up* a hierarchy towards more general values. —MisterSynergy (talk) 11:53, 13 January 2019 (UTC)
- Yes, it could be interesting to have something for implications which are going down a hierarchy. I cannot think of an example though (beyond my "hates" example above, maybe). − Pintoch (talk) 15:21, 14 January 2019 (UTC)
- I'm not sure if the above can of worms is what ChristianKl intended to open, but the question of how other ontologies deal with this again seems relevant. I remember some peculiarities with transitivity in SKOS - there are both the relations "broader" and "broaderTransitive", and "broaderTransitive" is declared as a superproperty of "broader" (and note this does NOT make "broader" transitive, despite the fact that "superproperty" like "superclass" is itself transitive). There's a discussion of some of this (in a more general case) here. It all kind of hurts the brain a bit... ArthurPSmith (talk) 14:06, 11 January 2019 (UTC)
- @ArthurPSmith: I don't think that properties like this should be trivally created. We should generally follow existing standards and only invent our own if we have good reasons to invent the wheel anew. ChristianKl ❪✉❫ 10:46, 13 January 2019 (UTC)
- @ChristianKl: I totally agree. That being said, it seems to me that there is no established RDF predicate for this in the usual namespaces. The reason for that is that this information is best expressed in OWL, not in RDF directly. Wikidata has not engaged much with OWL so far - the project has followed a different route by storing ontological information in the same data model as the data itself. For instance we have inverse property (P1696), and we use statements such that instance of (P31)transitive Wikidata property (Q18647515), which intrinsically don't do anything (Wikibase does not give them any special meaning, the query service does not infer any triples from them, and so on). Storing property constraints as statements on the property is another example. So this is why I proposed this property: it seems to be in line with the current customs of storing ontological information as statements on properties. That being said, it would obviously be much better if we had generic support for constructs like property chains, which would be much more expressive. But that is a much larger project, and we might want to think twice before construing these concepts in the Wikibase data model (maybe this is something we want to store elsewhere?) − Pintoch (talk) 11:03, 13 January 2019 (UTC)
- Comment Ok, it sounds like we need to settle on a non-confusing label (and if we can find a label somebody else has used, all the better). I was looking at a sample of the item-valued properties we have; for many of them this whole discussion simply does not apply (there's no "refinement hierarchy" or transitivity that would be associated with person-valued properties like head of government (P6) or director (P57), and I can't think of one that would apply for unitary entity values like country (P17) either.) For the ones associated with a physical location, for example place of birth (P19), then we're clearly interested in the location-based hierarchy that Wikidata manages via located in the administrative territorial entity (P131). For some others, subclass of (P279) seems to apply (for example for language-valued properties like official language (P37)). For employer (P108) I am not sure if we have a consistent refinement hierarchy - parent organization (P749) perhaps, but also part of (P361), owned by (P127)? Fundamentally I think what we're looking for is the way in which the domain of the property value (class, place, language, organization, etc.) is hierarchically modeled in Wikidata. So would a label like "domain hierarchy property" make sense? ArthurPSmith (talk) 14:20, 14 January 2019 (UTC)
- Yes, I also expect that subclass of (P279) and located in the administrative territorial entity (P131) would be the most common values for this property. Concerning the label, does "domain" generally refer to the target value? Intuitively, for me it would refer to the subject item. (Maybe by a vague mathematical analogy with the domain of a function, although statements aren't functions) So it would be more intuitive for me to have something else like "range", "value" or "target". − Pintoch (talk) 15:21, 14 January 2019 (UTC)
- Oh! Yes, I was thinking "domain" in the generic sense of type of item. Maybe "value hierarchy property"? ArthurPSmith (talk) 17:10, 14 January 2019 (UTC)
- I think that would be quite clear indeed! − Pintoch (talk) 17:21, 14 January 2019 (UTC)
- I attempted to update the label and description in English in the proposal - ok now? ArthurPSmith (talk) 20:33, 15 January 2019 (UTC)
- Yes, thanks! I am adding a note to explain the variations of labels in the discussion. − Pintoch (talk) 21:40, 15 January 2019 (UTC)
- I attempted to update the label and description in English in the proposal - ok now? ArthurPSmith (talk) 20:33, 15 January 2019 (UTC)
- I think that would be quite clear indeed! − Pintoch (talk) 17:21, 14 January 2019 (UTC)
- Oh! Yes, I was thinking "domain" in the generic sense of type of item. Maybe "value hierarchy property"? ArthurPSmith (talk) 17:10, 14 January 2019 (UTC)
- Yes, I also expect that subclass of (P279) and located in the administrative territorial entity (P131) would be the most common values for this property. Concerning the label, does "domain" generally refer to the target value? Intuitively, for me it would refer to the subject item. (Maybe by a vague mathematical analogy with the domain of a function, although statements aren't functions) So it would be more intuitive for me to have something else like "range", "value" or "target". − Pintoch (talk) 15:21, 14 January 2019 (UTC)
- Comment as far as constraint checks are concerned: any kind of transitivity hurts performance (“type” and “value type” are among the most expensive constraint checks by far) and caching (we can’t invalidate all cached results if an item buried somewhere in the chain is edited, that’s why you’ll see “this result may be outdated” on some type / value type check results). --Lucas Werkmeister (WMDE) (talk) 17:27, 14 January 2019 (UTC)
- Thanks! Yes I can imagine this would be expensive to run. So it makes all the more sense to store this information outside the constraint system. I would still be interested in this to ease data imports. − Pintoch (talk) 20:07, 14 January 2019 (UTC)
Wrapping up edit
It seems to me that the remaining issues are:
- How to indicate the scope of the "transitivity" by restricting it to the descendants of a particular item. In my opinion this information is already stored in the property constraints so it is not worth duplicating. Jklamo, what do you think? Otherwise we could indicate the bounding class for transitivity with a qualifier, such as class (P2308) or including (P1012) (such as occupation (P106)value hierarchy propertysubclass of (P279)
class (P2308)occupation (Q12737077). Or with the target property as qualifier: occupation (P106)value hierarchy propertysubclass of (P279) subclass of (P279)occupation (Q12737077). - Finding previous work on the issue. ChristianKl, are you satisfied with my take on this above?
− Pintoch (talk) 14:10, 3 February 2019 (UTC)
- Support ChristianKl ❪✉❫ 14:47, 4 March 2019 (UTC)
- Thanks! I'm marking this as ready then, unless anyone has other concerns. − Pintoch (talk) 17:12, 4 March 2019 (UTC)
- If we're going to have a qualifier for constraining to a particular class, I think class (P2308) would be a better choice than subclass of (P279), as subclass of (P279) is a widely-used main-value-only property, while class (P2308) is already a qualifier used on properties pages. --Yair rand (talk) 04:55, 14 March 2019 (UTC)
- Makes sense! − Pintoch (talk) 15:39, 22 March 2019 (UTC)