Wikidata talk:Data model

Latest comment: 2 months ago by 5.147.163.199 in topic Bad example of a statement in introduction
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. For the archive overview, see Archive/. The latest archive is located at Archive/2024.

Questions about the data model edit

Subproperties of P31 and P279 edit

Why do we have subproperties of P31 and P279? Is it just because parent taxon (P171) was created before subclass of (P279)?

In the proposal discussion of subproperty of (P1647) Emw has very convincingly argued that creating subproperties of P31 and P279 is a "bad idea" and I would tend to agree since they just make consuming the data and reasoning about the data more cumbersome.

Is there any good reason why we shouldn't just migrate all usages of subproperties of P31 and P279 to P31 and P279 respectively and delete these subproperties?

--Push-f (talk) 08:48, 26 November 2022 (UTC)Reply

For instance, in subproperties you can add specific constraints. --Horcrux (talk) 10:58, 29 November 2022 (UTC)Reply
Right I don't think the ability to define property constraints justifies the existence of subproperties of fundamental Wikidata properties, since as I mentioned they make Wikidata significantly harder to consume, query and reason about. I think we should be able to express the same schemas via entity schemas ... yes entity schemas might not yet be as well integrated into Wikibase as property constraints but I think in principle they are the better tool to express such domain-specific constraints for fundamental properties since they don't require subproperties.
--Push-f (talk) 13:06, 29 November 2022 (UTC)Reply

Properties used for an "is a" relationship that aren't subproperties of P31 edit

We have many properties to express an "is a" relationship that aren't subproperties of instance of (P31), which is even more problematic than #Subproperties of P31 and P279 because it makes finding instances and reasoning about data even harder.

--Push-f (talk) 08:06, 13 December 2022 (UTC)Reply

Properties that can be both restrictive and non-restrictive when used as a qualifier edit

We currently have 4 properties that are an instance of both non-restrictive qualifier (Q61719274) and restrictive qualifier (Q61719275), namely:

This is very problematic because it means data consumers cannot easily determine if the qualifier is restrictive or non-restrictive since it depends on the value of the qualifier.

of (P642) is already in the process of being deprecated ... I think we should also work on disambiguating the other properties so that they are either clearly restrictive only or clearly non-restrictive only. I think the first step would be to work out how exactly these properties can be restrictive / non-restrictive by finding examples.

--Push-f (talk) 06:36, 14 December 2022 (UTC)Reply

Stating which properties are likely to be inherited in Wikidata? edit

As described in the Inheritance section some properties are almost never inherited like described by source (P1343), image (P18) or any external identifier while other properties are very likely to be inherited like has part(s) (P527), has characteristic (P1552), has cause (P828) and uses (P2283) (or the inverses of these properties).

I think it would make sense to introduce a "Wikidata property that is likely to be inherited" data item and express that properties are likely to be inherited via instance of (P31) and this new data item, so for instance has part(s) (P527)instance of (P31)Wikidata property that is likely to be inherited.

This would allow data consumers to infer for a given data item all the statements that are likely to be inherited by its superclasses. Note that with the introduction of my proposed negates property property, data consumers could also take into account if an instance or an intermediary parent class negates an inherited statement via one of the negating properties.

What do you think about this idea?

--Push-f (talk) 12:39, 29 November 2022 (UTC)Reply

On Wikidata value hierarchy property (P6609) is what specifies whether or not a given property is supposed to be inherited about a specific relation. ChristianKl18:28, 29 November 2022 (UTC)Reply
If I understand it correctly value hierarchy property (P6609) has the following semantics:
P1value hierarchy property (P6609)P2AP1BBP2CAP1C
However what I am speaking about is:
Ainstance of (P31)BBPxClikely APxC, or
Asubclass of (P279)BBPxClikely APxC
I don't think that what I'm talking about can be modeled via value hierarchy property (P6609) ... unless I am missing something.
--Push-f (talk) 19:25, 29 November 2022 (UTC)Reply

Ideas for the page edit

Explain P6609 edit

Transitivity is currently explained via Pinstance of (P31)transitive Wikidata property (Q18647515). It would be better to explain it via value hierarchy property (P6609). Ideally after that property has been relabeled to "transitive over", see Property talk:P6609#Naming ?.

--Push-f (talk) 07:46, 15 December 2022 (UTC)Reply

Relevant pages edit

I think you should refer to/link to the following at least:

You might want to also communicate with User:TomT0m as they have written considerably on related matters - for example User:TomT0m/Classification. ArthurPSmith (talk) 19:23, 29 November 2022 (UTC)Reply

@ArthurPSmith: Thanks for the suggestions!
--Push-f (talk) 10:46, 9 December 2022 (UTC)Reply

How many items have at least P31 or P279 or a subproperty of them? edit

I think this would be an interesting statistic to add to the Fundamental properties section.

However I am not quite sure how to write that COUNT query in SPARQL because WDQS omits a or rdf:type for wikibase:Item.[1]

Note: the exact number is of course "too much information" but we could mention the percentage of the number of total items (109,584,184).

--Push-f (talk) 08:28, 26 November 2022 (UTC)Reply

Other edit

Where these discussions should take place edit

I think you should discuss these topics at Wikidata talk:WikiProject Ontology, where you will find a wider audience. --Horcrux (talk) 10:58, 29 November 2022 (UTC)Reply

I am happy to announce every subsection I create here on Wikidata talk:WikiProject Ontology but would rather keep the #Questions about the data model here because Wikidata talk:WikiProject Ontology seems to be much less focused on the data model in general (most sections there are about specific instances of conflation or property misuse ... I'd like to establish this talk page as a discussion page specifically about the general data model of Wikidata, so that people can watch this page without getting item-specific quality issues into their watchlist).
--Push-f (talk) 13:06, 29 November 2022 (UTC)Reply
There is a page specifically called Wikidata:WikiProject Ontology/Modelling and a corresponding talk page, with what looks to me like a similar approach to the subject. Some ideas may even have been discussed earlier. Neither of those pages have been edited for the past two years, however, and if you want to start a fresh discussion then a new section may be warranted. I wasn't aware of random discussion pages being created entirely outside the WikiProject ones, which are arranged by topic. SM5POR (talk) 08:56, 8 December 2022 (UTC)Reply
I think discussions on talk pages should be relevant to the subject page. Wikidata:WikiProject Ontology/Modelling doesn't even mention "subproperty" a single time. Besides that page isn't a description of the status quo but some proposal and says things like "all classes are an instance of class class (Q16889133)" ... which clearly isn't the case and never has been. --Push-f (talk) 04:24, 9 December 2022 (UTC)Reply
Of course. But create another sub-page to Wikidata:WikiProject Ontology (or Wikidata:WikiProject Ontology/Modelling, as you prefer) and discuss your proposal on its corresponding talk page then. You aren't claiming that "Data Model" has nothing to do with Ontology or Modelling, or that this discussion relates equally to all WikiProjects and therefore cannot take place within any one of them, are you? Wikidata is ten years old and its pages have over time been generally arranged according to topic. Your topic is relevant and belongs in that structure, just as well as any other topic. The slash ("/") found in many page URLs isn't some impenetrable barrier to entry; it actually indicates "related subtopic".
It's not like you need to move your desk outside the office building (and sit in the rain on the parking lot) merely because the particular desk indoors you were initially referred to (among some 2,000 different desks in 200 offices in that building, most of them empty) happened to be cluttered with somebody else's paperwork, if you get my analogy. SM5POR (talk) 13:12, 9 December 2022 (UTC)Reply

Wikidata's fundamental data model edit

This page appears to be trying to define something called the fundamental data model of Wikidata. I take this to be a data model in the sense of https://en.wikipedia.org/wiki/Data_model but that applies throughout Wikidata, as opposed to a data model for a particular domain. It is definitely the case that this data model has not been defined at all well in Wikidata. I feel that this is a major problem for Wikidata. I feel that there are a number of reasons for this lack, including a lack of consensus on just what Wikidata data structures mean. The Wikidata ontology project is an attempt to provide a basis for the fundamental data model of Wikidata but it does not have much traction in Wikidata, as far as I can tell.

A new attempt at providing a firm foundation for modelling in Wikidata is welcome, but any attempt needs to cover at least the ground already covered and needs to take into account what is already being done in Wikkidata, including not only the ontological classes and properties but also modal qualifiers. This is not going to be easy.

There are several published papers on how to make sense of Wikidata. I point you to the work of Markus Kroetsch and his colleagues at Dresden, particularly a presentation that Markus made at WOP 2018 - https://iccl.inf.tu-dresden.de/web/Misc3058/en Peter F. Patel-Schneider (talk) 17:24, 9 December 2022 (UTC)Reply

Yes I agree that the lack of definition is problematic ... which is one of the reasons why I created this page.
Regarding qualifiers, I think if they're not an instance of restrictive qualifier (Q61719275) you should be able to safely disregard them and treat the statements as semantic triples.
Thanks for linking these slides, they look really interesting. The talk doesn't happen to be recorded somewhere, does it? I agree that it makes sense to build on published papers ... I haven't looked into what has been published yet, if you happen to know some papers, feel free to share them :)
--Push-f (talk) 21:51, 9 December 2022 (UTC)Reply

Bad example of a statement in introduction edit

The very first example of a statement in the introduction is "Tim Berners-Lee (Q80) employer (P108) CERN (Q42944)". For a Wikidata beginner, this looks wrong, because in fact CERN (subject) is the employer of Tim Berners-Lee (object) and not vice versa.

It might be that this property has been used like that for historical reasons and it would be inappropriate to rename it to "employed by" for whatever reason, but in this case, there should be some better example of a statement in the introduction which can be understood as true by everybody without knowing such background. -- Juergen 185.205.126.194 22:56, 26 February 2024 (UTC)Reply

Juergen - there is usually an implicit "is" or "has" if you want to turn a statement (subject) (predicate) (object) into a sentence - ie. in this case it would be "Tim Berners-Lee has employer CERN". Many Wikidata properties work this way, with a label that is not a verb but a noun indicating an attribute (property) of the subject item. ArthurPSmith (talk) 11:38, 27 February 2024 (UTC)Reply
OK, I see the correct interpretation of this property (and many others) is neither "(subject) is employer of (object)" nor "(subject) is employed by (object)" but "(subject) has employer (object)". Should we include this kind of interpretation rule in the introduction to improve understandability, together with an explanation of why the verb ("has" in this case) is regularly not part of the property name ?
Or would it in fact be appropriate to rename the property from "employer" to "has employer", removing the ambiguity and avoiding the need for the above explanations ? -- Juergen 5.147.163.199 04:30, 28 February 2024 (UTC)Reply
I don't think renaming properties would be wise or helpful - there are labels in 300+ languages to consider also! But if you find a good spot for updating the documentation somewhere I can see that a statement along these lines could be helpful. Maybe something on Help:Statements? ArthurPSmith (talk) 20:15, 28 February 2024 (UTC)Reply
If a property would be renamed for the sole purpose of clarification, this would happen without a change in meaning, allowing us to start with the English label without changing the 300+ other languages.
But it seems a more successful approach would be to explain why the property naming system is like it is: Why isn't the verb part of the property name and how to derive the appropriate verb from context where it is ambiguous insetad of obvious, which seems to be the case in the example in question. If you explain it to me here, I would try to add an explanation to the introduction targeted at beginners who have common sense but no Wikidata knowledge. This would be the right place because it makes the introduction comprehensible without more context.
The alternative to this approach would be to find a better example of a statement for the introduction which doesn't have these issues. -- Juergen 5.147.163.199 23:03, 28 February 2024 (UTC)Reply
Return to the project page "Data model".