Wikidata talk:WikiProject Ontology/Archive 3

Latest comment: 1 year ago by Mateusz Konieczny in topic non-physical entity problems

Data Quality Days: come and exchange on data quality processes on July 8-10

 

Hello everyone,

I'm happy to share with you the upcoming Wikidata event Data Quality Days 2022, taking place online on July 8th to 10th. Following up on previous and similar gatherings (Data Quality Days 2021, Data Reuse Days 2022), this event will focus on processes around data quality, and will provide a space to bring the Wikidata community and the Wikidata development team together. During 3 days of presentations, workshops and facilitated conversations, we will discuss how we are currently identifying and fixing incorrect data on Wikidata, how we could improve these processes to increase data quality, and what concrete measures we could put in place together, with policies, tools or documentation.

The event is open to everyone, no particular knowledge or experience needed, and will take place on the open source video conference platform Jitsi. On the event page, you will find some useful information, the list of sessions, and the list of participants where you can already sign up.

The program of the Data Quality Days 2022 is curated by the organizers (Léa Lacroix, Lydia Pintscher and Manuel Merz). Until June 19th, you can propose a presentation, a workshop or a discussion topic. These will be selected and grouped by the organizers, and the final schedule will be ready around June 27th.

If you have any questions, ideas or suggestions, or if you need support to propose a presentation, feel free to write on the talk page of the event or to reach out to me directly by email. I will also post updates on the talk page.

We're looking forward to discussing with you again about data quality! Cheers, Lea Lacroix (WMDE) (talk) 14:57, 2 June 2022 (UTC)

Avoiding real world object to be subclasses of mathematical ones

I think a reasonable ontological viewpoint in the Wikidata case in to make a distinction between mathematical objects and physical one.

But sometime we have a statement that links, says « park » to « multiset » through a subclass of statements.

@Infovarius: wanted to, I guess, express the fact that physical locations are 3 or 2 dimensionals by making them a subclass of mathematicals spaces with this property.

I think a way out is not to use « subclass of » in such cases but more with a relation like « can be modelled by » or « can be reprensented by » to link the real world object to a mathematical object that can model it. Maybe we already created it but I can’t remember. I post it to gather input if possible for a property creation if necessary.

a link to the discussion that prompted this message. author  TomT0m / talk page 15:47, 3 June 2022 (UTC)

Yes, I wanted to. And I don't against "mixing physical and mathematical objects". I believe that mathematics describe real world too. But I won't insist. As for properties, we have depicted by (P1299)/depicts (P180). --Infovarius (talk) 20:59, 4 June 2022 (UTC)
I agree it's not necessarily a problem to model physical items as instances/subclasses of mathematical ones, but if they are to be handled differently, I think manifestation of (P1557) is the best existing property. depicted by (P1299)/depicts (P180) seem to connote depiction in the artistic/media sense. Swpb (talk) 13:48, 6 June 2022 (UTC)
@Infovarius: to describe and to be are different things. A rectangle can model the screen I’m looking at, it does not mean that my screen is a rectangle. A mathematical rectangle is a perfect object with no roughness of any kind, that lives in an euclidean space, my screen is far from that … We know for example that physical space is not generally euclidean, thanks to Einstein and the general relativity … it’s an approximation.
@Swpb « rectangle » would be the artwork, my screen « manifestation of » rectangle would be the performance ? I don’t really think that fits. FRBR is for a specific domain … in arts, creators create artworks for people to read or perform. In science it’s the opposite, things are and models are there to try to understand them. Worse, take « gravity » for example. This is a force, hence a vector field in Newtonian physics, while it’s not at all a force in relativistic theory. Eventually it might turn out to be something different in some kind of new physics reuniting the relativistic and quantum world. Then … what is gravity ? It can not be both a vector field and not a vector field. It remains true that it can be modeled as a vector field without two much troubles in most conditions we encounter. It’s also true that it is something else (bending of spacetime) in relativistic physics.
There is legitimate articles about how science models things mathematical model (Q486902)     , let’s take that path here, please, as it’s a meaningful and well documented one.
We could also be mathematical platonists and claim that mathematical objects are real, but this does not entail that non mathematical objects are themselves mathematical one. author  TomT0m / talk page 14:32, 6 June 2022 (UTC)
I think you misread my comment. I was saying depicted by (P1299)/depicts (P180) are not good properties for this relation. Swpb (talk) 15:02, 6 June 2022 (UTC)
@Swpb I agree with that. I just don’t think using « manifestation of (P1557)   » is a good idea too. I’d propose something around the idea of « is a scientific model of » / « has model ».
There is also a notion of relation between pure math object as in mathematical model theory (Q467606)      but it’s still something different, a theory is a set of axioms, like the euclids one, and the « model of the theory » is a math universe in which the axioms are true, like the Euclidean plane. This is the opposite of the usual in science, a mathematical model is a set of rule that the real world follows more or less … author  TomT0m / talk page 15:11, 6 June 2022 (UTC)

Ambassadors structure

Hello, I have some questions about the correct way of modeling "ambassadors" structure, could you please give me your opinion on Wikidata_talk:WikiProject_Heads_of_state_and_government#Current_consensus_about_ambassadors_structure. — Metamorforme42 (talk) 08:42, 12 June 2022 (UTC)

Telegram group?

Should I create a Telegram group for talking about ontology things? Let me know if you would be interested in joining. Lectrician1 (talk) 19:29, 5 March 2022 (UTC)

I prefer onwiki discussions as they are most inclusive. Furthermore, particularly Telegram is a highly problematic service that I will definitely not use. —MisterSynergy (talk) 19:35, 5 March 2022 (UTC)
@MisterSynergy Most ontology discussions are very back-and forth making the wiki really annoying to use. Is there any other service you would suggest? I'm on Discord as well. Lectrician1 (talk) 21:00, 5 March 2022 (UTC)
Well, question is what you want to talk about.
Any discussion that advances some problem to a solution should be made and documented onwiki in order to maximize accessibility for participants and future interested users.
However, if you for instance want to teach other users about your position or about a complex problem, I think a verbal format with a smaller audience would be more suitable than anything that is done with writing, no matter which service is used. —MisterSynergy (talk) 21:28, 5 March 2022 (UTC)
Okay, I ended up creating the group. You can join here. This way we can brainstorm and get feedback on ontology things a lot faster. There have already been some lengthy ontology discussions in the main Telegram chat and Wikiproject Music one in the past, so having our own should help avoid populating those chats.
Most of the Wikidata community is already on Telegram, so I'd highly recommend that you join @MisterSynergy @Swpb @ChristianKl @TomT0m Lectrician1 (talk) 21:21, 30 March 2022 (UTC)

Please keep things on-wiki wherever possible. Off-wiki discussions aren't easy to find, and can easily exclude parts of the community. Thanks. Mike Peel (talk) 17:31, 16 July 2022 (UTC)

For ontology discussions it's important to have on Wiki documentation that allows people to follow why decisions where made in a certain way, I oppose to have any conversation that pretends to be about making decisions to be happening on Telegram. The new talk pages format already makes it easier to have faster discussions on Wiki. ChristianKl14:19, 19 July 2022 (UTC)

items that are both instances and subclasses of the same class

There are a lot of items that are both direct instances and direct subclasses of the same class. Right now, jam (Q1269) is both an instance and subclass of food (Q2095) and yellow (Q943) is both an instance and subclass of color (Q1075). Both of these are problems. Even more problematic is Frederik Warburg (Q109406616) being both an instance and subclass of human (Q5). There are lots of cases, including ones in food, materials, science, sport, geography, social analysis, literature, art, botany, publishing, geology, color (lots more besides yellow), computing, and genetics. A simple query for items that are direct instances and direct subclasses of the same class has just found over 1000 results. (Trying to retrieve 2000 results results in a timeout.) I expect that the number of items that are both (possibly indirect) subclasses and (possibly indirect) instances of the same class is very large.

Genetics is particularly troubling as it seems almost as if all genes are both instances and subclasses of gene.

Is there a place to discuss this as the data quality meeting this weekend? I wasn't going to attend but I might be able to listen to a discussion of this topic. Peter F. Patel-Schneider (talk) 15:37, 8 July 2022 (UTC)

@Peter F. Patel-Schneider: Not a new question - Wikidata:WikiProject Ontology/Problems/instance and subclass of same class and related pages have been up for many years here now. I've occasionally spent time cleaning up some of these, but other data cleaners are of course also welcome! Wikidata:Events/Data Quality Days 2022 doesn't have a session specifically on ontology, but there are several open-topic sessions and ontology was mentioned in the introduction earlier today. ArthurPSmith (talk) 18:04, 8 July 2022 (UTC)
@ArthurPSmith Interesting. How can the million-plus items in genetics be fixed? Also, I tried to run the query that generates the report and, as expected, it timed out. There are other related problems as well, such as liqueur (Q178780), which is a direct instance of alcoholic beverage (Q154) as well as an indirect subclass of it through spirit drink (Q17562878). This would not show up in the report you mention but is nonetheless bad modelling. Peter F. Patel-Schneider (talk) 18:34, 8 July 2022 (UTC)
@Peter F. Patel-Schneider: Hmm, it looks like that report hasn't updated in about a month, likely related to the timeout you are seeing now. On the genetics items - we would need to persuade the people who run those bots to change how they represent those items. Wostr has a new proposal to fix a similar issue with the class hierarchy for chemical entities. ArthurPSmith (talk) 19:52, 8 July 2022 (UTC)
@ArthurPSmith Agreed, except that at some point there needs to be pushback that the representation of the items violates the Wikidata ontology in such an egregious manner that if nothing is done the items will be removed. To do otherwise is to let chaos reign. I view the gene situation as having passed that point by a country mile. It is modelling problems like these that make me tell people that Wikidata has to be used with great care and if you can't afford the significant cost of that care then it is better to not use Wikidata. A similar situation exists with respect to colours. Peter F. Patel-Schneider (talk) 20:00, 8 July 2022 (UTC)
It takes actual work to think through a coherent system of instances and subclasses for a domain. I do some work on anatomical entities, and it takes a lot of nontrivial decisions. When it comes to genes, there are likely less corner cases but it's a discussion to be had in https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology before running any bot. ChristianKl12:01, 19 July 2022 (UTC)

Ambiguity of the « is an instance » terminology

In the previous section, but also in older discussions, we often can read the expression « is an instance » / « is a class » terms. But it is ambiguous as a class can be an instance of a metaclass …

Take for example :

So depending on the context death (Q4) can be seen as both an instance of something, or as a class of event …

so when we say something is an instance we should usually say in which context … Using this terminology may lead to confusion is hour guidelines if we retain the ambiguity.

On the other hand there are stuff like death of Phillip Walters (Q5247510) who cannot possibly have instances because they are real world object or process or whatever. I propose to name them « token » in reference to the type–token distinction (Q175928). Classes whose only instances would be named « token-class », and meta-classes whose instances are token-class something else, like « token-token-class », « 2nd order class » or something like that. author  TomT0m / talk page 16:39, 29 March 2022 (UTC)

Well death (Q4) seems conflated honestly. It should probably be a subclass of (P279) of biological process (Q2996394).
Doing this is going to be a pretty big change and require overhauling a LOT of documentation, particularly on Help:Basic membership properties.
I'm much more in-favor of banning metaclasses and just making "instance" equivalent to "token".
Just another random idea that will never pass: instead of having "instance of" and "subclass of" properties, we could have a "is a" property and then an "entity type" property whose values could be either "instance", "class", or some other metaclass. Lectrician1 (talk) 17:15, 29 March 2022 (UTC)
You want to ban metaclasses but retain the terminology ? That’s a weird way to ban them. author  TomT0m / talk page 18:00, 29 March 2022 (UTC)
It should maybe mentioned here that "metaclasses"—as difficult as this concept is—is not an invention of the Wikidata community. This is a necessity in various other ontologies as well in order to build a useful graph. No need to consider abandoning it here. —MisterSynergy (talk) 19:32, 29 March 2022 (UTC)
Well no, ban making classes instances of metaclass (basically classes can only have a subclass of (P279) statement). would be a solution for our current system's metaclass/instance confusion.
That alternative solution I mentioned was completely separate and something we could do to retain metaclasses - but it would require changing the entire class system of Wikidata which would not be fun. Lectrician1 (talk) 19:40, 29 March 2022 (UTC)
And would likely reinvent the wheel by trying to bury it. author  TomT0m / talk page 06:07, 30 March 2022 (UTC)
  • @TomT0m: So for some of the most commonly used classes in Wikidata, an instance of human (Q5) would clearly be a "token". But what about scholarly article (Q13442814)? Instances of star (Q523) I think would also be "tokens". Are instances of organization (Q43229)? gene (Q7187)? position (Q4164871)? I think for example elementary particle (Q43116) has been nicely organized so it is a class and all the specific particles like electron (Q2225) are subclasses, so their instances (none of which would ever have a Wikidata item) would be tokens. Although the fundamental physical concept that specific elementary particles are literally indistinguishable would make them kind of strange tokens. ArthurPSmith (talk) 17:12, 30 March 2022 (UTC)
    The notion of token does not depend on whether there is or not a Wikipedia item on it actually, and that’s why I think it’s interesting. It does not depend (much) on the model we chose.
    Organisations are concrete stuff, they have a birth and a death, they act on the real world. Even if they are more an aggregation/composition of other tokens than things you can touch.
    I agree we’re starting to hit a problem with abstract stuffs like numbers. Works fits in this scheme, although it’s less obious. There is concrete stuffs that corresponds to works, when you read a book, when a play is played, when a movie is watched. It then make sense to think of work as a class of experiences. Different adaptations of a musical can be seen as a class of experiences easily. Or traditional stories, like « snow white » :  it’s been telled a lot of time in a lot of ways with different endings or variations, all can be regrouped on a more vague class corresponding to the general experience, whereas those who as been told following the text of a specific book are a subclass of them. author  TomT0m / talk page 18:19, 30 March 2022 (UTC)
  • I would say that we have 0-order items (Albert Einstein's brain (Q2464312)), 1-order items (brain (Q1073)), 2-order items (organ type (Q103812529)), 3-order items (direct anatomical metaclass (Q103997018)). We can call the 0-order items instances or tokens. We can call the 1-order items classes and the higher order items metaclasses.
For me the difference between a 0-order item and a 1-order item is that the 1-order item can have subclasses. brain (Q1073) has human brain (Q492038) down it's subclass tree, but it would be possible to go even further in subclassing it. We could have "male human brain" as a subclass. 0-order items however can't be meaningfully subclassed and that's what makes them different. Among 1-order items some of them have have instances but that's not central. ChristianKl17:45, 30 March 2022 (UTC)
@ChristianKl Metaclasses can have subclasses. So this is not really a good definition. For example you could subclass « isotope » (2-order item) with « stable isotope » or « radioactive isotope ». And the « there will be no instance of atom in Wikidata so Atom is an instance not a class » do not work either as we can subclass « Carbon » with an isotope like « C14 ». So « Carbon » is a 1-order item. But not only because it can have a subclass, as 2-order items can have subclasses too. Not enough. author  TomT0m / talk page 19:08, 30 March 2022 (UTC)
@TomT0m: Whether or not there's a C14 atom that's notable enough to be it's own instance in Wikidata doesn't change the fact that there are instances of the class of C14 atoms and thus it's part of the nature of C14 to be a class. ChristianKl14:11, 19 July 2022 (UTC)
@ChristianKl I can’t make sense to the sentence « it’s part of the nature of C14 to be a class ». author  TomT0m / talk page 14:50, 19 July 2022 (UTC)
A class is something that has instance. C14 is an entity that has a lot of instances in the real world. Thus it's nature is a being a class. ChristianKl15:14, 19 July 2022 (UTC)
Completely agree! Lectrician1 (talk) 18:00, 23 July 2022 (UTC)
It is very seductive to require that every class has an order. But there are two problems with this. First, what does one do with the class of all classes? This class does not have an order so you would have to exclude it from Wikidata. But there is also the problem that you have to determine the order of each and every class. What is the order of gene (Q7187)? It is a first-order class or a second-order class? One way of out this difficulty is to just not state the order, but require that there be one. So gene (Q7187) is a first-order class if its instances don't have their own instances and an higher-order class otherwise. This gives some flexibility while still preserving some regularity, but does have the consequence that, for example, all genes would switch from tokens to first-order classes when a single gene gets an instance. And is also just reflects the situation in Wikidata, not any external reality. So maybe the best way forward is to allow classes without an order (like class itself or the class consisting of the classes mentioned in this response) but also allow an order to be specified, so that gene (Q7187) could be stated to be a second-order class (along with quite a few other classes like ship type (Q2235308)). Peter F. Patel-Schneider (talk) 15:53, 8 July 2022 (UTC)
"First, what does one do with the class of all classes? This class does not have an order so you would have to exclude it from Wikidata." this sounds like a thought that comes from not thinking in terms of Wikidata's needs. I don't think we currently have an item for that and also don't know why we would want an item for that.
If individual genes are supposed to be subclass of (P279) of gene (Q7187) then it's a first-order class. If they are supposed to be instance of (P31) of gene (Q7187) it's a second-order class. The fact that we currently have not decided for either of those, means that instance of (P31) and subclass of (P279) get mixed and sometimes both are used. Solving the problem likely requires a separate item for the first-order and the second-order class (that relate via is metaclass for (P8225)). ChristianKl12:10, 19 July 2022 (UTC)
@ChristianKl Wikidata has quite a number of variable-order classes, including class (Q16889133) itself, but also metaclass (Q19478619), variable-order class (Q23958852), class (Q23960977), entity (Q35120), concept (Q151885), and fixed-order class (Q23959932). Should they all be removed?
I agree that gene (Q7187) should be fixed. Peter F. Patel-Schneider (talk) 17:51, 1 August 2022 (UTC)
@Peter F. Patel-Schneider It’s a bit weird to require that Wikidata does not reflects external reality when it’s kind of the whole point. The point of modelling is to reflect external reality. author  TomT0m / talk page 15:49, 1 August 2022 (UTC)
@TomT0m What external reality is not being reflected here? Peter F. Patel-Schneider (talk) 15:55, 1 August 2022 (UTC)
As you said, human brain (Q492038) can be subclassed into "male human brain" and "female human brain" for instance, thus invalidating the order number of the class. In general, I think you can always make up further subclasses of a class simply by specifying more constraints; i.e. "male human brain" -> "infant male human brain" -> "japanese infant male human brain" -> "japanese infant male human brain existing in 1920" etc. This makes specifying order for classes pointless (aside from instances which, as you note, are 0-order). Silver hr (talk) 14:15, 3 August 2022 (UTC)
I find it easiest to think about classes and instances in terms of mathematical sets. Specifically, an element of a set is analogous to an instance of a class, and one set being a subset of another is analogous to one class being a subclass of another. The key thing to note is that a set can at the same time be a subset of one set and an element of another, which corresponds to a WD item being a class and an instance at the same time. The exemplar for this, I think, is biological taxonomy: all of us humans are members of the set/instances of the class Homo sapiens sapiens (Q3238275) and through the set/class tree members/instances of Biota (Q2382443). At the same time, the set/class Homo sapiens sapiens (Q3238275) when viewed not as a set/class, but as an abstract individual is a member of the set/instance of the class taxon (Q16521).
It is also worth noting that RDFS uses sets to define classes, with the difference that while set equality is defined purely in terms of membership, class equality isn't. ("Associated with each class is a set, called the class extension of the class, which is the set of the instances of the class. Two classes may have the same set of instances but be different classes.")[1] Silver hr (talk) 14:48, 3 August 2022 (UTC)

Should billionaire be used as a value of P31

I've launched a new discussion in the project chat : Wikidata:Project_chat#Should_milliardaire_(Q1062083)_be_used_as_a_value_of_nature_de_l'élément_(P31)?. - - PAC2 (talk) 06:24, 28 July 2022 (UTC)

High level Geographic classes

We've got a very longstanding confusing mess with the high level geographic classes. The current setup seems to be:

There's also the (now mostly unused) item, that has at various times messed things up:

And geographic coordinate system (Q22664), which is a subclass (indirectly) of frame of reference (Q184876) which eventually ends up at abstract entity (Q7184903), which is probably fine.

There's also the messy hierarchy off of geographic entity (Q27096213) which should get cleaned up.

And there are probably more mess I haven't noticed yet. JesseW (talk) 16:37, 31 July 2022 (UTC)

Found another one, physico-geographical object (Q20719696) which is a subclass of the usual main class, geographical feature (Q618123), but also physical object (Q223557). This might be useful, although I'm not sure the distinction is actually workable, but it isn't widely used enough, yet, I think. JesseW (talk) 16:54, 31 July 2022 (UTC)

And another problematic subclass statement on land use (Q1165944). See Talk:Q1165944#Problematic_subclass_of_spatial_planning_(Q149013) for details. JesseW (talk) 17:02, 31 July 2022 (UTC)

I think that geographic coordinate system (Q22664) is actually an abstract thing, not some physical object so it looks fine to me Mateusz Konieczny (talk) 21:30, 31 July 2022 (UTC)
Yeah, I agree that geographic coordinate system (Q22664) is fine. JesseW (talk) 22:59, 31 July 2022 (UTC)
It seems that geographic entity (Q27096213) has changed definition a few times; it used to be between geographic location (Q2221906) and geographical feature (Q618123) in the class tree, which didn't make much sense so I made some changes and eventually Infovarius made it a direct subclass of entity (Q35120). Still not sure what to do with it; it seems it is now used for anything related to geography. ―Jochem van Hees (talk) 22:10, 1 August 2022 (UTC)
I'd be fine with just merging the whole geographic entity (Q27096213) hierarchy into the main hierarchy, but I want to wait to hear from @Infovarius: and anyone else invested in it first. JesseW (talk) 00:52, 2 August 2022 (UTC)

untitled (Q75320653) classified as an event (Q13464614 ceramics have own issues)

ceramic art (Q13464614)

plastic arts (Q1078913)
sculpture (Q17310537)
visual arts (Q36649)
art (Q735)
process (Q3249551)

Mateusz Konieczny (talk) 21:11, 1 August 2022 (UTC)

Lublin County (Q912777) classified as an object that exists outside physical reality - cebwiki attacks again

Wikimedia duplicated page (Q17362920) - it seems to be metadata about Wikidata entry, not about corresponding real object... And easy solution of merging will fail as cebwiki apparently generated the same entry multiple times

Wikimedia article page (Q15138389)
open content (Q1293664)
content (Q12488383)
abstract entity (Q7184903)
abstract entity (Q7048977)

Mateusz Konieczny (talk) 21:13, 1 August 2022 (UTC)

Maybe cebwiki entry only should get this? Mateusz Konieczny (talk) 10:37, 2 August 2022 (UTC)

"Field" properties

Properties such as field of work (P101) and field of this occupation (P425) have been around for several years and have proven very useful to establish a relationship between people, organizations, occupations and their respective domains of activities. However there are other types of things that would benefit from being associated with a field, for example, awards. I submitted a proposal for a "field of this award" property. This property proposal received opposition because it is deemed to be too specific. Discussion participants proposed a more generic property instead, and one participant is suggesting the merger of field of work (P101) and field of this occupation (P425). A second proposal for a "field of this item" property was made and the discussion on a possible merger is happening there. The thoughts of participants in this WikiProject would be more than welcome. Fjjulien (talk) 15:47, 2 August 2022 (UTC)

We need your help the Wikidata for Education project

Dear Wikimedians, As we are in the next stage of Wikidata for Education project after a broader consultation with the Wikidata community and experts on ed tech, OER, global and national curriculum, education policy, and digitization. Over the two rounds of the consultation, we received input from 31 individuals representing various global perspectives and areas of expertise to aid in the full implementation of the Wikidata for Education project. We are reaching out to you today because we know you have experience, expertise and interests that are very relevant to a project the Wikimedia Foundation Education Team is collaborating on with UNESCO and local partners in Ghana, which will establish a precedent for adding curriculum data to Wikidata. You can learn all about this project here. Dnshitobu (talk) 12:43, 1 September 2022 (UTC)

IEEE Taxonomy and Thesaurus

Please comment about IEEE Thesaurus (Q113673155) at Talk:Q113673155 -- Vladimir Alexiev (talk) 13:02, 1 September 2022 (UTC)

All watercraft classified as abstract objects

USS Niagara (Q7872265) as starting point

museum ship (Q575727)

ship (Q11446)
watercraft (Q1229765)
float (Q50380212)
support (Q1058733)
physical interface (Q64830866)
interface (Q110558466)
means (Q12894677)
cause (Q2574811)
source (Q31464082)
abstract entity (Q7184903)
abstract entity (Q7048977)

See https://www.wikidata.org/wiki/User:Mateusz_Konieczny/failing_testcases for other similar detected failing cases (feel free to edit/copy that page if you want)

If such issues will be fixed or revealed to be mistakes in how I process wikidata - I would be happy to post more if anyone here wants more examples of broken transitive classification.

Mateusz Konieczny (talk) 12:09, 1 August 2022 (UTC)

Looks like the bug there is the introduction of "float" (and more specifically, "support", that is already marked as deprecated). I (or you, if you want) can remove those, which should fix this issue. Feel free to bring up more! JesseW (talk) 12:57, 1 August 2022 (UTC)
Fixed in https://www.wikidata.org/w/index.php?title=Q50380212&diff=1693653958&oldid=1642512254 ! Mateusz Konieczny (talk) 16:31, 1 August 2022 (UTC)

  Mateusz Konieczny (talk) 01:24, 5 October 2022 (UTC)

Marian Column in Kłodzko (Q3894014) classified as an event

Maria column (Q2713614)

Marian and Holy Trinity column (Q1549521)
wayside shrine (Q3395121)
small monument (Q3370053)
monument (Q4989906)
recognition (Q7302601)
thanking (Q83493482)
occurrence (Q1190554)

this affects monuments in general Mateusz Konieczny (talk) 16:32, 1 August 2022 (UTC)

Fixed by removing recognition (Q7302601) (and "cultural heritage", although that created less problems). JesseW (talk) 17:46, 1 August 2022 (UTC)

  Mateusz Konieczny (talk) 01:20, 5 October 2022 (UTC)

Sameness of concepts

I think it might be a good idea to establish best practices on how to deal with the sameness of concepts on Wikidata. I've investigated this recently and here's my understanding.

  1. When there is a concept represented by a WD item and an identifier in another ontology, there are the following properties:
    1. For classes, equivalent class (P1709), declared to be the equivalent of owl:equivalentClass.
    2. For properties, equivalent property (P1628), declared to be the equivalent of owl:equivalentProperty.
    3. For instances, I was unable to find a property.
      1. Is there a dedicated property for instances that I missed? Should there be one?
      2. Should we use equivalent class (P1709)?
      3. Should we use the superproperty of equivalent class (P1709), exact match (P2888), declared to be the equivalent of skos:exactMatch?
  2. When there is a concept represented by two WD items, there is the following:
    1. Merging, which causes one item to be redirected to another. This gets exported to RDF as owl:sameAs.
    2. permanent duplicated item (P2959), when merging isn't possible.
  3. When there are two concepts, represented by two WD items, for which some sources claim that they are the same concept, there is said to be the same as (P460).

Regarding references: typically, statements on Wikidata are about the concepts items represent, and I am of the opinion that such statements should have references. However, identifier statements, and statements made with the properties in points 1 and 2, are different: they are statements about the items themselves, and in my opinion, it makes no sense to require (or even allow) references for them. The reason being, unlike statements about concepts which are claimed by people (sources) outside Wikidata, it is the WD editors themselves that are making the claims about WD items (and every such edit is attributable through the revision history).

Note that the outlier here is said to be the same as (P460). While seemingly a property to indicate the sameness of concepts, it is crucially different from the other properties mentioned earlier: it relates concepts, not WD items. Its purpose is to record statements of concept equality published by people outside Wikidata.

Finally, when the WD editor is reasonably certain, but not completely sure that two WD items represent the same concept, this could be indicated with a nature of statement (P5102)hypothesis (Q41719) qualifier on a permanent duplicated item (P2959) statement.

Does anyone have any suggestions? Have I left anything out? Silver hr (talk) 00:44, 5 October 2022 (UTC)

Farragut Houses (Q22329573) - claimed to be nonphysical entities

en: public housing (residential properties usually owned by a government) [2]

en: rented accommodation [3]
en: renting (agreement where a payment is made for the temporary use of a good, service or property owned by another) [4]
en: contract (agreement having a lawful object entered into voluntarily by multiple parties (may be explicitly written or oral)) [5]
en: agreement (understanding between entities to follow a specific course of conduct) [6]
en: consensus (general agreement on a subject) [7]
en: relation (general relation between different objects or individuals) [8]
en: abstract object (object with no physical referents) [9]
en: non-physical entity (object that exists outside physical reality) [10]

Mateusz Konieczny (talk) 01:21, 5 October 2022 (UTC)

rental housing (Q4315279) shouldn't be a subclass of renting (Q157171). That is the means, not the parent. wd-Ryan (Talk/Edits) 00:18, 6 October 2022 (UTC)
Applied in https://www.wikidata.org/w/index.php?title=Q4315279&diff=1744601839&oldid=1714379587 Mateusz Konieczny (talk) 17:32, 6 October 2022 (UTC)

  Mateusz Konieczny (talk) 17:43, 6 October 2022 (UTC)

Tomb of Darius II (Q5952161) - tomb is an event, apparently

en: rock-cut tomb (tomb cut out in rock) [11]

en: rock-cut architecture (creation of structures, buildings, and sculptures by excavating solid rock) [12]
en: architecture (both the process and product of planning, designing and construction) [13]
en: design (creation of a plan or convention for the construction of an object or a system; process of creation; act of creativity and innovation) [14]
en: process (series of events which occur over an extended period of time) [15] banned as it is an event

Mateusz Konieczny (talk) 17:42, 6 October 2022 (UTC)

  Mateusz Konieczny (talk) 12:07, 9 October 2022 (UTC)

Grunwald Monument (Q11823211) - claimed to be an event (process, not the resulting work)

en: equestrian statue (statue of a rider mounted on a horse) [16]

en: equestrian portrait (art genre that shows the subject on horseback) [17]
en: animal art (artistic theme of reproducing animals in art) [18]
en: figurative art (art that depicts real object sources) [19]
en: visual arts (art form which creates works that are primarily visual in nature) [20]
en: art (the process of creating an expressive work intended to be appreciated for its beauty or emotional power; NOT the resulting work) [21]
en: process (series of events which occur over an extended period of time) [22]

Mateusz Konieczny (talk) 01:23, 5 October 2022 (UTC)

  Done Conflation of object and genre of object. عُثمان (talk) 18:18, 16 October 2022 (UTC)

market hall is skill

en: market hall (covered space traditionally used as a marketplace) [23]

en: marketplace (space in which a market operates) [24]
en: retail environment (type of environment) [25]
en: land use (characterization of land based on what can be built on it and what the land can be used for) [26]
en: spatial planning (technique for physical organisation of space) [27]
en: planning (process of determining the activities required to achieve a desired goal) [28]
en: process (series of events which occur over an extended period of time) [29] banned as it is an event !!!!!!!!!!!!!!!!!!!!!!!!!!
en: skill (learned ability to carry out a task) [30]

(rooted in Industry City (Q5001422)) Mateusz Konieczny (talk) 08:09, 7 October 2022 (UTC)

  Done Required detangling land use (process of using land) from types of place used in a particular way. عُثمان (talk) 18:11, 16 October 2022 (UTC)

light rail system (Q1268865) is data visualization (Q6504956)

en: light rail (typically an urban form of public transport using steel-tracked fixed guideways) [31]

en: rapid transit (high-capacity public transport generally used in urban areas) [32]
en: urban rail transit (term for various types of local rail systems) [33]
en: public transport network (network for public transport) [34]
en: transport network (physical spacial network for vehicle movement and transportation of goods over thoroughfares between multiple locations) [35]
en: geographic network (concept in geography) [36]
en: spatial network (graph in which the vertices or edges are spatial elements associated with geometric objects) [37]
en: graph (mathematical structure; representation of a set of objects where some pairs of the objects are connected by links) [38]
en: multigraph (graph which is permitted to have multiple edges) [39]
en: hypergraph (a graph in which generalized edges may connect more than two nodes) [40]
en: diagram (plan, drawing, sketch or outline to show how something works or the relationships between the parts of a whole) [41]
en: data visualization (creation and study of the visual representation of data) [42]

So Kraków Fast Tram (Q1814872) is also data visualization (Q6504956)

Mateusz Konieczny (talk) 08:11, 7 October 2022 (UTC)

Have tentatively fixed this, but not 100% sure if I did it right عُثمان (talk) 13:46, 15 October 2022 (UTC)

ceramic picture on the wall is non-physical entity

untitled (Q75320653)

en: ceramics (art objects such as figures, tiles, and tableware made from clay and other raw materials by the process of pottery) [43]

en: plastic arts (form of art form based on the creation and modification of physical objects) [44]
en: sculpture (manufacture of sculptures in arts and crafts) [45]
en: visual arts (art form which creates works that are primarily visual in nature) [46]
en: art (the process of creating an expressive work intended to be appreciated for its beauty or emotional power; NOT the resulting work) [47]
en: process (series of events which occur over an extended period of time) [48] banned as it is an event !!!!!!!!!!!!!!!!!!!!!!!!!!
en: arts (human expression and creativity, usually influenced by culture) [49]
en: humanities (academic disciplines that study human society and culture) [50]
en: knowledge (mental possession of information or skills, contributing to understanding) [51]
en: memory (information stored in the mind, including facts, knowledge, skills, and episodic memories) [52]
en: content (matter or entity that is contained) [53]
en: abstract object (object with no physical referents) [54]
en: non-physical entity (object that exists outside physical reality) [55] banned as it is an object that exists outside physical reality !!!!!!!!!!!!!!!!!!!!!!!!!!

(here two branches lead to two different suspect classifications)

Mateusz Konieczny (talk) 08:13, 7 October 2022 (UTC)

  Done It seems like there are many conflations of art genres, methods, media, and areas of work, with the types of works themselves. I fixed the mural item so that its is a type of painting (object) rather than method of painting (process), and moved the ceramics item to a statement about genre. عُثمان (talk) 17:48, 16 October 2022 (UTC)

mountain chain (Q2624046) is not a geographical feature (Q618123)

Not sure if this is the place to report.

I've found a an instance of mountain chain (Q2624046) which produces a warning in https://www.wikidata.org/wiki/Q97288049#P706 . At first seems Q2624046 is well defined so I don't understand what the problem is. —Ismael Olea (talk) 16:36, 16 October 2022 (UTC)

@Olea: This warning was due to Conjunto de Máquina de Vapor fija de extracción minera (Q97288049) itself not being an instance of a geographic feature rather than the mountain chain. I added instance of "mine" which should fix this. (I did remove a redundant statement from mountain chain, but that was not causing any issue.)
Granted, I am not sure why this item is instance of "steam engine." Without speaking Spanish, I am not sure what the sources are saying about this so I left that statement alone. Is this supposed to be an instance of a certain type of mine that uses steam equipment? -عُثمان (talk) 17:28, 16 October 2022 (UTC)
I can confirm it is the machine. The previous mine statement was an error from me, after a suboptimal data treatment. What is true is the machine is in an abandoned mine but the heritage item is the machine. —Ismael Olea (talk) 18:16, 16 October 2022 (UTC)
Oh, now I understand what the warning I reported is about. I didn't understood it was about Q97288049. Sorry. —Ismael Olea (talk) 18:18, 16 October 2022 (UTC)
I was confused myself at first! Good fix, stationary steam engine makes sense. عُثمان (talk) 01:56, 17 October 2022 (UTC)

Cruzeiro de Santa Cruz (Q63895140) is an object that exists outside physical reality

type https://www.wikidata.org/wiki/Q2309609 en: wayside cross (cross by a footpath, track or road) [56]

en: Christian cross (symbol of Christianity) [57]
en: cross (geometrical figure) [58]
en: geometric shape (geometric information which remains when location, scale, orientation and reflection are removed from the description of a geometric object) [59]
en: set (well-defined mathematical collection of distinct objects) [60]
en: formalization (automated representation of a system) [61]
en: representation (role, function or property of an abstract or real object, relation or changes) [62]
en: relation (general relation between different objects or individuals) [63]
en: abstract object (object with no physical referents) [64]
en: non-physical entity (object that exists outside physical reality) [65] banned as it is an object that exists outside physical reality !!!!!!

@عُثمان: - thanks so much for your fixes! I have not looked in detail, but once I rerun test cases it will check whether something else unwanted is present. Do you want to be pinged once I find more cases like this? Or do you prefer to just watchlist this page? (or do something else)?

Mateusz Konieczny (talk) 22:41, 16 October 2022 (UTC)

@Mateusz Konieczny Pings are helpful, sometimes I find it hard to keep track of updates otherwise. عُثمان (talk) 23:37, 16 October 2022 (UTC)
  Done Above issue addressed by reconfiguring around the existing cross (Q21550515) item for physical crosses rather than the symbolic or geometric concept of crosses.
In the interest of documenting on items why certain statements cause problems for ontology, I have created a few items to use as values for "reason for deprecated rank." It is not always obvious where there is a different item for a class of concepts and a class of objects, so maybe this will help show the distinction more clearly.
Feel free to adjust the labels on these. عُثمان (talk) 02:17, 17 October 2022 (UTC)

Green Line (Q3720557) is a scalar magnitude (Q28733284) and process (Q3249551)

type https://www.wikidata.org/wiki/Q15079663 en: rapid transit railway line (type of railway line) [66]

en: rapid transit train service (type of train service) [67]
en: passenger train service (passenger transport by a specific train following a specific route at regular times) [68]
en: train service (passenger or good transport by a specific train following a specific route at regular times) [69]
en: public transport (shared transportation service for use by the general public) [70]
en: transportation [71]
en: displacement (vector that is the shortest distance from the initial to the final position of a point P) [72]
en: length (measured dimension of an object in a physical space) [73]
en: scalar magnitude (inherently non-negative scalar measurement or quantity, with non-arbitrary zero point; a measure of the "size" of something) [74]
en: measure (function assigning numbers to some subsets of a set, which could be seen as a generalization of length, area, volume and integral) [75]
en: additive function [76]
en: additive object (additively-composing abstract object) [77]
en: abstract object (object with no physical referents) [78]
en: non-physical entity (object that exists outside physical reality) [79] banned as it is an object that exists outside physical reality !!!!!!!!!!!!!!!!!!!!!!!!!!
en: process (series of events which occur over an extended period of time) [80] banned as it is an event !!!


@عُثمان:

Mateusz Konieczny (talk) 12:49, 17 October 2022 (UTC)

United States of America (Q30) is a statement

type https://www.wikidata.org/wiki/Q99541706

en: historical unrecognized state (nonexistent state that lacked recognition during its period of existence) [81]

en: historical country (country, state or territory that once existed) [82]
en: historical administrative division (administrative division which existed in the past, that may or may not still exist. (Use subclass Q19953632 for divisions which no longer exist).) [83]
en: historical fact [84]
en: fact (statement in accordance with the real world) [85]
en: statement (meaningful declarative sentence that is either true or false, or that which a true or false declarative sentence asserts) [86]
en: proposition (non-linguistic meaning of a sentence) [87]
en: declarative sentence (declaration) [88]
en: sentence (textual unit consisting of one or more words that are grammatically linked, expressing a complete thought in non-functional linguistics) [89]
en: semantic unit (linguistic unit that carries meaning) [90]
en: constituent (word or a group of words that functions as a single unit within a hierarchical structure) [91]
en: emic unit (type of abstract object analyzed in linguistics) [92]
en: linguistic unit (unit of language) [93]
en: mental object (object whose space of existence is the mind; item that is thought of as being "in" the mind, and capable of being formed and manipulated by mental processes and faculties: thoughts, concepts, memories, emotions, percepts and intentions) [94]
en: abstract object (object with no physical referents) [95]
en: non-physical entity (object that exists outside physical reality) [96]

Mateusz Konieczny (talk) 12:51, 17 October 2022 (UTC)

tribe vs human

See https://www.wikidata.org/wiki/Wikidata:Project_chat#Loophole? and https://www.wikidata.org/wiki/Wikidata:Project_chat#Example_of_a_found_problem:_Tulalip_Tribes_of_Washington_(Q1516298)_-_tribe_is_human_according_to_Wikidata Mateusz Konieczny (talk) 22:02, 17 October 2022 (UTC)

Both genre and occupation

Only 27, but some items are both a genre (Q483394) and an occupation (one of occupation (Q12737077), profession (Q28640), position (Q4164871)). This doesn’t seem plausible (examples include Ganguro (Q250160), social media (Q202833), or furniture construction (Q1957814)), but currently I don’t have time to investigate this, thus I post it here instead. --2A02:8108:50BF:C694:6C06:D6A3:9A84:953F 10:05, 25 October 2022 (UTC)

Occupations/professions should only be things like "photographer" instead of "photography". I think the problem lies in economic activity (Q8187769), where occupation is the parent class. This shouldn't be the case because occupations are labels and not the actual activity. -wd-Ryan (Talk/Edits) 15:54, 25 October 2022 (UTC)
  Done -wd-Ryan (Talk/Edits) 14:19, 2 November 2022 (UTC)

Look at all the things MBA is!

Master of Business Administration (Q191701): biological process, educational institution, sourcing circumstance, unit of measurement, first principle, HTML document … (Sorry for spamming a nursing case again, definitely don’t have time to investigate this myself…) --2A02:8108:50BF:C694:457C:3035:F49E:E2ED 17:46, 30 October 2022 (UTC)

Its because professional certification (Q16023913) is a subclass of certification (Q374814). certification (Q374814) is the process of certifying something, not the end result. I changed it to certificate (Q196756). -wd-Ryan (Talk/Edits) 14:23, 2 November 2022 (UTC)
certificate (Q196756) is, in turn, identified as a subclass of linguistic unit (Q20817253). I think the erroneous link in that chain is fact (Q188572)subclass of (P279)statement (Q2684591). A fact is the truth to which a statement refers, not the statement itself - I am changing it accordingly. Swpb (talk)
This chain also reveals a subclass loop: academic degree (Q189533) -> academic title (Q3529618) -> professional certification (Q16023913) -> academic degree (Q189533). That last link is definitely wrong, but the others are suspect too. Swpb (talk) 19:25, 2 November 2022 (UTC)
Other bad statements revealed and removed: legal instrument (Q3150005) subclass of law (Q7748). Swpb (talk) 19:30, 2 November 2022 (UTC)

Berna fountain (Q822122) is a technique

type https://www.wikidata.org/wiki/Q207174 en: personification (artistic or literary device in which an abstraction is represented by a person) [97]

en: stylistic device (technique used to give an auxiliary meaning, idea, or feeling to a literal message) [98]
en: rhetorical device (technique that an author or speaker uses with the goal of persuading) [99]
en: artistic technique (method by which art is produced) [100]
en: technique (sum of techniques, skills, methods, and processes used in the production of goods or services or in the accomplishment of objectives, such as scientific investigation) [101]
en: means (means by which an item performs a function) [102]
en: action (something an agent can do or perform) [103] banned as it is an event !!!!!!!!!!!!!!!!!!!!!!!!!!
en: process (series of events which occur over an extended period of time) [104] banned as it is an event !!!!!!!!!!!!!!!!!!!!!!!!!!

Mateusz Konieczny (talk) 06:22, 28 November 2022 (UTC)

I don't see this anymore... Lectrician1 (talk) 15:33, 28 November 2022 (UTC)
Looks like   Resolved in https://www.wikidata.org/w/index.php?title=Q822122&diff=1780319088&oldid=1566049204 Mateusz Konieczny (talk) 17:11, 28 November 2022 (UTC)


lie (Q87720384) (specific sculpture) is an event, according to Wikidata ontology

en: allegorical sculpture (type of sculpture) [105]

en: allegory (pictorial representation of a figure to represent an idea or institution) [106]
en: art (expressive work intended to be appreciated for its beauty or emotional power; or the process of creating such a work) [107]
en: process (series of events which occur over an extended period of time) [108] banned as it is an event !!!!!!!!!!!!!!!!!!!!!!!!!!

Mateusz Konieczny (talk) 20:48, 29 November 2022 (UTC)

Changed "art"->representation (Q11795009) (not an art by itself). --Infovarius (talk) 20:55, 29 November 2022 (UTC)

Sounds   Resolved then Mateusz Konieczny (talk) 21:12, 29 November 2022 (UTC)

non-physical entity problems

cemetery (Q39614) is now formalization and non-physical entity

San Joaquin Campo Santo (Q30593659)

type https://www.wikidata.org/wiki/Q39614 en: cemetery (place of burial) [109]

en: architectural ensemble (group of multiple related objects, such as buildings) [110]
en: group of structures or buildings (structures or buildings that do not form a building complex, but are treated as a group) [111]
en: group of geographic locations (set of several geographic entities spread over a geographic region) [112]
en: geographic region (2D or 3D defined space on something, mainly in terrestrial and astrophysics sciences) [113]
en: region in space (2D or 3D region in our universe) [114]
en: region (space greater than one dimension. Not a point. 2D, 3D, 4D, etc. space) [115]
en: locus (set of points whose location satisfies or is determined by one or more specified conditions) [116]
en: set (well-defined mathematical collection of distinct objects) [117]
en: formalization (automated representation of a system) [118]
en: representation (role, function or property of an abstract or real object, relation or changes) [119]
en: relation (general relation between different objects or individuals) [120]
en: abstract object (object with no physical referents) [121]
en: non-physical entity (object that exists outside physical reality) [122] banned as it is an object that exists outside physical reality !!!!!!!!!!!!!!!!!!!!!!!!!!

Mateusz Konieczny (talk) 19:20, 27 November 2022 (UTC)

Cavite–Laguna Expressway (Q5055176) and all roads also are all set (Q36161) and abstract entity (Q7048977) Mateusz Konieczny (talk) 19:22, 27 November 2022 (UTC)

every city such as Altona (Q1630) is now abstract entity (Q7048977) Mateusz Konieczny (talk) 19:24, 27 November 2022 (UTC)

Williams Loop (Q2581240) and all railway (Q22667) are now also set (Q36161) and abstract entity (Q7048977) @عُثمان: as you requested Mateusz Konieczny (talk) 19:26, 27 November 2022 (UTC)


Well that's because User:Swpb added locus (Q211548) to region of space (Q110910055). This is what happens when you mix mathmatics entities with normal ones. I don't even know what a locus is. Lectrician1 (talk) 15:24, 28 November 2022 (UTC)
A locus is just a set of points. They could be points in a physical space or a mathematical space. As such, a region of physical space is a locus. The problem is that locus (Q211548), set (Q36161), etc. conflate the thing being defined with the definition of that thing. The points (and the space) can be physical or not; it's the definition of those points (or space) that is an inherently non-physical abstraction. And it's not exactly clear where that line should be drawn. Is a physical region a type of locus, which is in turn defined by an abstract set? Or is a physical region defined by an abstract locus, which is a type of set? And what property would express "is defined by"? (manifestation of (P1557)?) These are not questions with definite answers. There has to be a decision whether locus (Q211548) on Wikidata refers to a mathematical definition (non-physical), or a thing so defined (possibly physical), and make the statements consistent with that. Swpb (talk) 16:10, 28 November 2022 (UTC)
@Swpb The « reality / model » relationship should be done by a property such as « is modeled by » (that we might have created at some point). Physical space is modelled by a Riemann space in the theory of general relativity. This avoids a strong comitment to a scientific ontology and make clear what is in the domain of the theory and what is the domain of what is described by the theory. author  TomT0m / talk page 10:58, 30 November 2022 (UTC)
I don't see any existing property like "is modeled by", which, yes, would be perfect for this - ping me if you decide to propose one! Anyway, I've removed the offending subclass of (P279) statement. Swpb (talk) 14:13, 30 November 2022 (UTC)

  Resolved - as far as I can see Mateusz Konieczny (talk) 18:18, 1 December 2022 (UTC)

Filipovský pramen (Q47037286) is an event, according to Wikidata ontology

en: river source (starting point of a river) [123]

en: beginning of a watercourse [124]
en: beginning (place where something begins) [125]
en: occurrence (occurrence of a fact or object in space-time; instantiation of a property in an object) [126] banned as it is an event !!!!!!!!!!!!!!!!!!!!!!!!!!

Mateusz Konieczny (talk) 17:59, 1 December 2022 (UTC)

beginning (Q529711) describes a point in space, so it shouldn't be a subclass of occurrence (Q1190554). Maybe someone thought it meant "beginning" as in start time (Q24575110). -wd-Ryan (Talk/Edits) 18:02, 1 December 2022 (UTC)
Edited that way so   Resolved Mateusz Konieczny (talk) 18:06, 1 December 2022 (UTC)
Return to the project page "WikiProject Ontology/Archive 3".