Wikidata talk:WikiProject Ontology/Archive for 2023

[Survey for Wikidata data Reusers] Help us Improve Wikidata by Sharing Your Experience on Ontology Issues and Data Reuse Impact

Hello,

We are conducting a survey to better understand the ontology issues within Wikidata and their impact on data reuse.

You may recall that in 2021 and 2022 we run the Data Quality Days, which generated a lot of very useful discussions on the processes around increasing/maintaining data quality and utility on Wikidata. We identified various types of ontology issues, and we would like to get input on which of these are the most problematic for you in your use of Wikidata's data.

This survey has two sections and will take 20–25 minutes to complete. Optional open-ended questions may lengthen the completion time.

In the first section, we will present descriptions of the ontology issues we have found. We will ask you to evaluate the impact of these issues on your work, and you are also invited to share any other ontology issues you have detected in case we missed them.

The second section focuses on how you use data from Wikidata. Most of the questions in this section are optional, meaning you do not have to share details of your work unless you find them relevant to the issues you would like to share with us. They are helpful to us in understanding the context of your issues better, however.

This survey is anonymous, but there is an optional email field at the end for follow-up questions. Providing an email will make your responses not completely anonymous. A summary of the results of the survey will be published as a whole at Wikidata:Ontology issues prioritization and will not include any identifying information.

If you would like to participate, please use this link (LamaPoll): https://wikimedia.sslsurvey.de/ontology-issues/

We kindly request your participation by Friday, February 17th at 23:59 UTC.

If you have any questions, please do not hesitate to let us know by replying directly to this message or leaving a note at Wikidata talk:Ontology issues prioritization.

Many thanks in advance for your participation.

Cheers, -Mohammed Sadat (WMDE) (talk) 16:50, 20 January 2023 (UTC)

Meaning conflation in Q1183543 causes weird subclass relationships

appliance (Q1183543)'s descriptions and wikilinks are inconsistent, they describe different meanings, amongst them:

  • british english: an object to manipulate another object
  • english: an object with which something can be worked on, manufactured or effected
  • galitian: an object with internal mechanisms that produces a job
  • italian: a generic type of equipment
  • chinese: Complex equipment such as electric motors or electric drives
  • ...

This causes some weird sublass relationships, for example...

⟨ gate (Q53060)      ⟩ subclass of (P279)   ⟨ appliance (Q1183543)      ⟩

... which causes by inference...

A gate (Q53060) is definitely not subclass of (P279) physical tool (Q39546). The description "an object with internal mechanisms that produces a job" is acceptable for a gate (Q53060) but the others, like "an object to manipulate another object", are not.

Is there something to be done about this? Danysan1 (talk) 15:16, 22 January 2023 (UTC)

Optional qualifier for hasPart?

Hi there, for Property:P527 "has part(s)", is there a preferred way to qualify that the part can be optional? I.e. it is not always true, but could be true, depending on the configuration. (I am thinking in the context of consumer products, let's take a laptop & a cellular internet card as an example) Photocyte (talk) 17:02, 17 March 2023 (UTC)

Use nature of statement (P5102). Lectrician1 (talk) 17:40, 17 March 2023 (UTC)

Property proposal for "stock keeping unit" Q399757

Hi there,

I'd like to propose a Wikidata Property for "stock keeping unit" (SKU) Q399757, but I imagine this concept is already captured in many ontologies out there. Is there an easy way to convert an existing ontology term for SKU into a Wikidata Property proposal? This property would also need a qualifier to link it to a company or business that issues the particular SKU, and ideally i'd like to have that harmonized with ontologies that are out there. Example: AA batteries from Amazon have https://en.wikipedia.org/wiki/Amazon_Standard_Identification_Number B00NTCH52W . ASIN is already a Wikidata property (Property:P5749). But it would be more generic I think to have B00NTCH52W be a generic SKU property with a qualifier "issued by" Amazon Q3884. Photocyte (talk) 17:11, 17 March 2023 (UTC)

In Wikidata we want to optimize the speed of possible queries by establishing efficient data structures. If I wanted to query for ASINs, something which is likely common for some data users, having a dedicated property to query for them will result in faster queries than having a general SKU property and then looking for matches that have a particular qualifier. It is for this reason why having a dedicated ASIN property is better.
If you have any other reason you need a general SKU property for things other than ASINs, let me know. Lectrician1 (talk) 17:39, 17 March 2023 (UTC)
Sure thing, here are three such examples I had made to explore the idea:
Q117162137
Q117162181
Q117162453
Going to have to respectfully disagree that there shouldn't be a generic SKU, that ASIN could also fall under (or not). There are literally millions of companies that issue SKUs. So, makes more sense IMO, to structure it as a qualifier. Photocyte (talk) 17:43, 17 March 2023 (UTC)
Hm. Those are good examples! I guess having a general SKU looks to be appealing then! However, another thing you need to consider is that dedicated properties can have property constraints that can limit them to specific types items and having certain formats of values (regex constraints), so if we made a general SKU, we couldn't use these. For these reason, I think it would be better if you proposed dedicated properties. Lectrician1 (talk) 17:59, 17 March 2023 (UTC)
Is there anyway to harmonize the two approaches? I.e. a generic SKU property and company specific SKU properties (Amazon, VWR, etc). IMO, the limit to specific types of items, is just everything that descendes from the very generic "goods" Q28877 . While for regex constrains, these things don't even get followed that closely by the companies themselves, so I don't think it is essential. (i.e. you'll see both 123-51321-789 and 12351321789.) Photocyte (talk) 18:04, 17 March 2023 (UTC)
Yes, we could do that, however regex constraints such as by only allowing digits are important so that we can prevent people from adding values that are sheer random, vandalism, or slightly incorrect. For example, we would want to prevent someone using a URL to the external resource as the value instead of the SKU. Lectrician1 (talk) 18:11, 17 March 2023 (UTC)
Okay, will plan to propose separate properties, and leaving harmonizing with a generic SKU & qualifiers for a different time. But still could use more information on how to harmonize this with external ontologies, if possible. Photocyte (talk) 18:46, 17 March 2023 (UTC)

Silesian Opera (Q1598188) classified as a longitudinal wave and an event

musical theater (Q1370345)

stage and screen (Q61340552)
functional music (Q26897135)
music (Q115484611)
sound (Q11461)
longitudinal wave (Q626707)
progressive wave (Q4080565)
wave (Q37172)
oscillation (Q170475)
change (Q1150070)
occurrence (Q1190554)

more such cases on User:Mateusz Konieczny/failing testcases

Mateusz Konieczny (talk) 09:15, 23 April 2023 (UTC)

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Mateusz Konieczny (talk) 12:16, 28 June 2023 (UTC)

Santa Catalina Island (Q845229) is an event and intentional human activity (and legal transaction), according to Wikidata ontology

en: rancho of California (land grant by Spain and Mexico of the 18th and 19th centuries in California, USA) [1]

en: land grant (gift of real estate – land or its use privileges – made by a government or other authority) [2]
en: gift (in law, voluntary transfer of property from one person (the donor or grantor) to another (the donee or grantee)) [3]
en: occurrence (occurrence of a fact or object in space-time; instantiation of a property in an object) [4] this was unexpected here as it indicates an event !!!!!!
en: legal transaction (means for the creation of legal relations) [5]
en: intentional human activity (human activity driven by purposeful motives) [6] this was unexpected here as it indicates an intentional human activity !!!!!!!!!!

more such cases on User:Mateusz Konieczny/failing testcases

Mateusz Konieczny (talk) 12:19, 28 June 2023 (UTC)

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Mateusz Konieczny (talk) 11:28, 7 July 2023 (UTC)

Transitivity of part of (P361) and has part(s) (P527)

part of (P361) and has part(s) (P527) are instance of (P31)transitive Wikidata property (Q18647515).

This is problematic because of the current usage of has part(s) (P527) to express membership in a group. For example, using has part(s) (P527) to document the members of a musical group (Q215380).

Membership is not a transitive relationship. If musicians are member of (P463) a musical group and the musical group is part of (P361) the Rock and Roll Hall of Fame (Q179191), the members of the musical group are not part of the Rock and Roll Hall of Fame (Q179191) because member of (P463) is not transitive.

However, we have been using has part(s) (P527) to express the inverse of member of (P463). has part(s) (P527) currently describes itself as transitive, which is not correct when it is used to describe membership. Now, we could remove the 23188 uses of has part(s) (P527) to describe membership, however we would need to do some additional things as well:

  • We would need to create a new inverse property of member of (P463), "has members", so that Wikipedia templates could easily access the members of a group and Wikidata editors can easily find them. We would then need to switch over all the incorrect uses of has part(s) (P527) to use this.
  • Currently 10+ musical group templates use has part(s) (P527) to describe the members of musical groups. We would need to change all of these to use the new "has members" property.

So there is this option, or there is the option of removing the fact that part of (P361) and has part(s) (P527) are transitive, which now that I look at it, is not a good idea.

Which option should we pursue? What other thoughts do you have? Lectrician1 (talk) 17:27, 17 March 2023 (UTC)

I like the “has member” option. Not “has members.”AdamSeattle (talk) 18:39, 18 March 2023 (UTC)
Property proposal created: Wikidata:Property proposal/has member 2 Lectrician1 (talk) 20:30, 19 March 2023 (UTC)
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update/March_2023_scaling_update suggests willingness to work to make it possible for templates to use inverse relationship. After that is through we could just remove this usage and don't need to create an inverse property for the current use. ChristianKl14:29, 20 March 2023 (UTC)
I see the problem in using instead. I would rather say "enlists" than "has part". And a band indeed has "member parts" imho. Infovarius (talk) 16:31, 20 April 2023 (UTC)

Please discuss my Property Proposal "Stock keeping unit"

See here for more info: Wikidata:Property proposal/stock keeping unit . Transcluded below (please feel free to remove the transclusion if this isn't how people typically do things):

stock keeping unit

   Not done

Motivation

We (including me Photocyte (talk) 20:16, 17 March 2023 (UTC)) are working on a Scientific Labware ontology + database (https://github.com/Bioprotocols/labware-databank/issues/3), and it would be useful to represent the company-specific stock keeping units & keep them in sync with Wikidata. Generally speaking, stock keeping units are used everywhere in global physical and electronic commerce. Would be worth representing in Wikidata.

An alternative approach is to make project proposals for vendor-specific SKUs, but that would be extremely laborious to implement for all companies on Wikidata. (some millions?)

But here are some large vendors, at least in the scientific space, where it might make sense to have independent Property proposals:

VWR catalog number (Q117162137)
Fisher-Scientific catalog number (Q117162181)
Neta Scientific catalog number (Q117162453)

Some past discussion happened here: https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ontology?markasread=883442226&markasreadwiki=wikidatawiki#c-Lectrician1-20230317181100-Photocyte-20230317180400

I expect there are some "real" ontologies out there with an equivalent stock keeping unit term, so would love feedback on how to rigorously tie this property into those ontologies.

It might be useful to have a format constraint of: [A-Z\d]{200} (could represent sha512 hashes in base36, in the unlikely event any company is using that) Photocyte (talk) 15:42, 25 March 2023 (UTC)

On second thought, a purely alphanumeric format constraint may exclude non-western scripts which do not use arabic numerals. So, perhaps there should be no constraints. Photocyte (talk) 14:41, 27 March 2023 (UTC)

Discussion

  WikiProject Ontology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

  Notified participants of WikiProject Companies

  WikiProject Properties has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

  Comment The datatype should not be external identifier but string - similar to ticker symbol (P249) this is an identifier only with respect to the issuer, not universal. ArthurPSmith (talk) 19:34, 28 March 2023 (UTC)

@ArthurPSmith: , If taking https://www.wikidata.org/wiki/Wikidata:Property_proposal/Amazon.com_ID and https://www.wikidata.org/wiki/Wikidata:Property_proposal/part_number as examples, both those properties use External Identifier, rather than string. So I guess before I change it to string on this particular property proposal, I'd like to clarify if it should also be changed on those other accepted & in progress property proposals? Also pinging the properties group. I'm starting to realize that there are many similar concepts: model code, part number, SKU, and I want to be sure they are kept appropriately harmonized with external ontologies & uses of these terms Photocyte (talk) 18:32, 29 March 2023 (UTC)

@Photocyte: The Amazon ID uniquely identifies a product, as it is a code specific to Amazon, so there's no problem calling that an external id. The part number proposal was not done, but if it had been it should not have been an external identifier as it does not uniquely identify items (the same part number could be used by different vendors to identify different things). ArthurPSmith (talk) 12:11, 30 March 2023 (UTC)
@ArthurPSmith :, would you like to give your opinion? Regards, ZI Jony (Talk) 06:57, 24 January 2024 (UTC)

Photocyte (talk) 15:53, 25 March 2023 (UTC) Photocyte (talk) 15:53, 25 March 2023 (UTC)

What is the correct way to make statements about items as terms?

E.g., rally (Q116236932). The term "rally" is a sports term (Q77733378), but a rally itself is obviously not, and rally (Q116236932)instance of (P31)sports term (Q77733378) is wrong. Is there a standard/appropriate way to link rally (Q116236932) to sports term (Q77733378)? Or is there/should there be an item like "sports concept", for rally (Q116236932) to be an instance of? Swpb (talk) 14:22, 28 March 2023 (UTC)

@Swpb It should not be linked to "sport term". Why would it ? We already know it’s sport related, for example by using the "sport" property. author  TomT0m / talk page 10:00, 23 April 2023 (UTC)

User Script to warn about conflicting superclasses now working!

I'd previously wrote on Project Chat about an idea of mine to make a Gadget that warned editors when they changed an items basic superclasses (i.e. changed an item from being a process to an object, or a spatial entity to a temporal entity). I've now written it (at least a first version), and got it to a point where I'd appreciate wider feedback. It checks each time a new item is selected when changing an instance-of or subclass-of value, and makes a bubble notification if a conflict is identified. (For now, for debugging, it also displays a notification if no problems are seen, but I plan to remove that.)

You can try it out just by adding the following to your Special:MyPage/common.js:

mw.loader.load('//www.wikidata.org/w/index.php?title=User:JesseW/conflicting_superclass_warnings.js&action=raw&ctype=text/javascript'); // [[User:JesseW/conflicting_superclass_warnings.js]]

Very glad for any feedback, particularly suggestions for additional conflicting superclasses! (above copied from my Project Chat announcement) JesseW (talk) 03:54, 2 April 2023 (UTC)

@JesseW Please add your script to Wikidata:Tools/Enhance_user_interface! Lectrician1 (talk) 21:30, 2 April 2023 (UTC)
Good idea, thanks! I've done so now. JesseW (talk) 02:35, 3 April 2023 (UTC)
@JesseW Good, it could be generalised beyond basic superclasses by using disjoint union of (P2738)   :) author  TomT0m / talk page 10:02, 23 April 2023 (UTC)
I've manually added some uses of disjoint union of (P2738); it would certainly be neat to implement logic to automatically extract such relationships and check them. But I think higher priority features including handling adding properties (not just editing them), and maybe being able to check RecentChanges/RelatedChanges/history pages for edits that introduced conflicts. In any case, please try it out and let me know how it goes! JesseW (talk) 13:05, 24 April 2023 (UTC)
Sure, I think I’ll add this to User:TomT0m/classification.js next. I recently finished the feature to show … loops in the class tree ! And there are some, see programmer (Q5482740)      (I don’t solve this immediately for the example). author  TomT0m / talk page 15:52, 24 April 2023 (UTC)
Ooh, neat! JesseW (talk) 01:42, 26 April 2023 (UTC)
I've removed the loop on programmer (Q5482740) in this edit. JesseW (talk) 22:34, 26 April 2023 (UTC)

First experiment ! @JesseW: A first promising query : this one shows that there is a problem with Annie (Q566893)      being both an instance of this statement. It’s efficient, although the involved class tree is a real mess : https://w.wiki/6dKD . Scary, a name is a mathematical object and a set …

select ?class ?st ?instance  (count(distinct ?classes) as ?count) {
  ?class p:P2738 ?st .
  ?st  pq:P642|pq:P11260 ?classes .
  
  bind (wd:Q566893 as ?instance)  .
  ?classes wdt:P279* ?class .
  
  ?instance wdt:P31/wdt:P279* ?classes .
  
  #?st pq:P642|pq:P11260 ?classesc .
}  group by ?class ?instance ?st 
  having (?count > 1)
  limit 1
Try it!

author  TomT0m / talk page 14:24, 25 April 2023 (UTC)

Excellent! Uncovering messes like that is very much one of the uses I hoped to make of this tool. JesseW (talk) 01:40, 26 April 2023 (UTC)
That mess seems to be underneath proper noun (Q147276), as can be seen on Talk:Q147276. I'm going to take a stab at fixing it. JesseW (talk) 01:48, 26 April 2023 (UTC)
Actually, while proper noun (Q147276) was very messy, the error was actually here. JesseW (talk) 22:30, 26 April 2023 (UTC)
Yep, this is clear on the screenshot of the query below. The query is now integrated into User:TomT0m/classification/sandbox.js, the test version of the script, it still needs a few cosmetic changes, and to work on subclasses will need a bit of work still. I did not solved it immediately because it was my test case :) But there are others, so not a real issue.
A query also computes the most suspicious item, the item the upper in the class tree that is a subclass of two conflicting classes. So now we could work on adding more disjoint union of (P2738)   and the tool will pick up the problems on navigation, hopefully … author  TomT0m / talk page 08:38, 27 April 2023 (UTC)
 
capture d’écran de la vue de conflit d’appartenance à des classes disjointe du gadget classification.js sur Wikidata

New query today ! soon integrated into my script. It makes kind of crystal clear where the issue is with colors : one of the classes is both a process and an object currently

#defaultView:Graph

prefix violated_top: <http://www.wikidata.org/entity/Q35120>
prefix violated_1:  <http://www.wikidata.org/entity/Q488383>
prefix violated_2: <http://www.wikidata.org/entity/Q3249551>

prefix instance: <http://www.wikidata.org/entity/Q566893>

select ?class ?rgb ?classLabel ?img 
      ?parent   
      ?edgeLabel
      ?size
with {
  select distinct ?class 
                  (sample(?rgb_) as ?rgb) 
                  (concat(?classLabel_, " : ",group_concat(distinct ?violatedLabel_;separator=", ")) as ?classLabel) 
                  (group_concat(distinct ?violatedLabel_;separator=", ") as ?edgeLabel) 
                  {
  
    instance: wdt:P31 ?baseclass.
  
    ?baseclass wdt:P279* ?class .
    ?class wdt:P279* ?violated .
    values (?violated ?rgb_) {
      (violated_1: "223311")
      (violated_2: "447711")
    }
  
    SERVICE wikibase:label { 
      ?class rdfs:label ?classLabel_ .
      ?violated rdfs:label ?violatedLabel_ .
      bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
    }
  
  } group by ?class ?classLabel_
} as %edges

{
  {#{select * { include %edges} }
  {select (?class as ?parent) { include %edges} }
  include %edges .
  ?class wdt:P279 ?parent .
  }
  union {
    bind (instance: as ?class)
    bind ( "Base" as ?classLabel)
    ?class wdt:P31 ?parent .
    bind ("FFFF33" as ?rgb)
    bind (12 as ?taille)
    bind ("plop" as ?edgeLabel)
  } union {
      values (?class ?rgb ?classLabel) {
        (violated_1: "223311" "vio1")
        (violated_2: "447711" "vio2")
      }
      values ?parent {
        violated_top:
      }
      # values ?rgb { "111111" }
      #SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". ?class rdfs:label ?classLabel}
    
  } union {
    values (?class ?rgb ?size ?classLabel ){
      (violated_top: "FF0000" "10"^^xsd:number "Top !!")
    }
    }
}
Try it!

author  TomT0m / talk page 14:52, 26 April 2023 (UTC)

Postfenn (Q53953145) is an event, according to Wikidata ontology

en: carr (waterlogged wooden terrain between a swamp and forest) [7]

en: azonal vegetation [8]
en: vegetation (total of plant formations and plant communities) [9]
en: assembly (object consisting of multiple parts) [10]
en: merge (entity resulting from the act of combining several entities to form one) [11]
en: occurrence (occurrence of a fact or object in space-time; instantiation of a property in an object) [12] this was unexpected here as it indicates an event !!!!!!!!!!!!!!!!!!!!!!!!!!

more such cases on User:Mateusz Konieczny/failing testcases

Mateusz Konieczny (talk) 09:18, 23 April 2023 (UTC)

Results of the survey on ontology issues reusers are facing

Hi everyone,

A while ago we did a survey among reusers about the different types of ontology issues they are facing when building applications and more using data from Wikidata. The results are available now. More details at Wikidata talk:Ontology issues prioritization.

Cheers Lydia Pintscher (WMDE) (talk) 09:03, 21 June 2023 (UTC)

Thanks for doing this. No big surprises there - although I was interested to see the upper ontology mess wasn't a higher priority. What are next steps? ArthurPSmith (talk) 19:03, 21 June 2023 (UTC)
Thank you!
Next steps: discussing potential insights and solutions you all have at Wikidata talk:Ontology issues prioritization#Results of the survey on ontology issues reusers are facing and next week I'll also add the solutions we have so far. (We didn't want to do that right away to not bias the results.) Lydia Pintscher (WMDE) (talk) 18:38, 22 June 2023 (UTC)
What is this «upper ontology»? Some convention I missed? I really don't know about it. —Ismael Olea (talk) 09:16, 23 June 2023 (UTC)
I hope en:Upper ontology is helpful. But basically what's at the most general (upper) level when you go up all the instance of/subclass of relations. For Wikidata there is stuff like Q35120 there. Lydia Pintscher (WMDE) (talk) 15:52, 23 June 2023 (UTC)

Instances of terminology (Q8380731) and its subclasses

Tons of items are claimed to be instance of (P31) some type of terminology. This makes for bizarre inferences (for instance double dribble (Q1242920) is a language (Q34770), etc.). It seems to me that types of terminology (like basketball terminology (Q104437412)) don't have instances, they have parts. So:

  1. Is my reasoning correct, that terms should be part of (P361) a terminology, rather than instance of (P31) it?
  2. If #1, what is the best way to (a) correct all the existing cases and (b) prevent new ones?

Re (b), I'm thinking this would be a perfect use case for a "not value-type"/"none-of class" constraint type, which I've proposed here. Swpb (talk) 20:09, 28 June 2023 (UTC)

I agree on (a); on (b), I suggest adding in Property talk:P31 {{Autofix|pattern=Q8380731|replacement=Q8380731|move_to=P361}}. In the future, of course, I think that autofixes should be coded somehow in properties, instead of added in this way in property talk pages (I wrote about this in Wikidata:Events/Data Quality Days 2022/Modeling data: "storing standards only as constraints would make them queryable, while Autofix cannot be queried" etc.). --Epìdosis 21:46, 28 June 2023 (UTC)
Won't that just fix cases where the value is terminology (Q8380731) itself, and not any of its subclasses? Swpb (talk) 14:40, 29 June 2023 (UTC)
Exactly; but we don't have better tools as of now (and in Wikidata:Events/Data Quality Days 2022/Modeling data I raise this exact problem); the only choice, presently, is asking a bot (WD:RBOT). --Epìdosis 15:12, 29 June 2023 (UTC)
Ok, I've requested a better tool. Swpb (talk)

Medical treatment class hierarchy

I could use assistance from this project in identifying an appropriate parent class for medical treatment (Q179661). Please join the discussion at Wikidata talk:WikiProject Medicine § Medical treatment class hierarchy. Daask (talk) 19:04, 15 July 2023 (UTC)

unpingable project

THe project page says to ping the project, but when I try it doesn't work because the project has more than 50 members. Either the project page should be changed or pings allowed again. Peter F. Patel-Schneider (talk) 13:21, 24 July 2023 (UTC)

plurality and description of metaclasses

There are at least two metaclasses in Wikidata --- metaclass (Q19361238) and class (Q23960977). Why is there duplication here (and in other places? Why does the description of metaclass (Q19361238) appear to forbid mixed meta-classes (classes whose instances are either classes or non-classes) as instances, particularly as variable-order class (Q23958852) is a subclass of it? Peter F. Patel-Schneider (talk) 13:28, 24 July 2023 (UTC)

The first seemingly internal thing while the second is real-world thing (if it makes sense). Q19361238 doesn't forbid mixed metaclasses, it combines them as a separate subclass. --Infovarius (talk) 16:34, 29 July 2023 (UTC)
I don't understand the rationale for metaclass (Q19361238). If this class should be in Wikidata, then why not Wikidata person, Wikidata vessel, Wikidata motor car, etc? This kind of duplication appears to be to be a major source of confusion in the Wikidata upper ontology.
The English description of {{Q|19361238} is "class of classes, class whose instances are classes" so it appears that mixed metaclasses are not allowed as they can have instances that are not classes. This is another source of confusion in the Wikidata upper ontology.
Peter F. Patel-Schneider (talk) 16:44, 29 July 2023 (UTC)
I took a look at all the metaclasses that appear to be part of the Wikidata top-level ontology.
First there are the fixed-order and variable-order metaclasses - first-order class (Q104086571), second-order class (Q24017414), third-order class (Q24017465), fourth-order class (Q24027474), fifth-order class (Q24027515), fixed order class of higher order (Q24027526), fixed-order class (Q23959932), and variable-order class (Q23958852). These are well organized between them, but many of the instances of them other than first-order class are incorrect. I suggest checking and correcting these instance relationships.
There are then four different versions of metaclass - class (Q23960977), metaclass (Q19478619), metaclass (Q19478619), and metaclass (Q19361238) - and three different versions of class - class (Q5127848), first-order class (Q21522908), and class (Q21522864) - all connected in some way to the previous metaclasses. There is no way that anyone could determine which of these to use. I suggest having only one version of metaclass and one version of class, merging as necessary. Several of seven classes have incorrect or missing superclasses or types. I suggest making the ontological relationships explicit between the remaining version of metaclass and the remaining version of class, (i.e., metaclass is a subclass and instance of class and class is an instance of class). I also suggest checking and correcting the instances and subclasses of these metaclasses.
If it is necessary to have other objects called metaclass or class there should be usage instructions and property constraints to the effect that these other objects are not to have instances or subclasses.
There is also Q104093226. This metaclass appears to have no good use. I suggest replacing it with usage instructions where appropriate.
I think that this will greatly improve the top-level ontology of Wikidata. What should be done before implementing this change (or doing something else to improve the state of the top-level ontology of Wikidata)? Peter F. Patel-Schneider (talk) 01:46, 9 August 2023 (UTC)
@Peter F. Patel-Schneider: Thanks for summarizing the situation. One helpful step might be to check which if any of these have any interwiki links (i.e. wikipedia pages in some language) - that constrains any plans to merge items. A second step would be to get some statistics on how often each of these is used directly - how many items have one of these as the value of a property (most likely P31 or P279). ArthurPSmith (talk) 18:47, 10 August 2023 (UTC)
@ArthurPSmithI have a text table with statistics, comments, and relationships to other top-level classes. What is a reasonable way of turning a text table into something that can be published here? Peter F. Patel-Schneider (talk) 19:24, 10 August 2023 (UTC)
If you can turn it into wikitext (see https://www.mediawiki.org/wiki/Help:Tables for how to do tables) this is the sort of thing that would usually be posted to a personal page - say User:Peter F. Patel-Schneider/metaclass table. ArthurPSmith (talk) 19:29, 10 August 2023 (UTC)
Done. See User:Peter F. Patel-Schneider/metaclass table Peter F. Patel-Schneider (talk) 22:04, 10 August 2023 (UTC)
@Peter F. Patel-Schneider: Nice, thanks! A few notes (in no particular order) on this:
  1. You list abstract entity (Q7184903) and object (Q488383) twice at the bottom. I assume these are otherwise fine?
  2. Deleting items requires an admin; it's usually preferable to merge if possible. However, I see these might be sufficiently confusing that deletion is better here. Wikidata:Requests for deletions works best if you remove all statements that use the item first, assuming we really do want to delete it.
  3. class (Q21522864) has a surprisingly large number of edits, including a relatively recent one by User:TomT0m. We might want to engage them here first to better understand what possible purpose this is intended to have.
  4. Some of the others such as class (Q23960977) also use that disjoint union of (P2738) property and may be seen to have a purpose in expressing disjointness of two items; I don't personally see this as useful but there may be other strong opinions on this out there.
  5. The other merge suggestions seem fine. Should we consider merging class (Q16889133) with first-order class (Q104086571)? Given that it shouldn't include things that are instances of metaclass (Q19478619)?
  6. Do we need an item that includes all possible types of fixed-or-variable order class but is not entity (Q35120) (which covers everything) or abstract entity (Q7184903)?
ArthurPSmith (talk) 15:45, 11 August 2023 (UTC)
1/ That's a cut-and-paste error. I'll fix it.
2/ Yes, before any request for deletion I'll investigate incoming links and try to contact their submitter.
3/ I'm not sure why class (Q21522864) has so many edits. @TomT0m: Do you have any idea?
4/ When I looked at the disjoint union constraints it appeared to me that they were incorrect. I'll look again and contact the submitters.
5&6/ I see that there is an issue about the definition of class and metaclass. Generally class is considered to be all the classes, including for example metaclass. Thus it is a variable-order class and not a first-order class. I don't see anything in Wikidata that indicates that either class that is in the ontology is restricted to contain only classes of individuals. (There are some lines of thought were class includes just first-order classes, but the predominant situation in open representations like the semantic web and related areas is that class includes more than first-order classes.) Similarly metaclass contains more than second-order classes - indeed metaclass is stated to be an instance of variable-order class. Peter F. Patel-Schneider (talk) 19:39, 11 August 2023 (UTC)
The most obvious first step is to merge class (Q23960977) into metaclass (Q19478619). The former is unused outside of the top level and has no interwiki links. Peter F. Patel-Schneider (talk) 09:12, 14 August 2023 (UTC)
On class (Q16889133), yes I see your point; in that case I think it might be best to merge class (Q23960977) into class (Q16889133) as I think the intent with class (Q23960977) was to include "all the classes, including for example metaclass". Not that what you propose would be a problem given that it's not being used, so I'm ok either way here. ArthurPSmith (talk) 20:02, 14 August 2023 (UTC)
Looking at (meta)class and metaclass more closely, it might be a good idea to keep both, but be much more explicit as to what they are for. Metaclass is for pure metaclasses - those with no individuals as instances - (meta)class is for mixed metaclasses - those that can have both individuals and classes as instances. I'll see if I can come up with better descriptions and usage instructions for both of them. Peter F. Patel-Schneider (talk) 15:22, 22 August 2023 (UTC)

detecting objects that are incorrectly at several levels of the ontology

I just spend a while fixing up several hundred instances of watercraft that were actually subclasses of watercraft. The typical setup was an type of watercraft that was an instance of flying boat (Q1153376) and also a subclass of seaplane (Q115940). This is a level mismatch and an obvious error but each of them had to be (quickly) checked to determine whether they were individuals or classes. (The all are classes.) Is there a way to detect this kind of mismatch so that the problem can be dealt with when the object is created instead of requiring later manual work? Peter F. Patel-Schneider (talk) 19:28, 8 August 2023 (UTC)

User:TomTOm/classification.js may be an answer, using and adding the necessary disjoint union of (P2738)   statements it shows a warning. But it’s a user script and will not work for people who do not activate it. I should add a check to see if an item is both an instance and a subclass of the same class at some point because the relevant disjointness statements may not be accessible. author  TomT0m / talk page 16:06, 22 August 2023 (UTC)
Yeah, the problem with user scripts is that the only work for those who know about them. Ideally there should be constraints that can be used to warn about ontology violations when edits are made, and that show up whenever the item is viewed. There are several constraints that would be useful, including checking that a value is not an instance of a class (and possibly remedying the problem by changing the property involved), checking that disjointness constraints are not violated, and checking the subclass requirement for is metaclass of. I see lots of violations of the latter two that create messes in the ontology. Peter F. Patel-Schneider (talk) 21:32, 22 August 2023 (UTC)

Proposed merges in the top level of the Wikidata ontology

Based on my analysis of the top-level classes in the Wikidata ontology (see https://www.wikidata.org/wiki/User:Peter_F._Patel-Schneider/metaclass_table) here are three merges that I think shold be done to reduce the number of classes in the top level of the Wikidata ontology.

class (Q21522864) is unused, except within the top-level classes of the ontology. As class (Q16889133) includes metaclasses, "class or metaclass" is just the same as "class", and "of Wikidata ontology" doesn't have any impact, so the two classes have the same meaning. Merging class (Q21522864) into class (Q16889133) thus seems warranted to reduce the number of classes in the top level of the Wikidata ontology.

class (Q23960977) is unused, except within the top-level classes of the ontology. The description of class (Q23960977) is "collection of entities that can be classes or individuals" which is functionally the same as the description of class (Q16889133) "collection of items defined by common characteristics", so the two classes have the same effective meaning. Merging class (Q23960977) into class (Q16889133) thus seems warranted to reduce the number of classes in the top level of the Wikidata ontology.

first-order class (Q21522908) is very lightly used. Its description is "class whose instances are never other classes; class which is not a metaclass". The first part is equivalent to the description of first-order class (Q104086571) - "class whose instances are individuals and not classes". Merging first-order class (Q21522908) into first-order class (Q104086571) thus seems warranted to reduce the number of classes in the top level of the Wikidata ontology. Peter F. Patel-Schneider (talk) 16:51, 23 August 2023 (UTC)

Better support for specialized versions of class membership

One of the things that makes it hard for me to use Wikidata is that it is often not possible to extract the correct information from Wikidata because the organization of information varies throughout Wikidata. For example, plumber (Q252924) - described as "profession; tradesperson specialized in installing and maintaining systems used for potable water, sewage and drainage in plumbing systems" - is a class in Wikidata, as can be seen from its subclass of (P279) relationship to skilled worker (Q1391083). So one should expect to be able to find plumbers in Wikidata by retrieving the instances of plumber (Q252924). As of 26 August 2023 there is only one instance - Q111745781, which is not even a person. Instead to retrieve plumbers one has to query for items that have a occupation (P106) link to plumber (Q252924) and are also instances of subclasses of person (Q215627). Then one gets 158 results, as of 26 October 2023, which seems more reasonable. But one has to know that this is the way to retrieve plumbers. There are connections from plumber (Q252924) to occupation (P106) through profession (Q28640) and occupation (Q12737077) and then using Wikidata property (P1687) but there is also a Wikidata property (P1687) connection to field of this occupation (P425) so there does not appear to be a way of discovering this special way of retrieving plumbers.

There a quite a few examples of this sort of connection to a class from its instances that do not use instance of (P31)/subclass of (P279)*. For example, instances of ship class (Q559026) - group of ships of a similar design - are classes, such as Olympic-class ocean liner (Q767166). But the connection from Titanic (Q25173) to Olympic-class ocean liner (Q767166) uses vessel class (P289), not instance of (P31).

This is not just a problem when querying information but also when creating information.

What can be done to better support this common way of modelling in Wikidata? Peter F. Patel-Schneider (talk) 14:21, 26 August 2023 (UTC)

I'm not sure this is a good solution, but subproperty of (P1647) could potentially be a route to making these more consistent, though the logic to support that would have to be handled somewhere. parent taxon (P171) for example is declared as a subproperty of subclass of (P279) so in principle the inconsistent modeling within the taxon domain can be made consistent that way. ArthurPSmith (talk) 20:29, 28 August 2023 (UTC)
I agree that using subproperty of (P1647) instance of (P31) is a way to go, and that has been done for vessel class (P289). But the tools used for Wikidata do not support this. For example, if I don't know about the status of vessel class (P289) then I can't build the query to find instances of a ship class. (If I do know, then I can use wdt:P289/wdt:P279* or maybe (wdt:P31|wdt:P289)/wdt:P279*, but I need to know that wdt:P289 is used for this purpose there.)
Ideally the Wikidata query service should hide this complication (and others) so that I could just query wdt:P31/wdt:P279* to find all instances of a class in Wikidata. Actually, the query service should do even more, so that I can use wdt:P279 to find all subclasses of a class including the indirect ones and wdt:P31 to find all instances of a class including the instances of subclasses. Peter F. Patel-Schneider (talk) 14:08, 2 September 2023 (UTC)

Classes that do not fit into the correct order

See https://www.wikidata.org/wiki/Wikidata_talk:Ontology_issues_prioritization#Classes_that_do_not_fit_into_the_correct_order_(e.g.,_first-order_classes_that_have_classes_as_instances) for a program I wrote to identify (some) order problems in the Wikidata ontology. It found lots of problems. Peter F. Patel-Schneider (talk) 11:32, 28 August 2023 (UTC)

Westkappelse kreek (Q110585281) is an instance of quantum particle (Q28693603) and that is weird

By inference (or not), we have

this subclass chain explains why (at the post time).

Is there something to be done about this ?


(reported through User:TomT0m/classification.js)

Swpb (talk) 17:43, 11 September 2023 (UTC)

This appears to be a case of semantic drift, where the different meanings of the label or description of a concept in Wikidata are used in different places. There is some discussion of this problem in the discussion of the results of the ontology issues survey. See https://www.wikidata.org/wiki/Wikidata_talk:Ontology_issues_prioritization
The issue here is what links to change. I suspect that the subclass of (P279) link from open water (Q2479431) to surface water (Q752112) should be removed. Peter F. Patel-Schneider (talk) 18:12, 11 September 2023 (UTC)
Interesting, why that link? It seems to the untrained eye to be as valid as any of the others. Swpb (talk) 19:54, 11 September 2023 (UTC)
Open water is a body of water, e.g., a lake, according to my reading of the linked Wikipedia page. Surface water appears to be just water that is on the surface, as far as I can tell. So a lake is a body of water, but it contains surface water, i.e., the instances of body of water are not instances of surface water.
This is all somewhat murky as the English Wikipedia page for surface water uses lake as an example of surface water. But elsewhere in the page surface water is not the body of water, but instead the flow of water. Peter F. Patel-Schneider (talk) 20:24, 11 September 2023 (UTC)
 
цепочка надклассов воды
Indeed not an easy case. There's somewhere drift from "class of water masses" to "class of water molecules"... --Infovarius (talk) 12:09, 12 September 2023 (UTC)
Perhaps more accurately from "class of geographical features that contain open water masses" to "class of water molecules". Peter F. Patel-Schneider (talk) 12:13, 12 September 2023 (UTC)

better support for common modelling practices in Wikidata

I've been updating the classes and some individuals related to watercraft (Q1229765).

Watercraft uses some somewhat sophisticated modelling, with several metaclasses, special-purpose properties, and intended constraints. A major problem is that there is little support for the modelling. For example, ship type (Q2235308) is metaclass for (P8225) ship (Q11446) but there is nothing that requires that instances of ship type are actually subclasses of ship. Watercraft is the disjoint union of ships, boats, and submarines but there is nothing that requires that the three classes are actually disjoint. As far as I can tell there are quite a few areas in Wikidata that use similar modelling techniques.

What can be done to better support the modelling of watercraft and other areas in Wikidata that use similar modelling techniques? For example, it would be very useful to have a generalization of the none-of constraint for classes, so that instance of (P31) links to instances of ship class (Q559026) can be converted to vessel class (P289) links. It would also be very useful to be able to find instances of ship classes even though the links use the vessel class property (which is a subproperty of instance of).

@Lydia Pintscher (WMDE) Is there is a different place for these sorts of questions? Peter F. Patel-Schneider (talk) 20:56, 19 September 2023 (UTC)

Did you have considered using ShEx? —Ismael Olea (talk) 10:19, 20 September 2023 (UTC)
What would ShEx provide to help here? I don't see how ShEx can do something like replace instance of links with vessel class links or add subclass links. I also don't see how ShEx can be used to search in Wikidata or enforce disjointness. Peter F. Patel-Schneider (talk) 12:19, 20 September 2023 (UTC)

Data Modelling Days, online gathering, November 30 - December 2, 2023

Hello all,

Following the past events dedicated to data quality and data reuse, the Wikidata team wanted to host a new gathering dedicated to data modelling.

The Data Modelling Days will take place online over three days and will host a variety of discussions, workshops and practical sessions on the topics of Wikidata ontologies, EntitySchemas, modelling issues and various other challenges.

The event is open to everyone, regardless of your experience with modelling data on Wikidata. We particularly encourage people who are working on specific topics to join the event and present their modelling challenges.

If you know people or groups who are already discussing modelling issues on Wikidata, or would have something interesting to contribute, please share this message with them!

You can find more information on the dedicated page, sign up and let us know what you are interested in, you can already propose discussions and workshops on the talk page until November 19th.

If you cannot attend, don’t worry, most sessions will be recorded, notes will be taken and slides will be shared.

We are looking forward to seeing you and learning more about your modelling challenges during the Data Modelling Days! If you have any questions, feel free to reach out to me. Best, Lea Lacroix (WMDE) (talk) 14:24, 9 October 2023 (UTC)

What are activities?

E.g., is loading (Q2516616) an instance (directly or not) of activity (Q1914636)? Or is it a subclass of activity (Q1914636), whose instances are individual occurrences at a particular place and time? I think the latter treatment makes more sense, leaving instance of (P31) activity (Q1914636) for one-time occurrences like Operation Breakthrough (Q24019). Right now, there are huge numbers of "activity" items with the former treatment: Here's a very depth-limited sample. How important is consistency of treatment for such items? Swpb (talk) 18:31, 20 October 2023 (UTC)

Topic by country

Considering our ontology, there are many group of items which fit into the scheme "ENTITY in COUNTRY X". As an example, the series "Human rights in ...". The properties assigned to these items vary significantly. For this case I have seen two main patterns:

  1. instance of (P31)human rights by country or territory (Q69358181)
  2. instance of (P31)human rights by country or territory (Q69358181) + facet of (P1269)human rights (Q8458)
  3. instance of (P31)human rights by country or territory (Q69358181) + subclass of (P279)human rights (Q8458)
  4. subclass of (P279)human rights (Q8458)

Since options 3 and 4 are clearly wrong and option 2 is redundant, I'm now spreading option 1. If you think that this is applicable also in similar cases, like "religion in ...", "agriculture in ...", could we write it down as a standard somewhere? Thanks, --Epìdosis 14:53, 29 October 2023 (UTC)

LGBT rights by country or territory (Q17898) seems well-modeled: instance of (P31)metaclass (Q19361238), subclass of (P279)aspect in a geographic region (Q74817647), is metaclass for (P8225)LGBT+ rights (Q17625913). I reproduced it in human rights by country or territory (Q69358181), transgender rights by country or territory (Q123237562), sex trafficking by country or territory (Q123237442). --Epìdosis 15:38, 29 October 2023 (UTC)

Coming up soon: Wikidata Data Modelling Days, online, November 30-December 2

 
Wikidata Data Modelling Days 2023

Hello all,

If you are regularly involved in adding, organizing or reusing data from Wikidata, you certainly encountered some questions or issues related to data modelling: how to describe and structure information in a consistent way on Wikidata. This is a big topic for the community at large, and that's why we will address it together during a 3-days online event, the Data Modelling Days, that will take place next week, on November 30th, December 1st and 2nd.

During this online gathering, we will have lots of discussions on various topics that you can discover in the program: we will talk about Entity Schemas and how they can be useful to improve data quality and consistency on Wikidata, how to model heritage, gender, references or web fiction, the challenges encountered by people reusing Wikidata's data inside and outside the Wikimedia projects, how to model data on a fresh new Wikibase instance, and many other exciting topics.

Aside from attending sessions and joining the discussions, you can also join our Data Modelling Clinic sessions, where you can bring any topic you are working on, ask questions or ask the community for feedback or help. You will find these sessions on each day in the program.

The event is taking place online on the video conference platform Jitsi, it is free, no registration needed (although you are invited to add your name to the participants list). Most sessions will be recorded in video and have collaborative notes, and we will publish a list of outcomes and next steps for each session.

We are hoping to see a lot of you at the event!

If you have any questions, feel free to ask on the talk page or directly by writing to me. Best, Lea Lacroix (WMDE) (talk) 16:01, 24 November 2023 (UTC)

new activities?

There was a lot of discussion at the Wikidata Modelling Days about improvements to the ontology. How can these potential efforts be made actual? Is there any support that WMDE can provide? Peter F. Patel-Schneider (talk) 18:24, 2 December 2023 (UTC)

Recurring problem with Q17334923

The classification gadget shows that physical location (Q17334923)      is both a subclass of some physical entities type and abstract property type. Looking at the history of the item it seems that it’s a recurring problem.

 
Screenshot of the classification gadget showing the conflict as of december 11, 2023 on the physical location (Q17334923) page

It seems from the description that it’s supposed to be about the property and not the physical place, clearly.

As it seems a recurring problem, how could we prevent this ? Why does this keep reappearing do you think ?

(the gadget uses disjoint union of (P2738)   statements to find this kind of conflicts, here the one in spatio-temporal entity (Q58415929))

author  TomT0m / talk page 14:19, 11 December 2023 (UTC)

This is not to say that your analysis is incorrect, but that disjointness on entity currently appears to be deprecated. Peter F. Patel-Schneider (talk) 14:34, 11 December 2023 (UTC)
physical location (Q17334923) is stated to be equivalent to https://schema.org/Place, which I believe is the general class for physical objects that have a fixed location. That is evidence for the physical entities reading of physical location (Q17334923). That may, of course, be an incorrect equivalence. Peter F. Patel-Schneider (talk) 15:09, 11 December 2023 (UTC)
@Peter F. Patel-Schneider From the description like "position of something in space" I’d say it was initially more intended to be like an equivalent for the location (P276)   property in the item space.
It’s a relation between a physical object and the place where it is located. It’s consistent with the fact it’s a subclass of "property" anyway.
I’m under the impression that it’s indeed used as a class for "something that has a location is an instance of this class" but it’s pretty confusing and ends ups having physical objects instance of "property" transitively. author  TomT0m / talk page 15:29, 11 December 2023 (UTC)
@TomT0m Agreed. Every time I look at these classes I end up confused. There is supposed to be some discussion at https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ontology#High_level_Geographic_classes but it appears that the discussion was moved somewhere else.
So it does look as if this is a property class - whatever these are supposed to be. Fixing the class up should involve moving the link to https://schema.org/Place to some other class, maybe place (Q98929991), and eliminating other information that suggests that this is a class of physical entities. Peter F. Patel-Schneider (talk) 16:11, 11 December 2023 (UTC)

comprehensive description of how Wikidata is supposed to work

There is a page attempting to describe how to interpret the data in Wikidata that was created in late 2022 but does not appear to have gathered much interest and does not appear to have been sigificantly updated since then. There are also Wikidata entities such as transitive Wikidata property (Q18647515), restrictive qualifier (Q61719275), disjoint union of (P2738), is metaclass for (P8225), Wikidata qualifier restricting a statement time-wise (Q115429021), class (Q16889133), and second-order class (Q24017414) that are intended to either have effects inside Wikidata or categories entities that are intended to have effects inside Wikidata.

Are there group-supported documents that comprehensively describe these aspects of Wikidata? If there are, I am interested in ensuring that they are up to date. If there aren't, I am interested in creating such documents. Is anyone else interested in participating in an effort along these lines? Peter F. Patel-Schneider (talk) 16:46, 30 December 2023 (UTC)

Return to the project page "WikiProject Ontology/Archive for 2023".