Wikidata:Requests for comment/Typing : class ⇄ instance relationship in Wikidata
An editor has requested the community to provide input on "Typing : class ⇄ instance relationship in Wikidata" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- The discussion is over since about a year so I'm closing this RFC now without a clear result. If we need further clarification or a guide line on this topic, a new RFC should be created. -- Bene* talk 11:08, 11 April 2015 (UTC)[reply]
Comment This RfC, if it's successful, is supposed to be extended with other typing features. This is a start with a relatively small question, and should be extended with questions about subclass of (P279).
Contents
Introduction edit
There is a lot of discussions and misconceptions, and open discussions around help:Basic membership properties. Their place on Wikidata also has not been discussed on this RfC space, I think we should do that one way or another as it's a very important point.
Purposes edit
instance of (P31) and subclass of (P279) are documented and commonly used here. These properties or equivalent of it exists in a wide variety of systems eg:
- Semantic web standards: rdf:type (instance of (P31)), rdfs:subClassOf (subclass of (P279))
- Thesauri: skos:broader. Has been refined to 3 varieties in ISO 25964: iso-thes:broaderGeneric (subclass of (P279)), iso-thes:broaderInstantial (instance of (P31)), iso-thes:broaderPartitive.
They are slowly gaining popularity on Wikidata. The main goal of the RfC is to define 1) a notation to define a class of things, and 2) to clarify what bot and humans should expect if an <item> is an instance of a <class item>.
An example edit
We'll start with a small example of a well-defined class in Wikidata: books and editions.
The Book class edit
We (probably) all have (physical) books in our libraries. These books are prints of a creative work written by an author. There is a Book class item in Wikidata is book (Q571) and one of its instances is Carry Me Down (Q4795).
Classes and instances edit
Let's stop here. I'll propose that we use a textual notation of the Web Ontology Language -- commonly known as OWL. The notation is similar to OWL's Manchester syntax, which makes OWL easy for humans to read and write:
The book class edit
Class: Book Individual Carry Me Down (Q4795): Types: Book
Explanations edit
Class
- keyword used to define a class. Anything that is displayed indented below is the definition of characteristics of this class. A class is a type of item which refers to a group of instances
Individual
- keyword used to describe an individual instance or thing or object. This could be linked to the Class using the instance of (P31) property or it's sub-properties.
Types
- keyword used indented below
individual
, used to define a list of classes (comma separated) which the item is an instance of. Should be linked to the class using the 'Subclass of' property.
Note In this notation, class and instance are not properties like in Wikidata, but are the basics concepts of the notation and of the classification, so they are not expressed as properties but are directly expressed into the grammar.
Properties edit
Now we know that a particular literary work has an author. This can be made in Manchester syntax by saying that the author property applies to Book class instances, and is noted:
Notation edit
Object property: author (P50) Domain: book (Q571) Object property: title (P1476) Domain: book (Q571)
Explanations edit
Object Property
- keyword used to define caracteristics of a property whose value is an item (
Datatype property
is used for properties whose value is a Wikidata type. See OWL documentation for more.) Domain
- keyword used to define on which item this property should be used. These items should be instances of one of the classes listed after the keyword.
An instance in Manchester Syntax edit
We have defined a class, and we have said that the author property applied to instances of that class. we then can extend our previous model
Individual Carry Me Down (Q4795) : Types: book (Q571) Facts: author (P50) M. J. Hyland (Q4731)
Facts
- keyword equivalent to Claims in Wikidata, followed by a list of Property/value.
Towards a class/instance relationship definition edit
These proposed textual notations are using an implicit Class/instance relationship definition. The main proposal is to link the expected properties on an item to the class of the item, based on an already existing language, and to be an introduction to this notation to Wikidata community to get comments on whether or not this is a good start - this is why there is a lot of untreated things here.
By browsing Wikidata task forces/project page, we can see that there is already things that can fit in in pages such as Help:sources or Wikidata:Chemistry_task_force/Properties, but things are presented differently, and properties are not always liked to a class. A common language could help, it is also bot readable, so if we happen to do that we could build easily bot which can do generic bots consistency checks with respect to our notations (more to follow as we extend the notation) and which will report if they see claims that do not fit in our current model. Then we could see what we could do with these claims. It also seems to me pretty human readable.
One other advantage : discussions on property proposals are sometimes tedious, with this we could adopt a whole domain specific model at once without voting for all and every properties.
If there is only one thing to get from the RfC, it's the idea associated to the Domain keyword which I did not actually explicitly saw in Wikidata very often.
Propositions of this RfC edit
- Adopt (in principle, there is other things to discuss) this partial definition for of expected properties in a class/instance relationships formalisation (even if you do not like the notation)
- using the proposed notation as a base to define, discuss and document classes, and to show example instances.
- This would have many advantages : unify notations, ease writing of consitency check for bots, writing documentation page related to how we do things on Wikidata in a consistent and standardised ways, building interfaces easily by proposing a model and build a javascript interface from this model, be compatible with other already existing tools ...
- later extend this system to adopt other aspects of the Manchester syntax.
A question currently opened in Wikidata (out of scope) edit
this point is worth mentioning but is supposed to be discussed in another subsequent RfC. Please just comment the modeling and instance creation process. An ongoing discussion on Help talk:Basic membership properties is on the nature of instances. Emw argues that in the web semantic World, an instance can only be a physical object, and classes are sets of physical objects. In our example this would imply that our Book is not a book, it is a class, and its instances would be one of the physical objects in libraries.
- But the text in a book has a single copyright so the text of a book can be an instance of (P31) a novel or a play, a textbook etc.
- Remember that the wikipedia article is not about any individual physical copy of the book. It is about the text, the content, common to all the various physical books.
- There are a few articles which are about particular copies of a book, such as Giant Bible of Mainz (Q1885752), and these are instance of (P31) book. Filceolaire (talk) 15:51, 24 July 2013 (UTC)[reply]
- Yeah, it's a problem we have to address, we should not have this kind of ambiguities. TomT0m (talk) 16:19, 24 July 2013 (UTC)[reply]
- I don't think there is any ambiguity. A brief glance at any of the linked wikipedia articles is enough to tell if a Wikidata item refers to an individual copy of a book or to the content of a book. I can conceive of a case where there is only one copy of a book so the wikipedia article is about both the physical book and about it's content but even in that case I am sure it would be clear that the Wikidata item refers to both of these concepts. i doubt there would be ambiguity. Filceolaire (talk) 16:50, 24 July 2013 (UTC)[reply]
- Yeah, it's a problem we have to address, we should not have this kind of ambiguities. TomT0m (talk) 16:19, 24 July 2013 (UTC)[reply]
(comments deplaced in this relevant part)
- If there are thousands of copies of Carry Me Down (Q4795) on shelves all over the place doesn't that mean it is a subclass of book (rather than an individual book)? Should we rather say it is an individual novel? Filceolaire (talk) 22:12, 25 July 2013 (UTC)[reply]
- Carry Me Down (Q4795) is not an individual book. It is thousands of physical paper books. Those books have a printer, not an author. It is however an individual novel and the novel has an author. Filceolaire (talk) 21:53, 25 July 2013 (UTC)[reply]
- I took a real model to make my example, this is the way it's currently say to model into Wikidata in Help:sources. It's probably not perfect, but it's not what we should discuss for now :).
- Not perfect is an understatement. Altogether ambiguous would be better. An individual physical copy of a book on a shelf is an instance of the particular printing & distribution format, designated by a specific ISBN --different for paper, hardcover, etc.; which is in turn an instance of the particular edition (2nd, 3rd, etc.) designated by a separate LCCN and OCLC number, which in turn is a particular version (print, recording ), with a standardized material-type designator, of a version ( in a particular language of a Work (with a standardized Uniform Title), . These are all separate levels, and the degree to which they need to be specified varies. (For illustration, an example of a Work is Aristotle's Poetics. The version in Project Gutenberg is an English translation by Butcher, edition (as usual with them) unspecified, but probably 1895, in a transcription by An Anonymous Volunteer, and David Widger, The actual text of the Gutenberg edition is yet another problem, because it is subject to continuous updating. 108.27.99.125 21:04, 2 August 2013 (UTC)[reply]
- I took a real model to make my example, this is the way it's currently say to model into Wikidata in Help:sources. It's probably not perfect, but it's not what we should discuss for now :).
- Take this example The Tale of Peter Rabbit. The article discusses, or could potentially discuss, an instance of a creative work (specifically the original book written with the author's hand), subclasses of the original work (the derivative works in other languages, editions), and instances of the work (each printed copy created by the printer), any printed instance of a book can become notable (e.g. one copy which was signed by Beatrix Potter, and sold for a lot of money). I think the root of problems lies with basing a database around wikipedia articles, in some cases it gets messy. For example, many school related articles generally describe both a building and an organization. Should these not be at least 2 separate items? Danrok (talk) 15:52, 1 September 2013 (UTC)[reply]
Talks & votes edit
discussions edit
- Would it be possible to rewrite the above using the existing wikidata property names e.g. instance of (P31) instead of "Type" etc. Filceolaire (talk) 15:51, 24 July 2013 (UTC)[reply]
- Yep, we could, and have a translator do the work to transform into this formulation. I'll try to add definitions and mapping to Wikidata glossary, but not today. TomT0m (talk) 16:21, 24 July 2013 (UTC)[reply]
- Thank you TomT0m. I'm sorry if I seem negative above but I worry that this proposal is a bit divorced from the practicalities of how wikidata works. Where there is a difference it may well be appropriate to change the wikidata practice to better align with semantic web customs but these changes need to be spelled out.
- No problem, it's a work in progress we are building together (and it's not for nothing I asked a is it clear enough question /o\)
- I don't think we should modify Manchester syntax. There's an impedance between Manchester syntax and our in-house vocabulary, but the benefits of adhering to the standard Manchester syntax are greater than then costs of tailoring it to align with our in-house vocabulary. The benefit of using a standard is that it's the same across projects, and has tutorials, usage examples, solutions for common problems etc. available across the web. The syntax itself requires a degree of effort to learn, and adding a tiny bit more effort to map our vocabulary to the standard seems reasonable. Emw (talk) 01:43, 29 July 2013 (UTC)[reply]
- No problem, it's a work in progress we are building together (and it's not for nothing I asked a is it clear enough question /o\)
- Personally I think the concept of 'class' will become important in stage 3 - queries - and it probably does make sense to codify what a class is now however you have not made the case for doing this in your writeup above. I hope I can help draft this. Filceolaire (talk) 22:02, 25 July 2013 (UTC)[reply]
- It's already important, it's an implicit part of what we are doing, we are defining class all other the project informally, without saying it always explicitely, which tend to make the retrieving of information possible for a human, but tedious. I propose to make this a little more explicit here. TomT0m (talk) 10:48, 26 July 2013 (UTC)[reply]
- Thank you TomT0m. I'm sorry if I seem negative above but I worry that this proposal is a bit divorced from the practicalities of how wikidata works. Where there is a difference it may well be appropriate to change the wikidata practice to better align with semantic web customs but these changes need to be spelled out.
- Yep, we could, and have a translator do the work to transform into this formulation. I'll try to add definitions and mapping to Wikidata glossary, but not today. TomT0m (talk) 16:21, 24 July 2013 (UTC)[reply]
Comment (Please feel free to move this comment to any applicable section if another one fits better.) I feel obliged to comment on this, given that I have co-developed (1) the Wikidata datamodel, (2) the OWL ontology language, and (3) the main tool currently used to translate one into the other. ;-)
First, I must say that I do not fully understand the proposal. When you propose to "use" Manchester syntax (or something very similar), where do you expect this to be used? By users editing Wikidata in string-type property values? That seems to be a bit cumbersome and not very user-friendly. Frankly I don't think this has any chance of adoption (this seems to be the point made in some comments above already). Also from a technical perspective, it needs some work to do a proper syntax check on such free-text inputs, and of course one would want auto completion and similar features too; without all this working very well, any free-text input will lead to many errors/typos. So I am sceptical that this form of UI could work in Wikidata, even if we accept that Manchester Syntax is easy for ("normal") people to use.
Second, there is a deeper semantic question. I am happy with OWL-style "instance of". It is well understood and agrees with what most languages that support classification do (essentially it just means: "every class denotes a set of things; every entity denotes an element; an entity E is an instance of a class C, if the element that E denotes is contained in the set that C denotes"). I think most people would agree with this notion. Now OWL does not apply directly to Wikidata (since Wikidata encodes information in a form that is not the same as OWL; we can translate it but in more than one way), so one needs to be more clear what we mean here by applying OWL axioms on Wikidata. However, I am sure that this can be done, as long as the axioms we use are only the simple ones in this RfC.
Note that the original data model of Wikidata contains special statement types for instance of and subclass of (rather than leaving it to the community to create properties called like this). These would have been the Wikidata representation of what you suggest here, I think. To my regret, these features were dropped due to lack of resources for implementing them. The current solution where we use two properties to capture this information may work just as well (if stable), but we need to ask what it means for instance of/subclass of to have qualifiers. This is something that cannot happen when using OWL, and it is not immediate what this means (this would need another discussion to solve; I just point it out here). This is also a reason why the RDF export script does not export instance of/subclass of using special OWL constructs.
Finally, regarding the distinction of classes: I don't think that it is necessary to have a special way to mark things that are a class. In general, we can consider something to be a class if it is used like one (that is: as the value of a subclass/instance property). Even in OWL, we do not require classes to be distinguished from individuals. The semantic solution to this (avoiding Russel's paradox) is to say that every entity can be considered as a class that denotes a set (which may contain other elements) and also as an individual that denotes an element (that could be contained in some class); we do not say that this rather unspecified "element" is the set. So if I would say in OWL that "Class A is an instance of class A" this means "The element denoted by class A is contained in the set denoted by class A". In particular, I don't claim that class A denotes a set which contains itself. OWL 2 still requires classes to be declared to help parsing, but one can always tell from the context if an entity is used "as a class" or "as an individual", so this is not really essential. I think that a similar approach could work for Wikidata. Of course, it would still be useful to have services that display, say, a class hierarchy based on Wikidata's subclass of, but this does not require a special class declaration.
--Markus Krötzsch (talk) 18:20, 20 September 2013 (UTC)[reply]
- Thanks for your comment Markus. It's very helpful to have input from experts on things like this. Replying to a few of your points:
- "First, I must say that I do not fully understand the proposal. When you propose to "use" Manchester syntax (or something very similar), where do you expect this to be used?"
- Mostly in Talk pages and the 'Wikidata' namespace. The 'Propositions of this RFC' section outlines things in a bit more detail. I don't think anyone is proposing we scrap the current UI of the Q (item) namespace and require human end-users to navigate through Manchester syntax to read and write claims. The goal is to agree upon a format for discussing data models. In suggesting we strictly adhere to Manchester syntax and not come up with a home-brew Wikidata syntax, my goal is to reduce friction between Wikidata and people from the rest of the Semantic Web who might want to talk about our data models.
- "we need to ask what it means for instance of/subclass of to have qualifiers. This is something that cannot happen when using OWL..."
- I think instance of and subclass of having qualifiers means the same thing as other properties having qualifiers. 'instance of' and 'subclass of' are "properties" in Wikidata and, as stated in rdf:type and rdfs:subClassOf, RDF/RDFS -- and thus OWL. So if it isn't possible to have qualifiers on instance of/subclass of in OWL, then how is it possible to have qualifiers on other properties?
- Reply: RDF, RDFS, and OWL do not have any notion of "qualifier". So if we say that it is "the same thing as other properties having qualifiers" we still have not explained what we mean by this. One can model qualifiers in RDF and OWL using reification (not necessarily using the RDF reification vocabulary), quads, or named graphs (there is discussion on this elsewhere). Our current approach uses a form of reification. This involves replacing the Wikidata property by several other properties of related URIs. However, there is only one "instance of", so it is not clear how this replacement would be compatible with this. --Markus Krötzsch (talk) 17:27, 8 October 2013 (UTC)[reply]
- "Finally, regarding the distinction of classes: I don't think that it is necessary to have a special way to mark things that are a class"
- I disagree. This seems to imply that Wikidata should delete 'subclass of' and rename 'instance of' to 'is a'. Without two different properties, how can it be determined whether an item at the bottom of a type hierarchy is an instance or a class?
- For example, consider hydrogen (Q556). Assume that item is not the value of any 'is a' claims, but it has an 'is a' chemical element (Q11344) claim. So hydrogen is an element. However, because hydrogen represents the type of many individual things (i.e., individual hydrogen atoms), it is a class, not an instance. By collapsing 'instance of' and 'subclass of' into 'is a', there's no way to specify that important fact about hydrogen (Q556). This use case applies to many items on Wikidata.
- No. An item can have two semantics : hydrogen is a chemical element. It means that evety atoms can be either hydrogen or not, ie. they belongs to the set of this type of chemical elements or not. In this view we must use instance of to classify a particular atom in the set of hydrogen elements. Over view : hydrogen is a concept used by humans to classify atoms. In this view hydrogen is an instance of concepts used by humans to classify atoms. But what is the set of hydrogens atoms to the set of all atoms ? it is sure that an instance of hydrogen is also an instance of atom. So hydrogen is also a subclass of atoms ... The keypoint is that the instance of an instance of something has no (universal) meaning. TomT0m (talk) 09:46, 6 October 2013 (UTC)[reply]
- Reply: TomTOM is right. "Subclass of" and "instance of" describe different relationships, and we need to distinguish these. I was not suggesting to blur this, and indeed "is a" is notorious for causing this confusion (hence should be avoided). What I meant is that we do not need to use different entities for classes and individuals, since it is clear from the context what we mean. For a detailed discussion of this and related issues, I recommend Motik's 2007 paper On the Properties of Metamodeling in OWL. --Markus Krötzsch (talk) 17:27, 8 October 2013 (UTC)[reply]
- "Even in OWL, we do not require classes to be distinguished from individuals."
- To my understanding of the W3C recommendation's description of the three sublanguages of OWL, that's true in OWL Full but not OWL DL. This is actually the only example the recommendation gives for how OWL DL is different than OWL Full: it says that in OWL DL "while a class may be a subclass of many classes, a class cannot be an instance of another class". Since the recommendation implies that OWL Full is neither computationally complete nor decidable, it seems to me that metamodeling is not an option if we want Wikidata to be usable with semantic reasoning software.
- Of the freely licensed semantic reasoners listed in Semantic reasoner, do any support metamodeling?
- Reply: You are referring to an outdated version of the standard here. Meta-modelling capabilities have been improved in the current version of OWL, which is the recommendation since 2009. There is also a paragraph on this in the new features document. I think all major OWL reasoners support this, and many have even before this was a standard. --Markus Krötzsch (talk) 17:27, 8 October 2013 (UTC)[reply]
- Thanks for the link, I was looking for something like that :) TomT0m (talk) 18:46, 8 October 2013 (UTC)[reply]
- Reply: You are referring to an outdated version of the standard here. Meta-modelling capabilities have been improved in the current version of OWL, which is the recommendation since 2009. There is also a paragraph on this in the new features document. I think all major OWL reasoners support this, and many have even before this was a standard. --Markus Krötzsch (talk) 17:27, 8 October 2013 (UTC)[reply]
- These are fundamental issues that have been brewing unresolved in Wikidata for at least the better part of a year. Greater clarity on them would be quite useful. Emw (talk) 22:25, 5 October 2013 (UTC)[reply]
- comments on syntax
- comments from Ficeolaire moved replied here (plus the formulation one deleted) :
- Can we change 'Individual' to 'Instance'? Can we change 'Type' to 'subclass'? This would align the vocabulary with what is used on Wikidata. I have expanded the definitions. Can you check these align with what you mean? Filceolaire (talk) 21:34, 25 July 2013 (UTC)[reply]
- Yes we can, I personnally would not mind, I just kept the model close to Manchester Syntax for now.
-
- Can we change 'Object Property' to 'Item Property' to it aligns with the wikidata vocabulary? Or else change our vocabulary so we refer to wikidata pages as referring to 'objects' instead of 'items'? Filceolaire (talk) 21:53, 25 July 2013 (UTC)[reply]
- I don't think we should modify Manchester syntax. The benefit of using an effective standard like Manchester syntax is that the terminology of the syntax itself is the same across projects -- whether it be Wikidata or some other Semantic Web project. And for better or worse, I think it's very unlikely that the community will agree to change 'item' references to say 'object'. This implies some impedance between Manchester syntax and our vocabulary, but I think it would be better to educate contributors about Manchester syntax than to tailor the syntax itself to align with our in-house vocabulary. Emw (talk) 01:33, 29 July 2013 (UTC)[reply]
- I don't think this is a real problem, if this is just a keyword substitution it's trivial, and as Wikidata is an important project other Web semantic projects could even implement Wikidata syntax. There is one thing I don't know how to handle yet for examples : qualifiers. It seems impossible to adopt a syntax in Wikidata as is if it's not possible to easily take qualifiers into account, so we might have to make our own extension. TomT0m (talk) 16:38, 29 July 2013 (UTC)[reply]
- I think it would be a problem. I look at it this way: if contributors are interested in data modeling, then they will eventually need to get acquainted with the syntaxes used to express RDF and OWL. Wikidata developers have committed to exporting data in RDF/XML, which is a W3C standard. Such built-in support for OWL isn't likely in the near future, but nevertheless our data modeling efforts would be best served by being expressible as OWL. Basing our syntax strictly off Manchester syntax would allow us to export our OWL data models as RDF/XML files.
- Using a tailor-made Wikidata syntax different from Manchester syntax would increase friction between Wikidata and the Semantic Web on multiple levels. First, it would require an in-house program that converts between our home-brew Wikidata syntax and the standard formats. Second, it would make it more difficult for Wikidata contributors to find relevant data modeling examples, existing bodies of Q&A, and other resources from the wider Semantic Web, and make it more difficult for contributors from the wider Semantic Web (which uses Manchester syntax) to determine what our data models mean in a language the external world understands.
- The bigger problem, though, is statements like "other Semantic Web projects could even implement Wikidata syntax". Wikidata is an important project but there are plenty of other important Semantic Web projects, for example ChEBI, the NCI Thesaurus, UMBEL, SUMO, etc. They all communicate with standard OWL formats. Tailoring the Manchester syntax itself to our in-house vocabulary goes against the grain of the Semantic Web, and would make us less interoperable with the outside world.
- So I don't think trivial word substitutions like 'Individual' -> 'Instance', 'Type' -> 'Subclass', 'Object property' -> 'Instance property' would be a good idea. That said, I think qualifiers might be something that we need to consider extending Manchester syntax for -- but only if there's no way to support qualifiers with standard Manchester syntax. I'd be interested to know how qualifiers will be exported to RDF/XML -- that should indicate how to express qualifiers in Manchester syntax. Emw (talk) 03:18, 31 July 2013 (UTC)[reply]
- You're talking of Wikidata's user versus Web semantic professionals. I think it's best to have a model which maps better Wikidata concepts to help Wikidata user and let semantic Web pros manage the complexity of discrepancy beetween both models. I think you did not realize that syntax is probably the least of their concerns, standardized or not. TomT0m (talk) 08:58, 31 July 2013 (UTC)[reply]
- @EmW to aswer your question about how Wikidata model is translated into Semantic Web standards, see the ontology of the RDF export. To express qualification, statement are reified (that is there is an item for each statement) and triples are created using this reified statement to qualify or source it. I don't think it's a good idea to follow this path as it's quite different from Wikidata model. TomT0m (talk) 13:02, 10 August 2013 (UTC)[reply]
- I don't think this is a real problem, if this is just a keyword substitution it's trivial, and as Wikidata is an important project other Web semantic projects could even implement Wikidata syntax. There is one thing I don't know how to handle yet for examples : qualifiers. It seems impossible to adopt a syntax in Wikidata as is if it's not possible to easily take qualifiers into account, so we might have to make our own extension. TomT0m (talk) 16:38, 29 July 2013 (UTC)[reply]
- I don't think we should modify Manchester syntax. The benefit of using an effective standard like Manchester syntax is that the terminology of the syntax itself is the same across projects -- whether it be Wikidata or some other Semantic Web project. And for better or worse, I think it's very unlikely that the community will agree to change 'item' references to say 'object'. This implies some impedance between Manchester syntax and our vocabulary, but I think it would be better to educate contributors about Manchester syntax than to tailor the syntax itself to align with our in-house vocabulary. Emw (talk) 01:33, 29 July 2013 (UTC)[reply]
- Can we change 'Object Property' to 'Item Property' to it aligns with the wikidata vocabulary? Or else change our vocabulary so we refer to wikidata pages as referring to 'objects' instead of 'items'? Filceolaire (talk) 21:53, 25 July 2013 (UTC)[reply]
- comments on the purposes and objectives
- Why do we need a formal language for defining classes when, till now, Wikidata has managed almost without a 'class' concept?
- till now, Wikidata has done consistency check with something like 'if there is this property, then there should also be this property', and on constraints on what an item value of a property should countains. This is ad-oc type checking and kind of weak, I don't know if as for now we can say something like the value item should be of that type. Another problem in my mind is that the information is distributed in constraints in discussion pages of properties, and is tedious to check by itself. A formal model could compress this information on a project and a lot of domain specific models, be consistently presented and beeing a lot faster to get for a human. One other point, there is tools from the semantic web world who already exists to do things, getting closer to them could open a door for writing consistency checking bots far more advanced than what we have now. And last : I just presented a small part of manchester syntax. There is other good things and knowledge encoded in this syntax and the tool associated we will not have to discover the hard way by making our own mistakes if we take it as a support. TomT0m (talk) 10:10, 26 July 2013 (UTC)[reply]
- One other thing : it has been asked for. When I talked about Help:modeling on Chemistry project, where they were building a model about biological chemistry and asked if someone could move it on Help:modeling when this would be done, someone replied Good idea, is there some kind of formal language ?. I thought before that I would rely on examples as it would be easier to understand as a start. Emw has been an advocate of this from the start. There is also a lot of confusion on discussions, there is comment like This sounds complicated but interesting, where can I read about that in project chat. An unambiguous and always the same language to discuss should help, associated with a detailed and good Help page. TomT0m (talk) 10:29, 26 July 2013 (UTC)[reply]
- I strongly agree that classes are a useful thing to have. The observation that we do not (strongly) use a certain feature (yet) is hardly an argument against it, given that Wikidata is still in its early days. On the other hand, whoever posted this is right that we need to provide clear motives for all extensions. For me, the main reason for a class hierarchy is to reduce redundancy in the data (and thus errors and maintenance effort). "Instance of" has clearly a very important role since the decision to remove "main type (GND)". "Subclass of" can be used to avoid having to state "instance of" for all applicable classes if most of them are just more general than the ones that we already have (e.g., if we know that something is a "solar planet", we do not need to state that it is a "planet"). However, while these basic features are useful (and well used already), it needs more motivation indeed why we need a special input method for this. --Markus Krötzsch (talk) 18:29, 20 September 2013 (UTC)[reply]
Confusing edit
I find this distinction (or lack therof) between instance of (P31) and subclass of (P279) really confusing. Let me point out a few examples:
- The Catcher in the Rye (Q183883) is instance of (P31) book (Q571). In this case it is the creative work of J. D. Salinger (Q79904) that is the instance, not the copy that I have in my shelf.
- P-15 Termit (Q177218) on the other hand is subclass of (P279) anti-ship missile (Q643532). In this case the creative work of MKB Raduga (Q2382556) is not an instance. Instead the specimen that struck and sunk HMS Zealous (Q1032114) is an instance even though it is no more distinguishable from its peers than my copy of Catcher in the Rye.
I can see no reason whatsoever that one is an instance and the other one is a subclass. Both of them are produced in numberous indistinguishable copies. /ℇsquilo 09:15, 11 March 2014 (UTC)[reply]
FRBR and Metaobject Protocol edit
The library community has analyzed this for Creative Works 10-15 years ago in FRBR. They distinguish 4 levels: Work, Expression, Manifestation, Item (WEMI).
- Competing ontologies (eg BibFrame, schema.org "BibExtend" classes) collapse some of the levels
- Each of the levels have their specific properties, in namespaces rdaw, rdae, rdam, rdai (which look on prefix.cc) and rdau which is an unattached namespace (i.e. no domains are declared)
- The props applicable to several levels (eg title, creator) are distinguished by name, eg uniformTitle vs title; and xxxOfManifestation
These are specific solutions in the specific Creative Works domain. We have many more cases in Wikipedia, eg
- see Wikidata:Properties_for_deletion#number_of_platforms_.28P1103.29 and the troubles of using a class as a value <Copenhagen Central Station (Q1495052) has part(s) (P527) railway platform (Q325358)>
- @Esquilo: gives a very good example with Books and Rockets. But I don't understand by which criteria Catcher in the Rye (the Work) is an instance of Book, while the Termit rocket (its design) is not an instance of Rocket!
So we'd need to make some very general yet clear decisions in Wikidata. I think we'd need to go the class/metaclass/metametaclass/... way (see en:The_Art_of_the_Metaobject_Protocol), because presumably, anything can be both a class and an individual. --Vladimir Alexiev (talk) 16:31, 19 December 2014 (UTC)[reply]
Question PS: the class hierarchy is a horrible mess because unlike property creation, there's no curation of instance of (P31) and subclass of (P279) whatsoever. Would this be the page to discuss this, or a separate RFC? How do I start one? Vladimir Alexiev (talk) 16:31, 19 December 2014 (UTC)[reply]
votes edit
- Question 1 Is the previous text clear enough ?
- No. You use a whole new set of properties - Domain, Type, Facts - which seem to sort of relate to things in Wikidata. Could you rewrite using the terms in the wikidata glossary and the existing wikidata properties? Filceolaire (talk) 16:00, 24 July 2013 (UTC)[reply]
- The terms used in the examples are not new properties -- they're vocabulary from Manchester syntax, a human-friendly way to express formal OWL concepts. Manchester syntax is probably our best bet for communicating data models between us humans, since its the most easily human readable and writable OWL format used by the Semantic Web community. This was discussed a few weeks ago on Project chat: the relevant thread is archived at Wikidata:Project_chat/Archive/2013/07#Turtle_vs._Manchester_syntax, and some background discussion is above that. Emw (talk) 01:06, 29 July 2013 (UTC)[reply]
- No, I did not understand it. Byrial (talk) 18:10, 24 July 2013 (UTC)[reply]
- Question 2 If yes, do you agree with the definitions ?
- What definitions? What is the difference between Class, Domain, Type and Instance of? These all seem (to me) to be the same thing. Filceolaire (talk) 16:00, 24 July 2013 (UTC)[reply]
- Question 3 Do you agree with the usage of the defined and used Manchester syntax notation variant to discuss things with the proposed definitions ? If not, why (too complicated, unclear, confusing, I would prefer something closer to what we use informally today as what is used in Help:Sources or tables, we already do that ...)
- The descriptions seem to be describing another project. The terms used by you seem to be different from the terms used on the Wikidata project. Please rewrite or at least provide a translation into the corresponding wikidata terms. Filceolaire (talk) 16:00, 24 July 2013 (UTC)[reply]