Wikidata:Requests for comment/Migrating away from GND main type
An editor has requested the community to provide input on "Migrating away from GND main type" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- I feel this RfC has gone on long enough to gather consensus over a majority of the move and it has gathered such consensus on the controversial part of the move. A few points are worth mentioning:
- All statements with term can simply be deleted and subst with instance of when appropriate.
- Human beings are to be classified with instance of:Human (Q5). Fictional characters such as Han Solo or Mickey Mouse (used as examples below) should now be classified as fictional characters using the instance of property. Similar with Gods, they should now be classified as deity as opposed to the current use of person.
- No consensus has gathered over all other existing uses of GND and their transfer. Existing uses should be able to be migrated over to instance of using their current P107 values.
- I feel this RfC has gone on long enough to gather consensus over a majority of the move and it has gathered such consensus on the controversial part of the move. A few points are worth mentioning:
- John F. Lewis (talk) 21:30, 1 October 2013 (UTC)[reply]
- de: Es wurde entschieden die Eigenschaft P107 (P107) nicht mehr zu verwenden. Diese Diskussion soll dabei helfen einen Teil der Aussagen auf andere Attribute zu übertragen.
- en: It was decided by the Wikidata community that the usage of P107 (P107) should be abandoned. This discussion aims to transfer some of the statements to other properties.
- nl: De Wikidata-gemeenschap heeft besloten dat gebruik van P107 (P107) moet worden beëindigd. De onderstaande discussie richt zich op de werkwijze van de overdracht van de beweringen naar andere eigenschappen.
Contents
- 1 Transfer checklist
- 2 Order of transfer
- 3 Types
- 4 Possible solution
- 5 Is this RfC valid?
- 6 Visualization of subclass of (P279) relationship
- 7 High classification
- 8 What are the specifications of an ontology for wikidata ?
- 9 Available ontologies
- Inform all language project chats about this discussion Not done
- Discuss and propose which statements should be transferred Not done
- Inform all language project chats about the proposals Not done
- Vote on the proposed statements to be transferred Not done
- Bot transfers the approved statements. All other statements are deleted Not done
- Update Database constaint on pages of space «Property talk:» for constraint violations reports Not done
Checklist discussion
edit- I think a check list will be a good thing for such a large task. --Tobias1984 (talk) 14:28, 18 August 2013 (UTC)[reply]
- In occitan wikipedia we have an automatic infobox that use poperty P107 in 37063 articles. Please notice us about the proposals regularly.
Boulaur (talk) 16:18, 15 September 2013 (UTC)[reply]
- For items which use either subclass or instance of, completely remove GND main type.
- For items which use neither subclass nor instance of, but which do use a specialized type property, completely remove GND main type.
- Other items to be discussed below.
Order discussion
edit- We need to collect each of the specialized type properties. --Izno (talk) 17:27, 18 August 2013 (UTC)[reply]
- Think I got most of them. Others are obliged to add ones I missed. --Izno (talk) 01:05, 20 August 2013 (UTC)[reply]
- Question I don't understand, the list below is not a list of types, it's a list of properties (which most probably implies that an item with one of those properties has some kind of type, is this what you meant ?)... TomT0m (talk) 14:11, 22 August 2013 (UTC)[reply]
- Yes, that would be a correct interpretation. Amended. :) --Izno (talk) 16:03, 22 August 2013 (UTC)[reply]
- Question I don't understand, the list below is not a list of types, it's a list of properties (which most probably implies that an item with one of those properties has some kind of type, is this what you meant ?)... TomT0m (talk) 14:11, 22 August 2013 (UTC)[reply]
- Think I got most of them. Others are obliged to add ones I missed. --Izno (talk) 01:05, 20 August 2013 (UTC)[reply]
List of specialized type properties
edit- occupation (P106) - by technicality
- parent taxon (P171)
- vessel class (P289)
- P173 (P173)
- P273 (P273)
- P75 (P75)
- P76 (P76)
- P77 (P77)
- P70 (P70)
- P71 (P71)
- P74 (P74)
- P89 (P89)
- taxon rank (P105)
- P132 (P132)
- P168 (P168)
- P202 (P202)
- P60 (P60)
Relevant RFC
editWhether we should use many domain-specific type properties listed above or only a few generic type properties (instance of (P31) and subclass of (P279)) is under discussion at the Many or few type properties RFC. Input there is needed! Emw (talk) 02:34, 10 September 2013 (UTC)[reply]
value | use | problem | proposal | quantity | status |
---|---|---|---|---|---|
person (Q215627) | identify people (useful for various things) | conflates people, groups of people, and minimally fictional characters | use instance of (P31). If something is needed for groups of people, we can probably create "instance of group of people or the proposed "instances of" property. We might also consider creating a new item "human person", because "person" carries with it the intent (at least in English) that a person may not necessarily be human. | 1,270,164 | Not done |
organization (Q43229) | might call for some pre-defined properties like "creation date" or "headed by" | these properties should also be used for administrative units that are geographical feature (Q618123) | 82,813 | Not done | |
event (Q1656682) | identify items that should use organizer (P664) | unintuitive English label (fixable) | could replicate the "use P664" constraint using instance + subclass chain. | 3,690 | Not done |
work (Q386724) | identify items that can be used as sources ? | does not differentiate works from editions, making it hard to use with the recommended source format | This could use instance of (P31) and subclass of (P279). | 303,564 | Not done |
term (Q1969448) | no real use, as it covers essentially anything that does not belong elsewhere | Should probably just be deleted | 257,592 | Not done | |
geographical feature (Q618123) | identify geolocalisable items ? | Many other things are localizable. It mixes administrative units and galaxies | 1,701,341 | Not done | |
Q11651459 | identify disambiguation items | Uses the obscurely labelled Q11651459, seems to stetch the original GND meaning | use instance of (P31) with Wikimedia disambiguation page (Q4167410) | 610,697 | Not done |
Some initial thoughts by User:Zolo from Help talk:High-level classification, with some amendments by User:Izno.
Person
editPerson discussion
edit- I think we should, in whatever things we do to replace the use of "person", distinguish between a "human person" and a "person". The former is a subclass of the two items human (Q5) and person (Q215627). --Izno (talk) 17:31, 18 August 2013 (UTC)[reply]
- So you're basically suggesting to use instance of (P31) => human (Q5) when we have P107 (P107) => person (Q215627)? (Because I don't feel I got it right) --Sannita - not just another it.wiki sysop 13:32, 19 August 2013 (UTC)[reply]
- First we need to find as many non-human persons and replace P107 (P107)=person (Q215627) on these pages with instance of (P31)=<whatever they are an instance of>. Once that is done we can send a bot round to convert the rest. Anyone got an idea how we can find the non-humans?
- Alternatively we can just delete all the P107 (P107) properties and start from scratch creating 'instance of (P31)' claims based on Categories. Filceolaire (talk) 20:33, 19 August 2013 (UTC)[reply]
- I would imagine the bot run for this would rely on categories ("Living people" and "Dead people"?) to remove the use of P107 and switch it to the new item I'm proposing. The rest we can have the bot spit out to a maintenance list of some sort, for those cases where there aren't already a P31, P279, or a specialist type claim already. I would imagine that list to be rather short, but maybe I'm dreaming. :) If it's a long list, we might find commonalities in the remaining items as well that we could group up also by category. --Izno (talk) 22:27, 19 August 2013 (UTC)[reply]
- So you're basically suggesting to use instance of (P31) => human (Q5) when we have P107 (P107) => person (Q215627)? (Because I don't feel I got it right) --Sannita - not just another it.wiki sysop 13:32, 19 August 2013 (UTC)[reply]
P31 value for things like Coco Chanel
edit- Sannita: No, my proposal would be to create a new item called "human person". We would use that as a instance of (P31) claim for all human people currently. The new item would be a subclass of (P279) person (Q215627) and human (Q5). --Izno (talk) 22:27, 19 August 2013 (UTC)[reply]
- I'm can't see a difference between "human" and "human person" except that 'human person doesn't have any sitelinks. Personally I would put 'instance of' 'human' for human people and make 'human' a subclass of 'person' and part of family 'hominidae'. Filceolaire (talk) 22:52, 20 August 2013 (UTC)[reply]
The definition of a person, at least in the en.wikipedia article, is "a being, such as a human, that has certain capacities or attributes constituting personhood". It does not follow from this that a human is always a person. In fact, I would say that humans are not always people, see e.g. fetuses (depending on who I want to piss off :). It also makes the hierarchy clean for a high level concept, which I would expect that "person" is?
That aside, it's (definitely) also a subclass of "homo" (or has the parent taxon homo; however you want to phrase it). --Izno (talk) 01:26, 21 August 2013 (UTC)[reply]
- Are you sure it's not homo that would be a subclass of person ? TomT0m (talk) 16:46, 24 August 2013 (UTC)[reply]
- For the reasons that I gave already, no. (Or are you replying to Filceolaire?) --Izno (talk) 16:57, 24 August 2013 (UTC)[reply]
- I'm somewhat wary of classifying all humans as "human person". On a superficial level, "human person" is not the conventional name for that term -- natural person (Q154954) is. On a deeper level, since the definition of person (Q215627) is so ambiguous, I think it would be better to use a more precisely defined concept, e.g. simply human (Q5). Yes, there are ambiguities in precisely what defines "human" (see species problem), but I think ambiguous instances of human are far rarer than ambiguous instances of person. Emw (talk) 00:32, 5 September 2013 (UTC)[reply]
- Any reply to the above concerns? Whether we claim instance of (P31) Q14870023 or instance of (P31) human (Q5) seems independent of how we classify fictional entities. It seems 1 editor (Izno) prefers the former, while at least 3 editors (Filceolaire, Danrok and me) prefer the latter. Emw (talk) 01:34, 10 September 2013 (UTC)[reply]
- I'm somewhat wary of classifying all humans as "human person". On a superficial level, "human person" is not the conventional name for that term -- natural person (Q154954) is. On a deeper level, since the definition of person (Q215627) is so ambiguous, I think it would be better to use a more precisely defined concept, e.g. simply human (Q5). Yes, there are ambiguities in precisely what defines "human" (see species problem), but I think ambiguous instances of human are far rarer than ambiguous instances of person. Emw (talk) 00:32, 5 September 2013 (UTC)[reply]
- For the reasons that I gave already, no. (Or are you replying to Filceolaire?) --Izno (talk) 16:57, 24 August 2013 (UTC)[reply]
- Are you sure it's not homo that would be a subclass of person ? TomT0m (talk) 16:46, 24 August 2013 (UTC)[reply]
- I'm can't see a difference between "human" and "human person" except that 'human person doesn't have any sitelinks. Personally I would put 'instance of' 'human' for human people and make 'human' a subclass of 'person' and part of family 'hominidae'. Filceolaire (talk) 22:52, 20 August 2013 (UTC)[reply]
- Sannita: No, my proposal would be to create a new item called "human person". We would use that as a instance of (P31) claim for all human people currently. The new item would be a subclass of (P279) person (Q215627) and human (Q5). --Izno (talk) 22:27, 19 August 2013 (UTC)[reply]
De-indent: I obviously prefer 'human person', with a second preference to 'human', and absolutely not to 'natural person' or 'person' (alone) per the unresolved discussion on fiction below. On an aside, I'm not sure Danrok was commenting on my thought of human person or not. *shrug*
However, assume we go with 'human' as opposed to 'human person'. Is human a subclass of person then? Otherwise? That question went unanswered when I posed it. When so commonly we call other humans 'people', I would assume that it would make a lot of people (*cough*) happy to know that humans are people as well. *shrug* --Izno (talk) 23:56, 10 September 2013 (UTC)[reply]
- Human can only be a subclass of person if all humans are persons. Some context:
- Personhood is the status of being a person. Defining personhood is a controversial topic in philosophy and law, and is closely tied to legal and political concepts of citizenship, equality, and liberty. ... Personhood continues to be a topic of international debate, and has been questioned during the abolition of slavery and the fight for women's rights, in debates about abortion, fetal rights and reproductive rights...In most societies today, living adult humans are usually considered persons...The category may exclude some human entities in prenatal development, and those with extreme mental impairment.—From the English Wikipedia on person
- Claiming instance of (P31) Q14870023 implies that not all humans are persons. It implies there are valid claims instance of (P31) Q14896454. It's been said that perhaps fetuses are not persons, but I'm not aware of any Wikidata items about a particular fetus. But how about human fetus -- does subclass of (P279) Q14896454 apply? Is instance of (P31) Q14896454 valid for Terry Schiavo, who became notable while she was in a state of extreme mental impairment? Are particular disenfranchised humans instance of (P31) Q14896454? What are specific examples of such subjects that fulfill the condition instance of (P31) human (Q5) but not instance of (P31) person (Q215627)?
- Using instance of (P31) human (Q5) for P107 'person' claims about items like Terri Schiavo (Q14897290), Ronald Reagan (Q9960), Rabindranath Tagore (Q7241) and Coco Chanel (Q45661) would insulate us from such philosophical troubles. It avoids making a statement about whether humans are a subclass of person. Conveniently, it also makes it easy to say human (Q5) subclass of (P279) person (Q215627), which I and I suspect most others would support. Emw (talk) 03:19, 11 September 2013 (UTC)[reply]
- Typically archelogical remains of tens of thousands of years ago are considered human, without being persons, so all fossils of early humans fit the bill. 130.195.179.107 03:12, 12 September 2013 (UTC)[reply]
- I'm not aware of any such archeological remains. Can you provide an example? How about humans merely thousands of years ago? Is Mungo Man a person? Otzi? Tutankhamun? What makes a 10,000+ year old human not a person? Human === Homo sapiens per Q5. I would be surprised if what makes a 10,000+ archeological specimen not an instance of Homo sapiens does not also make it not an instance of 'person'. In that case, human (Q5) subclass of (P279) person (Q215627) holds. Emw (talk) 03:53, 12 September 2013 (UTC)[reply]
- There's a list of fossils at w:List of human evolution fossils and the cats w:Category:Human remains (archaeological), w:Category:Specific fossil specimens and w:Category:Hominin fossils. Those with infoboxes appear to use taxobox or fossil variants rather than Infobox person. The underlying issue that the definition of a species only applies to individuals alive at a particular moment in time and is not stable across speciation. 130.195.179.40 23:00, 12 September 2013 (UTC)[reply]
- Where in that content is any assertion similar to "this specimen was Homo sapiens, but not a person"? Please answer that. Conclusions drawn from the names of infoboxes seem prima facie inadequate here. The statement "the definition of a species only applies to individuals alive at a particular moment in time" strikes me as unhelpfully vague or wrong. What do you mean -- that the intrinsic definition of human varies for subjects alive 10,000+ years ago and those alive today? I'm also not quite sure what you mean by "(the definition of species) is not stable across speciation". Please clarify. I'm familiar with the concept of speciation.
- I feel like an important point is being missed here. As I mentioned before, the species problem means there is some ambiguity in definition of species, though this doesn't affect humans much. That said -- and this is the important part -- the definition of person (Q215627) is immensely more ambiguous than the definition of human (Q5). The definition of person (Q215627) is a matter of philosophy and law, and is a topic of significant controversy in those domains. The definition of human (Q5), on the other hand, is a matter of science and enjoys effective consensus. The definition of 'human' is not nebulous and politicized; the definition of 'person' very much is.
- As I've said, using instance of (P31) human (Q5) for items like Coco Chanel (Q45661) doesn't imply human (Q5) subclass of (P279) person (Q215627). The latter claim is independent of the former, though the latter claim is probably valid. On the other hand instance of (P31) Q14870023 implies there are valid subjects for the claim instance of (P31) Q14896454, which is obviously problematic. Why do we need to couple our classifications to the controversial, ill-defined definition of 'person'? Let's avoid the philosophical briar patch that leads to and use the uncontroversial (but still very useful) claim instance of (P31) human (Q5) as our initial P107 'person' mapping for Coco Chanel et al. Emw (talk) 00:37, 13 September 2013 (UTC)[reply]
- There's a list of fossils at w:List of human evolution fossils and the cats w:Category:Human remains (archaeological), w:Category:Specific fossil specimens and w:Category:Hominin fossils. Those with infoboxes appear to use taxobox or fossil variants rather than Infobox person. The underlying issue that the definition of a species only applies to individuals alive at a particular moment in time and is not stable across speciation. 130.195.179.40 23:00, 12 September 2013 (UTC)[reply]
- I'm not aware of any such archeological remains. Can you provide an example? How about humans merely thousands of years ago? Is Mungo Man a person? Otzi? Tutankhamun? What makes a 10,000+ year old human not a person? Human === Homo sapiens per Q5. I would be surprised if what makes a 10,000+ archeological specimen not an instance of Homo sapiens does not also make it not an instance of 'person'. In that case, human (Q5) subclass of (P279) person (Q215627) holds. Emw (talk) 03:53, 12 September 2013 (UTC)[reply]
Vote: human, person, human person, or?
editSo we need to choose which item we use for people. It was sort of discussed above but I think we should focus the discussion that more clearly in a separate section.
- human (Q5) (alias: Homo sapiens)
- sounds like the most factual and neutral item to me; The only potential reason I see against it are that some people (languages ? / cultures? ) may find it weird or unpleasant to see humans classified by a taxonomic term. Maybe we should do an informal cross-language survey first,so that we do not have to fix everything afterwards (like we had for male/female in P:P21) -Zolo (talk) 07:14, 13 September 2013 (UTC)[reply]
- I would definitely check to see what the labels are for the other languages. --Izno (talk) 16:30, 13 September 2013 (UTC)[reply]
- Note that Q5 is labeled 'human', not 'Homo sapiens'. (The latter is an alias.) The item is certainly concerned with the subject as a taxonomic entity, but the label 'human' is not nearly as formal as 'Homo sapiens'. 'Human' -- just 'human' -- captures in common language that more neutral and uncontroversial (if not also plainly more useful) concept than any label containing the word 'person'. Emw (talk) 04:03, 17 September 2013 (UTC)[reply]
- Only in English. That's the concern he's voicing. --Izno (talk) 00:09, 18 September 2013 (UTC)[reply]
- sounds like the most factual and neutral item to me; The only potential reason I see against it are that some people (languages ? / cultures? ) may find it weird or unpleasant to see humans classified by a taxonomic term. Maybe we should do an informal cross-language survey first,so that we do not have to fix everything afterwards (like we had for male/female in P:P21) -Zolo (talk) 07:14, 13 September 2013 (UTC)[reply]
- I think it's plausible that opinions vary on the "weirdness" of classifying all items like Coco Chanel as instance of (P31) human (Q5) as much among English speakers as they do between English speakers and speakers of other languages. In fact, I would be surprised if most people -- whatever their language -- didn't consider it a bit weird to directly classify Coco Chanel as a human instead of a person. In languages that make the distinction, "Coco Chanel is a person" is probably a more colloquial way express the concept captured by the more precise statement "Coco Chanel is a human".
- However, P31 is for specifying the class of an instance in a hierarchy of all human knowledge. The idea is that we can use P31 to infer properties about an instance by examining the properties of its class. Classes from folk taxonomies -- like the class indicated by colloquial usages of 'person' -- are useful only in an informal, very domain-specific scope. That makes vernacular classes like 'person' bad fits as values for a property that can be used for generic type inference. What are the properties of person (Q215627)? That item currently has mappings to other classification/categorization systems and says 'person' is a subclass of 'subject' -- none of that directly tells us much useful information about 'person'. Nailing more specific properties down simultaneously makes 'person' better defined and better suited as a P31 value on the one hand, while on the other hand making it more "weird" and philosophically vexing than tacitly understood.
- As I stress in the Coco Chanel discussion above, the properties that define person (Q215627) are very nebulous and philosophically thorny, while the properties that define human (Q5) enjoy much more consensus and much less thorniness. That holds in English, and likely in most other languages. I just examined non-English articles on human sitelinked for Arabic, Catalan, Chinese, Farsi, French, German, Hausa, Hebrew, Italian, Japanese, Portuguese, Russian, Somali, Spanish, Swahili and Vietnamese, and except Hausa and Somali, they all have a infobox that represents taxonomic data for Homo sapiens. (Somali seems to discuss "human" as a taxonomic entity, though I can't make out fine detail.) The subject of English and Non-English Wikipedia articles for person, however, seem much less ontologized. I saw no infoboxes, some 'person' articles were merely disambiguation pages, etc. I suspect the 'person' articles that do exist are similar to the English Wikipedia article, where the gist is that person is a philosophical and legal concept about which there is little agreement on a definition.
- If it's true that 'human' is so much more precisely and uncontroversially defined than 'person' in not only English but also other languages -- as I think there is increasing evidence to indicate -- then I think the arguments remain strong against directly tying 'person' -- or 'human person', or any other class that has a strong, direct tie to 'person' -- to our P31 values for all items like Coco Chanel. Emw (talk) 12:12, 18 September 2013 (UTC)[reply]
- Second choice. --Izno (talk) 16:30, 13 September 2013 (UTC)[reply]
- Support per Coco Chanel discussion. Emw (talk) 23:48, 13 September 2013 (UTC)[reply]
- Support This one makes the most sense I think. Ajraddatz (Talk) 18:22, 23 September 2013 (UTC)[reply]
- person (Q215627)
- Oppose Too vague and value-loaded to me. That sounds more like a philosophical concept. --Zolo (talk) 07:14, 13 September 2013 (UTC)[reply]
- Oppose, also, primarily for vagueness. --Izno (talk) 16:30, 13 September 2013 (UTC)[reply]
- Oppose per Coco Chanel discussion. Emw (talk) 23:48, 13 September 2013 (UTC)[reply]
- Oppose I've been using this one, but thinking on it Q5 human would probably make more sense. Ajraddatz (Talk) 18:22, 23 September 2013 (UTC)[reply]
- Q14870023
- First choice. --Izno (talk) 16:30, 13 September 2013 (UTC)[reply]
- Oppose per Coco Chanel discussion. Emw (talk) 23:48, 13 September 2013 (UTC)[reply]
Fictional entities
editI would put at the top "being" instead of person. Then we can split into person, animal and vegetal...
- being
- person
- ...
- ...
- animal
- vegetal
- person
Then we have to classify according to fictional/non-fictional, but can we put Cerberus (Q83496) as fictional person or fictional animal ? Same question for a werewolf (Q9410) ? And god ? Can we classify it as fictional or not ? Snipre (talk) 14:01, 24 August 2013 (UTC)[reply]
- Animal and vegetal?
- As for fictional, we've had that discussion but it didn't go anywhere. The devs have said "no binary properties", give or take. And we don't want the class system to explode trying to classify every little thing.
- Werewolves: We can classify this as both things, if we have a "fictional animal" (we don't). We probably shouldn't have "fictional animal". They are still fictional persons (rather, fictional character).
- Gods can be described as fictional characters, sure. --Izno (talk) 16:39, 24 August 2013 (UTC)[reply]
- For fictional stuffs, the current usage for fictions characters are to classify them as instances of fictional characters. We can create several classes like this one, together maybe with a property real world equivalent to link these classes items to the real world classes. – The preceding unsigned comment was added by TomT0m (talk • contribs).
- We only do so because that already exists as a Wikipedia item. We should really strongly avoid making more such items because it will cause our subclasses to explode unnecessarily. --Izno (talk) 16:45, 24 August 2013 (UTC)[reply]
- Well I do so because it seems to me to be the best way to indicate that a character is fictional. I can see no reason not to add items for classes such as 'fictional places', 'fictional books' etc. We shouldn't let our subclasses explode unnecessarily. We shouldn't stop them increasing where necessary. Filceolaire (talk) 14:51, 26 August 2013 (UTC)[reply]
It will not explode, it will way less than double as the concepts of fictional worlds worth modeling are way more rare than the concepts of the real world. We're already having a lot of classes. Past fictional haracters, fictional classes of characters, and fictional space ships and more genrally fictional objects or fictional organizations, there won't be a lot of other classes. They can even be in a separate tree rooted in fictional universe, which would make a clear separation. TomT0m (talk) 15:15, 26 August 2013 (UTC)[reply]
- We only do so because that already exists as a Wikipedia item. We should really strongly avoid making more such items because it will cause our subclasses to explode unnecessarily. --Izno (talk) 16:45, 24 August 2013 (UTC)[reply]
- For fictional stuffs, the current usage for fictions characters are to classify them as instances of fictional characters. We can create several classes like this one, together maybe with a property real world equivalent to link these classes items to the real world classes. – The preceding unsigned comment was added by TomT0m (talk • contribs).
- I am thinking that every living and non-living human should be instance of (P31) = human, the point being that they should all have this same one item assigned, for the sake of simplicity. Perhaps, fictional people should be included, so long as they are assigned with another instance of (P31) which shows that they're part of fiction/myth/legend (in some cases, we can't even know for sure). Danrok (talk) 01:16, 1 September 2013 (UTC)[reply]
- I agree that all living and non-living humans should have the claim instance of (P31) human (Q5). Having the claim instance of (P31) human (Q5) in an item about a fictional person strikes me as a bad idea. I would prefer using something like the GND's literary or legendary character class to as an object for P31 claims about fictional characters. As I've said before, the problem with P107 isn't so much that it's based on the GND, it's that it attempts to restrict the world into very small set of classes. Emw (talk) 03:38, 4 September 2013 (UTC)[reply]
- It seems to me more important and not less that fictional characters also have claims about their species (I use the term 'species' loosely), as that is the most likely place where there may be other things possible. I wouldn't want to lump Chewbacca in the same class ("fictional") as Admiral Ackbar as Han Solo. It also is a question when we get to anthropomorphic characters, like Winnie the Pooh (Q188574).... I do understand the hesitance, of course. --Izno (talk) 02:08, 5 September 2013 (UTC)[reply]
- I'm much more interested in determining how we should classify actual persons. Resolving how to classify fictional entities seems peripheral and independent of that. Are you suggesting we should claim Han Solo instance of (P31) human (Q5)? Do you think how we classify actual persons is dependent on how we classify fictional entities? Unless the answer to either of those is "yes", I'd like to hammer out how to classify real people, since that's probably over 95% of P107 "person" values. Emw (talk) 14:03, 7 September 2013 (UTC)[reply]
- It seems to me more important and not less that fictional characters also have claims about their species (I use the term 'species' loosely), as that is the most likely place where there may be other things possible. I wouldn't want to lump Chewbacca in the same class ("fictional") as Admiral Ackbar as Han Solo. It also is a question when we get to anthropomorphic characters, like Winnie the Pooh (Q188574).... I do understand the hesitance, of course. --Izno (talk) 02:08, 5 September 2013 (UTC)[reply]
- I agree that all living and non-living humans should have the claim instance of (P31) human (Q5). Having the claim instance of (P31) human (Q5) in an item about a fictional person strikes me as a bad idea. I would prefer using something like the GND's literary or legendary character class to as an object for P31 claims about fictional characters. As I've said before, the problem with P107 isn't so much that it's based on the GND, it's that it attempts to restrict the world into very small set of classes. Emw (talk) 03:38, 4 September 2013 (UTC)[reply]
┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ For fictional entities, I propose fictional character (Q95074) that can be used from Han Solo (Q51802) (where it is already used) to Mickey Mouse (Q11934); for various kinds of gods I propose deity (Q178885). --Paperoastro (talk) 18:14, 7 September 2013 (UTC)[reply]
- I support both proposals. Developing a solution for building type hierarchies for fictional entities should not block the migration away from GND main type. This migration is really the first step in an incremental process to better classify P107 "persons". GND main type "person" can be used to classify several different types of items. Below is a list of GND subclasses of "person", and some proposed mappings:
GND class | Example | Proposed initial instance of (P31) mapping |
---|---|---|
Gods | Thor (Q42952) | deity (Q178885) |
Literary or legendary character | Falstaff (Q1233109) | fictional character (Q95074) |
Pseudonym | Voltaire | None, assign pseudonym to item label or alias, and/or use a separate property for "pseudonym" |
Royal or member of a royal house | Louis XIV of France (Q7742) | human (Q5) |
Spirits | ? | spirit (Q193291) |
- Does this seem reasonable? Of course, almost all P107 "person" claims will be for humans. For that 95%, for reasons described above (here), I think instance of (P31) human (Q5) is a good initial value. What do others think? I think we should try to get some finality on this P107 'person' discussion within a few days. Emw (talk) 20:55, 7 September 2013 (UTC)[reply]
- Support, of course ;-) --Paperoastro (talk) 21:17, 7 September 2013 (UTC)[reply]
As I said elsewhere, I would definitely say that we should claim Han Solo instance of (P31) human (Q5), and add a dedicated property for "is fictional". Like it or not, there are quite a few items about fictional characters, and that is something that we need to check for many queries. How do we make a list of aviators that excludes fictional aviators ? Find items with occupation (P106): aircraft pilot (Q2095549) that are instances of Homo Sapiens, or find items with occupation (P106): aircraft pilot (Q2095549) that are non-fictional ? The latter seems much more natural to me. And it makes it much more straightforward to indicate that Han Solo is a fictional human while Mickey Mouse a fictional mouse.--Zolo (talk) 06:37, 8 September 2013 (UTC)[reply]
- In the same discussions I said that IMHO "is fictional" is not sufficient: if we defined (some) fictional characters as humans, in a query people born in United Kingdom in 1980 I will find also Harry Potter! Once defined fictional characters as fictional character (Q95074) we will exclude them automatically from queries concerning "real people". We will define the "specie" of fictional character with a new property that will tell us that the fictional character Mickey Mouse is a mouse and Han Solo is a human.
- Thank you for your comment: my and your are two different ways to solve this problem. I hope that other people write here their suggestions. --Paperoastro (talk) 13:10, 8 September 2013 (UTC)[reply]
- You would have to add to the condition "is fictional != true" to the query. In this case that would indeed require an additional condition. In others, like my aviator example, that would just replace "instance of human". What I meant with my example is that anyway, we almost always need to check that someone is not fictional. It can be done either by "instance of: homo sapiens" or "is fictional != true". I think the latter is clearer, but I might former is indeed sometimes more concise, and If every fictional thing was a fictional human, I might support it. But there is such a thing as the Mickey Mouse problem. You suggest that there should be a separate"species" property. Yes, that may work, but I see two problems with that:
- if there is a separate "species" property, I find it strange to use P31 to say that people are instances of homo sapiens.
- more importantly fictional things are not always people. What would you do with One Ring (Q19852) ? Probably something like; "instance of fictional object + type of object: ring". That means that we need several rather arbitrary properties like "species", "type of object", "type of place" etc. For real objects, there seems to be a growing consensus for using "instance of" rather than these special properties. It means that the "type of object" property should be narrowed to "type of fictional object". Fictional items would thus a different structure than normal items, and I really find that confusing.
- --Zolo (talk) 14:04, 8 September 2013 (UTC)[reply]
- If users query "people born in United Kingdom in 1980" and get back Harry Potter (Q3244512) among the results, then Wikidata is doomed. Requiring users to add a clause "WHERE 'is fictional' != true" is also unacceptable; it would be an egregious API and command-line UI flaw. Wikidata should be unabashedly biased to the natural world. While it's important to be able to account for fictional entities in our structured data, solutions that make structured data for non-fictional entities unwieldy or notably less easy to use should be considered non-starters. If this makes structured data for fictional entities cumbersome, then I think that's OK.
- There have been several discussions about how to handle fictional entities: see e.g. Fictional stuff. Lydia, a proxy for the Wikidata developers, has also commented on this:
- There will not be such a datatype. It is better to express these with things like "is instance of:fictional person" for example—Lydia Pintscher (WMDE), regarding the boolean datatype necessary for an "is fictional" property
- So the WMDE spokesperson and a large set of Wikidata contributors who have commented on how to deal with fictional entities have proposed using "instance of (P31) fictional x" to solve this issue. Will this create a large number of arbitrary "fictional x" items? Yes. But that seems like a much better solution than polluting non-fictional items with crufty qualifier claims like "is fictional = false" and requiring future query users to tack on boilerplate code to filter out imaginary people from real ones. Emw (talk) 16:07, 8 September 2013 (UTC)[reply]
- Yes, there has been various conversations beforehand, but no consensus and that seems relevant here. I fully agree that having to use "is fictional" is a bit unwieldy, but does that does not really make things much better to use "instance of human". Say you want a list of Japanese nobel prize winners. If you use the "is fictional solution", you need to add "is fictional != true, and if you use the "instance of human" solution, you need to add "instance of: human". I do not think that is much better. The only way to filter out imaginary people without any boilerplate is to create a "fictional" counterpart to all properties, so that fictional people have "fictional nationalities" and earn "fictional prizes". Is that what you are suggesting ? --Zolo (talk) 18:33, 8 September 2013 (UTC)[reply]
- More precisely, I see two cases:
- when the query could apply to non-humans, like in "born in the United Kingdom in 1980." That query could very well apply to horse, so if you want to restrict it to real human, you need to specify that. In this case, that makes two parameters for "is fictional" instead of one for "instance of real human".
- when queried properties normally apply to humans. If you look for people with a nobel prize, you will not come across horse. In this case, you need the same number of paramters for the "is fictional" and the "real human" solution. And there may be complications. It may happen that some loony Emperor names his horse Senator. There is no reason to exclude him of the list of senator but the "instance of real human" thing makes it tricky. Of course, this is not a very common case, but I think that shows how "instance of real human" is more brittle than neatly distinguishing "humanness" from fictionality. If you add that all the problems it will create for fictional items, I think it amply makes the case against the "is real human" solution. --Zolo (talk) 20:00, 8 September 2013 (UTC)[reply]
- You would have to add to the condition "is fictional != true" to the query. In this case that would indeed require an additional condition. In others, like my aviator example, that would just replace "instance of human". What I meant with my example is that anyway, we almost always need to check that someone is not fictional. It can be done either by "instance of: homo sapiens" or "is fictional != true". I think the latter is clearer, but I might former is indeed sometimes more concise, and If every fictional thing was a fictional human, I might support it. But there is such a thing as the Mickey Mouse problem. You suggest that there should be a separate"species" property. Yes, that may work, but I see two problems with that:
┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ We have two options for specifying the class of fictional instances:
- put a Boolean "is fictional" qualifier on all P31 claims, or
- prepend "fictional" on only P31 values for fictional items.
Option 2 is better. Both involve adding boilerplate code, but Option 2 entails drastically less boilerplate code. I assert that Wikidata should be biased to reality, thus Option 2 should be preferred, since Option 1 makes it more cumbersome to deal with subjects in the natural world.
Regarding your example, noone is concerned that the query "born in the United Kingdom in 1980" could return a horse. It will not flabbergast users that such a query could return an item about a horse it satisfies the condition "born in the United Kingdom in 1980". The query under discussion is different. It's "person in the United Kingdom in 1980". Having Harry Potter among the results of such a query will flabbergast and frustrate users and cause them to ignore Wikidata. Requiring users to specify the query in the form "non-fictional person born in the United Kingdom in 1980" is bad user interface design. By default, queries should return only non-fictional results.
This requires having more items -- "fictional person", "fictional prize", etc. That's fine because building type hierarchies for fictional universes is an edge case. Sure, Trekkies will be annoyed that they have to add boilerplate code to support querying knowledge about what class of starship the Enterprise is and what type of organism Admiral Ackbar is, but such is life. The demographer wanting an easy and accurate way to determine how many people were born in the United Kingdom in 1980 will thank us. So will the physician wanting to easily and accurate way to find out if their patient's version of a gene has been found in any other humans.
Furthermore, per the quote above, there will be no Boolean datatype for an "is fictional" property. If we want to go against WMDE advice and implement a Boolean datatype ourselves, we'll have to use enforce value constraints via bot. That, of course, would be quite the kludge. Emw (talk) 22:50, 8 September 2013 (UTC)[reply]
- I am not sure I understand you point. Eliminating boilerplate would not require more items, it would require a full set of parallel properties. Unless we have a "fictional birth date" property, "born in 1980" will return fictional people as well as real people and real horses. And with "instance of real human", it would be rather difficult to get a list containing real people and real horses, but excluding fictional humans.
- There will be no boolean datatype, and it would be better to have have, but I do not think the development team has any deep objection to it, they just have a tight schedule. It would be better to have one, but the item datatype would work all right. We can decide that is is set to False by default, and bots already make constraint checkings on a massive scale anyway. --Zolo (talk) 06:23, 9 September 2013 (UTC)[reply]
- Absolutely not, we can reuse the property birth date, and with a property fictional equivalent of with a class as value, such as <human>, we can infer that it could have a birth date. TomT0m (talk) 14:05, 9 September 2013 (UTC)[reply]
- I do not understand what you mean. I am certainly not advocating to create a separate set of properties for fictional items, I am just saying that if you do not have them, you will need to explicitly filter out fictional items in the query. If I understand correctly, you propose to do it by checking that the item is"instance of human". I do not see why we should lump information about information about fictionality and about humanness. As I explained above, that will make claims about fictional items messy, and even for queries about real items, it may make some queries complicated (if the list of Senators can include a horse, you cannot use "& instance of human". You would need to have "& instance of non-fictional thing", which is not such a straightforward thing to do without a "is fictional" property). ---Zolo (talk) 15:16, 9 September 2013 (UTC)[reply]
- I don't know which one is more straightforward than the other, this still need a manpage ;). Anyway with a fictional senator class (which would have a fictional equivalent of claime) which your fictional character is the instance of you eliminate the need for any filtering in the query with minimal cost, your real senator can still be a horse (preferably not a fictional one :) ) TomT0m (talk) 17:47, 9 September 2013 (UTC)[reply]
- For me the creation of fictional properties is a "complication of simple business" ;-). Once we defined Harry Potter as " fictional character" we can use the usual properties for real world: we do not need of "fictional nationality", or "fictional date of birth" because the property P31 makes Harry Potter fictional, not the other properties! For me is not so strange use the normal "parent relations" to make the genealogical tree of Aragorn.
- IMHO, with this kind of classification, we can use more interesting properties: one for the creator(s) of fictional characters, one for the universe where they "live", one for the books or films with them... and they may be managed with the usual constraint mechanism. --Paperoastro (talk) 21:27, 9 September 2013 (UTC)[reply]
- @Paperoastro: I agree that properties like: "fictional date of birth" do not seem necessary; But the concern is: how do we remove fictional things from queries with maximal accuracy, minimal cost and minimal effort. I do not think that "fictional character" is the right solution for that. .
- @TomT0m, Creating fictional items would be equivalent to creating fictional properties, but only if it is done for all items, used in all properties of fictional item. The query "magician born in 1980" filters out fictional magicians only if we use "occupation (P106): fictional magician". To be consistent, we should also have: "birth place: fictional Paris", etc. That seems rather heavy handed. --Zolo (talk) 07:39, 10 September 2013 (UTC)[reply]
- I don't think it's heavy handed, creating an item is really easy (and could become even easier with a better interface), much easier than to create a property for example, and the fictional equivalent of property could make really easy to find if your item already hava fictional(s) equivalent. If it's just a matter of convention, I really see no functional problem with this solution, it separates clearly between fictional or not stuff without losing the real world equivalent, and it's not much a problem for the user as he needs anyway ti learn the conventions. TomT0m (talk) 10:54, 10 September 2013 (UTC)[reply]
- I think the solution is heavy-handed, but necessary. The most important criterion for a solution is that it ensures fictional entities are excluded by default in query results sets -- e.g., that no filtering needs to be done to exclude Harry Potter from the results of the simple query "persons born in the United Kingdom in 1980". The natural Universe (Q1) should be the default domain of discourse. If users want to include results from fictional universes, they should need to do something special, like prepend "fictional" to the value of the claim they're looking to satisfy in their query.
- The solution TomT0m and I support does this. However, this can only be applied to properties that have an item or string datatype. We need a solution that will work with non-string-like datatypes -- e.g. date, number. If a query "born in 1980" returns Harry Potter, I think this is still a major problem. Emw (talk) 12:07, 10 September 2013 (UTC)[reply]
- This would certainly require a significant amount of bot-assisted maintenance (for instance, we need to carefully insulate "fictional X" items from the main subclass tree), it would be unwieldy for people whose fictionality is disputed, and I do not see any good solution for non-item properties. On the other hand, it could indeed make basic queries simpler, so I am not radically opposed to i. But note that earlier proposals were about creating a few items like "fictional person" for P31 and using normal items for other statements. The current proposal would require much more work, and seems to call for a broader community decision. --Zolo (talk) 13:23, 10 September 2013 (UTC)[reply]
- I don't really thing creating one fictional equivalent item when needed is really much more work, it's on the person who want to model a fictional world and how far he want to go. I guess for most people it reduces itself to use the fictional person item. TomT0m (talk) 15:23, 10 September 2013 (UTC)[reply]
- And for Harry Potter : It would need something like a root class real world stuffs of which person would be a subclass. Then to filter Harry Potter we just would have to add instance of <Real stuffs> (with transitive properties), which is much better than specifying on every real world stuffs that they are real world stuffs. I will just add some arguments : take a city, let's say Paris, and take an uchrony which take place in Paris. The mayor of Paris is the loser af the previous election ... can you reuse the real Paris item to infer the mayor of the city in the uchrony ? TomT0m (talk) 15:29, 10 September 2013 (UTC)[reply]
- I agree with TomT0m that having a hierarchy for fictional universes rooted at fictional entity (Q14897293) is probably the best solution for excluding fictional entities from query results by default. This hierarchy for fictional things would be isolated from the hierarchy for things in the natural universe rooted at entity (Q35120). If needed, perhaps each fictional universe could be its own isolated hierarchy. Two trees, or a forest. Emw (talk) 03:45, 11 September 2013 (UTC)[reply]
- This would certainly require a significant amount of bot-assisted maintenance (for instance, we need to carefully insulate "fictional X" items from the main subclass tree), it would be unwieldy for people whose fictionality is disputed, and I do not see any good solution for non-item properties. On the other hand, it could indeed make basic queries simpler, so I am not radically opposed to i. But note that earlier proposals were about creating a few items like "fictional person" for P31 and using normal items for other statements. The current proposal would require much more work, and seems to call for a broader community decision. --Zolo (talk) 13:23, 10 September 2013 (UTC)[reply]
- I don't think it's heavy handed, creating an item is really easy (and could become even easier with a better interface), much easier than to create a property for example, and the fictional equivalent of property could make really easy to find if your item already hava fictional(s) equivalent. If it's just a matter of convention, I really see no functional problem with this solution, it separates clearly between fictional or not stuff without losing the real world equivalent, and it's not much a problem for the user as he needs anyway ti learn the conventions. TomT0m (talk) 10:54, 10 September 2013 (UTC)[reply]
- I don't know which one is more straightforward than the other, this still need a manpage ;). Anyway with a fictional senator class (which would have a fictional equivalent of claime) which your fictional character is the instance of you eliminate the need for any filtering in the query with minimal cost, your real senator can still be a horse (preferably not a fictional one :) ) TomT0m (talk) 17:47, 9 September 2013 (UTC)[reply]
- Looking at a lot of items today I came to the conclusion that we could replace 'P107->person' with the 'occupation' property which is effectively an 'instance of' property for humans. Or we could just replace of these with 'instance of -> human' and add occupation gradually (imported from categories probably). Filceolaire (talk) 21:19, 10 September 2013 (UTC)[reply]
- A carpenter is not necessarily human, as per the fictional discussion. Making multiple instance of claims regarding occupations could be done, and it was argued at the property creation discussion that we should do so, but I believe (IIRC) that Emw was shouted down on that point. :) --Izno (talk) 23:56, 10 September 2013 (UTC)[reply]
Table of options
edit- Blablabla. Stop sterile discussions and use practical examples, if you have a different proposition just fill one column in the table below. Or if you have an interesting case just add a new line. Snipre (talk) 02:27, 11 September 2013 (UTC)[reply]
- It is true that a synthetic table can be useful, but that is not enough. The point of the above discussion was that solution 1 does not work well at all, and that can only be seen by thinking on the problem on a more general level than a few examples. --Zolo (talk) 07:09, 11 September 2013 (UTC)[reply]
Case Solution 1: business as usual + instance of fictional solution 2: use only fictional X statements for fictional items solution 3: "is fictional" property Solution 4 Solution 5 Ra (Q1252904) deity (Q178885) deity (Q178885) deity (Q178885) Example Example Han Solo (Q51802) fictional character (Q95074)
human (Q5)
occupation: smugglerfictional character (Q95074) [1]
occupation: fictional smuggler [2]human (Q5)
occupation: smuggler
is fictional: trueExample Example Yoda (Q51730) fictional character (Q95074)
human (Q5)
occupation: Jedi Master (Q14904358)fictional character (Q95074)
occupation: Jedi Master (Q14904358)living entity
occupation: Jedi Master (Q14904358)
is fictional: trueExample Example Scar (Q2291314) fictional character (Q95074)
lion (Q140)fictional lion (Q27267085) lion (Q140)
is fictional: trueExample Example Albert Einstein (Q937) human (Q5)[3] human (Q5) human (Q5) Example Example Tsavo Man-Eaters (Q2510263) lion (Q140) lion (Q140) lion (Q140) Example Example Bo (Q1273495) dog (Q144) dog (Q144) dog (Q144) Example Example list of real lions lion (Q140) and no P31 a direct or indirect subclass fictional entity lion (Q140) lion (Q140) and not fictional == true Example Example list of real or fictional lions lion (Q140) lion (Q140) + fictional lion (Q27267085)[4] lion (Q140) Example Example
- ↑ fictional character (Q95074) would contain fictional equivalent of homo sapiens
- ↑ Q fictional smuggler would contain: "fictional equivalent of smuggler"
- ↑ And not instance of (P31):physicist (Q169470) because this is defined with occupation (P106): don't repeat the same information several times in the same item.
- ↑ that is fictional lion (Q27267085) but we should not asusme that the queries not the Q number for fictional lion. (From Emw: what?)
- I'm afraid I don't agree that option 1 doesn't work well. Your table demonstrates that it works better than option 2 or option 3.
- If the response to your query includes fictional characters then (if you don't want fictional characters included) you know that you need to add a term to your query to exclude anything which is not a member of the class 'human'. Simple. All we have to do is make sure the items are suitably labelled so this additional term is easy to formulate. Filceolaire (talk) 12:59, 11 September 2013 (UTC)[reply]
- The good thing about option 2 is that fictional items are excluded by default. Given that option 1 does not do that, I do not see any reason to prefer it over option 3. You need to set up, maintain and document a list of all items that correspond to fictional classes, and that is sure more complicated than a simple "fictional: true". You also need to make just as many statements,the only difference being that information about fictionality is mixed with other data in P31. --Zolo (talk) 13:51, 11 September 2013 (UTC)[reply]
- Jedi master just do not have a real equivalent ... why should we specify one ? Pure fuctional things should stay purely fictional ? It just have to be a subclass of fictional occupation. TomT0m (talk) 14:58, 11 September 2013 (UTC)[reply]
- True, I have change the table accordingly. --Zolo (talk) 15:21, 11 September 2013 (UTC)[reply]
Another question that come to my mind : is fictional character only for fictional humans ? In my mind it's a character of a fiction, not necessarily a human. So it would rather be that both Sherlock Holmes and Suhbacca are fictional characters, but one is a <fictional person> but the other is a <Wookie> (note that as the Special:ItemByTitle/enwiki/Wookiee item already exists, it's not much of an otherhead. TomT0m (talk) 07:21, 12 September 2013 (UTC)[reply]
- I agree. Fictional human should use the fictional equivalent of homo sapiens (or whatever will be used for non-fictional humans). --Zolo (talk) 08:10, 12 September 2013 (UTC)[reply]
Detecting non-human persons
editP107 "person" values probably cover 95%+ humans, but they can also include deities and fictional characters. How can we detect where GND main type is used on a non-human person? Like most things on Wikidata, this would be much easier if queries were available, but they're not, and it's unclear when they will be. Categories on those pages seem like they'd be difficult to leverage here. I'm almost inclined to support crudely transferring P107 "person" values to human (Q5), and then manually removing P107 "person" claims from items about non-human deities, fictional characters, etc. Thoughts? Emw (talk) 21:18, 7 September 2013 (UTC)[reply]
- I don't think that's a good idea and would definitely support using categories at least to figure out which are people and which are not. Using only the 'people' categories should get us most of the way; for the remaining 5%, I would be inclined to generate a list so that we can investigate the best categories to automatically import otherwise. --Izno (talk) 22:15, 9 September 2013 (UTC)[reply]
- What proportion of P107 "person" items have Wikipedia categories that reliably classify them as humans? Does that set of categories ever exist on non-human persons? (On a tangential note: is there any issue with using instance of (P31) deity (Q178885) to replace P107 'person' claims about deities?) Emw (talk) 03:00, 10 September 2013 (UTC)[reply]
I would imagine en:Category:Living people, and it's related categories (dead people, etc.) reliably classify persons, but I haven't investigated for false positives. Which are going to be an issue no matter what we do.
As for deity claims, I've personally gone further down the class tree where possible, as is good practice with P31/P279. --Izno (talk) 23:37, 10 September 2013 (UTC)[reply]
- What proportion of P107 "person" items have Wikipedia categories that reliably classify them as humans? Does that set of categories ever exist on non-human persons? (On a tangential note: is there any issue with using instance of (P31) deity (Q178885) to replace P107 'person' claims about deities?) Emw (talk) 03:00, 10 September 2013 (UTC)[reply]
Maybe it is easier to detect humans. A human, as for example, should have a property P27 (country of citizenship). Therefore, a bot could change p107 to p31:q5 if there is a statement p27. --Goldzahn (talk) 05:47, 18 September 2013 (UTC)[reply]
Organization
editOrganization discussion
edit- Are we going to need to do these by hand, besides those as commented in order? --Izno (talk) 03:53, 1 September 2013 (UTC)[reply]
- Classify using information from Categories and the 'instance of' property. Delete P107 when the 'instance of' property is added. Review any remaining P107 properties by hand. Filceolaire (talk) 21:37, 17 September 2013 (UTC)[reply]
Event
editEvent discussion
edit- Are we going to need to do these by hand, besides those as commented in order? --Izno (talk) 03:54, 1 September 2013 (UTC)[reply]
Creative work
editWork discussion
edit- Are we going to need to do these by hand, besides those as commented in order? --Izno (talk) 03:53, 1 September 2013 (UTC)[reply]
Term
editWhere an item has P107 (P107) term (Q1969448) delete this statement.
These items will be classified using information from Categories and the 'instance of' or similar properties.
Term discussion
edit- de: P107 (P107) = term (Q1969448) wird 255907 mal verwendet. Kann ein Teil dieser Aussagen auf andere Attribute übertragen werden?
- en: P107 (P107) = term (Q1969448) is used in 255907 statements. Can some of these statements be transferred to other properties?
- In my opinion there is nothing to gain for the Naturwissenschaften from these statements. So all the statements can be deleted in my opinion. --Tobias1984 (talk) 13:37, 18 August 2013 (UTC)[reply]
- I agree with Tobias. There is nothing useful about "term". Where items actually do cover terms, and which cannot be explained in some other way, I would rather prefer those items to be blank until we figure out exactly what to do with them. --Izno (talk) 17:21, 18 August 2013 (UTC)[reply]
- I agree. Delete them all. Sort them into classes using info from Categories. Text above rewritten. Filceolaire (talk) 11:27, 19 August 2013 (UTC)[reply]
- I agree, not very useful claims, no need to migrate. Michiel1972 (talk) 08:54, 22 August 2013 (UTC)[reply]
- I need to distinguish proper noun phrases from common nouns. How can I do it without 'terms'? User:Sokirko 12:53, 22 August 2013
- I meant we should delete the statements P107 (P107) = term (Q1969448); not that we should delete the items. Filceolaire (talk) 11:24, 23 August 2013 (UTC)[reply]
- Most "terms" are subclasses (P279). There are also proper nouns which are subclasses. All instances of (P31) are usually not terms. --Izno (talk) 22:22, 15 September 2013 (UTC)[reply]
- I agree, delete this, but I suggest leaving it as the last task, in case it is useful for the migration process, i.e. If P107 (P107) = term (Q1969448) then this item is assumed not be one of the other GND types. Danrok (talk) 01:24, 1 September 2013 (UTC)[reply]
- I agree with Danrok. Emw (talk) 00:36, 5 September 2013 (UTC)[reply]
- That... doesn't make sense. Why? --Izno (talk) 02:35, 5 September 2013 (UTC)[reply]
- GND's 'term' main type lets us divide the world into three bins: A) person, organization, event, creative work, or geographical feature, B) not A (term) and C) unknown (no P107 value). If we were to delete all P107 'term' claims right now, then we would lose bin B; we would lose information. Deleting 'term' claims last would let us know that the item should be classified as something other than a person, organization, event, creative work, or geographical feature -- which is useful. Emw (talk) 03:37, 5 September 2013 (UTC)[reply]
- How useful for our purpose though? One of the reasons I at least argued for its deletion is that it isn't…. As I noted in the other subdiscussions, we're probably going to need to do a fair chunk of this by hand anyway. Which will be painful, but it's work that needs to be done either way, and knowing whether something is not a proper noun (even so, there are proper nouns among terms!) doesn't seem particularly useful to me. --Izno (talk) 22:19, 9 September 2013 (UTC)[reply]
- It isn't particularly useful, agreed; but it also doesn't seem problematic whatsoever to delay a batch deletion of all P107 'term' claims until all other P107 claims have been replaced. I don't feel strongly about this though. If someone wants to mass delete 250,000+ P107 'term' claims sooner, I don't think doing so would be a major problem for the migration effort. Emw (talk) 01:47, 10 September 2013 (UTC)[reply]
- How useful for our purpose though? One of the reasons I at least argued for its deletion is that it isn't…. As I noted in the other subdiscussions, we're probably going to need to do a fair chunk of this by hand anyway. Which will be painful, but it's work that needs to be done either way, and knowing whether something is not a proper noun (even so, there are proper nouns among terms!) doesn't seem particularly useful to me. --Izno (talk) 22:19, 9 September 2013 (UTC)[reply]
- GND's 'term' main type lets us divide the world into three bins: A) person, organization, event, creative work, or geographical feature, B) not A (term) and C) unknown (no P107 value). If we were to delete all P107 'term' claims right now, then we would lose bin B; we would lose information. Deleting 'term' claims last would let us know that the item should be classified as something other than a person, organization, event, creative work, or geographical feature -- which is useful. Emw (talk) 03:37, 5 September 2013 (UTC)[reply]
- That... doesn't make sense. Why? --Izno (talk) 02:35, 5 September 2013 (UTC)[reply]
- I agree with Danrok. Emw (talk) 00:36, 5 September 2013 (UTC)[reply]
- astronomical objects can be distinguished by P60 (P60). If an item has P107 (P107) geographical feature (Q618123) and P60 (P60) then delete P107 (P107) and do not replace it. (Note: there is some debate about whether P60 or a combination of instance of (P31) and subclass of (P279) should be used to specify the type of astronomical objects.)
- What is the top class (in object-class hierarchy) for this type of objects? Still geographical feature (Q618123) or maybe geographic location (Q2221906) ? Infovarius (talk) 17:07, 27 September 2013 (UTC)[reply]
Place discussion
edit- I am actively adding the gnd type geographical feature for administrative division of places, so how do we replace this? --Napoleon.tan (talk) 01:06, 25 September 2013 (UTC)[reply]
- Replace it with 'P132 (P132)' Filceolaire (talk) 00:01, 26 September 2013 (UTC)[reply]
- And instance of (P31) or subclass of (P279) as appropriate. --Izno (talk) 00:09, 26 September 2013 (UTC)[reply]
- instance of (P31) is not appropriate because we have decide to use P132 (P132) instead. If at some future date we want to Replace P132 (P132) with instance of (P31) or we want to have both then we will have an RFC about it first.
- subclass of (P279) is not appropriate for individual administrative units. It should only be used for classes (types) of administrative unit and in fact most of these already have this property. Filceolaire (talk) 22:59, 26 September 2013 (UTC)[reply]
- True about 279. I am not inferring that it was for individual instances. (I know the distinction sir! :)
- No, instance of (P31) is all the more appropriate. Simply because we use P132 does not mean we should not use P31, only that there is a duplicate property (and it is trivial to see which is the duplicate). This is true of all the specialized type properties in use. Adding instance of/subclass of now eases our pain down the road, when and if in fact we remove P132 (I am skeptical that we will not at some point). And if we do not delete these properties (and we should!), then external users will still have a consistent pair of properties, without needing to know all of the specialized types. --Izno (talk) 00:04, 28 September 2013 (UTC)[reply]
- And trust me, deprecating a specialized type property is painful. I've been deprecating P288 (P288) slowly and alone (cleaning up the items I visit on the way, mind you), not least because they don't distinguish between classes and instances. Saying a ship is a class or that a ship class is instance of anything but ship class would be wrong. :). --Izno (talk) 00:09, 28 September 2013 (UTC)[reply]
Astronomical objects
edit- P60 (P60) should be used for astronomical objects instead of instance of (P31). This should not be changed until there is a consensus to delete P60 (P60) and replace it with P31. I've editted Paperoastro's text above. Filceolaire (talk) 11:22, 19 August 2013 (UTC)[reply]
- +1 with Filceolaire. P60 (P60) should be replaced by instance of (P31). Tpt (talk) 09:04, 5 September 2013 (UTC)[reply]
- This comment confuses me a bit. Filceolaire is saying "use P60 (P60) for astronomical objects instead of instance of (P31)"; Tpt is saying "P60 (P60) should be replaced by instance of (P31)". – The preceding unsigned comment was added by Emw (talk • contribs).
- Oppose P60 (P60) is a domain-specific "type of" property and thus redundant with instance of (P31) and subclass of (P279). This has been discussed in the moribund Many or few type properties RFC and a bit more in Paperoastro's sandbox. If we use P60 as suggested, I see no reason not to have thousands of additional domain-specific "type of" properties and effectively deprecate P31 and P279. No other Semantic Web project I'm aware of uses P31/P279-redundant properties like P60; they all use one or two of the two generic type properties recommended by the W3C. Emw (talk) 02:07, 10 September 2013 (UTC)[reply]
- "I see no reason not to" is probably an argument you don't want to make, Emw. Someone might take you at your word and go about proposing many such. :) --Izno (talk) 23:34, 10 September 2013 (UTC)[reply]
- I would be opposed to using P60, as I voiced elsewhere. But let's leave whether we should or should not use P60 for a separate discussion. While we're migrating, and where possible, we should add both a P31/P279 claim and a P60 claim. Duplication of data for this purpose—because there is a dispute on the path forward—seems natural to me. There is little doubt in my mind that P31/P279 have broad consensus of use. --Izno (talk) 23:34, 10 September 2013 (UTC)[reply]
Name (disambiguation)
editDisambiguation discussion
edit- I think the table that Zolo drew up and as amended by me works for this, per the discussion on Wikidata:Project_chat/Archive/2013/08#Disambiguation_pages. --Izno (talk) 14:38, 18 August 2013 (UTC)[reply]
- I agree with the summary in the table, thus use instance of (P31) with Wikimedia disambiguation page (Q4167410) and remove the GND statement Michiel1972 (talk) 09:09, 22 August 2013 (UTC)[reply]
GND classes is well-defined notable classification system. Loosing information about this classification is bad idea. GND-classes are not equals to any of our classes (items). This is alternative classification system. So possible solution is: create 6 items: GND-person, GND-place, GND-event, ... and use its with instance of (P31). Items will contain multiple instance of (P31), for example: "Moscow" is "GND-place" and "city". — Ivan A. Krestinin (talk) 06:07, 19 August 2013 (UTC)[reply]
- GND Main types are not very well defined. Our classes are much better defined - that is why the GND classes do not match ours. Having multiple 'instance of' properties for an item is confusing for bots trying to extract information and is only worth doing if it adds real information. GND main types either duplicate the information in the other 'instance of' statements - in which case they are not needed - or they do not duplicate the information in the other 'instance of' statements - in which case they are in every case wrong and misleading. If you want to link to GND use GND ID (P227) and link to the particular low level GND class related to that item. Filceolaire (talk) 11:37, 19 August 2013 (UTC)[reply]
- Our classes system is conflicting, contains undefined number of classes and incomplete. Some example: higher education institution (Q38723) <subclass of (P279)> school (Q3914) in some languages, but it is false at least for Russian. Multiple instance of (P31) claims already exists and correct, example: Hubble Space Telescope (Q2513). — Ivan A. Krestinin (talk) 12:46, 19 August 2013 (UTC)[reply]
- I think you missed the whole point of Wikidata. Concepts are not label depedants, the definition of an item in Russion should identify exactly the concept. I the school concept in Russia is not the same as the equivalent word in Russian, you should refer of the definition, not the wording. In other words if the concepts in the two languages does not match, it needs two items and the meaning of your example claim should be the same in all languages. TomT0m (talk) 13:01, 19 August 2013 (UTC)[reply]
- If you analyze interwiki set you find that this is very popular situation when linked articles describe only similar terms, but not exact the same. Are we need broke all these links? — Ivan A. Krestinin (talk) 13:16, 19 August 2013 (UTC)[reply]
- The poroposed solution by community for the problems of the difference between the two projects (1 item = 1 concept is deeply in the roots of Wikidata) is that the software will eventually handle redirects, which was not part of the Wikidata plan at the beginning. TomT0m (talk) 10:30, 20 August 2013 (UTC)[reply]
- I am one of the strongest proponents for links to redirects but I recognise that it is not going to happen soon and even when it does happen it is not going to solve all issues. In the mean time we need to be looking at ways to show the relationships between items using properties.
- Ivan: if higher education institution (Q38723) <subclass of (P279)> school (Q3914) is not true in Russian then the Russian version of either higher education institution (Q38723) or school (Q3914) doesn't match the concept behind the item and should be attached to a different item (maybe a new item). Similarly in every other place where a wikidata item has sitelinks to items which are not similar enough to be described by the same properties then those sitelinks need to be on different items. The heart of Wikidata is the items for concepts or objects which can be described by a set of properties. Items which cannot be described by properties - Wikipedia: pages, Category pages, compound pages (=pages that deal with more than one thing) etc - are peripheral to wikidata. Filceolaire (talk) 11:37, 21 August 2013 (UTC)[reply]
- Yes, I understand this idea. And I understand that this idea will create many conflicts because old iwiki systems allows to link similar, but not exact the same terms. We go away too far from the subject. The key think: there are many classification systems in the world. I do not see any reason why Wikidata should contain only one of its. And why this one must be self-developed. — Ivan A. Krestinin (talk) 08:56, 24 August 2013 (UTC)[reply]
- The poroposed solution by community for the problems of the difference between the two projects (1 item = 1 concept is deeply in the roots of Wikidata) is that the software will eventually handle redirects, which was not part of the Wikidata plan at the beginning. TomT0m (talk) 10:30, 20 August 2013 (UTC)[reply]
- If you analyze interwiki set you find that this is very popular situation when linked articles describe only similar terms, but not exact the same. Are we need broke all these links? — Ivan A. Krestinin (talk) 13:16, 19 August 2013 (UTC)[reply]
- I think you missed the whole point of Wikidata. Concepts are not label depedants, the definition of an item in Russion should identify exactly the concept. I the school concept in Russia is not the same as the equivalent word in Russian, you should refer of the definition, not the wording. In other words if the concepts in the two languages does not match, it needs two items and the meaning of your example claim should be the same in all languages. TomT0m (talk) 13:01, 19 August 2013 (UTC)[reply]
- Our classes system is conflicting, contains undefined number of classes and incomplete. Some example: higher education institution (Q38723) <subclass of (P279)> school (Q3914) in some languages, but it is false at least for Russian. Multiple instance of (P31) claims already exists and correct, example: Hubble Space Telescope (Q2513). — Ivan A. Krestinin (talk) 12:46, 19 August 2013 (UTC)[reply]
- There are two really simple cases from a practicality standpoint of why the GND type system needs to go:
- A claim is made using GND-whatever on an item which does not have a GND "page" already. You have just made a claim which is completely unverifiable. This is not okay given that this is a wiki. Period and end of story.
- A claim is made using GND-whatever on an item which does have a GND "page" already. That GND page is linked to ours via GND ID (P227). All you have done then is duplicated information. For our purposes and needs, that's certainly not okay either.
That's aside from all the other reasons why we don't need anything remotely like "GND" anywhere near our database. In short, and without rehashing all the argument in the RFC (because this is not the page to do that), the GND system is going away. If you would like to add constructively to the discussion above through some other means, please do so. Maybe you have a legitimate concern about the languages problem (in general, don't suggest a solution for an undefined problem), but that should be a discussion held completely and totally separately from the question of GND. --Izno (talk) 22:52, 19 August 2013 (UTC)[reply]
- It is a pity that you are going to delete a simple thing in order to replace it with a more complicated one. There is no way to create a knowledge base without inaccuracies; concepts would be different, properties would be sometimes wrong. Wikidata creates a language that describes the world, all languages has synonyms and levels of comprehension (from children to scientists). The claim 'the UN is an organization' is simple, the claim "The UN is headed by X or created by X, that's why it is an organization' is a complicated one. If a claim is simple and could be understood by majority of people, then it should be in Wikidata, all other claims should be deduced from simple claims.--Sokirko (talk) 11:39, 23 August 2013 (UTC)[reply]
- The claim would not disappear, the UN are still an instance of organization. And an instance of something more precise like and international organisation, a federation of countries. When you say it is an organisation, you don't distinguish the UN and your barber's shop. How useful is that ? TomT0m (talk) 11:51, 23 August 2013 (UTC)[reply]
This RfC starts with the assumption that concensus has been reached to delete the GND main type property but the only discussions I found show no such concensus. Unless a link to the relevant discussion is provided I propose that this RfC be closed as invalid. No one should have to make an extensive search to find where concensus was reached. Allen4names (talk) 16:33, 20 August 2013 (UTC)[reply]
- Perhaps the text has changed, but I can't see the part that a consensus was reached. At least now the into is: It was decided.... And that's the point, it was decided and I hope we can progress with this decision. --Nightwish62 (talk) 10:40, 21 August 2013 (UTC)[reply]
- The intro asserts that the decision was made not that there is concensus to delete. If evidense of a concensus to delete discussion is regarded as unnessary we may as well simply gave admins full authority to delete properties at will. Allen4names (talk) 06:43, 22 August 2013 (UTC)[reply]
- The previous discussion (Wikidata:Requests_for_comment/Primary_sorting_property) is not easy to interpret and I don't think we have a group or institution that has prerogative of interpretation (Deutungshoheit). --Tobias1984 (talk) 07:45, 22 August 2013 (UTC)[reply]
- One think is visible without interpretation: there is no consensus for p107 deletion. — Ivan A. Krestinin (talk) 08:24, 24 August 2013 (UTC)[reply]
- The previous discussion (Wikidata:Requests_for_comment/Primary_sorting_property) is not easy to interpret and I don't think we have a group or institution that has prerogative of interpretation (Deutungshoheit). --Tobias1984 (talk) 07:45, 22 August 2013 (UTC)[reply]
- The intro asserts that the decision was made not that there is concensus to delete. If evidense of a concensus to delete discussion is regarded as unnessary we may as well simply gave admins full authority to delete properties at will. Allen4names (talk) 06:43, 22 August 2013 (UTC)[reply]
- I think the RFC abovecited by Tobias makes it clear that GND is going away. 9 in favor of keeping it and 24 in opposition of keeping it is a pretty clear message. --Izno (talk) 22:19, 22 August 2013 (UTC)[reply]
- These numbers says only one: there is no consensus, en:WP:NOT#DEM. — Ivan A. Krestinin (talk) 08:24, 24 August 2013 (UTC)[reply]
- No, I think you're just wrong. :) Even if we don't count the numbers, there was no persuasive argument put forward by "keep P107" persons. --Izno (talk) 16:17, 24 August 2013 (UTC)[reply]
- Arguments was ignored. Are you see in this RFC sections: "Arguments analysis", "Draft decision", "Discussion of the draft", "Final decision"? Are you think consensus in hard question can be reached without these stages? — Ivan A. Krestinin (talk) 20:45, 25 August 2013 (UTC)[reply]
- The RfC-system has some problems, but Izno is right about, that there are no real good arguments for main type. I voted for main type, because I think the two systems could exist parallel to each other. But other than that I can't think of a good reason to keep it. And how should we ever source main type. No publication starts with the sentence "the electron is considered a term in GND-main type classification". --Tobias1984 (talk) 20:56, 25 August 2013 (UTC)[reply]
- And as I argued in the RFC, where we can source the main type, all that happens is that we duplicate information by also including the ID of the item in the GND scheme (which is undeniably bad for us). But that aside, this is not the place to be arguing over points already mentioned there. --Izno (talk) 21:15, 25 August 2013 (UTC)[reply]
You cannot produce proof that the arguments made were ignored. You can produce proof that such analysis of the arguments was not posted onwiki.
As for the rest, this is Wikidata. I undoubtedly believe that any user, so long as he is uninvolved (you and I are not uninvolved in this case, as both stated opinions during the RFC) can come to the correct decision regarding the outcome of an RFC or other closure without going through all of those phases. Neither, as this is Wikidata, is there any policy or guideline basis for what you claim should be the method of closing an RFC. Thus, you seem to be proceeding from a false first principle, which makes your argument largely irrelevant. Until such time as there is policy or guideline, uninvolved users can close RFCs in any manner they see fit. --Izno (talk) 21:12, 25 August 2013 (UTC)[reply]
- Involved/uninvolved, administrator/anonymous - all this is unimportant. Just because we need no random decision of random person. Important think is good arguments analysis and decision based on this analysis. Please see ru:Википедия:Опросы#Проведение. — Ivan A. Krestinin (talk) 21:56, 25 August 2013 (UTC)[reply]
I do not mean to be rude, but did you just link me to a wiki which is not Wikidata? A wiki which obviously holds no sway here?
But no, whether a user is involved in the process very much matters. They are the only ones who can judge without a predisposed bias. What if I had closed the RFC? What would you be saying then? --Izno (talk) 01:57, 26 August 2013 (UTC)[reply]
- I use this link because Wikidata`s rules system currently is in creation stage, the project is too young. To reach consensus is not enough to call some uninvolved person and say "Please say that you think about this discussion? You word will be law." The person must analyze discussion (in written form), extract arguments, told why part of arguments are invalid, write resolution draft (in ruwiki it is named "Предварительный итог"), this draft usually is discussed again, after it draft is modified and after it final document is completed. This procedure is long, but it allows to reach real consensus. Sample with P107 shows that closing RFC using administrative pressure produces edit wars and conflicts. — Ivan A. Krestinin (talk) 14:34, 27 August 2013 (UTC)[reply]
- Involved/uninvolved, administrator/anonymous - all this is unimportant. Just because we need no random decision of random person. Important think is good arguments analysis and decision based on this analysis. Please see ru:Википедия:Опросы#Проведение. — Ivan A. Krestinin (talk) 21:56, 25 August 2013 (UTC)[reply]
- The RfC-system has some problems, but Izno is right about, that there are no real good arguments for main type. I voted for main type, because I think the two systems could exist parallel to each other. But other than that I can't think of a good reason to keep it. And how should we ever source main type. No publication starts with the sentence "the electron is considered a term in GND-main type classification". --Tobias1984 (talk) 20:56, 25 August 2013 (UTC)[reply]
- Arguments was ignored. Are you see in this RFC sections: "Arguments analysis", "Draft decision", "Discussion of the draft", "Final decision"? Are you think consensus in hard question can be reached without these stages? — Ivan A. Krestinin (talk) 20:45, 25 August 2013 (UTC)[reply]
- No, I think you're just wrong. :) Even if we don't count the numbers, there was no persuasive argument put forward by "keep P107" persons. --Izno (talk) 16:17, 24 August 2013 (UTC)[reply]
- These numbers says only one: there is no consensus, en:WP:NOT#DEM. — Ivan A. Krestinin (talk) 08:24, 24 August 2013 (UTC)[reply]
We are starting to create some hierarchies using instance of (P31) and subclass of (P279). It would be useful to have a gadget that allow us to show (and check!) the "tree of items" made with these two properties. There is the Template:Tree that shows the main items of the "sub-item" with a certain number of iteration, but it would be useful also the inverse: the possibility to show all the the subclass of an item from the "main item" to its sub, sub-sub (and so on) items. --Paperoastro (talk) 15:22, 21 August 2013 (UTC)[reply]
- I created
{{Subclasses tree}}
based on this template indeed, but it needs to be the other way around :). I'll look at{{Tree}}
later, but maybe it would require that we could make the query to find every item such that subclass of <parent> given some parent. Maybe we can not do that now (in which case a workaround woult be to have a bot maintaining a reverse property of superclass automatically. TomT0m (talk) 14:19, 22 August 2013 (UTC)[reply]- You have made a nice job! You are right: the <parent> items do not know which "children" have and how many they are! Probably, as you suggested, we need queries! --Paperoastro (talk) 10:46, 25 August 2013 (UTC)[reply]
- Queries -- at least their initial version as planned -- will not enable us to determine the type hierarchy of a given item. For that, bug 50911 will need to be resolved. I encourage anyone who thinks it's important to be able to get an item's type hierarchy using a query (hint: it is) to sign up for an account on Bugzilla and vote for bug 50911 (in the 'Importance' section). Without this feature, Wikidata's ability to leverage P31 and P279 will be largely hamstrung. Emw (talk) 16:46, 31 August 2013 (UTC)[reply]
Some type hierarchies are not trees
editPlease note that the hierarchies formed by instance of (P31) and subclass of (P279) are directed acyclic graphs (DAGs), not necessarily trees. This is because an item can have multiple instance of (P31) or subclass of (P279) values.
For example, consider Arnold Schwarzenegger (Q2685). Schwarzenegger is an instance of an actor and an instance of a politician. Although it's useful to be able to classify Schwarzenegger as having properties of both actors and politicians, a tree cannot capture this sort of relationship.
The same need arises with classes. For example, consider how Sequence Ontology (SO), a Semantic Web ontology widely used in biology, classifies the concept "rearrangement breakpoint". (A rearrangement breakpoint is an important idea in evolution, genetic diseases, oncology, etc.; the hierarchy is visualized here.) SO classifies a rearrangement breakpoint as a class of "biological region" and "structural alteration". Like Schwarzenegger as an instance of an actor and a politician, rearrangement breakpoint as a class of biological region and structural alteration cannot be represented by a tree. These classifications require DAGs.
Of course, implementing a way to depict DAG hierarchies for P31 and P279 is another question! Beyond simply requiring a way to represent an item having edges to multiple parent classes, we also need to explore how the diamond problem entailed by the multiple inheritance could affect our structured data, and how to resolve any problems that emerge there. I'm not as concerned with solving those issues here -- well-designed languages like Python support multiple inheritance and have solved the diamond problem -- my point is that we should be aware that P31 and P279 can be used in a way that cannot be represented by a tree data structure. Emw (talk) 15:20, 31 August 2013 (UTC)[reply]
- I see no diamond problem at first sight here in OWL for example. A class is essentially a predicate, a subclass just add constraints to the predicates on the properties that an instance already have, and constraints to maybe other properties. Then multiple inheritance just is the conjuction of predicates on all parent classes, no need to select values differently depending on the father class. For exemple, imagine a Young men astronauts classes, We could have a Man class which imply that the sex is male, is a subclass of Human being, a class young man which add a constraint to the date of birth (first problem here, it's not the current age we want but the age he had when he was an astronaut, may be solved by a qualifier on the class statement on when the predicate was true). The other parent subclass would be astronaut, which has it's own constraints, the instance is a beeing, which is already true by the other parent class, and went is space or as astronaut as an occupation. TomT0m (talk) 11:27, 1 September 2013 (UTC)[reply]
- I think that the use of "instance of" for describing Arnold Schwarzenegger as actor or politician is wrong because we already have properties to do that job (in that case occupation or office held). Don't do the classification job twice. Snipre (talk) 11:58, 7 September 2013 (UTC)[reply]
We can't just delete GND main type or partially replace some of its use. We have to provide another classification scheme either with a new peoperty or with instance of and subclass of system. If we selact the second possibility we have to propose the basis of the classification scheme in order create vertical classification (meaning more than 2 or 3 levels). An horizontal scheme is useless from classification or search point of view because you need to know too many classes. I just propose a first draft of classification (feel free to extend or modify) Snipre (talk) 14:41, 24 August 2013 (UTC):[reply]
- wikidata
- Being
- Person
- Animal
- Vegetal
- Matter
- Substance
- Mineral
- Chemical compound
- Physical object
- Architectural structure
- Airport
- Astronomical object
- Machine
- Vehicle
- Body of water
- Sea
- Ocean
- River
- Landmass
- Mountain range
- Mountain
- Continent
- Architectural structure
- Physical event
- Substance
- Science
- Hard science
- Human science
- Cultural
- Society
- Event
- Organization
- Creative work
- Society
- Being
- Doing it this way makes no sense. Why don't we just collect how the things are sorted currently using subclass/instance of? -_- --Izno (talk) 16:18, 24 August 2013 (UTC)[reply]
- If you take the time you look a little you will see that main relations are taken from existing relations instance of/subclass. But we need to show these relation in order to help people in the classification process and to offer a place to discuss the classification choices. Right now everybody is doing the classification according to his opinion without possibility to have an global overview. Snipre (talk) 18:10, 5 September 2013 (UTC)[reply]
- I think Snipre's suggestion to start thinking about an upper ontology is a good idea. However, constructing our own upper ontology from scratch strikes me as a bad idea. I think we should be looking more toward adopting or importing an external upper ontology, for example SUMO, UMBEL, BFO, etc. While I think most would agree that the GND ontology is not a good upper ontology for Wikidata, I don't think there's much more we can take for granted than that. This deserves its own RFC. Since the classes of GND main type cover a fairly limited domain, I don't think we need to have an upper ontology hammered out before we can migrate away from P107. Emw (talk) 00:51, 5 September 2013 (UTC)[reply]
- I want to +1 that it would be foolish to attempt to invent, apriori, a new upper ontology. I would like to see a way for this to evolve from community action - perhaps by using inference behind the scenes to infer the classes of items based on the properties assigned to them as in OWL reasoning, but that seems like a stretch.. Barring that, and if you want to attempt to impose an upper ontology I think you(we?) not only need to decide which one to use (SUMO, UMBEL..), but this needs to be made a consistent part of the technical infrastructure. By that I mean, when a person goes to create a new item, they should be forced to classify the item somewhere into the upper ontology by the interface. Further, when new properties are created, they should have to define what of the domain and range of the property is in terms of classes in the upper ontology. When someone attempts to use that property to describe something, their choices (in a GUI editor for example) should be constrained to be in the appropriate upper class. (and this would require hierarchical reasoning from the API). If that kind of tooling does not appear, I worry that the effort expended in picking an upper ontology would be wasted as few people would end up using it appropriately. Genewiki123 (talk) 19:18, 5 September 2013 (UTC)[reply]
- Built-in constraints like you describe are very unlikely to be implemented, at least for quite a while. Denny makes the case for this in his essay Restricting the World. I wonder how feasible it would be to support multiple upper ontologies (and type hierarchies as a whole), perhaps by using a qualifier in P31 and P279 to indicate which third-party ontology the P31/P279 classification is based on. Emw (talk) 00:51, 6 September 2013 (UTC)[reply]
- OK, no hard constraints.. If you want to support multiple upper ontologies, why not just leave them outside of wikidata on the semantic web and then establish equivalency links between classes in them and the appropriate items in wikidata? This would be fully unconstrained but of course means that any sort of reasoning or automation would need to be able to import those ontologies. But.. since we don't have any reasoning happening anyway in wikidata, it seems not to be a big problem. If you want to drive the community towards more consistent semantics, I think you'd be better off going with an RFC to decide on which upper ontology to prefer, doing some engineeering to bring it in, but then only enforcing its use socially - as an editorial guideline. If someone decides they want to import another vocabulary for some reason, I don't think they should be stopped, just discouraged if there is a lot of conflict. Genewiki123 (talk) 17:37, 6 September 2013 (UTC)[reply]
- @Genewiki123 Built-in constraints are not necessary: external bots can do the same job by cross-checking data and then can correct some errors or at least spot problems. I don't really think that building an ontology is a big problem if 1) we try to keep the number of different classes very low (<100), 2) we avoid a to high abstraction. As Izno said just take the present use of subclass/instance of properties and see what is the comon ontology of wikidata contributors: a wikidata ontology aims to create subsets of the DB for faster queries and data analysises. For more sharper classification specific properties are used. Here we can play between the ontology and the properties to keep the system simple. Again we are speaking about an upper ontology. Snipre (talk) 11:50, 7 September 2013 (UTC)[reply]
- Snipre, I think I agree with much of what you say other than keeping the number of classes in our ontology under, say, 100 classes. For an upper ontology I think a small number of classes makes sense, but if we're including the middle and lower regions Wikidata's ontology, it seems clear to be that we would have many more classes. Emw (talk) 13:44, 7 September 2013 (UTC)[reply]
- Genewiki123, I think that's a reasonable way forward. I'd be interested in such an RFC after figuring out how to port GND main type to P31/279. Emw (talk) 13:44, 7 September 2013 (UTC)[reply]
- Keeping the number of classes under 100 ain't gonna happen. Each year date (e.g. April 1) is a class - all the April 1s of all the years. Just look at it's wikipedia page - it's a list. That's 366 pages just there. For each country we have three or four types (=classes) of administrative division. Thats a few hundred more. Those are just the sectors I know a bit about. Keeping the Upper Ontology to 100 may be possible, especially if we define "upper" as meaning the top hundred or so properties.
- At the moment we are effectively building the ontology up from the bottom. Now we need
- some visualisation tools to show what we have built and what remains to be done
- a task force to put together the upper level ontology to tie the base classes together. Filceolaire (talk) 19:48, 8 September 2013 (UTC)[reply]
- @Genewiki123 Built-in constraints are not necessary: external bots can do the same job by cross-checking data and then can correct some errors or at least spot problems. I don't really think that building an ontology is a big problem if 1) we try to keep the number of different classes very low (<100), 2) we avoid a to high abstraction. As Izno said just take the present use of subclass/instance of properties and see what is the comon ontology of wikidata contributors: a wikidata ontology aims to create subsets of the DB for faster queries and data analysises. For more sharper classification specific properties are used. Here we can play between the ontology and the properties to keep the system simple. Again we are speaking about an upper ontology. Snipre (talk) 11:50, 7 September 2013 (UTC)[reply]
- OK, no hard constraints.. If you want to support multiple upper ontologies, why not just leave them outside of wikidata on the semantic web and then establish equivalency links between classes in them and the appropriate items in wikidata? This would be fully unconstrained but of course means that any sort of reasoning or automation would need to be able to import those ontologies. But.. since we don't have any reasoning happening anyway in wikidata, it seems not to be a big problem. If you want to drive the community towards more consistent semantics, I think you'd be better off going with an RFC to decide on which upper ontology to prefer, doing some engineeering to bring it in, but then only enforcing its use socially - as an editorial guideline. If someone decides they want to import another vocabulary for some reason, I don't think they should be stopped, just discouraged if there is a lot of conflict. Genewiki123 (talk) 17:37, 6 September 2013 (UTC)[reply]
- Built-in constraints like you describe are very unlikely to be implemented, at least for quite a while. Denny makes the case for this in his essay Restricting the World. I wonder how feasible it would be to support multiple upper ontologies (and type hierarchies as a whole), perhaps by using a qualifier in P31 and P279 to indicate which third-party ontology the P31/P279 classification is based on. Emw (talk) 00:51, 6 September 2013 (UTC)[reply]
- I want to +1 that it would be foolish to attempt to invent, apriori, a new upper ontology. I would like to see a way for this to evolve from community action - perhaps by using inference behind the scenes to infer the classes of items based on the properties assigned to them as in OWL reasoning, but that seems like a stretch.. Barring that, and if you want to attempt to impose an upper ontology I think you(we?) not only need to decide which one to use (SUMO, UMBEL..), but this needs to be made a consistent part of the technical infrastructure. By that I mean, when a person goes to create a new item, they should be forced to classify the item somewhere into the upper ontology by the interface. Further, when new properties are created, they should have to define what of the domain and range of the property is in terms of classes in the upper ontology. When someone attempts to use that property to describe something, their choices (in a GUI editor for example) should be constrained to be in the appropriate upper class. (and this would require hierarchical reasoning from the API). If that kind of tooling does not appear, I worry that the effort expended in picking an upper ontology would be wasted as few people would end up using it appropriately. Genewiki123 (talk) 19:18, 5 September 2013 (UTC)[reply]
- I think, as it is, wikidata might be a mismatch for these example ontology systems because it is based on wikipedia articles, and many articles describe two different but related things. For example, many museum articles might describe the building and the organization, but not all of them. Also, factor in all the language articles which may describe differing topics. It soon gets messy. Whereas this Suggested Upper Merged Ontology (SUMO) is much easier, English only, and just deals with terms of a few words each. Danrok (talk) 02:14, 5 September 2013 (UTC)[reply]
- As soon as we get serious about describing the museum organisation and the museum building then we need two separate wikidata items - one for each these. One wikidata item cannot describe both. If there is one wikipedia article covering both then that article will (eventually) have two infoboxes - one for the organisation and one for the building.
- Wikidata may be based on wikipedia articles for now but in future it will be based on 'things that can be described using wikidata properties' and that will be the entire basis on which decisions are made as to which items are merged and which are split; which wikipedia articles are linked to that page and which go elsewhere. Filceolaire (talk) 19:48, 8 September 2013 (UTC)[reply]
- Divide the Wikidata items into instances (things which can be described by wikidata properties like 'located at', 'invention date', 'instance of', etc.), classes (items which are groups of instances), and others (items related to wikipedia pages which are not instances or classes such as compound items (Bonnie and Clyde), Lists, Category pages, Disambiguation pages, etc.
- Classify each instance as an 'instance of' one or more classes. Describe these instances using item properties and value properties.
- Classify other items as 'instances of' appropriate classes. Use properties to describe the relationship of these items to instances and classes.
- Classify the classes using the 'subclass of' property. Class A is a subclass of class B if every instance of class A is also an instance of class B.
- Use 'part of' to link instances to other larger instances. Don't use 'part of' for linking instances to classes.
- For each property describe the constraints on the items to be used with that property by listing the classes of items which can be used as the Domain or the Range for that property. Where needed define additional classes.
- discussion
Section above rewritten. Filceolaire (talk) 23:11, 8 September 2013 (UTC)[reply]
Let's discuss this in another RFC
edit- This seems largely independent of migrating away from GND main type. Could we put this into a separate RFC? I'm concerned that this RFC is losing focus. There is plenty to discuss about the task at hand. Emw (talk) 23:17, 8 September 2013 (UTC)[reply]
Suggested Upper Merged Ontology (SUMO)
editLink: SUMO Description: Pro: Contra:
Upper Mapping and Binding Exchange Layer (UMBEL)
editLink: UMBEL Description: Pro: Contra:
Basic Formal Ontology (BFO)
editLink: UMBEL Description: Pro: Contra:
General Formal Ontology (GFO)
editLink: GFO Description: Pro: Contra:
General Formal Ontology (Cyc)
editLink: GFO Description: Pro: Contra:
Evaluating upper ontologies is beyond the scope of this RFC
editEvaluating upper ontologies for Wikidata is well and good, but should be done independent of this RFC. We do not need to choose an upper ontology for Wikidata to migrate away from GND main type. This RFC is already fragmented enough; let's focus on the specific task at hand and have upper ontology discussions another day, or at least in a separate RFC. Emw (talk) 23:04, 8 September 2013 (UTC)[reply]
- Support Filceolaire (talk) 23:15, 8 September 2013 (UTC) This needs to be done buy not here.[reply]
- Support - well said Filceolaire. I even don't understand what's so complicate at all. Move all GND statements to "instance of", that's it. --Nightwish62 (talk) 10:52, 9 September 2013 (UTC)[reply]
- That leaves us no-better off than we were before and certainly doesn't fix the other problems that can be solved during migration. --Izno (talk) 22:21, 9 September 2013 (UTC)[reply]
- In Nightwish's defense, I think a naive P107-to-P31 port -- except for P107 'term' claims -- would actually improve things considerably. P107's single biggest flaw is that it's an enum. A simple port to P31 would fix that. Sure, gods and groups of people would still absurdly be called "person", but that's likely a relatively miniscule proportion of claims and not even theoretically fixable with P107. There are much better options, and I wouldn't support such a crude port, but I think it would nevertheless improve the current situation. Emw (talk) 01:25, 10 September 2013 (UTC)[reply]
- What would be a better option in your mind? --Izno (talk) 23:28, 10 September 2013 (UTC)[reply]
- This RFC! Emw (talk) 04:09, 17 September 2013 (UTC)[reply]
- Huh? I was keying off "There are much better options". What options might those be? :) (The answer to that question can't be "this RFC"; "this RFC" is where we must answer that question!). --Izno (talk) 00:21, 18 September 2013 (UTC)[reply]
- This RFC! Emw (talk) 04:09, 17 September 2013 (UTC)[reply]
- What would be a better option in your mind? --Izno (talk) 23:28, 10 September 2013 (UTC)[reply]
- In that case better do no migration: why doing the job twice ? If you just want to change the property without any improvement, just let the current property. Snipre (talk) 14:35, 13 September 2013 (UTC)[reply]
- No thanks. P107 is going away. Please don't argue against that fact, whatever pains we may suffer otherwise, after it's been discussed to death already. --Izno (talk) 16:26, 13 September 2013 (UTC)[reply]
- Ok, so just delete everything but don't create new claims based on instance of (P31) with old values: don't change P107 (P107): term with instance of (P31): term. Deleting P107 (P107) now without any replacement solution just means we will start from the scratch for a new classification system. And just be aware that a lot of constraints use P107 (P107) as check for claim values. I hope we will have a solution for this property use. Snipre (talk) 08:24, 16 September 2013 (UTC)[reply]
- You're not making any sense, and this is offtopic to the RFC anyway. P107 is going away. It's being replaced by P31/P279. If you would like to contribute to the discussion constructively, please do...
- As for constraints, that may be something to start in a new section: How do we go about dealing with the constraints? Without getting too much into that discussion, it's my feeling that most of the constraints can be dealt with quite handily without the need for P107. But please, start a new section for that because it is a valid concern about how best to migrate. --Izno (talk) 19:37, 16 September 2013 (UTC)[reply]
- @--Izno Please, what is the sense to change P107 (P107): term by instance of (P31): term ? Just reducing the number of properties by one ? At the begining the problem of the GND was that this classification was too simple. Without any change this argument stays the same, using term as classification term doesn't allow any improvement. So instead saying to go ahead without any idea which direction we will follow better stop and explain again the purpose of the deletion of P107 (P107). I am really interested to know what is your explanation because until now you didn't give any argument to continue this process. Snipre (talk) 21:42, 16 September 2013 (UTC)[reply]
Your input is welcome in other sections about how to transfer the data. I am not in favor of just having a mass import, as I made quite clear above. :)
We will not be having this discussion again. The RFC is very clear. --Izno (talk) 23:10, 16 September 2013 (UTC)[reply]
- @--Izno Please, what is the sense to change P107 (P107): term by instance of (P31): term ? Just reducing the number of properties by one ? At the begining the problem of the GND was that this classification was too simple. Without any change this argument stays the same, using term as classification term doesn't allow any improvement. So instead saying to go ahead without any idea which direction we will follow better stop and explain again the purpose of the deletion of P107 (P107). I am really interested to know what is your explanation because until now you didn't give any argument to continue this process. Snipre (talk) 21:42, 16 September 2013 (UTC)[reply]
- Ok, so just delete everything but don't create new claims based on instance of (P31) with old values: don't change P107 (P107): term with instance of (P31): term. Deleting P107 (P107) now without any replacement solution just means we will start from the scratch for a new classification system. And just be aware that a lot of constraints use P107 (P107) as check for claim values. I hope we will have a solution for this property use. Snipre (talk) 08:24, 16 September 2013 (UTC)[reply]
- No thanks. P107 is going away. Please don't argue against that fact, whatever pains we may suffer otherwise, after it's been discussed to death already. --Izno (talk) 16:26, 13 September 2013 (UTC)[reply]
- In Nightwish's defense, I think a naive P107-to-P31 port -- except for P107 'term' claims -- would actually improve things considerably. P107's single biggest flaw is that it's an enum. A simple port to P31 would fix that. Sure, gods and groups of people would still absurdly be called "person", but that's likely a relatively miniscule proportion of claims and not even theoretically fixable with P107. There are much better options, and I wouldn't support such a crude port, but I think it would nevertheless improve the current situation. Emw (talk) 01:25, 10 September 2013 (UTC)[reply]
- Personally I think we should have a dozen different upper level ontologies. each with their top level item marked as 'subclass of of:Ontology top level item'. The english Wikipedia category system, for instance, has two top level ontologies for articles (Q6741584 and Q4587687) and a third which includes non-article pages(Q1281). All of these are on top of the same set of lower level categories and their existence or non-existence has little or no influence on how those lower level categories are organised. I believe the upper level ontologies used on Wikidata items will, similarly, have little influence on the classification of instances using 'instance of' and other properties. Filceolaire (talk) 19:20, 16 September 2013 (UTC)[reply]
- As the section is about, this is the wrong place (and which you seem to recognize already? :). --Izno (talk) 19:37, 16 September 2013 (UTC)[reply]
- That leaves us no-better off than we were before and certainly doesn't fix the other problems that can be solved during migration. --Izno (talk) 22:21, 9 September 2013 (UTC)[reply]