Wikidata talk:WikiProject Taxonomy/Archive/2013/10

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

The variety of specific properties -- kingdom etc.

Given the fact that the discussion above seems not to take the specific properties (P75 (P75), etc) into consideration, and that there's even a working implementation of Taxobox using only parent taxon, taxon rank, and taxon name, would a deletion request submitting all of the specific properties have consensus from the task force? It would be nice to only deal with the 5 properties that fully define a taxon. --Izno (talk) 02:56, 17 September 2013 (UTC)

That is a very good question, but not so easy to answer. If there were a working implementation of Taxobox, then it would depend on having "parent taxon" in place universally
  • For angiosperms at and above the level of family, parent taxon is almost universally present (in a high degree of redundancy, according to various sources), and the exceptions are odd balls. For gymnosperms a sufficient redundancy could be achieved pretty quickly. I don't know about other groups but I guess they are not ready.
  • It is different below the level of family. I am not even sure this could be quite done in practice, as infrafamilial classifications can be quite involved. Anyway, there is an enormous amount of work to be done here.
  • The rank of kingdom is used at the moment in the current implementation of Taxobox, so it cannot be deleted now. In the long run this use is impractical, as the question of under what Code of Nomenclature an organism belongs cannot be read from its taxonomic position (not for the small stuff). For example the Cyanobacteria taxonomically are prokaryotes but their nomenclature is governed by the Code for algae, fungi and plants. From a practical perspective there needs to be a new property "Code of Nomenclature" (with some half a dozen values) to deal with this. It would not need to be used in many items (once parent taxon is fully implemented).
- Brya (talk) 05:30, 17 September 2013 (UTC)
Thanks Izno for bringing this up. I'm not sure if we should directly delete them. Perhaps, we should wait until this bug is fixed as it prevents a Wikipedia version of this taxobox yet. If all of us vote for this bug, it's importance is increased for the developers. As already one (?) Wikipedia chapter uses the specific properties we should perhaps have a working Wikipedia version of this taxobox before we force the switch. However, right now we could start with the following:
  • Outline on this project's page that these properties are deprecated. Also, this should be mentioned on the properties talk pages.
  • Perhaps we could start to remove all specific properties from taxons which have parent taxon already set. However, we would need to inform bot users to only add the deprecated properties if parent taxon (P171) is not set yet.
  • To cope with Brya's third point: Propose the respective property. Brya, would you? The taxobox can be easily changed then.
What do you think? On the one hand side if we wait too long the specific properties are going to be the default only because of there broad usage. On the other hand side we use them to identify the taxa for adding P171. — Felix Reimann (talk) 11:46, 17 September 2013 (UTC)
Sounds reasonable. We should get rid of all these properties as fast as possible. But we have somehow to reach a consens (see: Inheritance of taxon ranks). I think the new property should be used as qualifier for the claims instance of (P31) = taxon (Q16521) / monotypic taxon (Q310890). --Succu (talk) 12:46, 17 September 2013 (UTC)
I don't see how to proceed with the RfC as long as the developers have not fixed the bug. Perhaps it would be enough to get a concrete statement from them that indirect usage is definitely wanted. Otherwise we have no strong argument against "The only working approach is: Add all information to each and every taxon" as stated by Eran. The rest of the opposing comments seem to me to make not much sense. Of course, everyone here could support the RfC question 1.  — Felix Reimann (talk) 15:06, 17 September 2013 (UTC)
I've created a bugzilla account and voted. ;) --Succu (talk) 15:18, 17 September 2013 (UTC)
By they way, I think the Rfc was closed, but not marked as such. --Succu (talk) 15:30, 17 September 2013 (UTC)
I think we have a strong argument against one by one mapping of properties in wikipedia taxoboxes: We have only eight properties for ranks, but the ICN knows 24! So an adequate mapping is not possible. --Succu (talk) 15:38, 17 September 2013 (UTC)
Okay, let me sort things out here:
  • Brya says, "kingdom can't go away yet" and "parent taxon needs to be deployed more fully". Question the first: is 'kingdom' only being used on direct children or is it being used on all implementations for the current Wikidata-only-taxobox? I agree with the second, but I think that's not that hard to do?... An assessment on that (either assuming robotic aid or not) would be nice.
  • Felix says, "mark usage as deprecated", "remove it from where parent taxon is used", "remind the bot ops", and "(afraid of) default due to broad usage". I would agree with all four. I would go further and ask for the bot ops not to add anything at all if they cannot determine the parent taxon—from however they're importing the data—and also to make up a list from indeterminate taxa. I think that's much less painful that way. I agree very strongly with the assertion that they should be deleted prior to mass usage; as noted, he.wp is already using the properties. As that's my stance, I would also say we should do that before the blocking bug gets fixed: 1) we don't have a timeline for that bug and 2) yeah I'd be scared of 10 million bot edits just to remove properties. :).
  • Succu says "consensus not clear in RFC" — I agree, but I'm kind of ambivalent on the point. The people who seem to care would be the ones here, no, and everyone so far agrees that it would be crazy for us to keep the statements even though they are being used? On a side note, it seems also clear to me that there is a working implementation of taxobox if extremely limited in functionality for the clients anyway, as they could only show parent taxon in like a separate field while keeping their implementations similar to pre-Wikidata.
  • Felix's second comment: "get comment from devs" — I think it's pretty clear that the devs will work on implementation; Lydia (WMDE) was the one who filed the bug! and it looks like Danny has made a comment on possibilities of what needs to get done for that to work.
  • Succu's second comment, "have wrong number of properties" is another reason why it, well, doesn't really work with what we've got currently.
Did I mistake something there? --Izno (talk) 00:53, 28 September 2013 (UTC)
The current implementation of taxobox indeed uses "Kingdom", but I see no inherent reason why it needs to do so. It could also test if there is an item where rank=kingdom and read off the value given there.
        However, I would like to point out the level below family again. In some cases infrafamiliar classifications can be very complicated, and it is quite possible that Wikipedia's don't want to use any infrafamiliar classifications. Thus, there is not only a lot of scope for error, but also a danger an unnecessarily heavy computational load on the servers. - Brya (talk) 04:45, 26 September 2013 (UTC)
I'm skeptical. That aside, don't worry about performance. I'm kind of ambivalent to what the Wikipedias want to do. Most will have either a) already set up redirects for the intrafamiliar classifications, or they have not, and they can make new articles, or they will come up with the modules necessary to filter out these classifications in a particular tree. Preemptively, is there something I'm not addressing with those sentences? --Izno (talk) 00:53, 28 September 2013 (UTC)
Yes, but "Nothing in this page is to say that editors should not be mindful of performance, only that it should not limit project development." And we are talking big numbers here. As to what Wikipedia's want, yes, this is highly uncertain. In an ideal world Wikipedia's would write interesting articles about interesting organisms and use a not-too-large taxobox in tasteful support. In reality, all too often, Wikipedia's are starved for content and grasp at a taxobox to serve as the backbone for an article, any kind of article, as long as it is there. The bigger the taxobox, the less it is noticeable that there is little or no content (they hope). -- Brya (talk) 05:19, 28 September 2013 (UTC)
  • I think we should follow "do not worry about performance". First we should concentrate on deploying a clean model. If we later on really see some performance problems there are (already now) ways to improve it. For example: Set the default to a reference which does not enumerate all infrafamily ranks. If you think this won't work, please give me a concrete example to have a look at. Also, the current Javascript implementation is itself very inefficient as in cannot use the fast and powerful parser functions which are available at Wikipedias as Wikidata is not a client for itself. And even so there are now performance problems (I tested it with quite deep hierarchies)
  • The presentation in the taxobox of infrafamily ranks could be reduced in numerous ways. Some examples: a) Show only up to "3" (choose your number) infrafamily ranks. b) Show minor ranks only if the Wikipedia the taxobox is displayed at has an article for it. c) the last rank must be a principal rank. Or combinations from all of this.
  • The Module:Taxobox is not a blocker for us. As the more powerful alternative to detect the rules to apply is proposed: Wikidata:Property_proposal/Term#Code_of_nomenclature (vote here! :-) ), I will remove P75 (P75) from the taxobox today. Thus, we can move on.  — Felix Reimann (talk) 09:08, 28 September 2013 (UTC)
As regards the last, "kingdom" is not quite superfluous, as it is used for the selection of the colour of the taxobox. The names of fungi are governed by the ICNafp but the taxobox is given a different colour than that for plants, the names of which are also governed by the ICNafp. I don't know about protists, but something may go wrong there as well.
        As to infrafamiliar classifications, this is somewhat academic as we have neither a working taxobox, nor infrafamiliar classifications in place. - Brya (talk) 10:35, 28 September 2013 (UTC)
We have at least one infrafamiliar classification in place, based on Das große Kakteen-Lexikon (Q13520496). :) --Succu (talk) 13:28, 28 September 2013 (UTC)
Oh yes, there is not a complete absence of infrafamiliar classifications (I am putting in something myself now). The Cactaceae classification is fairly respectable with two ranks between family and genus, but some of the big families have really complicated internal classifications. - Brya (talk) 17:54, 28 September 2013 (UTC)
Apocynaceae (Q173756) would be an interesting example. Ranks down to subtribus. --Succu (talk) 18:08, 28 September 2013 (UTC)
Oh yes, especially given the fact that the family is bigger now than it used to be. But there are plenty of complicated families! - Brya (talk) 04:10, 29 September 2013 (UTC)
It would be great if you could model one specific taxon for which you think the current approach is problematic as an example.  — Felix Reimann (talk) 18:55, 29 September 2013 (UTC)
Yes, but don't expect anything soon. One of the main issues is that it will take a great deal of work for any complex infrafamiliar classification to be put in, let alone several. It is unlikely if it will happen at all: who is going to do it? - Brya (talk) 11:22, 30 September 2013 (UTC)

  Info Property code of nomenclature (P944) was created . --Succu (talk) 20:38, 3 October 2013 (UTC)

...and is now supported by the Taxobox. P75 (P75) is not used any longer. Whenever I edit a taxon of a primary rank (classis, family, order), I add P944.  — Felix Reimann (talk) 15:18, 4 October 2013 (UTC)
Thanks for the fast implementation. --Succu (talk) 16:20, 4 October 2013 (UTC)
It is good to see that this has been created. However, it seems overdone to add it to families and orders. If 'parent taxon' is implemented it needs to be only added to the highest ranking items. - Brya (talk) 17:47, 4 October 2013 (UTC)

Deletion request

These were submitted for deletion by GZWDer, as a btw. Please see WD:PFD#taxonomic rank properties. --Izno (talk) 23:32, 29 October 2013 (UTC)

Taxon

I see that sometimes "instance of taxon" is added, but I see no real reason for this (if something has a taxonomic rank it is a taxon anyway). It looks like unnecessary clutter to me. Couldn't this be deprecated or eliminated? - Brya (talk) 04:51, 26 September 2013 (UTC)

instance of (P31) is part of the main categorization system of Wikidata. Every item should have this as it replaces the GND main Type which in turn will be deleted. Bots must add it if possible when they change an item. I currently set either taxon or monotypic taxon (Q310890). If the parent taxon (P171) of a taxon is instance of monotypic, the taxobox shows also the authority of the parent. See Anilius scytale (Q834371) and Help:Basic_membership_properties.  — Felix Reimann (talk) 03:08, 27 September 2013 (UTC)
As far as I can see, all the objections that exist against the GND main Type Term apply equally to "taxon". It does not look like a real improvement to me. - Brya (talk) 04:38, 27 September 2013 (UTC)
I do not know if it is the perfect approach either. However, it is definitively an improvement as it is way more flexible as the world in not sqeezed in 4 distinct classes. And it is also part of the W3C standard for ontologies, which means that specialists have already thought about it.  — Felix Reimann (talk) 05:57, 27 September 2013 (UTC)
And GND main Type Term is not part of the W3C standard for ontologies? - Brya (talk) 11:25, 27 September 2013 (UTC)
No. It is the classification system of the German National Library. I'm pretty shure that it was never meant as a means to model the world but just to categorize what they have in their database. As it was adapted early after the start of Wikidata it became the quasi-default by accident.  — Felix Reimann (talk) 14:34, 27 September 2013 (UTC)
Well, it is an improvement, it just is not much of one ... Adding taxon or monotypic taxon also raises questions, like whether it is useful to add both (this will depend on the implementation of the not yet existing taxobox software), but sometimes this will have to be done anyway, as when a taxon is monotypic according to one source, but includes more than one taxon according to another source. - Brya (talk) 16:58, 27 September 2013 (UTC)
Something I've been wondering: Isn't the proper relation to be making instance of the taxonomic rank? Example: Homo sapiens (human) instance of species? Species is then a subclass of a taxon, no? We could actually deprecate taxonomic rank if we wanted to (not that we want to). That then makes "parent taxon" really the "subclass of" relation, I think, no? I need some pretty drawings here... --Izno (talk) 01:09, 28 September 2013 (UTC)
A species is an instance of taxon anyway, no matter how Wikidata defines things. This is why it really is unnecessary to have "instance of taxon" in each item: it is redundant. I would pretty much hate "Homo sapiens instance of species" as it suggests that species are a given; it is rather the opposite: taxonomists make groups and assign them a rank (usually), but the next taxonomist to come along may disagree and assign a different rank. Ranks are like the weather: here today, changed tomorrow (or not). - Brya (talk) 05:00, 28 September 2013 (UTC)
Which is rather irrelevant: we have qualifiers (and soon, claim-ranks) for a reason! --Izno (talk) 15:01, 28 September 2013 (UTC)
No, I don't think so. The rank is only a property of the taxonomic group: A taxon is a group of species which are presumably related. The taxon may have are rank, may have a scientific name (or more, if debated), may have an author if already formally described. If you think of several family-ranked taxa: Depending on how they fit best in the hierarchy the author creates, they are sometimes at family rank, sometimes at subfamily rank. Nonetheless, they remain the same taxon with all it's behaviors and characters.
@Brya: Please consider that Wikidata contains also such things: Douglas Adams (Q42), Faust (Q29478), and New York City (Q60). It is really helpful if you have one property with which you can distinct organisms from these things.  — Felix Reimann (talk) 07:22, 15 October 2013 (UTC)
As it now is, there already is a property with which you can distinguish organisms from these things, or actually several. If "taxon rank" is used or "taxon name", etc, then you already know it is an organism. - Brya (talk) 10:45, 15 October 2013 (UTC)
instance of (P31) has a specific meaning. By saying something is a taxon, in OWL we can express that this imply that this taxon as a rank, maybe an inventor ... exactly as if we know an individual as a human beeing or a horse we can infer some of its characteristics. It's no different, so it's useful to express it's an instance of taxon to be consistent. TomT0m (talk) 16:06, 15 October 2013 (UTC)
(edit conflict) Only for human users. But for bots this would mean that they need know all existing properties (~900) to be able to say "this item does not belong to the domain I'm interested in". With P31 it is easy to deduce that an item with P31 pointing to something different than taxon (and its subclasses) does not belong to this domain. Of course, if a human sees an item which has only the property official language (P37), it can conclude that the item is not a species only by understanding the label. For a machine, this is much harder.  — Felix Reimann (talk) 16:13, 15 October 2013 (UTC)
I'm still puzzled: human (Q5) should be instance of (P31) species (Q7432) subclass of (P279) taxon (Q16521), no? That should make the duplication of data that is taxon rank (P105) apparent. I'm not trying to get P105 deleted here, mind you, just also trying to point out that P31 "taxon" is duplicating data already present in the form of "taxon rank" <anything>.

I personally would rather see P31 used, and at least duplicated, but that's just me. :^) --Izno (talk) 19:50, 19 October 2013 (UTC)

Your edit was puzzled? Mind to explain? --Succu (talk) 20:22, 19 October 2013 (UTC)
Why do you think that is relevant to my question? --Izno (talk) 21:03, 19 October 2013 (UTC)

Interwiki puzzles

Please think about interwiki conflicts: Ursidae and Pangolins. Infovarius (talk) 19:57, 4 October 2013 (UTC)

Shutdown

I see that sites like GRIN and USDA Plants are now not available because of the US Government Shutdown. Weird! - Brya (talk) 07:31, 5 October 2013 (UTC)

The same for ITIS. Liné1 (talk) 08:14, 5 October 2013 (UTC)

Integration with phylogenetic trees

Is anyone thinking about storing phylogenetic trees in wikidata, and mapping nodes of these trees onto any (monophyletic) taxon nodes set up by the taxonomy task force? HYanWong (talk) 20:44, 22 October 2013 (UTC)

It's a good idea, I think it's doable with a subclass of or upper taxon with a qualifier and sources to identify the sources of the tree. So we do not have to map nodes if they are identical : we just have to put several statements on the same item. TomT0m (talk) 21:46, 22 October 2013 (UTC)
This is not new: there are some trees present here. The point is that to include a tree it is not only necessary to have a source, but also to have all the nodes named. There are a myriad trees out there, but often, only some of the nodes have been named. - Brya (talk) 04:23, 23 October 2013 (UTC)
Brya is right, we do this already. However the number of taxa is huge therefore we need your help. Also, some early tooling exists to support the tasks of inserting huge trees. For my bot, the syntax is described here. Succu may introduce his bot himself. The idea is to model all named clades and ignore intermediate unnamed clades.  — Felix Reimann (talk) 05:38, 23 October 2013 (UTC)
I was thinking of a tool that took a (e.g. Newick-formatted) phylogenetic tree, presumably from a reputable scientific paper, with a reasonable number of named intermediate nodes, and in wikidata, saved the tree structure as nodes with parent-child relationships. The named nodes could then be mapped to known taxa, if appropriate, and the tree as a whole would form another taxonomy. I'm keen to have full phylogenetic relationships stored somehow on Wikidata, not simply certain named taxonomic levels. I can provide e.g. a reputable mammalian supertree with many named nodes HYanWong (talk) 08:36, 23 October 2013 (UTC)
I am not sure what will happen if you enter a tree leaving out the unnamed nodes, but it feels wrong to me, and the more unnamed nodes it has, the more wrong it feels. Presumably, this is not a huge problem, as the more reputable the tree is, the less unnamed nodes it will have. Anyway, "full phylogenetic relationships" is an illusion, as there are more trees out there than I would care to think about. - Brya (talk) 18:07, 23 October 2013 (UTC)
I think there are plenty of reputable trees (read, with high consensus for most nodes) that still have the majority of nodes unnamed. After all, it's unlikely that (for example) sub-family or sub-genera clades, however well established, will have nodes named after them. But they such groupings (if correct) still provide valuable information for biologists looking for relationships between species. Hence my desire to make such data available to mediawiki sites. But yes, there would have to be some way to avoid creating a huge number of poorly established tree nodes. HYanWong (talk) 22:02, 23 October 2013 (UTC)
I hope that "high consensus" means that these nodes are found in many papers, with independent corroboration? - Brya (talk) 04:39, 24 October 2013 (UTC)
Well, there are many meanings to the word, including high support values, multiple agreement between gene trees, multiple agreement between trees incorporated into a single "super tree", or general agreement in the literature. In the spirit of wikipedia, I wonder if there is an argument to be able to import any tree into wikidata, but then have some means of describing whether the node has consistent support in the literature. After all, we might want to reflect disagreement between nodes too, if it is an important feature of our current understanding. HYanWong (talk) 15:41, 26 October 2013 (UTC)
Indicating support for anything can be done by adding sources to it. But surely, it is not a good idea to import any tree into WikiData; there are far too many to even consider that. A tree must be notable enough to pass the threshold for inclusion. - Brya (talk) 06:00, 28 October 2013 (UTC)
Return to the project page "WikiProject Taxonomy/Archive/2013/10".