Wikidata talk:WikiProject Taxonomy/Archive/2013/07

Latest comment: 10 years ago by Brya in topic synonyms
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Constraints

I propose to add constraints to some main taxonomic properties in order to track inconsistencies. As parent taxon (P171), taxon name (P225), taxon rank (P105) are necessary for any taxon (taxon (Q16521)) I propose the following set of constraints to these and all rank-properties and may be IUCN conservation status (P141), taxon range map image (P181), endemic to (P183), taxonomic type (P427), botanist author abbreviation (P428):

{{Constraint:Item|property=P31|item=q16521}}
{{Constraint:Item|property=P171}}
{{Constraint:Item|property=P225}}
{{Constraint:Item|property=P105}}

I started adding but Succu required confirmation. Please express your opinions. Infovarius (talk) 18:54, 1 June 2013 (UTC)

I also thought about adding these constraints. In principle, I would support this. However, there are still some open question we should probably discuss beforehand, to be able to fix the violations consistently:
  1. There are many items which have properties which are not regarding taxons, like banana (the fruit) or dolphins (several different families), where bots constantly add taxon properties. My proposal is to mark them with taxon rank (P105)=<No value> to show that this item is definitely not a taxon. How should be handle non-monophyletic taxons?
  2. IUCN conservation status (P141): only holds, if given by IUCN. Thus, this is not a mandatory but optional property.
  3. taxon range map image (P181): only holds, if we have one. Especially, for many higher order ranks this is probably not senseful. Thus, this is again not a mandatory property.
  4. endemic to (P183): I think this property is to be discussed as it is used in two different senses: 1) s.l.: "This species lives only in France and Germany, thus it is endemic to Europe". Or, to exaggerate this meaning: "the species has a worldwide distribution, thus, it is endemic to the earth" 2) s.s.: A species can only be endemic in a really encapsulated, distant habitat like an island. E.g., "Varanus komodoensis is endemic to the Sunda islands", but the adder is not endemic to any place. Depending on the meaning, P183 is either optional or not.
  5. botanist author abbreviation (P428): shouldn't this be a property of the author, linked by taxon author (P405)? See for example Q192056.
FelixReimann (talk) 09:00, 3 June 2013 (UTC)
1) Good idea.
2, 3) These are optional. Constrains are wrong.
4) Should be narrowed to s.s., thus no constrain.
5) Correct.
--Succu (talk) 15:28, 3 June 2013 (UTC)
I suppose you missed the point. I've proposed not constraints in P171, P225, P105 but "P171, P225, P105" constraints on other properties.
1) It's possible. But I see a better variant: polyphyly (Q217743).
2-3) Ok, these properties are optional for taxon. But they are necessarily should be in item for taxon => Constraints for them (.
5) I've mixed with taxon author (P405) - this property should have similar to constraints in #2-3.
The main idea: because many properties are defined on taxons they are should have constraints telling about it. Infovarius (talk) 09:53, 7 June 2013 (UTC)
You're right, I missed your point: Yes, the dependent properties IUCN conservation status (P141), taxon range map image (P181), endemic to (P183), taxonomic type (P427), botanist author abbreviation (P428) may only be set if also the required ones parent taxon (P171), taxon name (P225), taxon rank (P105), taxon author (P405) are also set. Thus, I support constraints for IUCN conservation status (P141), taxon range map image (P181), endemic to (P183), taxonomic type (P427), botanist author abbreviation (P428). FelixReimann (talk) 15:51, 7 June 2013 (UTC)
I don't think you missed the point. I reverted completly other proposals. Constraints on IUCN conservation status (P141), taxon range map image (P181), endemic to (P183), taxonomic type (P427), botanist author abbreviation (P428) are depending only from taxon name (P225). Properties parent taxon (P171) and taxon rank (P105) are not mandatory. Rarely taxon author (P405) should be provided to make things clear. --Succu (talk) 21:55, 8 June 2013 (UTC)
This is also ok for me. Otherwise, we would only see the same constraint violations for several different properties. I'm fine with constrainting only taxon name (P225). FelixReimann (talk) 13:51, 12 June 2013 (UTC)
Succu, are we talking about taxons or what? Why are parent taxon (P171) and taxon rank (P105) not mandatory? Imagine a case when someone by error applied IUCN conservation status (P141) to, say, kind of fairy Winks. taxon name (P225) s/he creates easily by using Pig Latin. But impossibility of applying parent taxon (P171) and taxon rank (P105) would make things clearer. Infovarius (talk) 03:55, 13 June 2013 (UTC)
Then, the constraint violation will pop up at Wikidata:Database_reports/Constraint_violations/P225#Item_P105. We will just have less different constraint violation pages, where the same violation is reported several times. For the "optional" properties, requiring P225 is sufficent as P225 itself requires P105...  — Felix Reimann (talk) 12:34, 14 June 2013 (UTC)
Another point: P89 constraint violations are created be different users, for example [1] [2] [3] linking a species item with taxon rank (P105) to a disambiguation item. Do we want this?? Are these created by some scripts, like useful.js or similar?  — Felix Reimann (talk) 12:41, 14 June 2013 (UTC)

It is not obvious now where can I find now a list of taxons which doesn't contain parent taxon (P171) claim? --Infovarius (talk) 18:49, 7 July 2013 (UTC)

To do what? --Succu (talk) 19:00, 7 July 2013 (UTC)
Of course to add the absent claim (which is necessary I suppose). --Infovarius (talk) 18:32, 9 July 2013 (UTC)

An early version of a workflow to add taxonomic data

Assumptions:

  • Taxonomy is subject to change
  • For many parts of the tree of life, there is not one single authority defining the current state of the art
  • Different concepts exist in parallel (also in different wikipedia chapters), we should be able to reflect them.
  • If we want to have references from different scientific articles for each and every taxon (for which wikipedia articles exist + x), semi-automation is required.

I've been thinking and testing a possible workflow for this task which I want to present here. If you want, you can use it already. If more people use it, I will add an appropriate user interface (currently, it is only usable from the command line) and share it with you.

  1. Choose one scientific article you want to add. As example, I selected doi:10.1186/1471-2148-13-93, PDF, a very recent publication regarding Squamata (Q122422) which covers 1149 taxa from genus level up to the order Squamata. the paper is quite similar to what is currently represented in the reptile database, the main taxonomic reference of de-wiki and others)
  2. Create an item for the article to be referenced: A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes (Q13416674)
  3. As I see no chance to automatically and reliable extract taxa from any article, now comes the manual work: Create the hierarchy as given by the article. This can be written in wiki syntax, as this is easiest for human input:
* Squamata
** Dibamidae 
** Episquamata
*** Lacertoidea
**** Amphisbaenidae 
***** Amphisbaena 
***** Ancylocranium 
***** Baikia 
***** ...
*** Toxicofera
**** Serpentes
  1. Mapping of these taxon names to Wikidata items.
  2. For each name, matching items are automatically searched in a multi-step approach using, among other criteria, taxon name (P225) or item labels. As different taxa with the same scientific name exist (like Q1862566 and Q1482244) or other ambiguity is possible, each mapping candidate is presented to the user by opening the respective item in the browser. Each mapping must be accepted manually, for example:
Is Anolis ( Dactyloidae, Iguania ) the same as Q311348 (y/n)?
Also taxon ranks can be added now easily.
 
Squamata taxonomy according to Pyron et al. 2013 with mapping to wikidata items if existent (=green). Red items show other already existing values for Property:P171
  1. The gathered information is written back to wikipedia syntax, see User:FelixReimann/Pyron2013 and the tree of taxons is created for visual verification, see figure. Green vertices are items found already in Wikidata, red ones are those, where the Wikidata item has a different value for parent taxon (P171). You can click on the green vertices, they link to the corresponding item.
  2. Now we have a verified hierarchy. Thus, a bot can add for all existing items taxon rank (P105), taxon name (P225), parent taxon (P171) and add references for all of them linking the item of the scientific article.

For additional articles in this field, the existing name->item mapping can be reused.

With such a referenced taxonomy, a wikipedia chapter could decide to use for their taxobox, e.g., "for any taxon below squamata the claims which are based on scientific article A" while another wikipedia chapter could decide to "use every claim of article A for squamata but for all snakes, use reference B is existent". But this is still to be discusses in the future. If you like it, start with a simple definition page.  — Felix Reimann (talk) 20:36, 25 June 2013 (UTC)

 
Odonata taxonomy according to the World Odonata List with mapping to Wikidata items if existent (=green). Red items show already existing items for Property:P171
Nearly finished is the taxonomy of dragonflies and damselflies (Odonata (Q25375)) according to the World Odonata List (WOL) from order down to species level. From 6638 taxa only these 105 have no Wikidata item yet. The rest (98%) has now taxon name (P225), taxon rank (P105), and parent taxon (P171). This means, each taxon below odonata which has not a WOL-sourced property set is very likely to be from the list of missing ones or should be synonymized with another item, i.e., is outdated.  — Felix Reimann (talk) 14:51, 11 July 2013 (UTC)

Property proposals

  Info: I proposed the following properties: BHL Page Id, Replaced_synonym and Ex_taxon_author(s). --Succu (talk) 10:35, 9 July 2013 (UTC)

synonyms

Hey,
is this a case to use Basionym? I never did it before and want to ask first ;) . Thank you, Conny (talk) 19:51, 10 July 2013 (UTC).

Not sure. Basionym is only defined by ICN (botany). ICZN (zoology) does not know this term. Up to now, synonyms are only added as additional taxon names. See for example Ross's Goose (Q244320). However, this is not ideal as even if (partially) sourced like Pinheyschna subpupillata (Q5409834), it is not easy to derive the valid name.  — Felix Reimann (talk) 20:21, 10 July 2013 (UTC)
In my opion basionym (P566) fitts only ICN (see property definition). --Succu (talk) 20:34, 10 July 2013 (UTC)
Well, it depends. Indeed, the term "basionym" is restricted to the ICNafp, but if you look at the mechanism (in a generalized sense), it is the same mechanism in the ICZN and in the ICNafp, so it could be used for names of all organism. Perhaps it would be better to give it a different name (non-Code specific), so that questions like the above are avoided. A supporting explanation may also be a good idea.
        In the given example the "basionym" could be used if the taxon name were Pristiapogon taeniopterus (Bennett, 1836), then Apogon taeniopterus Bennett, 1836 could be included as the "basionym". Given that the entry is now labelled Apogon taeniopterus, it would now be inappropriate. - Brya (talk) 04:29, 11 July 2013 (UTC)
I would like to keep this discussing alive, as I think that we do not have a common rule for handling synonyms yet. As basis for this discussion, I created a proposal at Kaleidoscope Jewel (Q2530145) for which the World Odonata List (Q13561342) defines
  • The feasible name Africocypha lacuselephantum
  • The original description as Libellago lacuselephantum, also a so called synonym in zoology
  • 2 additional synonyms which are also not valid.
The following questions arise:
  1. How should we handle these different synonyms?
  2. Could a taxobox generate the correct format from these? (in the example: as there is a "basionym" given, use parenthesis and add the author name and year of the "basionym" to get: (Africocypha lacuselephantum, Karsch 1899)
  3. While I think we have to distinguish between normal synonyms and the name of the first description, is "basionym" a valid term as it is typically not used in zoology? If not, what should we use then?
  4. Which concept works if different recent works do not agree with each other?
  5. Do we want more than one item per taxon (e.g., one for every synonym, 4 in the above example)? How should we handle interwiki links, when some wikipedias are up to date and others still use the synonym?
By using "instance of (P31) basionym (Q810198)", the property basionym (P566) could remain botany-specific. Alternative proposals are very welcome!  — Felix Reimann (talk) 11:59, 17 July 2013 (UTC)
Indeed, "basionym" is a term used exclusively in the ICNafp. Sometimes bio-informaticians use "protonym" as a term across all Codes.
        The matter of synonyms is a very complicated one and I don't expect that Wikidata is set up to really handle this. It is not only the issue of more names for one taxon, but especially of different circumscriptions (more than one taxon for one particular name), not to mention taxa that are split (one species that is divided over three species). - Brya (talk) 16:36, 18 July 2013 (UTC)
So what do you propose? Should we surrender? ;) I fear we have to model synonyms as not all wikipedia articles are based on the latest available literature (and also different opinions in literature).  — Felix Reimann (talk) 11:31, 19 July 2013 (UTC)
If by surrender you mean that we should resign ourselves that we cannot handle it completely, then yes. Of course some synonyms are easy (the homotypic synonyms), and some heterotypic synonyms can be handled by making sure to add the sources. That still leaves something not covered. But a very respectable database like Tropicos clearly has given up on the idea of completeness, also. You cannot do more than the best you can do, and it is better to aim to do what you can do well rather than try to do what you cannot do and fail at that. - Brya (talk) 12:03, 20 July 2013 (UTC)
I created protonym (Q14192851). This can be used instead of Basionym for animals. Could you be so kind and give specific counterexamples where the model fails or propose another one? I think we need should find a common starting point and describe it on this project page.  — Felix Reimann (talk) 11:57, 23 July 2013 (UTC)
I think we should keep taxon name (P225) unique, because it's the fundamental property which identifies a taxon. May be a new property Synonym of could solve the problem. The claim has to be sourced of course, indicating the authority who thinks this. I know some cacti with several dozen synonyms. --Succu (talk) 14:05, 23 July 2013 (UTC)
The disadvantage of "one item, one scientific name" is that interwiki links are spread over several items and not all wikipedia articles describing the same taxon could be linked to each other whenever they are based on different scientific names (in many cases: are not referring to the same source). For example Q680440 with en:Sciades parkeri and es:Aspistor parkeri: Same species, different name. If we do not accept more than one scientific name, they will not be linked - and if you say this is only a technical problem: Also all non-taxonomic features are of course identical like color, size, distribution, diet, ... Thus, the question is, do we want to create items for scientific names or for taxa (which have, depending on the literature, more than one scientific name). Of course, for taxonomic data, it is easier to have one scientific name per item. With the example from above, I think we could do both, but we have to decide.  — Felix Reimann (talk) 15:05, 23 July 2013 (UTC)
Regarding the cacti with a lot of synonyms: I do not plan to add all synonyms given in literature yet. However, the question is how to cope with already existing synonyms from Wikipedia articles.  — Felix Reimann (talk) 15:09, 23 July 2013 (UTC)
I have no objection to using "protonym" as a property, either besides "basionym" or as an all-inclusive term (so including "basionym"). For prokaryotes the term is "basonym".
        The "one item, one scientific name" sounds nice in theory, but is unrealistic. Taxa are dynamic, and nomenclature is dynamic also. The approach of the One Correct Taxonomy is not allowed in Wikipedia: Wikipedia prescribes that all taxonomic viewpoints that have support in the literature should be included, and this also means that not-current taxa should have their own article (if prominent enough in the literature). - Brya (talk) 16:40, 23 July 2013 (UTC)

Author citations

Repeated from here:

Clearly, something needs to be done about author citations. Most likely the best thing would be a text field, as in the "description", as there are many permutations (see here). However if an itemized data structure is desired the following will go a long way:


basionym ex-author1
basionym ex-author2
basionym ex-author3
etc
basionym author1
basionym author2
basionym author3
etc
basionym year
valid ex-author1
valid ex-author2
valid ex-author3
etc
valid author1
valid author2
valid author3
etc
valid year

Andinobates Twomey, Brown, Amézquita & Mejía-Vargas, 2011
[ animal ]
valid author1 = Twomey
valid author2 = Brown
valid author3 = Amézquita
valid author4 = Mejía-Vargas
valid year = 2011

Andinobates daleswansoni (Rueda-Almonacid, Rada, Sánchez-Pacheco, Velásquez-Álvarez, and Quevedo-Gil, 2006)
[ animal ]
basionym author1 = Rueda-Almonacid
basionym author2 = Rada
basionym author3 = Sánchez-Pacheco
basionym author4 = Velásquez-Álvarez
basionym author5 = Quevedo-Gil
basionym year = 2006
[ Note no "valid author" as the zoological Code omits the author(s) of the combination ]

Kniphofia uvaria (L.) Oken (1841)
[ plant ]
basionym author1 = L.
valid author1 = Oken
valid year = (1841)

Lithocarpus polystachyus (Wall. ex A. DC.) Rehder (1919)
[plant]
basionym ex-author1 = Wall.
basionym author1 = A.DC.
valid author = Rehder
valid year = (1919)

Gentianales Juss. ex Bercht. & J.Presl (1820)
[plant]
valid ex-author1 = Juss.
valid author1 = Bercht.
valid author2 = J.Presl
valid year = (1820)

Bacillus subtilis (Ehrenberg 1835) Cohn 1872
[ prokaryote ]
basionym author1 = Ehrenberg
basionym year = 1835
valid author1 = Cohn
valid year = 1872

But most likely, very many users will think this is too hard to understand, and will get tangled in it. Safest is a normal text field. - Brya (talk) 16:53, 9 July 2013 (UTC)

Summary

  Comment With taxon author (P405) you can add an infinite number of authors and qualify taxon name (P225) with year of publication of scientific name for taxon (P574). You can do this with basionym (P566) too. --Succu (talk) 19:26, 9 July 2013 (UTC)
Yes, the following input fields are required
[basionym ex-author] [basionym author] [basionym year] [valid ex-author] [valid author] [valid year],
with the four 'author' fields allowing an indefinite number of authors (the 'year' field need take only one input value, one year). The present structure is something like
[taxon author] [year of description]
where 'taxon author' can be 'basionym author' or 'valid author' and 'year of description' can be 'basionym year' or 'valid year'. The input of
[basionym ex-author] [basionym author] [basionym year] [valid ex-author] [valid author] [valid year],
should then be rendered
[([name [, ... & name] ex] name [, ... & name] year)] [[name [, ... & name] ex] name [, ... & name] year]
This means that "name year" is the minimum form. Complicated, but doable? - Brya (talk) 05:54, 18 July 2013 (UTC)
See also my comment in #synonyms. To be assignable, the taxon authors are best set as qualifiers for the scientific name.  — Felix Reimann (talk) 06:44, 18 July 2013 (UTC)
Return to the project page "WikiProject Taxonomy/Archive/2013/07".