User:FelixReimann/RfC

This is a draft version. If ready, I'll move this RfC to Wikidata talk:Taxonomy task force

Request for Comments: An outline of a workflow to add taxonomic data edit

Assumptions:

  • Taxonomy is subject to change
  • For many parts of the tree of life, there is not one single authority defining the current state of the art
  • Different concepts exist in parallel (also in different wikipedia chapters), we should be able to reflect them.
  • If we want to have references from different scientific articles for each and every taxon (for which wikipedia articles exist + x), semi-automation is required.

I've been thinking and testing a possible workflow for this task which I want to present here. If you like it, I will add an appropriate user interface (currently, it is only usable from the command line) and share it with you.

  1. Choose one scientific article you want to add. As example, I selected doi:10.1186/1471-2148-13-93, PDF, a very recent publication regarding Squamata (Q122422) which covers 1149 taxa from genus level up to the order Squamata. the paper is quite similar to what is currently represented in the reptile database, the main taxonomic reference of de-wiki and others)
  2. Create an item for the article to be referenced: A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes (Q13416674)
  3. As I see no chance to automatically and reliable extract taxa from any article, no comes the manual work: Create the hierarchy as given by the article. This can be written in wiki syntax, as this is easiest for human input:
* Squamata
** Dibamidae 
** Episquamata
*** Lacertoidea
**** Amphisbaenidae 
***** Amphisbaena 
***** Ancylocranium 
***** Baikia 
***** ...
*** Toxicofera
**** Serpentes
  1. Mapping of these taxon names to Wikidata items.
  2. For each name, matching items are automatically searched in a multi-step approach using, among other criteria, taxon name (P225) or item labels. As different taxa with the same scientific name exist (like Q1862566 and Q1482244) or other ambiguity is possible, each mapping candidate is presented to the user by opening the respective item in the browser. Each mapping must be accepted manually:
Is Anolis ( Dactyloidae, Iguania ) the same as Q311348 (y/n)?
Also taxon ranks can be added now easily.
 
Squamata taxonomy according to Pyron et al. 2013 with mapping to wikidata items if existent (=green). Red items show other already existing values for d:Property:P171
  1. The gathered information is written back to wikipedia syntax, see User:FelixReimann/Pyron2013 and the tree of taxons is created for visual verification, see figure. Green vertices are items found already in Wikidata, red ones are those, where the Wikidata item has a different value for parent taxon (P171). You can click on the green vertices, they link to the corresponding item.
  2. Now we have a verified hierarchy. Thus, a bot can add for all existing items taxon rank (P105), taxon name (P225), parent taxon (P171) if not already there and add references for all of them linking the item of the scientific article.

For additional articles in this field, the existing name->item mapping can be reused.

With such a referenced taxonomy, a wikipedia chapter could decide to use for their taxobox, e.g., "for any taxon below squamata the claims which are based on scientific article A" while another wikipedia chapter could decide to "use every claim of article A for squamata but for all snakes, use reference B". But this is still tbd in the future. Do you think the workflow for adding taxonomic claims could be helpful? Please comment!  — Felix Reimann (talk) 12:11, 19 June 2013 (UTC)