Wikidata talk:WikiProject Taxonomy/Archive/2018/12

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Author information for fungi

Hello. I notice that, at least in some families, quite a high proportion of plant items in Wikidata have the author information, such as taxon author (P405), basionym (P566), etc. This information should follow the rules of the taxonomy tutorial and can be used to generate the author strings which conventionally follow taxon names. Very few fungi have this information and I am thinking of starting to add it based on Index Fungorum. If you have an opinion, please could you tell me whether you think this is a good idea? Strobilomyces (talk) 14:55, 22 November 2018 (UTC)

Yes, that is a good idea. Maybe this could be done by bot, but maybe not: IF is full of homonyms, and a bot could probably not select the right name. - Brya (talk) 18:32, 22 November 2018 (UTC)
It is true that the homonyms are a problem; there are extra fields which can eliminate some wrong records, but doubtless some manual intervention may be necessary. I would start with small volumes using QuickStatements; I am not sure how far that can get me, but it should allow bigger jobs to be planned. Strobilomyces (talk) 19:21, 22 November 2018 (UTC)
According to MycoBank we are missing more than 25,000 basionyms/replacement names. I can use my bot to create most of the items (without authorship) and link them together, but we are missing around 1,000 genera for the species too. --Succu (talk) 19:22, 24 November 2018 (UTC)
I was intending to take this into account in my batch system, but it is true that my system will not work through QuickStatements for large numbers of items. I am not sure where the boundary lies between on one hand preparing batches of data and executing it through QuickStatements, and on the other hand a bot. Anyway it would be good if you could create the basionym items and link them up. As I understand it, the referencing item should link to the new basionym item through basionym (P566) with a reference and the basionym item should link back to the referencing item(s) through subject has role (P2868) = basionym (Q810198) with of (P642) = referencing item number, with the same reference. I use Index Fungorum rather than Mycobank, but they should be the same. I suppose the protonyms (replaced synonym (for nom. nov.) (P694)) are similar; I have not come across one or taken them into account yet. Strobilomyces (talk) 16:58, 25 November 2018 (UTC)
By the way, if species are marked as illegitimate in IF, I am thinking of skipping them as that rule helps with disambiguation. Also I am thinking of ignoring items which do not have a "current name" defined ( i.e. they are not in the "Species Fungorum" section, meaning they are probably not real species with a name in use today). If you would also filter out those cases, I think it would reduce the number of anomalies. I did not quite understand your comment about the genera; do you mean that there are 1000 genera which need to be created only for basionyms (and protonyms)? If so I think they should be clearly marked somehow as not real genera. But it would be quite easy to generate them, wouldn't it? It might be useful if you could make available a list of them or some examples.
Is there any possibility that in the future I could prepare a batch of changes and you could run them using your bot? Strobilomyces (talk) 17:29, 25 November 2018 (UTC)
Sorry for the delay, but there are a lot of changes out there and here at the moment I try to cope with. I'm using MycoBank because it's queryable and returns a detailed set of information, but I have to dig in a little more before creating items. An example of using my bot is User:Achim Raschka/Thorington. You'll find more (undocumented) examples at my user page. --Succu (talk) 19:42, 27 November 2018 (UTC)
OK, that would be great if you could do some of that work. I don't know MycoBank well, but it should be consistent with Index Fungorum, I believe. Index Fungorum also has a queryable API and that is what I am using up to now. I am only just starting to understand the problems. As I think you pointed out above, it will be necessary to create dummy items for the parents of the basionyms, and sometimes for their parents too, just to have a consistent hierarchy. There are many unused obsolete fungus names and it worries me that I don't know any way to mark those items as obsolete (except in the English description, but that is hardly part of the database). Also there are author abbreviations which are not in IPNI, nor the author information of Index Fungorum, and I don't know how to find out about them. I am starting off with only small quantities of data, but I will look at your examples and try to understand bots better. Strobilomyces (talk) 23:02, 27 November 2018 (UTC)
You are right. I forgot that IF provides a webservice my bot is using to match taxon name (P225) to the ids. --Succu (talk) 21:27, 28 November 2018 (UTC)
Indeed MycoBank should be consistent with Index Fungorum, although in at least a few cases there are differences.
        Maybe there should be a way to mark that a name is in Index Fungorum, but is not matched to a name in Species Fungorum.
        All authors of fungal names should be in Index Fungorum, but it is true that in the past abbreviations for personal names were used that later were discontinued. As far as I know there are dozens of these rather than hundreds. - Brya (talk) 05:30, 28 November 2018 (UTC)
I am referring to new author names (2005). Their abbreviations in IF format are M.C.C. de Arruda, G.F. Sepúlveda, R.N.G. Mill., M.A. Ferreira & M.S. Felipe (fuller names: Maricília C.C. de Arruda, German F. Sepulveda Ch., Robert N.G. Miller, Marisa A.S.V. Ferreira & Maria Sueli S. Felipe) and they defined Crinipellis brasiliensis in this paper. They give their department addresses and de Arruda gives an E-mail address, but I suppose that information should not be in WD. If they should be in a database somewhere, please could you say how to find them? What is the minimum set of WD fields needed to create an author item? It would be very good if someone could create the item for one of them as an example, or point out a similar example. I found this case from a very small sample, so there must be lots of them.
I absolutely think there should be a way to mark a name as having no current name in Species Fungorum. Ideally I think there should be a special "current name" property distinct from subject has role (P2868)=basionym (Q810198)/of (P642) which could be set to a value meaning "not defined" in this case, but I suppose that would be difficult to arrange. Meanwhile, could we make an item for role "obsolete taxon name" and use subject has role (P2868)="obsolete taxon name" item to mark this? It could also be used to mark basionyms and parents of basionyms, to make it obvious that they are not separate taxa. - Strobilomyces (talk) 10:51, 28 November 2018 (UTC)
If these authors have an entry in IF, there is no problem in principle: items can be created for each of them. It does mean a considerable amount of work. I guess we should have a property for authors in IF (analoguous to IPNI), but in the meantime a URL can be used.
        Using "obsolete taxon name" is not the way to go: that would mean a Single Point of View, which is very much to be avoided. Also many of these names in IF are not names of taxa, and never were. The fact that some names in IF are not linked to Species Fungorum could be recorded in some way, but does not mean all that much: it represents the absence of evidence, rather than real information. - Brya (talk) 17:48, 28 November 2018 (UTC)
I wrote to the IPNI mailing list about it and now the five authors have been created in IPNI, but with different abbreviations from Index Fungorum in three cases (M.C.C.deArruda->M.C.C.Arruda, M.A.Ferreira->M.A.S.V.Ferreira and M.S.Felipe->M.S.S.Felipe). I think that in time the IF ones will have to change in line with IPNI.
@Brya: Very many of the names in Index Fungorum are completely obsolete and only show the history (which may be needed for nomenclatural reasons). It is true that an old name may be resurrected as a new one, but that is a change which needs to be reflected when it happens; it is not a particular problem. I think that such information should not be uploaded if possible and my worry is that it will reduce the quality of Wikidata because of introducing a lot of spurious and misleading items. For instance, many fungi were originally put into the genus Agaricus, and their basionym is in that genus, but now Agaricus has a much more restricted meaning, and those names are misleading. There are also many names which are no longer used because their definitions are unclear according to modern criteria or may contain mistakes. I don't understand your statement "many of these names in IF are not names of taxa, and never were" - please could you give an example? Surely all the names in IF were intended by their authors to be the names of taxa?
I completely disagree with your opinion that the presence or absence of a link to the current name in Species Fungorum is not real information; all the names have been reviewed and if possible the current name has been assigned. That is not to say that it is 100% correct, but this information is curated and it is extremely useful. If an old IF name has no current name it is almost certain that it is not used in modern times, it does not appear in modern mushroom books, and no corresponding photos or pages can be found on the web. It would be much better if such names were not brought into the Wikimedia projects and if they have to be (for instance in WD if they are basionyms or parents of basionyms), they should be clearly distinguished. If the status is disputed, it should be possible to indicate that. "obsolete taxon name" may be a bad choice of wording, but would it be possible to choose a better phrase? It would be a great improvement to the data quality if something like this could be added, so for many purposes a lot of records could be ignored. - Strobilomyces (talk) 21:41, 29 November 2018 (UTC)
If there is now a disparity between IPNI and IF, it would be helpful to inform IF of this.
        IF is a nomenclatural database (like IPNI and Tropicos), which means that it has many entries of names that are not relevant to communication about taxa, roughly stated: nomenclatural detritus. Indeed, the quality of the Wikidata would be helped by trying to keep these out. It is not necessarily true that "all the names have been reviewed": these databases have lots of names and only limited personnel, which has to set priorities. Very many names are just uninteresting, and don't really merit the time and effort that would be necessary to clarify them. That is why I stated that lack of information "represents the absence of evidence, rather than real information".
        "Surely all the names in IF were intended by their authors to be the names of taxa": sure, but that is not sufficient. A name has to meet all kinds of nomenclatural standards for it to be available as the name of a taxon. Names that never were a name of a taxon are: names not validly published (Q18575734), illegitimate names (Q1093954, including later homonyms, Q17276484), and combinations under illegitimate names (Q17487588).
        The fact that a name is the basionym of another name does not mean it is obsolete: lots of current names are basionyms of other names. - Brya (talk) 04:43, 30 November 2018 (UTC)
@Brya: I have indeed informed Paul Kirk of Index Fungorum about these author names.
I am not sure if I understand a current name being a basionym of another name - I suppose that it is where a new name was proposed based on what is now the current name, but it failed to become accepted. Anyway, I was assuming the context of basionym items being created for the author information; if the basionym item existed, that existing item would be used.
I think that your "nomenclatural detritus" certainly should be kept out of WD or clearly marked as not real. Please can you say how to indicate in WD that a taxon belongs to one of the detritus name types like designation (Q18575734)? I think that that topic should be added to the tutorial. But these nomenclatural cases are rare and only a tiny part of the mycological detritus which I think we should be trying to exclude. I would like to recapitulate the various categories of entry in IF/SF.
  1. Names marked as current names in Species Fungorum.
  2. Names which are not current names in SF, but which are linked to a current name as a synonym. This category includes most illegitimate usages (nomenclatural detritus), since in those cases it is usually possible to know what the equivalent correct name is.
  3. Names which are in IF but not linked to any current name in SF. They may fall into to the following cases.
    a. A species name which is rejected by modern mycologists because it is unclear or does not specify criteria which are now considered important, or there may be some other problem. It may not be possible to investigate the type specimen. Also it may be impossible to use the name because its nomenclatural status is uncertain (the nomenclatural status can depend on synonymy decisions). In the opinion of some mycologists it may be a synonym of a newer species or covered by several newer species, but that is not clear.
    b. A species name which has not been used for many years because no fungus is ever identified as such because the author made a mistake in the description.
    c. A species name which has not been used for many years because it has become extinct, or so rare that it has never been identified since.
    d. A species name which has not come into use and which no mycologist has found the time or inclination to clarify. Note that all species are of interest to some mycologists, including those acting for IF, and so such a species almost certainly belongs to case 3a or 3b or conceivably 3c.
    e. A genus name or higher-level taxon which is not in the IF classification scheme and does not fit well enough to be assigned a current name as synonym.
A name in 3a or 3b can be called a "Nomen dubium"; this depends on the opinion of a particular mycologist. Type 3c is a tiny minority and we have no way to distinguish such cases; we can only wait for them to be reclassified in IF if they are found to be "real". Type 3d names can also be treated as 3a because if they were important or easy, they would already have been examined. Type 3e applies to parent taxa of basionyms created only for that purpose.
Case 3a includes the ones which are "uninteresting, and don't really merit the time and effort that would be necessary to clarify them", or ones where mycologists have tried to clarify them and found it impossible - we do not need to distinguish those two cases. Such names are not to be found in modern books, web sites, photos etc. and it is unlikely that someone will want to use them. Perhaps they could be described as "unclear" or "deprecated" or "detritus". I am not saying that they can never become accepted names, but in that case before changing the status in WD we would have to wait until some mycologist redefines them and they come into Species Fungorum.
The current name (or absence of) information is the raison d'être of Species Fungorum and I think you underestimate the quality of it. In my experience this information is of high quality, and if there is no Species Fungorum link, I normally cannot find the species in any modern source. Unfortunately I think on the other hand that there are also mycological detritus names amongst category 1 above. If a name is absent from Species Fungorum, in my view it would be very helpful to be able to show this so that for many purposes the items could be ignored. It would be useful to have a term for the mycological detritus in cases 3a-3e; it is difficult but I suggest "not an established taxon", "not a standard taxon", or "not validated". You said that the fact that some names in IF are not linked to Species Fungorum could be recorded in some way; please can you say what would be the appropriate way to do this? Strobilomyces (talk) 16:42, 30 November 2018 (UTC)
As to contacting IF, that is very good.
        Indicating that something is a designation (Q18575734) can be done by just "instance of: Q18575734" (like this). That is not the problem: the problem is finding it and establishing that it is indeed not validly published.
        As to keeping "nomenclatural detritus" out of WD, there are several problems, like 1) sometimes there is a structural need for it to be present, 2) sometimes a Wikipedia has a page on it (pretending it is a real species), especially svwiki, cebwiki, warwiki and viwiki (with svwiki fighting against corrections) and 3) bot operators who enthusiastically import a database they found somewhere.
        It is not a good idea to link to "nomen dubium" at enwiki: the entry there is very confused.
        I never said that Species Fungorum is not carefully curated: this will be pretty good. That cannot be said of Index Fungorum.
        I did not say "that the fact that some names in IF are not linked to Species Fungorum could be recorded in some way". What I said was that this would be a good idea. I am not sure what would be the best way. One option would be to have a property "Species Fungorum": this would be very straightforward. Every item with a value for Species Fungorum would be a current name (according to Species Fungorum). Admittedly, this would look a little odd since then there would be three properties with the same external identifier. There may be other ways. - Brya (talk) 18:27, 30 November 2018 (UTC)
Yes, I see that we cannot always keep detritus out of WD and that is why I would like to have a way of indicating that particular items are not "real".
As you indicate, if the name has a current name in Species Fungorum there should in any case be a link of type subject has role (P2868)=basionym (Q810198)/of (P642), or subject has role (P2868)=synonym (Q1040689)/of (P642), or both, showing what it is, so all I want is a status indicator to show that a given name is or is not current. I would propose that we could create a special item meaning "current taxon name", one meaning "synonym taxon name" and one meaning "outdated or unclear taxon name". Then for fungus name items we would set instance of (P31) to the "current taxon name" item where the given is actually a current name in Species Fungorum (case 1 in the list above), set it to the "synonym taxon name" item for names which are linked to a different current name in Species Fungorum (case 2 in the list above), and set it to the "outdated or unclear taxon name" item for names which have no current name in Species Fungorum (cases 3a-3e in the list above). The statement would be given a reference, to Index Fungorum in this case, and if there are alternative taxonomic viewpoints to be expressed, they can be added in a similar manner with a reference to the appropriate source. The subject has role (P2868) records would also be qualified by the corresponding references so that there would be a complete set of data for each source. Thus this proposal is not imposing one taxonomic point of view, but can accommodate multiple possible classifications.
For instance if someone wanted to estimate the number of species in a family, they would choose a reference and if it was "Index Fungorum" they could count only names with instance of (P31) = "current taxon name". I don't think that such a query is possible at present because there is too much detritus. Do you think that this proposal would be a good idea? - Strobilomyces (talk) 20:47, 1 December 2018 (UTC)
No, subject has role (P2868)=basionym (Q810198)/of (P642) expresses a nomenclatural relationship: it is always true. In contrast, subject has role (P2868)=synonym (Q1040689)/of (P642) would express a taxonomic relationship, and would be only true if viewed from one particular taxonomic perspective.
        What you suggest comes down to adding "is a current taxon name according to Species Fungorum". There is an accepted way of doing that, namely having a property "Species Fungorum". It does not require a new structure, only a new property. - Brya (talk) 04:54, 2 December 2018 (UTC)

  Info I created several thousand basionym items. I restricted the creation to the following conditions

  1. the name is marked as "legitimate" at MycoBank
  2. we had already an item for parent taxon (P171)
  3. IF and MycoBank refer to the same basionym name
  4. IF and MycoBank have the same id for this basionym name

I hope that helps a little bit. I'll try to dig into the authorship problematic next. As a first step I created around 500 new author items from IPNI. --Succu (talk) 19:41, 10 December 2018 (UTC)

That sounds really good. Yes, that is a help for populating the author information and the one or two examples which I saw look good. Do you think it would be a good idea to add Mycobank or Index Fungorum as a reference when creating the basionym (P566) or subject has role (P2868) statements? Strobilomyces (talk) 20:37, 10 December 2018 (UTC)
No. I prefer a literature reference related to the nomenclatural act. That's why my bot is using the edit comment. --Succu (talk) 21:59, 11 December 2018 (UTC)

Mis-spelling in taxon name

Crinipellis brunneoaurantiaca (Q49601461) has taxon name (P225) = "Crinipellis brunneoaurantica", but it should be "Crinipellis brunneoaurantiaca". I am not allowed to change that property. What is the procedure for correcting this, please?

Is it necessary to create a new item for "Crinipellis brunneoaurantiaca" and request for Q49601461 to be deleted? Doubtless there will be many cases like this, so I thought I should ask. Strobilomyces (talk) 17:50, 25 November 2018 (UTC)

Apparently, the way to do it is to delete the statement, and then re-create it. - Brya (talk) 18:18, 25 November 2018 (UTC)
Ah. I didn't think of that. And I see that you did it. Thanks, Strobilomyces (talk) 21:34, 25 November 2018 (UTC)
Simply click on "publish" a second time. --Succu (talk) 06:44, 26 November 2018 (UTC)
Do we need to add this to the edit notice? --- Jura 06:46, 26 November 2018 (UTC)
@Matěj: is it possible to change the color of the edit notice from red (=error) to yellow (=alert). --Succu (talk) 21:34, 1 December 2018 (UTC)
No, this is how the interface behaves after rejecting an edit. Matěj Suchánek (talk) 10:36, 2 December 2018 (UTC)

Order of taxon authors

In this edit, the order of the taxon authors has changed when the bot bypassed a redirect. Now the name of the taxon is Encephalartos kanga Q.Luke & Pócs instead of Encephalartos kanga Pócs & Q.Luke. Maybe we should use series ordinal (P1545) like it is done for authors (P50) of scholarly articles (Q13442814)? Korg (talk) 23:58, 1 November 2018 (UTC)

Maybe. The problem is that series ordinal (P1545) can't be used as a qualifier to a qualifier. For series ordinal (P1545) to be usefully applied here, "taxon author" would need to be moved from a qualifier to a statement. In principle, this is possible but it would mean a big change in how things are done: it would be a lot of work. - Brya (talk) 03:29, 2 November 2018 (UTC)
@Ivan: Is there any possibility to avoid this? --Succu (talk) 06:15, 2 November 2018 (UTC)
You say "Now the name of the taxon is Encephalartos kanga Q.Luke & Pócs...", but Wikidata has never stated the name as "Encephalartos kanga Pócs & Q.Luke"; and it would be wrong to infer any such thing from the order of the names given as qualifiers to taxon name (P225). If it is desired to record the name in the latter form, as structured data, then there should be a specific property for that; or perhaps use the form shown in this edit. That name should then also be given as an alias. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:56, 2 November 2018 (UTC)
The form "stated as" requires the publication to be accessed. For this name the original paper is behind a paywall, making verification difficult. - Brya (talk) 04:29, 3 November 2018 (UTC)
@Pigsonthewing: I was thinking of the name that could be retrieved and displayed, for example in the taxobox. There should be a way to have the form "Encephalartos kanga Pócs & Q.Luke", with the authors in the correct order (and linked). Korg (talk) 21:43, 6 November 2018 (UTC)

Some recent examples of the problem: [1], [2], [3]. Korg (talk) 14:15, 25 December 2018 (UTC)

Here you caused it your self. The merge should retain the lower QID. --Succu (talk) 16:29, 25 December 2018 (UTC)
In principle I agree, but I also think there could be exceptions. In this example, the item with the higher QID had more information and many incoming links, so I think that merging to it was the best option. Korg (talk) 18:36, 25 December 2018 (UTC)
No. The oldest concept ID (=QID) has to be stable (because of e.g. external usages). All the information is moved to the lower QID with the merge.
Again @Ivan: Is there any possibility to keep the order of qualifiers in cases like this (fixing a redirect)? --Succu (talk) 21:51, 25 December 2018 (UTC)
Some points:
  • The oldest QID is not always the one with external usages: it may well be just about empty.
  • Sometimes there are good reasons to merge into a higher QID, like when the higher QID has correct labels while the lower QID has wrong labels.
  • Merging itself does not determine where the item ends up. It is quite possible to first merge into the higher QID with the better information, then remove the redirect, and merge again into the empty lower QID.
- Brya (talk) 03:41, 31 December 2018 (UTC)

How do I describe population of a species?

Hi all

How do I create a statement there are 415,000 African elephants? It seems population (P1082) is only meant for humans....

Thanks

--John Cummings (talk) 09:29, 14 December 2018 (UTC)

What are the best modelled items for your areas of interest?

Hi all

Over the past few months myself and others have been thinking about the best way to help people model subjects consistently on Wikidata and provide new contributors with a simple way to understand how to model content on different subjects. Our first solution is to provide some best practice examples of items for different subjects which we are calling Model items. E.g the item for William Shakespeare (Q692) is a good example to follow for creating items about playwright (Q214917). These model items are linked to from the item for the subject to make them easier to find and we have tried to make simple to understand instructions.

We would like subject matter experts to contribute their best examples of well modelled items. We are asking all the Wikiprojects to share with us the kinds of subjects you most commonly add information about and the best examples you have of this kind of item. We would like to have at least 5 model items for each subject to show the diversity of the subject e.g just having William Shakespeare (Q692) as a model item for playwright (Q214917), while helpful may not provide a good example for people trying to model modern poets from Asia.

You can add model items yourself by using the instructions at Wikidata:Model items. It may be helpful to have a discussion here to collate information first.

Thanks

John Cummings (talk) 15:46, 17 December 2018 (UTC)

Return to the project page "WikiProject Taxonomy/Archive/2018/12".