Wikidata talk:WikiProject Taxonomy/Archive/2019/07

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Another option

In practice the problems with author citation are caused by "basionym" / "original combination" relationships. A way to prevent these problems that suggests itself would be to create new properties with datatype "string". If there are properties "basionym string" / "original combination string", this would allow the basionym / original combination to be added as a string, with "taxon author" and "taxon date" as qualifiers. This would mean two properties instead of one, which is not elegant, but Wikidata already has this in several places. Then there is no need to create new items for very obscure names or "dead" names. - Brya (talk) 06:23, 30 June 2019 (UTC)

OK, well this proposal is a halfway house between the full solution of the tutorial and the property taxon author citation (P6507) solution of putting the whole author citation in a string. It seems to me unlikely to get approval because it will satisfy neither the purists who want everything modelled nor the pragmatists who supported the P6507 proposal. But I am not particularly against it. Strobilomyces (talk) 21:22, 30 June 2019 (UTC)
@Succu Hello. Please could you give your opinion on this proposal? Strobilomyces (talk) 17:34, 3 July 2019 (UTC)
@Strobilomyces: I do not support taxon author citation (P6507) and P5326 (P5326) because I think they are superfluous. This is true for Bryas proposal. --Succu (talk) 17:42, 3 July 2019 (UTC)
Thanks. Strobilomyces (talk) 19:50, 3 July 2019 (UTC)

P5326

@Succu: may I ask why you think P5326 (P5326) is superfluous? I try to add it as soon as I can, such as I just did for Eudorella hispida (Q6499272), included a link to the right page in the publication. I find it's a useful info. Christian Ferrer (talk) 18:29, 6 July 2019 (UTC)
To bounce with the subject below @Circeus:, note that I just created species:Template:Sars, 1871 relative to Nya arter af Cumacea samlade under K. Svenska Korvetten Josephines Expedition i Atlantiska Oceanen år 1869 af F. A. Smitt och A. Ljungman (Q65088109), and I added it to the concerned species both on wikispecies and in Wikidata. Christian Ferrer (talk) 18:33, 6 July 2019 (UTC)
@Christian Ferrer: Both belong to different Help:Namespace (Q4994250) and should - as far as I know - not linked that way. --Succu (talk) 19:12, 6 July 2019 (UTC)
I do not like the direct connection of Wikispecies reference templates to items about the work because it generates a weird distinction between articles and other type of works: we generally link books and multivolume works via a page with bibliographical information about the work, but we do not typically have that for articles. Circeus (talk) 19:26, 6 July 2019 (UTC)
Yes I know I will not blame you if you take the sitelink off, Succu you did not answer. Christian Ferrer (talk) 20:40, 6 July 2019 (UTC)
@Christian Ferrer:
  1. I prefer the usage of stated in (P248). Together with reference has role (P6184) it is more flexible (Example Nomenclatural acts published in Taxon). Examples of the use of BHL page ID (P687) within the references section you find in this query about Genera Plantarum (Q40975586). BTW: There is a object stated in reference as (P5997) I discovered recently. Our TaxoBox makes use of this format providing the references.
  2. Looks like the usage of P5326 (P5326) is unclear. Sometimes it is used for derived names.
  3. The usage of BHL page ID (P687) as qualifier of P5326 (P5326) is more or less enforced. But I think this is bad practice because it is not a "refinement" of the statement.
--Succu (talk) 18:05, 8 July 2019 (UTC)
If I understand C:Template_talk:Wikidata_Infobox#erstbeschrieben_in_(P5326) right the Commons approach relies on a simple item/statement relationship to get thinks working. Hopefully the planed new mw:Wikidata Bridge supports more complex scenarios. But for this we need a clean up data model. --Succu (talk) 19:46, 8 July 2019 (UTC)
@Succu: do you think it is better to use stated in (P248) to list all the potential references (included the original publication)? Christian Ferrer (talk) 07:13, 13 July 2019 (UTC)
In my opinion yes. I started adding references that way in the summer of 2013 for cacti described (=accepted) in Das große Kakteen-Lexikon (Q13520496). For references published as a scholarly article (Q13442814) you can easely add page(s) (P304) to the reference (Example: Cacti referenced by a paper). --Succu (talk) 11:36, 13 July 2019 (UTC)
OK thanks you, I will try to integrate this. BTW is there a specific value we can set for reference has role (P6184) in the case that this is not the first description, but an additional reference? Christian Ferrer (talk) 12:29, 13 July 2019 (UTC)
reference has role (P6184) is not mandatory. So in such cases it's OK not to have it. --Succu (talk) 16:35, 13 July 2019 (UTC)
Hopefully I understood the question right, but I also reference at the journal level when I don't have the time to create an article item. Circeus (talk) 16:02, 13 July 2019 (UTC)
This is not a problem. But I think if we have an item, then it should be used. A bot can check this. Currently I'm working on replacing DOI (P356) only references (< 100 cases) ) with the article item. --Succu (talk) 16:40, 13 July 2019 (UTC)
Oh, I 100% agree! We disagree on other things (P5326 (P5326) vs. reference has role (P6184)), but certainly not on this. I always try to find an article item first, if i can! (especially if it's from a journal where a mass import is likely to have happened) Circeus (talk) 01:27, 14 July 2019 (UTC)
How would you annotate eg. a type designation, a sanctioned name (Q7415672) or a emendation (Q1335348)? --Succu (talk) 21:16, 15 July 2019 (UTC)
title (P1476) was not present. --Succu (talk) 17:52, 15 July 2019 (UTC)
Great, thanks you! Christian Ferrer (talk) 17:56, 15 July 2019 (UTC)

Wikispecies Publications templates

Is there currently a way to link the publication templates of Wikispecies to the publications items here? Example:

species:Template:Brandt, 1833aConspectus monographiæ crustaceorum oniscodorum Latreillii (Q64902470)

Would it be appropriate? Christian Ferrer (talk) 14:13, 30 June 2019 (UTC)

The proposal for that exact property could use some more attention. I've got one user who hates me personnally for external reasons and one who seems to not understand the proposal at all... Circeus (talk) 16:48, 30 June 2019 (UTC)
Well, your proposal hasn't currently a lot chance to pass, though I clearly understand it. Firstly because you ask for item data type and none of your examples show what you claim for. Secondly I think you should have pinged the members of the WikiProject Taxonomy too (or instead..). If you want a property to store items about templates, then better to have the concerned items first in that way you can show precise examples. But I guess this is a bit why there is mainly the second opposition, it is because your proposal implies the further creation of items. A kind of property that work a bit as Commons Institution page (P1612) (see the relevant documentation in the talk page) is maybe more adapted. If I opened this discussion it was to potentially find a good possible solution which makes rather consensus, because a property proposal without a clear idea of what the other users potentially thinks about a specific topic is very lucky to bring misunderstanding, frustration and disagreement... Christian Ferrer (talk) 18:09, 30 June 2019 (UTC)
The problem with a Commons Institution page (P1612)-type property is that it does not link items and fails to provide the actual item link between template and item. It will break if the template is renamed, and possibly have issues with work that spread across multiple templates (although admittedly there may not be any clear solution for that), and I am not sure it can look back efficiently from the template to the item (part of the interest is the potential to automatically generate a working reference from the wikidata item). Circeus (talk) 20:53, 3 July 2019 (UTC)

How to proceed with author citation info?

And if "taxon name" is used for things other than the name of a taxon, how will the user be prevented from assuming that this other thing is a taxon? - Brya (talk) 05:08, 28 June 2019 (UTC)
@Brya: All these names were intended to be taxon names, so the property name "taxon name" is appropriate. In fact where these names are obsolete or illegitimate, the group of organisms which they represent is often perfectly clear. One taxon/organism has various taxon names which may or may not be valid, and you are using the term "taxon name" is your own special way, which is wrong. Valid taxon names can also be obsolete and just as misleading as invalid ones.
Anyway this property, or another similar one, is absolutely essential to allow the items in question to be used by software. If the items are illegitimate or irrelevant and should be hidden, that needs to be shown by adding a property or assigning appropriate values to existing properties, not by deleting the claim which defines the name. For instance such obsolete items could be distinguished by setting instance of (P31) to a value like "invalid name" instead of "taxon". The P225 claim is the real identifier of the item and its deletion ruins the item for any software use; the whole item might just as well have been deleted. But Wikidata has to have such items, for instance they are needed for any use of property replaced synonym (for nom. nov.) (P694).
The example of Quercus multinervis makes it clear why it is not acceptable to rely on the language label to define the name referred to by an item. Wikidata is intended for automatic processing and any software needs a clear, rigorous method to find what the name in question is. The property taxon name (P225) is there to meet this requirement whereas deriving the name from a language label would be against the Wikidata philosophy and would require the agreement of complicated new rules. In fact in the case of Quercus multinervis (W.C.Cheng & T.Hong) Govaerts (1998) non Lesq. (1859) (Q6371819), there is a constraint on the basionym claim giving message "An entity with basionym should also have a statement 'taxon name'". This constraint correctly indicates that the item needs a P225 claim.
The idea of deleting the identifiers of items to prevent them being found in searches is fundamentally flawed, and anyway such a course of action should not be followed without first proposing and agreeing a rigorous alternative. For me this situation puts the Wikidata taxonomy project in great doubt. In making these changes you have not only made my author citation exercise impossible, but you have prevented any other software use of the Wikidata author citation data - anyone else trying to load or use the data automatically will run into the same problem. Strobilomyces (talk) 12:03, 28 June 2019 (UTC)
I tend to fully agree with Strobilomyces, either for the fact to consider valid or unvalid names in the same way, and for the fact to not to remove any statements without consensus, (thing that is 1/potentially frustrating for the one(s) who previoulsy done the job and 2/quite inconstructive). Christian Ferrer (talk) 19:15, 28 June 2019 (UTC)
The link to my documentation was wrong above, it is here. Strobilomyces (talk) 20:37, 28 June 2019 (UTC)
The argument "All these names were intended to be taxon names" makes no sense. We have been over that before. There is an indefinite (but huge) number of things that when published were "intended to be names of taxa" but that are discounted. For an example take Nicotiana minor, published by Bauhin in his Pinax. This was definitely intended to be the name of a taxon, it was published in an important book, by a leading botanist, for a species which is accepted today. Still, nobody ever mentions it, and for good reason.
        In order to manage this huge number of things "intended to be names of taxa", there have long been international agreements on what can and what cannot be the name of a taxon (in this case the ICNafp). The "intended to be the name of a taxon" as such is not a requirement for establishing a name; there have been plenty of good, accepted names that were established by accident, unintentionally.
        It is unconstructive to just drag in stuff from a database somewhere and present it as if these are 'real taxa', leaving the user with disinformation, believing in fake species. This is directly counter to the vision of the WMF and the purpose of Wikidata. It is unconstructive not to take part in a discussion of creating and adopting properties that can handle the reality that is out there.
        The basic question above is unanswered: if "taxon name" is used for things other than the name of a taxon, how will the user be prevented from assuming that this other thing is a taxon? - Brya (talk) 04:30, 29 June 2019 (UTC)
@Brya: I had to drag in these names in order to add the author citation and replaced synonym (for nom. nov.) (P694) information following the rules in the tutorial. The exercise was constructive because it highlighted problems in the current data model rules. It is true that this had the unfortunate effect of creating name items for obsolete names and it would be good to have a clear way of distinguishing them. Your use of language is wrong when you say that the obsolete or illegitimate names are not taxon names. For me, an invalid taxon name is a taxon name. A taxon means a group of organisms and you seem to be confusing "names" with the taxa themselves. If a name becomes obsolete or illegitimate it should not be used for most purposes, but it still means the same taxon (and the link to the current name which replaces it for that taxon is often perfectly clear). If indeed the label "taxon name" for this property is wrong, it can be replaced by another label such as "name"; that would not change the logical data structure. But to delete the claim which identifies the item, in this case P225, makes the item completely useless for software and breaks the system.
One taxon typically has several taxon names and thus several items; that does not make all but one of the items fake species. The data structure which was agreed implies many taxon names for one taxon and when we start to use software on these data it is essential to follow the specified data model (and its disadvantages come to the fore). Having agreed that there will be an item for each name, users now have to understand that several items can represent the same taxon/group of organisms. I think the question is not "how will the user be prevented from assuming that this other thing is a taxon?", but rather "how will the user be able to see easily which names for a taxon are obsolete or illegitimate?" As I mentioned above, one correct way of doing that would be to set instance of (P31) to a different value than "taxon" for the items of obsolete or invalid names. Or a new property could be used. But it is not acceptable to delete the P225 property, as that is the property which is used to identify what name the item refers to - the item is useless without it.
In my opinion the currently agreed data structure is poor. It should either have a new type of item for the taxon/organism or one name item should be chosen as special to represent the taxon/organism and hold properties belonging at that level (which apply whatever the name). But having chosen this data model we either have to keep to it rigorously or propose and agree an alternative with a detailed specification.
In order to move forward, I propose the following.
  1. We should agree a new instance of (P31) value meaning "obsolete or invalid synonym". After approval, software searching for names would have to search on P31 = "taxon" or (say) "obsolete or invalid synonym", but people wanting modern names would just search on P31 = "taxon". There would be an activity to identify obsolete or invalid synonym items and to update the P31 value accordingly. Some property constraints which generate messages would need to be updated. I think this would solve the problem of distinguishing the obsolete or invalid name items and making them less conspicuous, which I believe is the principle objective.
  2. It is essential that every name item ("taxon" or "obsolete or invalid synonym") should have a property which defines the name in question. My preferred solution to this is simply to use taxon name (P225). I think your argument that the phrase "taxon name" is not applicable here in the obsolete or invalid cases is wrong, but if it is right my preferred solution would be to change the property label. The data structure of the "obsolete or invalid synonym" item is the same as that of the "taxon" item and it would be a pity to use two different properties when one would do. I think you should propose a solution here and get it agreed, and then the items should be updated accordingly. You should have done this before deleting the P225 claims.
  3. Unfortunately if these two proposals were implemented, there would still often be several valid item names corresponding to the same taxon/organism. The property taxon synonym (P1420) and the new property "is a synonym of taxon name" can be used to represent this, but I think it would be a big improvement to also have a property "claimed current name" (with reference) to show the current name, which is very often not contentious. Strobilomyces (talk) 16:32, 29 June 2019 (UTC)
@Strobilomyces: Do you know nomenclatural status (P1135)? --Succu (talk) 20:46, 29 June 2019 (UTC)
No, that could be used instead of P31 or as a qualifier to indicate the type of "dead" name. Strobilomyces (talk) 15:55, 17 July 2019 (UTC)
"The exercise was constructive because it highlighted problems in the current data model rules." I don't see it that way. It was already clear that this problem exists, and it is only one among many problems: Wikidata is drowning in problems. What we rather need is a way forward to solutions, not highlighting problems.
        Using "instance of: ..." might help a little. However, Wikidata is a database, open to any users, and it is not possible to prescribe to users that they have to start by querying P31. A user may well elect to start by querying P225, and never notice what is in P31.
        I see you have not responded to my example, but just repeated your argument.
        I also don't see why we could not afford separate properties. Say that there are some two million named species. Say that there are some ten million names that are eligible for use as correct names of taxa, depending on what taxonomy is adopted. Say that there are a further two million names that are not eligible for use as correct names, no matter what taxonomy is adopted. Say that there are some two million further pieces of dubious junk. Surely for that many items, Wikidata can afford to have separate properties. There is a separate property for taxon common name (P1843) although these are also "taxon names", by the reasoning presented by Strobilomyces. - Brya (talk) 05:22, 30 June 2019 (UTC)
I agree that what we need is a way forward to solutions, but I think a concrete exercise with software helps with this. To find a good solution we need to identify all the problems, especially logical data structure problems, and a solution should only be acceptable if it allows the software (perhaps with changes) to operate. The solution needs to provide a property which would allow the name of a name item to be clearly identifiable, even if it is an invalid or very obscure name.
Regarding your example, I think (but am not sure) that you are referring to Nicotiana minor. The best solution for this name would be that (as currently) it should remain absent from WD. According to GBIF, it is a synonym of N. rustica, so if someone decides that it has to be in WD it could be added like that. I am not sure exactly what is the question; the person adding it would have to have a reference and they would add it following the reference. Having decided to add it, it would absolutely need a property specifying the name Nicotiana minor, and according to the current rules that would be P225.
The user may want names or actual taxa/organisms. But the only way we have to allow querying on a taxon/organism is through property taxon synonym (P1420) and the new property "is a synonym of taxon name", and that information is mostly absent for the moment - certainly in the case of fungi. The user wanting to query on taxa/organisms has to understand that one taxon typically corresponds to several name items and take into account . Querying on P225 does not help much with this anyway, since one taxon typically corresponds to several valid name items (even if the invalid ones were eliminated). Wikidata is a database and I believe that the users are supposed to be fairly knowledgeable about data modelling, as illustrated by the very complicated data structure agreed for the author citation information.
In your last paragraph you state the main problem very well; I agree to assume that there are two million named species (these are the taxa/organisms) and that there are ten million valid names, two million ineligible names, and two million junk names. These are all scientific names and the comments to property taxon name (P225) make it clear that the latter refers to scientific names, so P1843 is an independent property which may be added to "taxon" items. Anyway we do not need to consider it here and when I referred to taxon names, I only wanted to consider scientific names (perhaps I should have added the word "scientific" more often in my statements above).
Wikidata can afford to have several new properties here if that would solve the problem. However I have another issue which I think may be close to the core of the contention between us.
I think you may be considering the following way of distinguishing the various types of scientific names, which I call method A.
  • If the item is for a valid name, use property P225 "taxon name" and set the property value to the name in question.
  • If the item is for an ineligible name, use new property Pxxx and set the property value to the ineligible name in question, and
  • If the item is for a junk name, use new property Pyyy and set the property value to the junk name in question.
On the other hand, what I think is the normal way to model this situation is as follows (method B).
  • One property (such as perhaps P225) should be used to represent the name of the name item, whether the item is for a valid name, an ineligible name, or a junk name.
  • Another property should be set to different values to distinguish whether this is a valid name, an ineligible name, or a junk name. For instance P31 could be set to "taxon", "ineligible name" or "junk name" respectively in those cases.
The relation to the taxon/organism can be expressed through other properties such as taxon synonym (P1420) with method A or method B equally.
But method A is no way to represent information in a database! I think people accustomed to data modelling would not consider doing it that way. One problem is that it would be necessary to deal with the possibility that an item would get multiple properties out of P225, Pxxx and Pyyy. I think that whilst you are rightly concerned about hiding irrelevant information, there are more fundamental issues which have to be respected and since valid names, ineligible names and junk names are very similar from a data modelling point of view (though they are very different in the importance of the items), one should use the same properties for them if possible. Database people are used to having properties which make records invalid or even "deleted".
Please can you find a proposal to support which corresponds to method B? I think that would advance us a lot in finding a solution.

Strobilomyces (talk) 21:13, 30 June 2019 (UTC)

GBIF mentions a Nicotiana minor that was put into print by Garsault, so this is a red herring (this does not exist either, from a nomenclatural perspective). There does exist a Nicotiana minor Sessé & Moc., but this is much later; and this is the only scientific name with this spelling (although it only is a synonym).
        You adopt a curious circumscription of "scientific name", namely "everything that catches the eye". There are very precise definitions out there, by international agreement, as to what are formally established names ("scientific names"). Users who enter data are supposed to be fairly knowledgeable about the field they are entering data of. - Brya (talk) 10:57, 2 July 2019 (UTC)
My definition of a scientific name is a name which was intended to fit into the binomial system of Linnaeus. Whether the scientific name is established or not is a separate question. An invalid scientific name is a scientific name. Similarly a taxon name is a name which was intended to represent a taxon (or type of organism), and one should not use wording in a way which would exclude an invalid or obsolete taxon name from being a taxon name. This is not curious at all, I think it is normal English, and it especially fits with normal database practice where one identifies a general entity like "taxon name" and then gives it a status property which might be "accepted", "invalid", "obsolete", etc. In order to explain the Nicotiana minor situation you have to mention Garsault's definition; you shouldn't say that it doesn't exist, but rather that it is invalid. The first priority is that the Wikidata data structure should be general enough that Garsault's name definition could be put into Wikidata if that were wanted for some reason. The second priority is that non-notable dross such as Garsault's name definition should be kept out of Wikidata if possible. Strobilomyces (talk) 07:29, 3 July 2019 (UTC)
This is the same as saying that a square circle is a circle, because "this is not curious at all, I think it is normal English, and it especially fits with normal database practice where one identifies a general entity like "circle" and then gives it a status property which might be "square", "triangular", "polygonal", etc."
        That line of reasoning is a good example of Original Research (and a violation of VER). The first priority for Wikidata is that the data structure should allow correct data to be entered. It should avoid forcing the creation of database artefacts. - Brya (talk) 10:58, 3 July 2019 (UTC)
I don't agree with your analogy about a square circle, but you could make it similar by replacing the word "circle" by "geometric figure". A name is still a name if it is unavailable and we need to include unavailable names if we are to add author citation information according to the rules of the tutorial.
We are allowed to do original research to establish the data model here. I don't understand your comment about VER, this is not main space. It is certain that the data structure should allow correct data to be entered, and it should allow all the data to be entered which is needed.
You have blocked my attempt to add fungus author citation information according to the rules of the tutorial and indeed you have blocked any possibility of adding that information in a general way (since anyone else would run into the same problem). The taxonomy project is not meeting its responsibility of allowing the author citation information to be inserted and extracted in the general case. You have not proposed any alternative data structure which could provide a solution, except the proposal under "Another option", which is logically possible but seems strange to me. Is there any way we could move forward from here to allow software to insert and extract the author citation information? Strobilomyces (talk) 16:42, 3 July 2019 (UTC)
It is a very good analogy, the main difference is that everybody learns about circles at school while to learn about names requires one to actively study the topic. Your argument is exactly the same as that a square circle is still a circle if it is square. - Brya (talk) 04:25, 4 July 2019 (UTC)

I also suggest a look at the No Original Research policy which is about not inventing stuff to add as content. And NOR and VER are important here as Wikidata is intended to serve "as central storage for the structured data of its Wikimedia sister projects including Wikipedia, ...". - Brya (talk) 04:46, 4 July 2019 (UTC)

GBIF

  Info: Global Biodiversity Information Facility (Q1531570) is using taxon identifiers from Wikidata (eg. Calibrachoa (Q141632)Calibrachoa). --Succu (talk) 14:30, 13 July 2019 (UTC)

Yes, and it does not require a P225, or an "instance of: taxon", as in Ilex reticulata C.J.Tseng (1984), non Heer (1868). - Brya (talk) 15:37, 13 July 2019 (UTC)
The mapping is based on GBIF taxon ID (P846) not taxon name (P225). Ilex neoreticulata (Q42890446) (=Ilex neoreticulata) had no P846 until now. My bot uses P225 (and other things) to provide P846. --Succu (talk) 16:30, 13 July 2019 (UTC)
So it is a matter of adjusting your bot for those cases where P225 does not apply. - Brya (talk) 16:52, 13 July 2019 (UTC)
No, it isn't. BTW GBIF calls the field „scientificName“ for all cases. --Succu (talk) 17:36, 13 July 2019 (UTC)
I don't see GBIF use „scientificName“ anywhere? - Brya (talk) 04:19, 14 July 2019 (UTC)
As part of the GBIF-API. BTW: The mapping is updated nearly in real time. Fascinating. --Succu (talk) 17:37, 14 July 2019 (UTC)
OK. Well, 1) it is a label behind the screens, for internal use. 2) GBIF deals with a more limited set of names, excluding clade names, so only names within the scope of a Code ("scientific names"). 3) GBIF is a Single Point of View database, so it does not need to make such a distinction. - Brya (talk) 04:40, 15 July 2019 (UTC)
We (=Wikdata) have more than 200 labels „behind the screens, for internal use“. We (=Wikdata) are not a taxonomic database (aka „Single Point of View database“). We (=Wikdata) is a knowledge base (Q593744). We should be able to model the "facts" leading to the "theory" of a flat Earth (Q660936) and why this Q1349367 was rejected. --Succu (talk) 19:37, 17 July 2019 (UTC)
Wikidata is a user-editable database, while GBIF can be editted only by a very select group of personnel. Wikidata labels are out there for everybody to see. Wikidata has a lot of taxonomic information, not from a Single Point of View, but from a Neutral Point of View. Wikidata should be able to hold many kinds of data, with different structures for different kinds of data. - Brya (talk) 04:16, 18 July 2019 (UTC)
„Wikidata should be able to hold many kinds of data, with different structures for different kinds of data“ - Hm. Yes. Exactly! Similar to what GBIF is doing. --Succu (talk) 21:20, 18 July 2019 (UTC)

Item for types

Hello, I start to think about how to create suitable items for types.

Example : here is a holotype in the collections of the Muséum national d'histoire naturelle (Q838691).
the reference number is "MNHN-IE-2013-10358". I think that the minimum for an item should be:

some thoughs?

Christian Ferrer (talk) 21:22, 11 May 2019 (UTC)

Hey! Do you know Rausch 572 (Q19359611)? An early approach to model this. --Succu (talk) 21:29, 11 May 2019 (UTC)
Thanks you, great, I added two properties above. But I think it is likely more suitable to have a property "type" and a property "taxon for this type". No? Christian Ferrer (talk) 21:39, 11 May 2019 (UTC)
"Type" was proposed here and basically ignored. I'm not convinced we ought to create items for type specimens in the first palce when we can't implement anything like a proper taxon-name distinction in the data structure. Circeus (talk) 23:51, 11 May 2019 (UTC)
Yes, the two proposed properties would have merited more discussion (starting with a split into two separated discussions). - Brya (talk) 06:07, 12 May 2019 (UTC)
Agreed, better to discuss a bit to see if we can work all in the same direction, and if we can manage to get some consensus. Christian Ferrer (talk) 06:27, 12 May 2019 (UTC)
Just to be clear: what I mean is that ontologically, it makes no sense for (e.g.) any item that is an instance of (P31)taxon (Q16521) to have subject has role (P2868)protonym (Q14192851) or to be the target of original combination (P1403) or basionym (P566). lion (Q140) and Felis leo (Q15294488) are not different taxa! They literally cannot be separate taxa by the very definition of the property linking them. But there we are, stuck where pretty much every single item about a taxon is conflated with its name and vice-versa. Circeus (talk) 13:35, 12 May 2019 (UTC)
Yes. Maybe we can adopt subject has role (P2868)holotype (Q1061403) of Ophiactis tyleri (Q2272068). See this change at {Rausch 572 (Q19359611). --Succu (talk) 18:13, 12 May 2019 (UTC)
Succu, Yes indeed it is an improvement. But the disadvantage is that it is a little tedious for the uninitiated persons, as well as from the ontological point of view. Imagine if we got a lot of datas (because there is indeed a very big potential) and that one search all specific type specimens available inside a said taxon (example : all lectotype specimens of the order Lepidoptera, ect, ect...), that will be very much more easy to handle with two dedicated properties that will requires statement constraint with each other. And such properties will have the advantage that we can have a kind of control over on how futur users will built such items. The minmum required for taxa items is, though not perfect, rather solid, it is because there are statement constraints. In my opinion we should follow that and we should go in direction of 2 dedicated properties, in the meaning "type (biology) of this specimen" and "taxon to which this type specimen refers". Christian Ferrer (talk) 19:19, 12 May 2019 (UTC)
I think we do need more modeled examples. We do not have ones with respect to International Code of Nomenclature of Prokaryotes (Q743780) and International Code of Virus Classification and Nomenclature (Q14920640). --Succu (talk) 21:14, 12 May 2019 (UTC)
Ok, I just read the interesting parts for International Code of Nomenclature of Prokaryotes, the concept is exactly the same, there are types (holotype, neotype), though they talk more about strain than specimen, example neotype strain. At first view it seems possible to handle that with a property "type of this specimen (or strain)" whose constraints encompasses to accept all that is "subclass" of type (biology), and of strain (Q855769). For the other constraint I was talking about "requires statement constraint with each other", and therefore, for the second property I was talking about that was needed ("taxon to which this type specimen refers") it does not seem like a problem because bacteria taxa... are indeed taxa... Otherwise, if needed, the scope of second property should be extended without more problem than in the first. But we can call it ("taxon to which this type specimen (or strain) refers").
For the viruses, they don't seems to have this concept at all in their nomenclature (serotypes, genotypes are entirely different concepts), the only thing that they talk about is the need of a "type species" for the creation of a new genus. At first sight it does not interfere. Christian Ferrer (talk) 22:26, 12 May 2019 (UTC)
I will try to create an exemple for a bactery. Christian Ferrer (talk) 04:53, 13 May 2019 (UTC)
Our modeling of bacterial strains needs some overhaul too. :( --Succu (talk) 20:27, 13 May 2019 (UTC)
Yes, Escherichia coli CFT073 (Q21365228) seems a quite good approach, IMO Christian Ferrer (talk) 10:52, 14 May 2019 (UTC)
A general point: types are not properties of taxa, but of names. Taxa do not have types, names have types. - Brya (talk) 04:36, 13 May 2019 (UTC)
To the extent that we use only one item for the taxa and for the name...and since that this item is instance of taxon, it does not really matter, we don't have the choice. See discussion above... maybe we should start by that. Christian Ferrer (talk) 04:53, 13 May 2019 (UTC)
It may, or may not, make a difference in practice, but it is wise to keep the distinction in mind. - Brya (talk) 05:42, 13 May 2019 (UTC)
That is exactly why we should create such taxon/scientific name items, to keep/highlight that distinction. My comment above was a bit sarcastic. Christian Ferrer (talk) 10:43, 13 May 2019 (UTC)
A good database worthy of the name, is obliged to consider all scientific names at the same level, at one time or another. That we use one way or another is not a problem for me. But if we create separate items for the names, then we have to do it for all the names, otherwise the data is corrupted and arbitrary. If we don't want to separate the names from the taxa items, well very fine, WoRMS do it and that also is my favorite option, but in that case we must accept to create taxa items for the not correct names otherwise the data is not less corrupted and arbitrary. The consequences are that the properties proper to the names, included relative type specimens, become somewhat related to the taxa item that contains the name. It does not shock me more than that, but okay let's keep that in mind. Christian Ferrer (talk) 18:20, 13 May 2019 (UTC)

@Brya, Succu: after to have looked some bacteria relative items such as Bradyrhizobium sp. ORS278 (Q21384106), and as a strain is not strictly speaking a taxon rank, I think we could have a property "strain taxon lineage" whose accepted values would be taxon items. We could also have a string property "strain name". What do you say? Christian Ferrer (talk) 21:10, 26 May 2019 (UTC)

Maybe. Names of strains are not formally regulated, and I don't know if there are well-established customs. - Brya (talk) 04:57, 27 May 2019 (UTC)
At the moment I try to incorporate culture collection (Q64062850) from Culture Collections Information Worldwide (Q64072881). Another source is Culture collections of prokaryotes (bacteria). Strains are (have to) deposited there and have something like a inventory number (P217). Not sure about the best solution. --Succu (talk) 20:16, 27 May 2019 (UTC)
I created ATCC 15468 (Q64144321) = CCUG 25939 (Q64144615) the type strain of Aeromonas punctata (Q3506457) (= Bacillus punctatus (Q64143980) [basonym], Pseudomonas punctata (Q64144078)) and Aeromonas caviae (Q16825697) Sic!
This strain is known as NRRL B-968, DSM 30190, LMG 3775, Popoff 545, JCM 1060, CIP 76.16 and A 309 too. --Succu (talk) 21:25, 28 May 2019 (UTC)
Looks as if ATCC 15468 (Q64144321) and CCUG 25939 (Q64144615) should be the same item, as it seems to be the same strain (therefore the same concept), but with several "names". Hence the potential utility of a property "strain name" allowing possible mutiple values. And to bounce with the rest of the discussion, I see hat you created "instance of" "strain" and "type", maybe should we have "instance of" "type strain", the only question being : how (or, do we want) to highlight the difference between type strain and the neotype strain, exemple Oscillochloris trichoides neotype strain DG-6, a neotype is supposed to replace a type. Christian Ferrer (talk) 05:12, 29 May 2019 (UTC)
A neotype is one of (about) three kinds of types. - Brya (talk) 11:10, 29 May 2019 (UTC)
@Brya: what are those three kinds of types? I thought I heard only about holotype AKA "type strain" and about neotype. Christian Ferrer (talk) 18:10, 30 May 2019 (UTC)
Under all Codes, a type can be a holotype, a lectotype, or a neotype (depending on the Code, there can be minor additions). - Brya (talk) 05:09, 31 May 2019 (UTC)
Lectotypes are rejected in 1982 from this Code. --Succu (talk) 20:43, 31 May 2019 (UTC)
It has paratypes within a type series, but no mechanism to designate one of those a formal, single-specimen type, right? Just checking I'm remembering things correctly. Circeus (talk) 22:54, 31 May 2019 (UTC)
I stand corrected, the Prokaryote Code has removed the term "lectotype", so that there are only two kinds of types: holotype and neotype. There still are some provisions that do something similar to lectotypification, but not under that term. The concept of "strain" is much less restrictive when it comes to material that is allowed as a type than under the botanical and zoological Codes, so that the mechanisms for selecting a lectotype are (mostly) superfluous.
        For all the talk about creating a single Code for all organisms, it proves that the existing Codes are diverging rather than converging. - Brya (talk) 03:50, 1 June 2019 (UTC)

strain

Looks like the term reference strain (Q64159332) is commonly used. --Succu (talk) 18:15, 30 May 2019 (UTC)
Yes indeed. I missed that. Christian Ferrer (talk) 18:46, 30 May 2019 (UTC)
We now have type strain (Q64149240). For a possible usage of neotype strain (Q64159621) please see Oscillochloris trichoides (Q62869887). --Succu (talk) 20:50, 29 May 2019 (UTC)
@Succu: IMO taxonomic type (P427) is not adapted here, indeed I think we have to differentiate the two clear concepts that are in summary "type taxon" (→ e.g. the type genus of a familly) and "type specimen (or strain)" (→ e.g. a holotype of a species, or a neotype strain of a species). taxonomic type (P427) seems to be adapted (seems to have been done) for the fisrt case. Christian Ferrer (talk) 21:57, 30 May 2019 (UTC)
I would rename taxonomic type (P427) as "type taxon" with a constraint only for taxon items (and not included the types as currently), and I would create a property "type specimen/strain" that accept as values the types (but not the taxon items). Indeed, to say that a taxon is the "type taxon" of a parent taxon is not the same thing than to say this specimen is the holotype of this taxon. I don't see a single reason to use the same property for the both concept, furthermore IMO the wikidata ontololy allow us to create a property that accept all the needed items (botany, zoology, bacteria) as value, as soon as these values are subclasses of type (Q3707858) or even something else if needed, but not subclass of taxon → keep that for taxonomic type (P427). In addition I would create a property "taxon to which this element refers" to allow us to structure properly the link between a strain or a specimen to a taxon item. Christian Ferrer (talk) 16:51, 31 May 2019 (UTC)
For me the generalization of type specimen (=Q51255340) and type taxon (=Qxxx) as nomenclatural type (=P427) works very well. --Succu (talk) 20:07, 31 May 2019 (UTC)
Is there any Code defining the term type taxon? --Succu (talk) 20:33, 31 May 2019 (UTC)
The exact details are very complicated across Codes. The zoological and Prokaryote Code know "type species" and "type genus", and the zoological Code does define them (but the definitions don't make things easier). The botanical Code does not know "type species" and "type genus", but uses a mechanism that is for all practical purposes indistinguishable. It is perfectly OK to say that the type of the name Fagus is Fagus sylvatica, but this is just shorthand for saying that the type of the name Fagus is the type of Fagus sylvatica. So strictly speaking, a label "type taxon" would be inaccurate.
        The term type specimen comes with lots of problems, as well. Firstly, under the zoological Code, a type specimen is not necessarily a type. Secondly, when the type is not a taxon, it does not follow that it then is a type specimen.
        But having two properties, one with data type item, and one with data type string may be a good idea. - Brya (talk) 04:25, 1 June 2019 (UTC)
NHM 2011.2080 (Q54854611)
instance of (P31) type specimen (Q51255340)
New property "taxonomic type of the specimen/strain → holotype (Q1061403)
New property "taxon to which this element refers" → Siamcyclops cavernicolus (Q41167961)

And same principle for the bacteria strains :

instance of (P31) strain (Q855769)
New property "taxonomic type of the specimen/strain → neotype strain (Q64159621) or type strain (Q64149240) or reference strain (Q64159332) or no values if needed
New property "taxon to which this element refers" → Q|xxx

Christian Ferrer (talk) 05:39, 1 June 2019 (UTC)

The "found in taxon" of NHM 2011.2080 (Q54854611) is a matter of taxonomic judgement, and should be referenced. One taxonomist may refer this to one taxon, while another refers this to another taxon. Also, in principle there is no upper limit to the number of names that a particular specimen can be the type of, so "holotype of ..." is not a handy label (potentially leading to labels like "holotype of ... and holotype of ..., and lectotype of ..."). Specimens are usually referred to by a collection number, in a particular institution. - Brya (talk) 07:31, 1 June 2019 (UTC)
In all case "found in taxon" is not adapted nor for our current discussion neither for the example linked, as it is another concept. This is why I claim for a new property "taxon to which this element refers", as for the name I agree that it is (arbitrary here) not adequat, it is the specimen (NHM 2011.2080) quoted in https://doi.org/10.5852/ejt.2018.431 . Christian Ferrer (talk) 08:36, 1 June 2019 (UTC)
Still, types do not refer to taxa, but belong to names. - Brya (talk) 13:24, 1 June 2019 (UTC)
One can not say that I have not tried. I think I'm done. Christian Ferrer (talk) 14:15, 1 June 2019 (UTC)
In fact the properties constraint of inventory number (P217) are not well done (or I missed a specific goal), currently the constraints said "use inventory number (P217) only if you use also collection (P195) (item requires statement constraint)" and "use collection (P195) as a qualifiers each time you use inventory number (P217) (mandatory qualifier constraint)", that is clearly redondant. We should chose to use inventory number (P217) as a qualifier or not, but not both IMO. What is the best? Christian Ferrer (talk) 08:41, 27 July 2019 (UTC)
I think Wikidata:WikiProject sum of all paintings makes heavily use of this kind of modeling. Not sure why. --Succu (talk) 18:29, 27 July 2019 (UTC)
@Succu: I got an answer. Christian Ferrer (talk) 18:51, 27 July 2019 (UTC)

Report on Index Fungorum Wikidata Fungus Author Loader exercise

  WikiProject Taxonomy has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

From December 2018 to May 2019 I started to carry out a small project to load author citation information of fungi from the Index Fungorum database. My original proposal to do this work is here on the Taxonomy Project discussion page. After adding author citation information for over 4800 fungi I stopped because user:Brya deleted a claim that I had created, invalidating the data structure assumptions which I needed. My unsuccessful discussion with Brya on how to continue is here. In case anyone is interested or able to benefit from my experience, I have now documented the work I did, starting here. My main conclusion is that the data structure rules for holding the author citation information are too complicated to be workable in a project like Wikidata, which does not have a central authority to enforce rigorous rules. I would be interested in any comments, and especially in any advice as to how this sort of operation could be carried out successfully. Strobilomyces (talk) 16:12, 25 June 2019 (UTC)

We should drop the assumption that „"taxon name" provides a value that can be used in a Wikipedia taxobox as the correct name of a taxon“. This POV is mainly enforced by Brya. In fact WD items tagged with taxon name (P225) and instance of (P31) are about names of all kinds and not only about the "correct" name applicable to a taxon.
The removal of P225 and related properties after Opinion 2430 (Q64006730) (see backlinks) was published this year omits more information than we can retrieve now. --Succu (talk) 20:18, 26 June 2019 (UTC)
Another example is Quercus multinervis (W.C.Cheng & T.Hong) Govaerts (1998) non Lesq. (1859) (Q6371819) (well known, P225 etc. removed) vs. Quercus multinervis Lesq. (1859) (Q61913992) (badly known, with P225). --Succu (talk) 19:07, 27 June 2019 (UTC)
The removal of P225 and related properties after Opinion 2430 (Q64006730) (see backlinks) was published this year omits a lot of disinformation that will now fortunately no longer confuse the user. And that never should have been included in the first place. - Brya (talk) 05:08, 28 June 2019 (UTC)
E.g. the name Anuraea longistyla (Q18609406) was published 1859 by Ludwig Karl Schmarda (Q114564) in Neue wirbellose Thiere beobachtet und gesammelt auf einer Reise um die Erde 1853 bis 1857 (Q64864016) at page 62. I doubt this information would „confuse the user“. Omitting it certainly will. --Succu (talk) 19:35, 28 June 2019 (UTC)
No, by definition no zoological name "Anuraea longistyla" was established in 1859 by Ludwig Karl Schmarda (Q114564) in Neue wirbellose Thiere beobachtet und gesammelt auf einer Reise um die Erde 1853 bis 1857 (Q64864016) at page 62. Schmarda published a Neue Turbellarien, Rotatorien und Anneliden beobachtet und gesammelt auf einer Reise um die Erde 1853 bis 1857 in which he mentioned an "Anuraea longistyla" but this only caused confusion, and finally appropriate measures were taken by the relevant international body. As a result, it is now clear that there is no such zoological name, and the reader should not be led to believe that there is. Database quality depends first and foremost on eliminating error. - Brya (talk) 04:50, 29 June 2019 (UTC)
I don't agree that there is no such zoological name. Rather it is a bad name and no useful taxon is associated with it. But the name still has to be remembered for nomenclatural purposes, and it is clearly a name and zoological. You are mis-using the words when you say "there is no such zoological name" instead of "this name is invalid or useless". Strobilomyces (talk) 16:45, 29 June 2019 (UTC)
Until the publication of Opinion 2430 (Q64006730) these names were available name (Q4827436). Now all names within the phylum Rotifera published before 1 January 2000 not placed at the lists List of Available Names in Zoology, Candidate Part Phylum Rotifera, genus-group names established before 1 January 2000 (Q64875973) and List of Available Names in Zoology, Candidate Part Phylum Rotifera, species-group names established before 1 January 2000 (Q64876016) are a available name (Q4827436). I don't think it's confusing to use nomenclatural status (P1135) with unavailable name (Q7882332) to annotate (qualifiy) taxon name (P225). All is fine then. If you like you can use reference has role (P6184) to distinguish between Neue wirbellose Thiere beobachtet und gesammelt auf einer Reise um die Erde 1853 bis 1857 (Q64864016) and Opinion 2430 (Q64006730). The result is an informativ an queryable item. --Succu (talk) 18:22, 29 June 2019 (UTC)
@Strobilomyces. I am not sure it is relevant that you "don't agree that there is no such zoological name." By international agreement, sanctioned by the UN, etc, there is no such zoological name. There was a very extended (close to twenty years) and very careful procedure, before it was agreed to delete it. It is no longer there. Of course it is very regrettable that you don't agree, but it is like saying "I don't agree that the meter is a useful measure". There are many who believe the metric system is not useful but it does exist, nevertheless.
The international agreement does not say that the old names no longer exist, but rather that they are unavailable. It is not possible to delete a name (itself), only to delete it from a particular list. It is a matter for each database to decide whether the old names are still wanted in the given database, but anyway they exist and will still have to be recorded somewhere. The metric and imperial systems of units both exist and you are like someone who dislikes the imperial system and tries to say it does not exist in order to prevent it from being used. Strobilomyces (talk) 09:09, 3 July 2019 (UTC)
@Succu. Before the publication of Opinion 2430 (Q64006730) these names were species inquirenda (Q3766304), and a problem. Now, the situation has been cleared up, and these names stopped existing, from a nomenclatural point of view. Possibly, it may seem attractive to set "nomenclatural status: no value", as they have no nomenclatural status. But they are not zoological names.
Brya (talk) 04:58, 30 June 2019 (UTC)
The lists accepted with this Opinion contains species marked as species inquirenda (Q3766304) as available name (Q4827436). So this explains not your removal. After your intervention we know about Gomphogaster areolatus (Q18610370) only that this item is an instance of species inquirenda (Q3766304)… --Succu (talk) 18:42, 30 June 2019 (UTC)
Yes, the lists do not eliminate all problems, but only some of them. They leave some species inquirenda for later taxonomists to puzzle over. And yes, all we know about Gomphogaster areolatus (Q18610370) only that this item is an instance of species inquirenda (Q3766304), and that it should not have been included in Wikidata in the first place. - Brya (talk) 04:15, 1 July 2019 (UTC)

I've made some improvements at Gomphogaster areolatus (Q18610370). … --Succu (talk) 21:29, 1 July 2019 (UTC)

Any further comment, Brya? --Succu (talk) 21:32, 20 July 2019 (UTC)

Recurring disagreements

I find it unfortunate that there is no environment (is there one?) where we can test different configurations, a place where we can create at will properties and ontologies.

No one is perfect and we can all have good ideas and the best solution could be an aggregates of several ideas. Such a complex topic as biology deserve IMO a bit more than recurring disagreements or too decided and too quickly opinions on property proposals.

It would be very great to have a project where we can built, test and improve different models.

Christian Ferrer (talk) 18:33, 29 June 2019 (UTC)

There is https://test.wikidata.org. But I don't think this will help. --Succu (talk) 18:51, 29 June 2019 (UTC)
Yes I searched, before to write this and I already noticed it, but indeed it is not intended for such purpose, it is for developpers. Christian Ferrer (talk) 19:04, 29 June 2019 (UTC)
As far as I can see there is no SPARQL support, so testing „different models“ will not work. --Succu (talk) 19:23, 29 June 2019 (UTC)
Yes, I started to try to use this for development but I gave up because there is no SPARQL support. Perhaps we should request a SPARQL endpoint for the test Wikidata. Another difficulty I had was that all the P numbers for properties and all the Q numbers for items are different in the test database. Well, perhaps you can say that I was not programming in a general enough way, but it makes everything very inconvenient. Strobilomyces (talk) 21:04, 30 June 2019 (UTC)
Yes, I agree. Testing would be very helpful. - Brya (talk) 04:59, 30 June 2019 (UTC)

homonyms

I think we should formulate some questions (aka use case (Q613417)) first. Questions that our data model should be able to answer. A first one:
How to get a list of homonym (Q902085) that includes author(s), year(s) and place of the publication?
--Succu (talk) 19:33, 1 July 2019 (UTC)
To get this list, 2 possible solutions each homonym have to be instance of homonym (Q902085), or maybe better, a property "homonym of" (with reverse constraint) have to be created. And of course as taxon author is a qualifier of taxon name only possible on a taxon item, then each homonym have to be instance of taxon and with a taxon name and a author. Or (so for the twentieth time), to manage to make this kind of querry, the scientific names must be out of the taxon items, for the purpose that you can consider all names (accepted or not) at the same level. One or the other. Christian Ferrer (talk) 04:50, 2 July 2019 (UTC)
To be clearer and more concise, each of your homonym items have to be "instance of" taxon, + a property "taxon name" + qualifiers "taxon author" and "year" + a property "publication in..." and then a new property " homonym of" (with a reverse constraint. Whether names are accepted or not. And it would work perfectly, and that's the only really viable solution, and the same principle can be applied to all the other properties that we need about the taxonomy. Christian Ferrer (talk) 11:19, 2 July 2019 (UTC)
Actually, a list of "instance of: homonym" is very simple to produce. There are no such cases. - Brya (talk) 04:08, 3 July 2019 (UTC)
@Succu: I don't think I understand your question. In the present data model, homonym taxon items will all have {P|225} set to the same value. You can make a SPARQL query using "HAVING(COUNT(?xxx)>1)" to find these sets of items and the qualifiers taxon author (P405), year of publication of scientific name for taxon (P574), etc. to find the authors, etc. I don't think homonyms should be identified with a property; that would be redundancy. But what sort of answer are you asking for? Strobilomyces (talk) 08:07, 3 July 2019 (UTC)
Plant names can legally be homonyms of animal names, so we have to support that case. Within plants/fungi etc., only one of the homonyms should be "available" for current use; the unavailable names may be legitimate or not (e.g. Agaricus politus Bolton is legitimate but not available). But I suppose there can be competing authorities which disagree as to which name is available. Anyway we have an item for each name and to include illegitimate and unavailable names (for instance for basionyms and replaced synonyms). Each name item needs to have a property which defines the name and according to the present data model, this is P225. Strobilomyces (talk) 08:37, 3 July 2019 (UTC)
At the moment only hemihomonym (Q36033662) are allowed to have a taxon name (P225) per Bryas POV. Usually taxon name (P225) etc. are removed from a item that represents a later homonym (Q17276484) or earlier homonym (Q21651662). later homonym (Q17276484) is used 970 times. Only twelve item provide P225. --Succu (talk) 18:01, 3 July 2019 (UTC)
Could you say where the rule (that only hemihomonym (Q36033662) has a taxon name (P225)) is documented? If a taxon name item has no P225, I think it is useless for software. I don't see how you can work with that. Strobilomyces (talk) 20:00, 3 July 2019 (UTC)
Nowhere. And I agree: P225 should not removed. Having (at least) year of publication of scientific name for taxon (P574) for items about later homonym (Q17276484) or earlier homonym (Q21651662) makes them more useful. --Succu (talk) 20:29, 3 July 2019 (UTC)
For me the worst problem is that for software it is absolutely necessary to have a way of knowing the (bad) name which the item is about, but I think that is missing. For instance in the case of Macrones Duméril (1856) non Newman (1841) (Q1092022), there is no way of knowing without more rules that the item is about the name "Macrones". If items for this group of fish were reloaded automatically, a duplicate item would be made for this sense of "Macrones" because the software would not recognise that this item already exists. I think that the complete scheme for dealing with these names (which I call the data model and which in this case is very complicated) should be written down and commented on before implementing it with the real data. I think it is strange that P460 and P1889 are qualifiers not separate properties, but that is at a lower level of importance. Strobilomyces (talk) 10:53, 5 July 2019 (UTC)
Yes, we need a new property for that, as has been recognized for a long time. - Brya (talk) 05:18, 6 July 2019 (UTC)
To model what?! „recognized“ by whom? --Succu (talk) 21:26, 6 July 2019 (UTC)
To represent something that exists in reality, and that is not rare. And recognized by me, here, publicly, and commented on by many. - Brya (talk) 03:51, 7 July 2019 (UTC)
Do you know any data-/knowledgebase working your way? I don't. --Succu (talk) 20:27, 8 July 2019 (UTC)
There is no data-/knowledgebase working like Wikidata. But taking account of the differences in database structure, Tropicos in principle does this. - Brya (talk) 03:54, 9 July 2019 (UTC)
Tropicos Web Services is a lame duck. --Succu (talk) 21:16, 10 July 2019 (UTC)
I have no idea how well Tropicos Web Services works. No doubt it is easier to service a Single-Point-of-View database. But Tropicos is a valuable/respected database. - Brya (talk) 04:42, 11 July 2019 (UTC)
Tropicos annotates names (=P225) with symbols !! = nom. cons., ! = Legitimate, ** = Invalid, *** = nom. rej., * = Illegitimate and keeps the rest like authors and publication. --Succu (talk) 20:01, 19 July 2019 (UTC)
We need a property to enable the software to find the actual name, but it seems to me that the logical property to use is P225. In the section below I discuss the example of Cortinarius callochrous (1821) (Q49601374), for which the taxobox generated by Module:Taxobox will fail if the P225 claim of its "unavailable combination" basionym is deleted. A different property should be found or created to let dead names be bypassed. Strobilomyces (talk) 17:57, 13 July 2019 (UTC)
The most obvious solution would seem to accept at least one new property. But there may be any number of solutions that can be made to work. - Brya (talk) 06:40, 14 July 2019 (UTC)
What should be expressed with your new fictitious property? Are there any constrains you have in mind? --Succu (talk)
@Brya? --Succu (talk) 21:39, 20 July 2019 (UTC)

forward

Both later homonym (Q17276484) or earlier homonym (Q21651662) represent a breakdown in the naming of taxa. That is, they mean that something went wrong, leaving a broken down entity that does not function, in the naming of a taxon. In general there is no reason to refer to it, and mostly it would be better if Wikidata did not have an item for it. In some cases a later homonym (Q17276484) or earlier homonym (Q21651662) may be notable, but that does not alter the fact that the most important thing about it is that it does not function in naming a taxon. - Brya (talk) 04:39, 4 July 2019 (UTC)
But the breakdown in the naming of taxa shouldn't and needn't imply a breakdown in the working of Wikidata, but that is what has happened. You agree that sometimes Wikidata has to have items for the bad names and for me that means that there needs to be a data structure which covers all the cases and for which the specification is publicly available. Otherwise it is impossible to write software which loads or interprets the information, but that is what Wikidata is for. It would help a lot to give a list of what values of instance of (P31) are possible for all the (maybe invalid) taxon name items, and for each possibility how the (possibly invalid) name of the item is found in the properties, and what minimal set of properties is needed to make a valid item of that P31 value.
Surely if we separate off items which are "later homonyms" and "earlier homonyms" using property instance of (P31), we should also separate off items introduced because they are "replaced synonyms" or "basionyms" in the same way? Anyway, I think the urgent requirement is to write down the whole scheme. I thought that the tutorial was fulfilling that role, but you are making changes which are not consistent with my interpretation of the tutorial. I suggest that we could make a new sub-page of Project Taxonomy and write a new specification starting from the list of P31 values that I proposed just above. Then we could propose and consider different versions. Strobilomyces (talk) 10:53, 5 July 2019 (UTC)
"But the breakdown in the naming of taxa shouldn't and needn't imply a breakdown in the working of Wikidata, " no, it need not. But there is a need for at least one new property if there is to be an appropriate structure for "a broken down entity that does not function, in the naming of a taxon."
        "It would help a lot to give a list of what values of instance of (P31) are possible for all the (maybe invalid) taxon name items," this depends on what is meant by this. Taken at face value, this is simple: any item for a (possibly) correct name of a taxon must hold an "instance of:taxon". However, presumably this question does not mean what it says (it contains the word "invalid" which by itself is guaranteed to cause confusion). For names that are not formally established (and which are not scientific names) "designation" (for algae, fungi, and plants) and "unavailable name" (in zoology) are used. For dead, ineligible names that are formally established names (and which are scientific names), but that can never be used as the correct name of a taxon, there are more. These include "later homonym", "earlier homonym", "nomen illegitimum", "unavailable combination", "unranked name", "nomen utique rejiciendum", "superfluous name". An "isonym" or an "orthographical variant" don't exist as names either, but are, more or less, imperfect reflections of an existing name.
        And no, we should not "also separate off items introduced because they are "replaced synonyms" or "basionyms"" because that does not mean that they cannot be perfectly correct names of taxa. Being "replaced synonyms" or "basionyms" of some other name may be completely irrelevant and non-notable. - Brya (talk) 05:18, 6 July 2019 (UTC)
Your list of types of dead name is useful, but you imply that it may not be complete. Please could you give a full list of all the types of dead names which could require items for them? By the way, higher on this page Succu gave a similar list which was: "later homonym", "nomen illegitimum", "unavailable for use", "nomen utique rejiciendum", "preoccupied name" (= "earlier homonym"??).
Can a name which was not formally established require its own item in Wikidata? If so, please could you give an example? Anyway I think such a name does not need to be considered for the author citation information.
I think "unavailable combination" only applies to a species name or below, but a higher rank name could be unavailable. Wouldn't it be better to call this "unavailable name"?
I believe that an unranked name can be a basionym and a valid name (example: Agaricus multiformis δ claricolor). In fact I think almost any of the types of name might also be unranked. Surely it would be better to drop this as a possible value for P31 and model this by allowing property taxon rank (P105) to have a value meaning "unranked"?
The category "superfluous name" worries me; it seems to me that at least in the legitimate superfluous case this may depend on a question of synonymy and so may depend not only on relatively clear nomenclatural decisions but also on more contentious taxonomical ones. A superfluous name may be illegitimate or not. It would be a great simplification to the data model if we could make these possible values of P31 mutually exclusive; otherwise we need to define what combinations are possible and define the necessary properties for all combinations.
A "nomen utique rejiciendum" seems to me to be also an "earlier homonym"; please can you explain the relation between these types?
The orthographical variants are a problem, but I don't think they should have separate items or P31 values. I think they either need a new property or a qualifier of taxon synonym (P1420) to include them properly.
Property instance of (P31) is very important for all users of Wikidata and it is a big disadvantage to make the definition of it complicated for everyone to understand. It would be a big benefit to be able to find all the name items related to the Taxonomy Project with a simple query. An alternative would be a system where all these name items have one out of only two P31 values: either the "taxon" item (as at present), or an item meaning "invalid or unavailable taxon name". In the latter case the reason why the taxon name is bad, such as "illegitimate name", "unavailable combination" etc. could then be given through a qualifier of the P31 property or through a separate property. These items are very similar from a data processing point of view and I think that that would be a great improvement. Would you consider changing to such a system? Strobilomyces (talk) 13:34, 6 July 2019 (UTC)
Yes, lots of these don't merit items of their own. But mostly they were not created de novo, but they have been forced on Wikidata because a Wikipedia has a page on it. Remember that if any user makes an erroneous page on any project, it will end up here. Nothing we can do about that. There are a lot of projects that have erroneous pages, for fictitious taxa. Often errors peculiar to that particular project, and often caused by one of just a handful of users.
        The list may well be incomplete; Wikidata grows in its own way. An "unavailable combination" is a name below the rank of genus, and as far as I know unranked names are always combinations, and unavailable for use. An unavailable combination may serve as a basionym, and so may an unranked name, but they never can be correct names. As to "superfluous name", in theory you have a point, but in practice the term "superfluous name" is used only for illegitimate names. For legitimate names a phrase like "superfluous when published but correct for ..." will be used. A "nomen utique rejiciendum" can be just about anything, but not an "earlier homonym"; an earlier homonym comes into existence by conservation of a later name with the same spelling.
        The point of using one (or two) collective term(s) like "ineligible scientific name" and adding qualifiers bears thinking on. It is a variation on the question if subproperties are viable. - Brya (talk) 16:31, 6 July 2019 (UTC)
I can understand that new types of name might show up in the future, but I think we need a list of all which are known at present. In fact if these name types are P31 values, any software operating on these items will need to be kept up to date with the list. This is a strong argument for using P31 = "ineligible scientific name" for such items, since probably the software could treat all these name items in the same way (provided that they all have the same identifying property and other relevant information to the software is defined in a uniform way).
If the relevant superfluous names are illegitimate, please could we drop the categorization "superfluous name" and just use "nomen illegitimum" (where it may be illegitimate due to being superfluous or for some other reason)?
In order for software to handle these items (for instance in order to load author citation information automatically), we need a specification which lists these types of name item and defines how to find the property which identifies the name and the other relevant properties. It would be best if the same data structure were used in all cases. Can you say how we could advance with a proposal like that? Strobilomyces (talk) 12:19, 7 July 2019 (UTC)
Actually, I see no indication that P31 is of any importance to software. Clearly, the taxobox module here ignores P31, and looks only for P225: if P225 is present, the item will be presented as an accepted taxon. I see no reason to assume other software may not operate on the same principle. The key to avoid providing desinformation is to make sure P225 is not present.
        Changing "superfluous name" to "nomen illegitimum" will lose information, and once it is lost the user will not be able to retrieve it. - Brya (talk) 16:10, 7 July 2019 (UTC)
What kind of "information" do we will loose here? I have an idea, but some examples at your side maybe more helpful.--Succu (talk) 21:05, 10 July 2019 (UTC)
It would tell the reader only that a name is illegitimate, but would lose the information that this is because of Art. 52. To find out why the name is illegitimate, of several possible reasons, the reader would then have to go other sources, if he knows how to use them. - Brya (talk) 05:01, 11 July 2019 (UTC)
At the moment superfluous name (Q29995613) has four usages. At the original proposal of nomenclatural status (P1135) you commented: „nomen superfluum (nom. superfl.): useless distinction? = nom. illeg.“ Looks like you changed your mind. :) --Succu (talk) 18:18, 11 July 2019 (UTC)
@Brya: I am not sure whether the taxobox module which you are referring to is w:en:Module:Taxobox. Please could you give an example of a page which actually uses the taxobox module which you refer to, whether it is that one or another one? Strobilomyces (talk) 09:38, 11 July 2019 (UTC)
The Wikidata module is here. IIRC it has been copied to some Wikipedias as well. By installing it as a gadget it will produce a taxobox on every item that uses P225. - Brya (talk) 11:01, 11 July 2019 (UTC)
(PS: use "edit" to access source code) - Brya (talk) 16:48, 11 July 2019 (UTC)
@Brya: Please could you tell me where I can find instructions as to how to install it as a gadget? There does not seem to be a publicly available gadget for this; do I have to write it myself? I think that a taxobox is just as applicable for an invalid taxon name item as for a valid one. Strobilomyces (talk) 08:59, 12 July 2019 (UTC)
They are linked to, on that page [4]. - Brya (talk) 16:24, 12 July 2019 (UTC)
@Brya Thank you, now I understand the gadget.
You say "Actually, I see no indication that P31 is of any importance to software", but it depends on the specification whether P31 is important and we do not have a clear or consistent specification. The software should follow the specification.
In fact Module:Taxobox is a good example of the reason why it is necessary to have a P225 claim also for ineligible names. There is no reason for it to take into account P31 - provided that P225 is present, it will also generate a taxobox for an ineligible scientific name, which is a logical and useful function which I for one would find useful. But anyway, deleting the P225 makes it completely impossible to write software to generate a taxobox for an ineligible scientific name. A reasonable design would make that possible, even if it is not wanted at present. After deleting the P225 claim the item is completely useless for any software, since the software will have no way of finding what name is being referred to.
And Module:Taxobox derives the author citation and so it suffers from the same problem as I have with my fungus author loader when the P225 claim is missing. For instance, Cortinarius callochrous (1821) (Q49601374) is an example of a taxon of which the basionym, Agaricus callochrous (1801) (Q63079874), is an unavailable name. The author citation in the taxobox of Cortinarius callochrous works fine at present because Agaricus callochrous has the P225 claim, but if that claim were deleted it would stop working (it would not generate the part of the citation in parentheses).
The practice of deleting the P225 property, which contains the primary key needed by software, is totally inappropriate from a data modelling point of view and it has broken Module:Taxobox as well as my author citation loading exercise. If we need to make ineligible scientific name items less prominent, we need to find a way of doing it that avoids this problem. Strobilomyces (talk) 17:38, 13 July 2019 (UTC)
Well, anybody writing software may use whatever catches his eye, no matter what specification is drawn up by somebody. And for that matter, anybody can draw up his own specification. As to "it will also generate a taxobox for an ineligible scientific name, which is a logical [...] function": yes, that is exactly the problem. It will spread disinformation; uninformed users will find this disinformation "logical and useful" and will spread it, and very great harm will be done. There is nothing as damaging as logic being applied to the wrong material. Garbage in, garbage out.
        There is a huge difference between regarding names from a nomenclatural perspective and regarding taxa from a taxonomic perspective. From a nomenclatural perspective, names are formal entities and there are lots of kinds: many have little practical use and should get as little exposure as much as possible outside nomenclature. There are long-standing international agreements (ICZN, Q13011, ICNafp, Q693148, ICNP, Q743780, etc) as to what kind of name can be used under what circumstance, starting with what it takes to formally establish a name in the first place. There is no call to start making up stuff, violating the No Original Policy of Wikipedia's.
        Taxa are groups of organisms, defined from a scientific, taxonomic viewpoint (named by applying the relevant Code of nomenclature). Taxoboxes present a classification of a group of organisms, from a particular scientific, taxonomic viewpoint.
        Databases out there tend to be 1) taxonomic, from a Single-Point-of-View, presenting correct names (from a particular taxonomic point of view) and synonyms. That is fine. Or, 2) nomenclatural, presenting names, no matter if these can be used for taxa. In itself, this is fine as well. The two should not be confused.
        The dominant feature of Wikidata is that it "acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, ...". That means that the most important thing for taxonomic-related data in Wikidata is that it must be able to generate taxoboxes for Wikipedia pages (and secondarily, also lists of taxon identifiers from other databases). Conversely this means that any Wikipedia page on a fake species (and there are may of these) must get an error report of some kind. Certainly, it should not be able to generate a taxobox for a fake species, giving credence to a myth. - Brya (talk) 05:20, 14 July 2019 (UTC)
@Brya "Well, anybody writing software may use whatever catches his eye, no matter what specification is drawn up by somebody. And for that matter, anybody can draw up his own specification." This is completely wrong; it is essential to have a rigorous specification in order to write software. If software developers act as you suggest, they will make conflicting assumptions and the whole system will suffer more and more errors. Having agreed that there the names for one taxon/organism should have separate name items, it is necessary to use that system consistently; the users need to understand that not every name represents a real taxon. There is no disinformation there. Logic has to be applied based on the specification; the software will expose any errors in the logic.
We need to have a rigorous nomenclatural database which can contain any of the names, and then the taxonomic information should be built on top of that using the synonym properties etc. That is how Index Fungorum/Species Fungorum works and I think it is the only way that a nomenclatural + taxonomic database like Wikidata can work given the decision that every name needs a separate item.
Taxoboxes are applicable to names rather than taxa/organisms. A taxabox for an invalid name gives the hierarchy and author citation information for that invalid name, which is perfectly logical and appropriate. It does not say that that name represents an independent taxon - having agreed the structure of an item per name it is clear that the current name may be different. With the present data structure the correspondence needs to be represented through properties like P1420 or P694 and their inverses. The invalid name typically does clearly represent a known modern taxon/organism with its current name item; that item can be used to represent the taxon/organism. When you focus on invalid names you are only addressing a fraction of the problem anyway, since there are many obsolete/dead names which are actually valid.
At the nomenclatural level there are no fake species here, only invalid names or valid non-current names (the latter can also give the impression of extra species). The WP projects, which presumably want one page per taxon/organism, have the problem of deciding which names are synonyms and what is the current name (to use as page title), but Wikidata cannot solve that for them and eliminating invalid names is only a tiny part of that problem. Assuming that Wikidata contains the information as which names are invalid, it must be possible to change the taxobox module to indicate that - that would be much better than having it fail, since the taxobox module is perfectly applicable to all name items, whether dead or not.
The invalid name items do not give misinformation, and when you delete the P225 claims from them it makes them useless for all software which may need to use them. A solution to make them less prominent can be found, but it needs a rigorous specification to be defined beforehand for all the software. You have broken the taxonomy part of Wikidata, as illustrated also by the section on GBIF below. Strobilomyces (talk) 12:35, 17 July 2019 (UTC)
+1; "That is how Index Fungorum/Species Fungorum works" : no this is how work almost (every?) taxon databases... and Brya's answer that is to say "we are not a single point of view database" is inconvenient rather than twice. 1/it is easy to consider all names as to be taxon item and to indicate what is their status within a specific source (in the case of contradictory information according to the sources you put several values, and maybe we should have a property "name status" which allows to say the status of the name 2/ if we are the only one database to do that (to consider different class of names), we tend to become precisely this "single point of view database", as nobody else is really doing that. 3/If we consider that a taxon item is the record of a concept (false, unacceptable or valid), so then there is no issues at all.... 4/I don't see how the fact to remove "instance of " a "taxon" is helpfull in anyway to make us a good "multiple point of view database". and 5/(sorry to be a little direct) but this look much more to be the Brya single point of view. Christian Ferrer (talk) 17:49, 17 July 2019 (UTC)
+1 @Brya: In 2013 there were a lot of slogans like „dominant feature of Wikidata is“ to get WD starting. It worked and our project is imported far beyond Wikimedia. I think the GBIF integration of WD is similar importance of that one for Virtual International Authority File (Q54919). The restriction of taxon name (P225) the way you like it makes the interoperability between Databases and papers nearly impossible. --Succu (talk) 19:59, 17 July 2019 (UTC)
The main page states that the dominant feature of Wikidata is that it "acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, ...". And in fact many Wikipedia's are indeed importing data from Wikidata and are debating importing more data. So, I see no reason to ignore this. As far as I know all Wikipedia's are using taxoboxes, and clearly these are understood to provide taxonomic information, the classification of the taxon treated on that Wikipedia page. The statement "Taxoboxes are applicable to names rather than taxa/organisms." is completely divorced from the reality that exists in Wikipedia.
        Of course it is possible to build a strictly nomenclatural database, but then it must be kept in mind that this requires that it is separated from the taxonomic use. The first step to building a strictly nomenclatural database is to stop using P225 and to create a new separate property. The moment that a field for scientific name in a nomenclatural database is also used for the correct name of a taxon is the moment that it stops being a strictly nomenclatural database. Personally, I find it undesirable to build a strictly nomenclatural database within Wikidata, as 1) there are already strictly nomenclatural databases out there, like Index Fungorum and just copying them is a copyright violation, and 2) strictly nomenclatural databases are full of junk, that is best forgotten, and that clearly is not notable. - Brya (talk) 04:47, 18 July 2019 (UTC)
I have not suggested that Wikidata should only be nomenclatural; it should hold both nomenclatural and taxonomic information. For the nomenclatural information it should have an item for each name - that is agreed - and the taxonomic information should be added using the same items with extra properties such as P1420. The nomenclatural and taxonomic parts should be integrated together; that is the data structure we have (as described in the Wikidata:WikiProject_Taxonomy/Tutorial) and I think that is the only sensible possibility. The Wikipedia users naturally want one page per taxon/organism, but since there is one item per name in Wikidata, and there may be several names, valid or invalid, the Wikipedia user has to select one of the names for the taxobox and the title of the page. That is the responsibility of the Wikipedia user, who should know which names are current, and Wikidata cannot help much with this process. The taxoboxes do not currently provide much taxonomic information; mostly they show name-based information. They could be enhanced to show if the name is legitimate, or (if the information were in Wikidata) whether the given name is current according to particular taxonomic viewpoints. Eliminating illegitimate names would not help much; it would still be necessary to have the taxobox work for alternative valid names for the same taxon/organism and it is clear that taxoboxes are applicable to names rather than taxa/organisms. This is in line with the reality of Wikipedia; the Wikipedian has to choose one of the names.
So your last paragraph is not relevant; we need a combined nomenclatural and taxonomic database. Information should only be copied from Index Fungorum if there is a reason, but we need to provide the data structure which could accommodate any of the names if needed. A name property is needed for all the name items and much the best solution is the current one, to use P225. It is not the most important priority to prevent the users from being able to select invalid names; they are responsible anyway for making a selection and Wikidata cannot do much to eliminate wrong names. A higher priority is that the taxonomy/nomenclature part of Wikidata should have a consistent data structure, which is violated with the deletion of P225 claims. Regarding the taxobox, the best way to help the user avoid bad names would be to enhance the taxobox module to indicate the nomenclatural status, for instance using nomenclatural status (P1135) or P31. It would be necessary to agree the exact data structure and update the data and the change would apply to other software. Strobilomyces (talk) 10:34, 18 July 2019 (UTC)
@Christian Ferrer: Wikidata is unique among databases in that it is intended to feed into other Wikimedia projects. Other databases have other purposes. Pragmatically speaking the problem is that users go to an existing database and want to copy its content into Wikidata one-on-one, without taking into account the differences. As I pointed out, most databases out there tend to be 1) from a Single-Point-of-View, presenting correct names (from a particular taxonomic point of view) and synonyms. Or, 2) nomenclatural, presenting names, no matter if these can be used for taxa. In itself there is no objection to using such databases as sources, as long as proper care is taken to adjust the content to proper formats. When copying stuff from WoRMS the format Genus (Subgenus) specific name must be adjusted to Genus specific name. So also with more fundamental aspects.
        Wikipedia's subscribe to NPoV, Neutral Point of View, and anyway different Wikipedia's subscribe to different taxonomic points of view. Therefore, either Wikidata must accommodate NPoV, or abandon its link to Wikipedia's. And it is not true that there are no other databases that do something similar, Tropicos does aim to do the same, and has done so for a long time. - Brya (talk) 05:26, 18 July 2019 (UTC)
"...is intended to feed into other Wikimedia projects" we are not here to filter infos, otherwise we are at the opposite of NPV, this have to be done with querries.
To create an item "taxon" for an unvalid name, and then to find a way to retrieve that info that is "this name is unvalid within this source" is absolutly not against a neutral point of view. And as pointed by Strobilomyces Wikipedias can easily avoid such invalid names. A quick example here made by me who is not a specialist : c:Goniasteridae, the simple sentence "filter not exists {?item wdt:P31 wd:Q1040689.}" avoid all items that are instance of synonym. This can be completed easily with other selected values if needed.
It's an understatement to say that there is no consensus to remove taxon name (P225). At this point we should be legitimate to reinstall those statements, and then to ask further help (administrator's help) if you continue work alone in toward an opposite goal. Christian Ferrer (talk) 11:28, 18 July 2019 (UTC)
@Brya: „Wikidata is unique among databases in that it is intended to feed into other Wikimedia projects“. But not limited to it. And somehow downgraded from the ontology approach to a link container (my POV). --Succu (talk) 20:44, 18 July 2019 (UTC)
@Strobilomyces: it is really odd to see you write: "The nomenclatural and taxonomic parts should be integrated together" while every time this comes up you fight this tooth and nail, indicating that you want to promote nomenclatural entities to become taxa, and even things that do not exist from a nomenclatural perspective to become taxa as well. Any attempt to start a discussion on how to integrate nomenclatural and taxonomic parts, without confusing the two, is quickly aborted.
        Of course it is true that any user who creates a page in a Wikipedia, or makes an edit in a Wikipedia, is responsible for the content of this page or edit. But that does not mean that a user who makes an edit in Wikidata is not responsible for his edit, or for the errors this causes in a Wikipedia or for the errors it supports in a Wikipedia. In practice, Wikipedia's are known for having many errors, and there are at least hundreds, but more likely thousands, or tens of thousands fake species, which only exist digitally. It is very wrong for Wikidata to support these fake species with fake data. - Brya (talk) 03:36, 19 July 2019 (UTC)
@Brya: I certainly think that the nomenclatural and taxonomic parts should be integrated together, and I said above how I think this should work. I do not want to promote nomenclatural entities to become taxa, but I think it is agreed that one taxon/organism can have several names which each require their own items; no-one should think that all the name items represent separate taxa. Using P31 = "taxon" doesn't affect this - we can't expect the labels to be as literal as I think you would like, and I suggest that the label should mean "taxon or name intended for one". In my opinion it would be useful if you could propose a complete specification of a data structure which would solve all issues for you. Presumably it is impractical to move to a model where is one item for a taxon/organism and separate items for all its names, though that would be logical; therefore we have to accept multiple name items for one taxon/organism. The names for a given taxon/organism need to be linked together through P1420, P566, P694, etc. Use of P1420 (and its inverses) indicates the current name (according to a given taxonomy view which should be referenced) and I suppose conflicting taxonomies can be represented with multiple P1420 claims. Dead names can be indicated using nomenclatural status (P1135) and the taxobox module could be upgraded to show this conspicuously. I think it would be useful to have a new property "claimed as current name by" with the value being the organisation which supports the given taxonomy. So in that way the nomenclatural and taxonomic parts need not be confused.
Wikipedians should be working from a reference which gives their taxon names and so they should not normally be dead. I think the best help which can be hoped for is that Wikidata will show the defunct status of such names. I see that above you gave an example of a fake name from the Swedish Wikipedia, but that is a result of their automatic generation of millions of WP pages which leads to low quality and that problem does not apply to manual pages. They aren't fake data in WD, they are data which need to be understood correctly. The situation is complicated in reality and WD should not have a data model simpler than the reality; Wikidata cannot be the driver for the Wikipedians. The users need to have a certain level of understanding and I think that you exaggerate the difficulty which that causes. Strobilomyces (talk) 17:01, 19 July 2019 (UTC)
@Christian Ferrer: I think that "To create an item "circle" for a square, and then to find a way to retrieve that info that is "this square is a circle within this source" is absolutely [...] against" VER and NOR. - Brya (talk) 04:11, 19 July 2019 (UTC)
@Brya: I think you overstretching metaphors like squaring the circle (Q193394) a little bit. --Succu (talk) 20:22, 19 July 2019 (UTC)
Well, I think it is understating things. - Brya (talk) 05:54, 20 July 2019 (UTC)

interleaving

@Strobilomyces: we agree on lots of things in principle, like
  • "Wikipedians should be working from a reference which gives their taxon names": yes, and their references should provide lots more information, as well. The reality is that very many of the pages on taxa (very likely an absolute majority) is created by bots. Svwiki (also cebwiki, warwiki) reports on every such page that it is a bot product, but other wiki's don't do this for their bot pages.
I hope it is OK that I interleave my comments with your points. There is no significant contention between us here, but I think that svwiki, cebwiki & warwiki are very special cases and massive automatic generation is not used much elsewhere. These projects are reaping the natural consequences of their policy of generating the pages automatically without consistent human review. Strobilomyces (talk) 15:22, 20 July 2019 (UTC)
In general, interleaving is bad and reduces readability, but I guess we could make an exception. It does not really matter if svwiki, cebwiki & warwiki were an exception: Wikidata still has to deal with them. But they are not an exception: nlwiki and viwiki do this as well, and maybe others. In addition, ptwiki and zhwiki have lots of entries that are incomprehensible. And there is nothing to prevent a user to manually create a page on an ineligible name: there have been cases where this was done deliberately (and they still are there). - Brya (talk) 04:37, 21 July 2019 (UTC)
  • "it is agreed that one taxon/organism can have several names which each require their own items", yes, there may be different correct names for one taxon, depending on different taxonomic viewpoints, different frames of reference.
OK. Strobilomyces (talk) 15:22, 20 July 2019 (UTC)
  • "no-one should think that all the name items represent separate taxa.", yes, that applies to the name items. However, all the items with "instance of: taxon" or having a P225 statement should be (potential) taxa. That is, they should be correct names from a particular taxonomic viewpoint.
I agree that incorrent names could be distinguished by having "instance of:" equal to another value (not "taxon"), but all the items should have a property which defines the name which they refer to, and the simplest and best solution is to require the P225 claim in all cases. Much software (for instance loading WD downloading, generating citation strings etc.) needs to refer to all name items including dead ones and it is a big disadvantage in terms of complication and clarity to use a different property if the item refers to a dead name. I do not think it is too much of a problem that users have to take another property into account in order to eliminate dead names. This is the main issue between us. Strobilomyces (talk) 15:22, 20 July 2019 (UTC)
Depends, "simplest" when it comes to generating author citation, maybe. But "simplest" when giving information on taxa and for creating error reports on ineligible names in Wikipedia's, no: it makes it hellishly complicated. - Brya (talk) 04:37, 21 July 2019 (UTC)
  • "The names for a given taxon/organism need to be linked together through P1420". If this means all the names that can be listed in synonymy, and for which there is an item in Wikidata, then yes. And this should now be possible. And very likely, we should be able to also list synonyms with datatype "string", as well. If you mean something different by "names for a given taxon/organism", then no.
I think items for dead or valid names should be created in WD only if there is a reason - if the name is notable or needed as a basionym etc. What I meant corresponds to what you say and I think we agree on this. Strobilomyces (talk) 15:22, 20 July 2019 (UTC)
  • "I think it would be useful to have a new property "claimed as current name by" " Yes, although this would be most useful if backed by taxonomic monographs, flora's, etc, not by mere databases.
OK. Strobilomyces (talk) 15:22, 20 July 2019 (UTC)
  • "WD should not have a data model simpler than the reality", yes, that is what I have been saying all along. It is not what you propose. - Brya (talk) 05:54, 20 July 2019 (UTC)
I disagree with you in your last point too (that your view is simpler than my proposal). I will be away from WD for a few days after today. Strobilomyces (talk) 15:22, 20 July 2019 (UTC)
The issue is "simpler than the reality" your proposal is simpler than the nomenclatural/taxonomic reality. It eliminates a very real distinction. The issue is not who comes up with the simplest (most oversimplified) model. - Brya (talk) 04:37, 21 July 2019 (UTC)

nomen rejiciendum / nomen utique rejiciendum

Probably all of this cases have to be modeled per qualifier nomenclatural status (P1135) and should cite the according decision. --Succu (talk) 20:22, 8 July 2019 (UTC)

In fact I'm more interested in how we can model the relationship between a nomen conservandum (Q941227) and a nomen rejiciendum (Q17276482). --Succu (talk) 20:35, 8 July 2019 (UTC)

Yes, it would be helpful if an item for a nomen rejiciendum (Q17276482) had a clear way to indicate the nomen conservandum (Q941227) it is rejected against. - Brya (talk) 04:01, 9 July 2019 (UTC)
Any idea how this could be done? --Succu (talk) 20:58, 11 July 2019 (UTC)

One of the options would be to make nomen rejiciendum a subclass of "synonym", and then to use "instance of: nomen rejiciendum" with a qualifier "of :[nomen conservandum]". - Brya (talk) 16:31, 12 July 2019 (UTC)

For a nomen rejiciendum isn't the better phrase conserved against? --Succu (talk) 19:56, 13 July 2019 (UTC)
The phrase conserved against would be for use in an item for a nomen conservandum. I am assuming that in general it would not be worth having items for all nomina rejicienda (a lot of these are very obscure), so I am looking for a way to mark items for the nomina rejicienda that we do have.
        In essence, a nomen rejiciendum is a synonym of the nomen conservandum, or can be if the taxonomic situation calls for it. So, this is one way to model it. But a new property "rejected against" is also possible. The exact name will be more complex "scientific name rejected against a conserved name"? - Brya (talk) 06:53, 14 July 2019 (UTC)

Using subject has role (P2868) is another option (see Aechmea (Q131754) / Hoiriri (Q65574171)). --Succu (talk) 20:37, 15 July 2019 (UTC)

But a very inelegant one, Aechmea is a prominent name (well worth protecting) and it is quite awkward to describe it as having a role for a name that nobody has heard of and that is best forgotten. This is a reversal of the real situation, in as far as Hoiriri is worth mentioning, it is as a name that threatened Aechmea. It looks to me that Wikidata would be better of without Hoiriri (Q65574171). - Brya (talk) 04:24, 16 July 2019 (UTC)
Within the decision(?) from 1906 (name it actor) both names had/played a role (=P2868). The role of Hoiriri (Q65574171) is that of a rejected name (thrown away). The role of Aechmea (Q131754) is that of a conserved name (more worthful). More (uncomplete) examples are Alpinia (Q150572) and Collema (Q150980). --Succu (talk) 19:21, 17 July 2019 (UTC)
This "role" is not a separate, independent property: it exists in relationship to something else. Aechmea is a prominent name (well worth protecting) and it has been conserved. There might be any number of rejected names that a conserved name is protected from, and this number may be altered by adding or deleting rejected names, without any consequences for the conserved name. - Brya (talk) 04:24, 18 July 2019 (UTC)
Indeed: The process of conserving Malus domestica (Q18674606) and our modeling here is a good example. --Succu (talk) 19:51, 19 July 2019 (UTC)
Reverted twice with the comment „It is a conserved for a well-known taxon, not a conserved name of rejected names“ by you. What makes it a „well-known taxon“? --Succu (talk) 21:21, 20 July 2019 (UTC)
Apples are sold in what must be any vegetable shop and supermarket in the temperate regions of the world. That makes Malus domestica a well-known taxon. Very likely just about anybody over the age of eight years in Europe, the US, etc will recognize an apple.
        The relationship "subject has role: conserved name of [rejected name]" is a very wrong representation of the real relationship in this case. It is worse than using "subject has role: employer of [any employee]" in say the US Ministry of Defense, listing everybody who is employed there. The US Ministry of Defense at least is the employer of many people, although (in a description of the US Ministry of Defense) it is irrelevant exactly who. To a conserved name it makes no immediate difference what the rejected names are, and there need not be any rejected names at all. I doubt that "subject has role: " is appropriate for rejected name at all. - Brya (talk) 04:19, 21 July 2019 (UTC)
I think we do not talk about apple (Q89) (fruit). And clearly not about United States Department of Defense (Q11209). It's about trying to model the relationship of names. --Succu (talk) 21:25, 21 July 2019 (UTC)
Any modelling for a nomen conservandum that leaves out the taxon involved is really far out. There would never be any nomen conservandum, unless there is a well-known taxon involved. It is the very reason of its existence.
        And ignoring analogies is not a sign of strength either. - Brya (talk) 04:21, 22 July 2019 (UTC)

nomen dubium

Should nomen dubium (Q922448) really be in there? As far as I'm aware, a nomen dubium is really either just an informal nomen rejiciendum or an untypified name (less commonly a name that should really have its type conserved). In any case, this is not an actual status under the ICBN (and quite possibly, not under any other code either). Circeus (talk) 06:18, 9 July 2019 (UTC)
A nomen rejiciendum is a formal status. But indeed, in the ICNafp a nomen dubium is not mentioned. I did not see it mentioned here, either. - Brya (talk) 16:39, 9 July 2019 (UTC)
Sorry, I wasn't clear, I meant it is currently one of the exclusively allowed target item for nomenclatural status (P1135). Circeus (talk) 20:54, 9 July 2019 (UTC)
Yes, the list of the exclusively allowed target items for nomenclatural status (P1135) looks rather haphazard. - Brya (talk) 04:40, 10 July 2019 (UTC)

But speaking pragmatically, nomen dubium is not too bad as a nomenclatural status. It is likely that there will be some publications that will label some names like that, so it does not really hurt to record that. - Brya (talk) 11:00, 10 July 2019 (UTC)

Please note Property_talk:P1135#Missing values. --Succu (talk) 21:09, 10 July 2019 (UTC)
Yes, given that a nomen dubium effectively does not relate to a taxon, it will work better to treat it as an ineligible name. - Brya (talk) 04:47, 11 July 2019 (UTC)

central question

It seems everybody goes to great lengths to avoid answering the central question: if Wikidata's "taxon name" is used for things other than the correct name of a taxon, what will prevent a user (at a Wikipedia, or elsewhere) from assuming that this other thing is a taxon? What error report will this user get when consulting a page on a completely fictitious species like [5]. - Brya (talk) 05:34, 18 July 2019 (UTC)

If they track them via Category:Wikipedia categories tracking data using Wikidata (Q21981953) something useless. For Albertia Regel & Schmalh. (1877) non Schimp. (1837) (Q19838818) you decided in 2015 - without giving any references - that this version should represent the knowledge about this genus. Early this year we got Albertia (Q61480486). I enhanced and corrected both items. A simple query can now provide the information that the first is a later homonym (Q17276484) of the latter. Of course this is a problem regarding principle of priority (Q2110868). Probably the first name is better known than the fossile? name. It's not upon us to judge cases like this. You did this for the genus and „on a completely fictitious species like“ Albertia margaritifera (Q19724746) (IPNI). --Succu (talk) 20:14, 18 July 2019 (UTC)
I see you took action so that the taxobox for a page on a completely fictitious species like [6] will not produce an error report. Camouflaging errors practically ensures that they will never be corrected. As far as I can tell neither of the names Albertia is anything like well known, and the later homonym is not even well known as a synonym. And indeed we should not judge cases like that: it has been established a long time ago that the later Albertia is a later homonym and that there is no species Albertia margaritifera (and can never be). It not upon us to ignore this well-documented reality and to create a fake species Albertia margaritifera
        And as for my "decid[ing] in 2015 - without giving any references - that this version should represent the knowledge about this genus.", I have indicated time and time again that we should have a better structure to handle such cases, but none of my attempts to discuss this were taken up. - Brya (talk) 03:56, 19 July 2019 (UTC)
Brya, could please tell me where this error report was produced before my additions? --Succu (talk) 18:20, 19 July 2019 (UTC)
There would have been an error report if there would have been an attempt to use Wikidata in the taxobox; now there will not be. - Brya (talk) 05:11, 20 July 2019 (UTC)
So there wasn't any. But ist up to the Wikipedias (one of our consumers) creating a notification like this: "Hey you are using a nomen rejiciendum (Q17276482) for correct/valid name of a taxon. Is our article about correct/valid name of a taxon? </sarcasm> --Succu (talk) 20:15, 20 July 2019 (UTC)
It is up to Wikipedias to not to create too foolishly and too hastily articles made by bots.... and even if we can keep this fact somewhere in our mind, that should not be our major concern IMO. It is up to them to avoid the item "instance of" "later homonym", and it is up to us to tag those items with "instance of" "later homonym" (or other relevant values) in addition of "instance of" "taxon". I see that Succu have reinstalled the initial properties in Albertia Regel & Schmalh. (1877) non Schimp. (1837) (Q19838818), properties removed by Brya. @Brya I agree with that and I putted this item in my watchlist, you removed statements without consensus, and there is still no consensus (quite the opposite) right now. Do not remove those statements again, or I'm afraid that we will run into problems. If you don't manage to make us understand your point of view, so then live with that and do not impose it on us. Christian Ferrer (talk) 21:13, 20 July 2019 (UTC)
"But ist up to the Wikipedias (one of our consumers) creating a notification like this: "Hey you are using a nomen rejiciendum (Q17276482) for correct/valid name of a taxon. Is our article about correct/valid name of a taxon?". No, that is a wildly unlikely expectation, nothing like that is going to happen unless Wikipedia's stop being publicly editable projects. If a Wikipedia has an entry on a fake species and Wikidata supports this with fake data then that Wikipedia's users will be fine with that. - Brya (talk) 04:53, 21 July 2019 (UTC)
Sorry, Eurygaster confidens (Q16981898) is a real hoax species. --Succu (talk) 21:38, 21 July 2019 (UTC)
So? - Brya (talk) 04:22, 22 July 2019 (UTC)

correct name (Q3342920)

correct name (Q3342920) needs some improvements. --Succu (talk) 21:45, 13 July 2019 (UTC)

Added some stuff. high-level abstract stuff like that is hard to have a lot of statements on, though. Circeus (talk) 01:35, 14 July 2019 (UTC)
It should be a subclass of scientific name, not an instance of. If it’s an instance of something, it should be an instance of « scientific name type (or class)». Likewise for all types of scientific names.
I think it’s pretty easy to see if you take an example of correct name (Q3342920). You’ll see that it’s both an instance of correct name (Q3342920) and an instance of scientific name (Q10753560) … so the relationship between the relationship between correct name (Q3342920) and scientific name (Q10753560) should be subclass of (P279) (by definition of « subclass of », if any instance of B is also an instance of A, then B is a subclass of A) author  TomT0m / talk page 07:37, 19 July 2019 (UTC)
@TomT0m: Sukhaya Rechka (Q1753560) is a river (Q4022). --Succu (talk) 20:06, 19 July 2019 (UTC)
woops, I meant scientific name (Q10753560), a « 0 » seems to have misteriously disappeared /o\ author  TomT0m / talk page 08:38, 20 July 2019 (UTC)

  Info correct name (Q3342920) and valid name (Q3342956) now use the same set of properties. --Succu (talk) 19:10, 27 July 2019 (UTC)

The related pair validly published name (Q17134993) / available name (Q4827436) is modeled differently. --Succu (talk) 19:18, 27 July 2019 (UTC)
Return to the project page "WikiProject Taxonomy/Archive/2019/07".