Wikidata:Property proposal/ICZN code properties

ICZN code properties edit

Originally proposed at Wikidata:Property proposal/Natural science

   Not done
Data typeString
Domaintaxon
Exampletbd
SourceICZN compliant database, scientific articles
Planned usetbd
See alsoscientific name Search
Motivation

There is a recurring debate on the naming of taxons on Wikipedia. Some very active people on the taxonomy project, naming Brya and Succu are engaged into a strong commitment to reflect datas about taxons following the ICZN code, which is fine. The problem is that they have a strong commitment and a tendency to reject any other stuff that can be found outside in literature as a mistake and something that should not be found in Wikidata. The opinion of the proposer of this is that it's not out job to qualify scientist work as mistake if they do not follow the ICZN rule and that it's perfectly fine to import those datas as long as they are published. The compromise that is the goal of the proposal is to make clear which datas are correct according to ICZN by making clear which properties are supposed to stricty follows the code and to let the "scientific name" be more relaxed and be able to reflect the actual datas, even if they could be considered a mistake for someone. It's a clarification and a possibility for everyone to work the way he wants and using the code he wants, to respect the possibility to cross different datas according to Wikidata - in line with Wikidata beeing a database that store tons of identifiers in all kind of fields.

This proposal is a starting point and I'll let the experts discuss the best way to model the genus and so on according to ICZN, for example use a qualifier in a statement like "ICZN genus" of a property "ICZN name", or use an item for genuses. I hope we'll end up in a result that satisfies everyone.

author  TomT0m / talk page 10:49, 28 January 2017 (UTC)[reply]

Discussion

Comment - As I pointed out on the other page. Wikdata and Wikispecies are not publications for the purposes of nomenclature as defined by the code. Therefore they cannot make nomenclatural decisions on the availability of names. To do so requires a valid nomenclatural act. All Wikidata and Wikispecies can do is follow current usage unless a valid nomenclatural act can be cited. If that current usage is confusing or even in error, a note of this can be made but current usage must be followed in the end. I agree that for taxonomic names Wikimedia should follow the various nomenclatural codes, however they must follow them completely, which includes realizing what you can and cannot do on a web publication that does not meet the requirements of the code for making nomenclatural decisions. Cheers Scott Thomson (Faendalimas) talk 11:22, 28 January 2017 (UTC)[reply]

@Faendalimas: Sorry, I don't get your point. How is this related to this proposal ? author  TomT0m / talk page 11:35, 28 January 2017 (UTC)[reply]
I am making a comment on the motivations for the proposal. You mention the wish by some to follow the code explicitly and to the exclusion of other information that may or may not agree with the code. My point is that due to its publication model Wikiedia cannot make this distinction and must include current usage for a taxon. Even if it is technically wrong. We cannot reject names that are wrong if they are in current usage. We cannot change the nomenclature of any organism to bring its nomenclature in line with the code. Cheers Scott Thomson (Faendalimas) talk 12:04, 28 January 2017 (UTC)[reply]
@Faendalimas: It seems we agree on this. My point with this proposal is to use this property only when the name is (very probably) in line with UICN rules to avoid ambiguities, not to avoid the use of other names which is indeed impossible. The idea is to retain the other used names in the existing taxon name properties. Does it seems to be a good idea ? author  TomT0m / talk page 12:25, 28 January 2017 (UTC)[reply]
A point in general: this issue does not come up exclusively with the zoological Code, the ICZN. It happens also for the Code for algae, fungi and plants, the ICNafp and for the prokaryote Code, the ICNP. These have differing definitions for homonyms and the like, but, if we don't look too closely, these are compatible.
        The problem in general is that under these three Codes there are a lot of names (formal entities) that may not be used for a taxon, no matter what taxonomic position is adopted. Wikidata has items on some of these 'inoperable' names: 1) some of these names serve a functional purpose, 2) some of these names come from careless imports, and 3) some of these names are to some degree in use.
        This is separate from the issue that, depending on taxonomic tank, circumscription and position there may be many names which can be applied to a taxon. - Brya (talk) 12:56, 28 January 2017 (UTC)[reply]
@TomT0m: I am fine with it. We just need to have a clear understanding on its usage, and in particular what we can and cannot do with respect to the code. As @Brya: points out there are multiple codes also and they have different definitions, they are independent of each other. We must at all costs refrain from being the source of a confusing nomenclature, most people who use these names have no understanding of how they are formed and governed. They do not need to. They will go to whatever source they can to get a name, including Wikimedia projects. If it is not the one in usage it can cause major issues and confusion. Cheers Scott Thomson (Faendalimas) talk 13:27, 28 January 2017 (UTC)[reply]
@Faendalimas, Brya: Then I guess we should add specific properties for the different nomenclature codes, and somehow make them "subproperties" of scientific names Search. People who just don't know can then use "scientific name" and expert would refine and validate (or not) by creating a (hopefully sourced) statement with the specific property. In SPARQL it's then possible to retrieve statements and their value for scientific names by a query as simple as
select ?nameprop ?name where {
  ?nameprop wdt:P1647* wd:P225 .
  ?nameprop wikibase:directClaim ?namepropdirect .
  ?item ?namepropdirect ?name . 
} limit 100
Try it!

, even excluding those who are using P225 and not a property associated with a code by changing the "*" in the query by a "+". author  TomT0m / talk page 14:37, 28 January 2017 (UTC)[reply]

@TomT0m: the information on the different Codes has long since been included, and is called up by every use of the Taxobox module. I just wanted to point out that it is not a matter of the ICZN only.
@Scott Thomson: in general terms I can agree with "We must at all costs refrain from being the source of a confusing nomenclature, most people who use these names have no understanding of how they are formed and governed." and "If it is not the one in usage it can cause major issues and confusion." However, for many taxa there are several names in use (in varying degrees). The one thing Wikidata must not be is a Checklist, a source for "right names". - Brya (talk) 16:09, 28 January 2017 (UTC)[reply]
@Brya: absolutely, we must never look like a checklist. We do not have that intent and must always appear as reviewers of the data, not creators of it. Yes there are taxa with variable names, this is unfortunate and is often these days as much ego based as being genuine mistakes. The codes are not perfect, all we can do is report what we find as best as possible. Following the code as a guideline, but realizing we are in no position to act on it is our best path. Cheers Scott Thomson (Faendalimas) talk 00:24, 29 January 2017 (UTC)[reply]

TomT0m, so you want to downgrade taxon name (P225) to a name string („A literal string of characters representing a taxon name. These may include authorships, rank indicators and other qualifiers, and concept qualifiers [...]“) as Richard L. Pyle (Q21340682) put it? --Succu (talk) 19:41, 28 January 2017 (UTC)[reply]

I want nothing. I don't really care if this is a string or something else coded with qualifiers and items, actually. author  TomT0m / talk page
To be blunt TomT0m, then shut up. --Succu (talk) 20:43, 28 January 2017 (UTC)[reply]
I would, if there were not so many stuff that felt wrong in the area of Wikidata. Including the way you answer. Don't excuse yourself to be blunt, it's your usual behavior. author  TomT0m / talk page 09:55, 29 January 2017 (UTC)[reply]
I have said it before, so I will repeat it here: I do feel we need a new property, for scientific names that cannot be used as the correct name of a taxon. But there does not seem to be much support for it. - Brya (talk) 17:42, 29 January 2017 (UTC)[reply]
I am not disagreeing with anyone here. But I am extending the issue a bit. What people often do not get is that names are really mononomial not binomial. We present them binomially and they can only be presented that way, ie Genus species. But that binomial is made up, in this case, of a genus and a species. These two names have their own references, their own synonymies. When I move a species from one genus to another I am not changing its name per se, I am changing its combination. However the names are still their own entities. So the question to ask of the database, are you going to list valid and invalid names; available and unavailable names. The valid names in the currently accepted combination is the correct binomen for a species. So how do you display these different names. How do we search for them. How does anyone know at a glance, and with no knowledge of the Code know what type of name it is. @Brya says that he sees no reason for a new property, fine I can support that. But does the current system clearly indicate the differences between the names and I would suggest highlighting the valid combination over others in some way. Is the current method effective, clear and does not have issues of confusion. If it is no problem then I agree with Brya. If it does maybe some alterations are needed. So the question needed to be answered is can we list scientific names in a way that is clear on the type of name it is under the code. Cheers Scott Thomson (Faendalimas) talk 21:19, 30 January 2017 (UTC)[reply]
Under the ICNafp (algae, fungi, and plants) and ICNP (prokaryotes) it is crystal clear that the combination is the name (Lycopersicon lycopersicum is a scientific name and Solanum lycopersicum is a scientific name). The zoological Code is a little ambiguous (some internal contradictions?), but most provisions also state that the binomen is the scientific name, for example Article 5. Anyway, unless we are going to have separate properties for names, divided per Code, we have to treat all combinations as scientific names, anyway.
        What I said on new properties, is that I do feel we need a new property, for scientific names that cannot be used as the correct name of a taxon. Under every nomenclatural Code there is a hefty percentage of names that are part of the nomenclatural universe, but that when it comes to names for taxa are deadwood. Lots of people go into databases, copy these 'dead' names and promote them to names of taxa: a "miraculous duplication of taxa". We should mark them clearly. - Brya (talk) 05:27, 31 January 2017 (UTC)[reply]
@Faendalimas: Interesting point about "monomonial" names. What I hear is that we could totally reconstruct the binomial name from a monomonial one by following the ranking statements up to the genus. So you would support a monomonial name for genus items and monomonial names for species one ? Maybe indeed that would simplify stuffs. author  TomT0m / talk page 07:25, 31 January 2017 (UTC)[reply]
Apologies @Brya, I read that the wrong way, but my argument is the same in that if it is not clear we need a new property, if it is we do not. @TomT0m, sort of. you could in theory set it up with mononomials down to subgenus, but for species you need the binomen. Reason for this is a species name you can only figure out where it belongs by its inclusion with its genus. Species names on their own do not have to be unique, ie names such as marginata can apply to dozens of species, but all in different genera. You could do this by listing the species in original combination, then linking it to its current genus and form the current binomen from that. Your last caveat would be the principal of coordination where original spelling may not be the same as current spelling due to gender conflict between names. I think that could get a little complicated. So what you need is to be able to identify which names are valid and to be used, and which are not with the option I guess of why. @Brya I was not stating any different about scientific names, I was referring to what a binomen is all these names whether they can be used or not are scientific names. I was referring largely to binomens which are the species group scientific names. I use the ICZN code as an example because I work with it all the time however the arguments can be applied to any code, just the terminology changes. Cheers Scott Thomson (Faendalimas) talk 11:06, 31 January 2017 (UTC)[reply]
@TomT0m: theoretically this is possible (there are databases that work this way, although these are much more limited in scope), but the second part of a binomen/binomial is not unique and not invariant.
@Scott Thomson: unlike Commons and Wikispecies, we do not aim to provide "right" names. We do store references and links to databases. By checking links to databases, it is possible to generate a list of taxa that are recognized by, say, NCBI. Once we have included lots of references, it should become possible, long term, to check these and to be more accurate: we can then create lists based on the literature.
        In the meantime we have names:
  1. that are in use by somebody
  2. that are not in use (although they could be used, if taxonomic insight changes), but that are here for some structural reason
  3. that may never be used, no matter what taxonomic viewpoint is adopted. A lot of these were imported by "enthousiastic" users (but a few of these serve some structural purpose).
Brya (talk) 11:58, 31 January 2017 (UTC)[reply]
@Brya: Unicity is not a problem at all, we can easily build a constraint that checks if the computed binomial is unique. Would be more intensive in computing time but nothing unsolvable. By invariant, it's no problem in Wikidata thanks to claim ranking. author  TomT0m / talk page 14:35, 31 January 2017 (UTC)[reply]
Well, extra computer time seems like a bad thing to me. FWIW, binomials need not be unique, but a binomial need not be computed, as it is not possible to store the two parts apart. I do not see how claim ranking can help with the invariance. - Brya (talk) 17:48, 31 January 2017 (UTC)[reply]
@Brya: Extra computer time is not a big problem compared to maintenance problem, Machines are made to automate things and if it can spare human volunteer work, it's what they are made for. To enter into the equation the computing time has to be extra large, which I don't seem is the case here, it will scale almost linearly with the number of taxons.
  • Apart from that, I'm confused. Need not or cannot ?
  • "I do not see how claim ranking can help with the invariance" => we can store several values and put the most recent one with the preferred rank. We can put the invaled one with the deprecated rank. It's a common problem in Wikidata, nothing specific to taxonomy.
  • It is pointless to compute a result, if one already has that result
  • Yes, but if all these values are stored, there is no longer any conceivable advantage over the current model.
Brya (talk) 18:02, 1 February 2017 (UTC)[reply]
  • It's more robust to change of classifications. If a species change its genus, you just have to change the statement and deprecate the old one, no need to change the label in every language.
  • It's in line with usual Wikidata modelling. No need to create something specific when the generic is enough, it avoids learning the specifics everywhere.
  • "If a species change its genus, you just have to change the statement and deprecate the old one, no need to change the label in every language." Nope, as it is not invariant. "It's more robust to change of classifications." Any method, if done properly will be robust. If not done properly, it will not be robust.
  • The generic is not enough.
Brya (talk) 19:00, 1 February 2017 (UTC)[reply]

TomT0m, I think it would be helpful to withdraw this proposal and to copy the discussion to another location. --Succu (talk) 18:52, 31 January 2017 (UTC)[reply]

I would second @Succu's comment here, I initially supported the concept but the ensuing comments have brought up complexities. In saying that I would like it to move forward. Whether in the end it is deemed necessary or unnecessary is not the point, I think some resolution on this issue is needed and if deemed warranted after some discussion that includes discussion of how to deal with many of the spelling, recombination and other similar issues that scientific names have. Cheers Scott Thomson (Faendalimas) talk 19:26, 31 January 2017 (UTC)[reply]
@Succu: It's a good place to discuss, and the discussion don't seem to be over. author  TomT0m / talk page 14:05, 1 February 2017 (UTC)[reply]
This is not a RfC, TomT0m, and I can't see a real property proposal. --Succu (talk) 21:56, 1 February 2017 (UTC)[reply]
OK, if this helps to close this "proposal":   Oppose --Succu (talk) 19:52, 3 February 2017 (UTC)[reply]