Wikidata talk:WikiProject Chemistry/Archive/2017

Active discussions

Outdated CAS RN

How should we best deal with outdated CAS registry numbers? Example: The CAS RN 99387-89-0 of Triflumizole (Q2519096) is outdated, i.e. SciFinder redirects it to 68694-11-1. --Leyo 21:46, 13 October 2016 (UTC)

Using the rank is one solution but this creates constraint violations. Or delete the old numbers. My preference is to delete the old CAS numbers but this is a problem for future data import if the original data are not corrected. Snipre (talk) 11:21, 14 October 2016 (UTC)
A solution that causes constraint violations is not a good one. If there is really no other option than deletion, we should do this. --Leyo 12:25, 14 October 2016 (UTC)
No, I think the constraint system needs to take ranks into account. --Izno (talk) 19:49, 14 October 2016 (UTC)
How could this be implemented? --Leyo 08:46, 15 October 2016 (UTC)
@Snipre: The problem of outdated identifiers applies to many other identifiers too. At this moment I am not aware s public data sets with outdated identifiers, nor databases that actually publish outdated identifiers. I was thinking the last few weeks of actually doing exactly that. Extract this info from the databases and publish that data on Figshare (Q17013516). And then we have a script (bot?) regularly check Wikidata for outdated values. --Egon Willighagen (talk) 09:51, 1 January 2017 (UTC)
@Leyo, Izno: I started a discussion in Ivan_A._Krestinin talk'spage. He is the guy who take care of the constraint checks. Snipre (talk) 09:21, 24 October 2016 (UTC)

EPA CompTox Dashboard identifiers

Hi all, @ChemConnector:, I have completed a first round of entering DSSTox substance ID (P3117) identifiers using QuickStatements (Q20084080) commands created with Bioclipse (Q1769726) (script:https://gist.github.com/egonw/1102a17fc319d0ac9950a97c3164d305) from CC0 (Q6938433) data on Figshare (Q17013516). I did this in batches of increasing size. No major issues were observed. I did see a number of items without English labels (which I added), and there is an overseeable number violations, some due to problems in the dashboard, rather than Wikidata. Some I already fixed, e.g. by splitting up chiral from achiral entities. There was also an item with two totally different InChIKeys (also fixed). In total, a bit over 36 thousand IDs were added (query). Given there are about 150 thousand chemical compounds in Wikidata with InChIKey (~25%) and about 700 thousand in the dashboard (5%) there is need to regularly rerun this script (which I probably do manually; but feel free to run it yourself :). --Egon Willighagen (talk) 10:16, 1 January 2017 (UTC)

Content validation

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
  Notified participants of WikiProject Chemistry Following up on some of the things I have done in the past year or so, I have picked up something I did about a year ago: validate structural information, though primarily based on consistency. For example, I ran an experiment last night to check if all canonical and isomeric SMILES give the same chemical formula. That is not the case, so I need some guidance where I can best put such output. At this moment, you can find it in this Gist: https://gist.github.com/egonw/dc09c36fe5cb963ceaae8fdad532c665 Yes, it's very consistent at that level! The script (in Bioclipse) also parses all SMILES, which is something I did about a year ago or so too, and 99.999% parses fine, but some do not. That report is currently in this Gist: https://gist.github.com/egonw/b8ca6377d2b85f825474a2866520f189 Should I put such reports on my user page or somewhere under this WikiProject? Further tests I plan to do include comparing the mass info, InChI, and InChIKey. Feedback, feature requests, etc, welcome! --Egon Willighagen (talk) 10:26, 1 January 2017 (UTC)

NCBI hackathon Jan 9-11 with track on further integrating Wikidata and PubChem

At National Center for Biotechnology Information (Q24813517), a hackathon will take place Monday to Wednesday next week, and one of the projects being tackled there is to look at further integration between Wikidata and PubChem (Q278487). Suggestions welcome. --Daniel Mietchen (talk) 20:05, 7 January 2017 (UTC)

@Daniel Mietchen: Some problems I had with PubChem:
  • Some special duplicates like CID 6432049 and CID 9877645: both represent the same molecule but once with an iionic bond and another time with a covalent bond. This just leads to confusion. It should be possible to define some scientific criteria to describe the correct bond of a molecule and to have an unique representation.
  • Then the section "Depositor-Supplied Synonyms" is often a mess where people are adding anything else. First it should be possible to constraint the addition of some identifiers like CAS, EINECS, ChEBI or CHEMBL to only one value per CID. Often people don't check stereoisomery and mix data about different molecules. Something similar to our constraint reports about identifiers with unique and single value properties have to be implemented in order to spot wrong data addition or definition problems. Snipre (talk) 23:29, 7 January 2017 (UTC)
@Daniel Mietchen: A last remark. I think PubChem should differentiate real compounds from mixture of compounds. For example, you can have entries in PubChem about mixture of stereoisomers, but you have no way to determine if an entry is about a fully defined compound from stereoisomery point of view or about a mixture of partially or not defined compounds. In Wikidata we have a way to differentiate this difference by using instance of chemical compound or subclass of chemical compounds. It could be a good improvement to be able to identify this situation clearly in PubChem and even to filter data sets according to this criterion. ChemSpider is providing this kind of information by indicating if the compound is fully defined or not from stereoisomery point of view. Snipre (talk) 01:05, 9 January 2017 (UTC)
Thanks, Snipre — will bring this up. --Daniel Mietchen (talk) 02:36, 9 January 2017 (UTC)
@Snipre, Daniel Mietchen: Snipre, I agree to most of your comments, especially the part of names/labels. Here, PubChem RDF does quite a nice job by mapping to the chemoinformatics ontology. Regarding defined stereochenters, this info is usually also available via PubChem, you just need to scroll down to the table of compound properties. This data is also available through the APIs. Sebotic (talk) 19:09, 10 January 2017 (UTC)
@Daniel Mietchen: Anything that's food or cosmetic related in the NBCI DBs will be of interest for Open Food Facts (http://world.openfoodfacts.org) and Open Beauty Facts (http://world.openbeautyfacts.org). --Teolemon (talk) 07:35, 9 January 2017 (UTC)

Mapping to and from the English Wikipedia

Over at enwp's WikiProject Chemicals, there is an ongoing discussion about how to map and reconcile WP and WD info. Bonnie and Clyde issues keep popping up, particularly in relation to steroisomers, mixtures and salts (e.g. search for "cis-(+)-vernolic acid or "cis-(-)-vernolic acid"). --Daniel Mietchen (talk) 10:09, 3 February 2017 (UTC)

Comparison of Wikidata and Wikipedia content

I just wanted to post that here as well: I did a comparison of Wikidata chemical compound items and their corresponding English Wikipedia chemboxes and drugboxes. Please find the updated results here and engage in curation. Sebotic (talk) 01:44, 7 February 2017 (UTC) Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
  Notified participants of WikiProject Chemistry

I started to correct category C. Snipre (talk) 10:00, 7 February 2017 (UTC)

Maintenance categories

I am working inside en:Template:Chembox and en:Template:Drugbox (17k transc's). -DePiep (talk) 01:55, 7 February 2017 (UTC)

Elements and periods

According to this query only magnesium (Q660) appears to be using part of (P361) period 3 (Q211331) while other elements use subclass of (P279) period 3 (Q211331). It looks like part of (P361) should be used according to Wikidata:WikiProject Chemistry/Tools? --Ricordisamoa 16:12, 12 January 2017 (UTC)

These elements are subclass "period 3 element" and part of "period 3". So the English labels would indicate that the correct relation is "subclass of". --Izno (talk) 17:01, 12 January 2017 (UTC)
While French labels indicate that the correct relation is "part of". --Infovarius (talk) 12:48, 13 January 2017 (UTC)
Hence why I qualified with the language. There seems to be about a 50-50 split in the item in question. :D --Izno (talk) 14:40, 13 January 2017 (UTC)
The problem is the definition of the item Q211331:
* If Q211331 is an instance of period, then the correct label is Period 3 and magnesium (Q660) is part of Q211331
* If Q211331 is a subclass of element, then the correct label is Period 3 element and magnesium (Q660) is an instance of Q211331.
So the question is: What is the concept of Q211331 ? Currently we have two concepts mix in one item and this is the origin of the problem.
If we choose Period 3 element, then we have a problem because we don't have an item for the concept Period 3 of the periodic table. I find really strange to focus on Period 3 element concept when we didn't know what is the Period 3.
By definition, Period 3 element is an element of the 3rd period. So what is the 3rd period of the periodic table ? Having Period 3 element implies to have two other concepts, element and Period 3. Do we really need to have 3 concepts when only two are sufficient ? This reminds me an old categorization problem in the French WP. If I have a category for biologist and another for American citizen, do I need a third category for American biologist ? And if I have a man which is biologist and American and died in 1970... ? Snipre (talk) 16:36, 13 January 2017 (UTC)

toollabs:ptable is broken again after this edit. Please tell me the query I should use in the periodic table. --Ricordisamoa 18:19, 29 January 2017 (UTC)

@Ricordisamoa: An item can't be as the same time an instance of period 1 element and a subclass of period 1 element. I just deleted one of the two statements and I took the second in the list of statements because the first one is often more correct to represent the concept.
My problem with the concept of subclass of chemical element is that I don't know what is an instance of chemical element. An isolated atom of hydrogen is not an instance of chemical element. Again if I take the definition of IUPAC for chemical element I have 2 definition but both of them are a specie or a chemical substance, not a group of several species or a group of several chemical substances. Snipre (talk) 20:51, 29 January 2017 (UTC)
@ArthurPSmith: I ping here because of your reverts of my previous deletions in hydrogen (Q556): I always try to consider people as smart but when they act as stupid boy, it is difficult to keep quiet. Your rever my deletion saying "used by wikidata periodic table - every other element has these subclass statements" but did you take a small look at the item itself ? Can you explain how a chemical element can be at the same time an instance of group 1 element and a subclass of group 1 element ?
If some relations are non sense, there is no reason to keep them even if someone outside WD is using them. So please do the work correctly and if you assume that hydrogen (Q556) is a subclass of group 1 element then delete the statment saying that it is an instance. Just to be clear, the stupid action is not reverting my deletion, it is to create non-sense when a logical situation is established. A logic can be changed by another logic but not by non sense. Snipre (talk) 20:58, 10 February 2017 (UTC)
@Ricordisamoa, Snipre: are you planning to "fix" every other element this way? I'm sure the app can be adjusted but let's make these changes in a methodical and consistent way, not haphazard one at a time. Yes, the instance and subclass statements are seemingly inconsistent here. I've argued that myself. However, in general the "subclass" vs "instance" status of the elements is complicated by the fact that we think of them in several different ways (and different languages apparently have slightly different connotations for the term "element" according to my previous discussions on this with TomT0m). Hydrogen as an element to me represents all the different forms of hydrogen - the different isotopes, the molecule and the atom, the element as a portion of the chemical formulas of other molecules, etc. It is in a very real sense a "class" of entities - and in a way even "hydrogen atom" is a class of all the possible instances of a hydrogen atom. So the questions are a little subtle. I'm all for good arguments, but let's discuss before making random changes like this. ArthurPSmith (talk) 21:12, 10 February 2017 (UTC)
Snipre, I didn't realize you had previously been discussing this issue with Ricordisamoa here until after my above comment. However, I stand by the fact that at the least all the elements should be consistent on this. I think you have a good argument that the relationship should be "instance of" - so if we are all agreed let's make changes in the following order:
  • copy all subclass of (P279) "group X element" and "period Y element" statements to instance of (P31)
  • update wikidata periodic table (and anything else that may be dependent on this?) to use the "instance of" relationships
  • then remove all the subclass of (P279) statements.
I'm willing to help, or perhaps we can get a bot to do this? ArthurPSmith (talk) 21:20, 10 February 2017 (UTC)
I have no strong opinion on which property to use, but I'm sure there's no point in editing one piece of data at a time. Structuring Wikidata items in a predictable way is required in order to allow their effective reuse by outside consumers. Of course the periodic table is not a vital tool, so if you agree on a model and aren't afraid of breaking other uses, I shall be able to bot the things and update the tool within a few minutes. But please make sure to update all the relevant WikiProject pages. --Ricordisamoa 06:21, 11 February 2017 (UTC)
@Ricordisamoa, ArthurPSmith: I don't want to spent another time to discuss about instance/subclass because this is useless until we have a discussion at the level of the community. Until that time we can use what we want but we have to be coherent: so subclass or instance not both. @Ricordisamoa. If you just come and take the data you want without being involved in the structure work of WD you will change your code again and again. Ontology building should be done based on rules and WD doesn't even think about what kind of rules we have to choose. If you want to built strong code you have to ask from WD an imperative policy. I am not s specialis but I didn't find anyone able to provide a clear description of the distinction and about the consequences of a choice.
Even if you decide to keep subclass of group 1 element and delete the instance of group 1 element, you will have a problem later when someone will try to solve the problem between subclass of group 1 element and instance of chemical element. And this difference is present in all elements I think. Snipre (talk) 21:27, 12 February 2017 (UTC)
@Snipre: I'm not sure what more of a community discussion you want or expect other than the one we are having right now. The other option which I think you suggested above was adjusting the meaning of "group 1 element" to be just "group 1" (as it is in some languages?) and using a part of (P361) statement. That would be a fine solution too - I know with human (Q5) the community came to a consensus that that should be the only instance of (P31) statement on an item, and any other aspects should be covered by separate properties, not instance of (P31). Do you have a strong preference? Should we ping this wiki project to get more discussion on this? ArthurPSmith (talk) 16:56, 13 February 2017 (UTC)
  • Oops only just saw this discussion, while I made some edits in this...
This problem stems from the enwiki article misnaming group 1 element, which is about group 1 (I tried a rename there years ago). Now the en:article has to open with a construct like "A Group 1 element is an element that belongs to group 1", how awkward and circular. Tellingly, at en:wiki there is no separate en:itemarticle for the class of "[Periodic table] group 1" -edited:- added #1. Quite simple, "group 1" is a class elements belong to. Sure then that element is a 'group 1 member', but that does not make 'group 1 element' a class (It is a reverse listing). (In analogy, the "CF Barcelona team" is not the same as "CF Barclona team members"). So: and element is part of a group. -DePiep (talk) 18:59, 23 February 2017 (UTC)
-edited- for clarity. -DePiep (talk) 02:42, 25 February 2017 (UTC)

A proposition

I'm currently trying to use Wikidata from elements item, and I'd like to help on this, I'm a high school french chemical teacher. I'd really like to have feedbacks on the way to organize this data. Personally, I see this organization:

  • helium element (Q560) is an instance of chemical element(Q11344), is part of group 18(Q19609) and part of period 1(Q191936).
  • group 18 is an instance of group(Q83306) and also an instance of main group(Q428830)
  • period 1 is an instance of period(Q101843)

Also, properties series_ordinal, follows and followed by would be added as qualifiers to the chemical element statement only.Benjaminabel (talk) 15:32, 4 March 2017 (UTC)

Sorry for my ignorance about Wikidata, I was in a OOP vision of instance. If I understand well the notion of instance in wikidata, the instance is something that really exists, and a concept is a class . So a chemical element and even hydrogen or helium are classes and not instances. Is there any place(maybe a github repo) where we could define some kind of rules to classify this data. Benjaminabel (talk) 08:29, 6 March 2017 (UTC)

@Benjaminabel: The group of all atoms of helium defined as chemical element exists. The problem of your definition is the following: if helium as chemical element is a class, can you provide an example of instance of that class ? An atom or an isotope of helium are not an instance of chemical element. So as helium as chemical element seems to be the latest level of classification, it should be an instance.
For your question, no, WD doesn't provide a clear definition of instance/class. There is a proposition but this was never accepted as general rule as very few people can handle that kind of definitions and their consequences on the general classification in WD. Snipre (talk) 10:47, 6 March 2017 (UTC)
@Snipre: Thanks for your answer, I think the key difference between chemical element and atoms or isotopes is scale. Isotopes and molecules exists at the microscopic scale, while chemical element exists(could be viewed as an instance) at macroscopic scale. In France there is a distinction between chemical entity(entité chimique) at microscopic scale and chemical species(espèce chimique) at macroscopic scale. Most chemical experiments are done at a macroscopic point of view, and the heterogeneity of our substances at the microscopic scale is hidden behind the chemical element notion. Could we provide this kind of distinction? For example, at microscopic scale hydrogen atom is a class with instances isotopes protium, deuterium ... , while at the macroscopic scale it is an instance that could be part of molecules like water. But how to treat the same item differently at different scales?Benjaminabel (talk) 22:01, 6 March 2017 (UTC)
@Benjaminabel, Snipre: it sounds like the real solution here is to have two distinct items, one for the "microscopic" (atom/molecule) and one for the "macroscopic" (substance - may be gas, liquid, solid, etc) - but we also need a good relation between them which I think requires a new property. ArthurPSmith (talk) 16:42, 7 March 2017 (UTC)
Pardon my ignorance, but how do we currently distinguish between (1) an element, (2) a subclass of atom defined by having the atomic number of that element, and (3) a substance made up of possibly-many such atoms from that subclass (just atoms of that element)? An instance of this substance is a *particular* physical object or lump of the substance, right? DavRosen (talk) 18:35, 7 March 2017 (UTC)
@DavRosen: I would think your (1) and (2) are the same? That's what "element" means to me at least. And yes, an instance of the substance would be a particular lump of the substance. For example if one wanted to talk about the supposed metallic hydrogen sample that recently disappeared, that would be an instance of the substance hydrogen (or perhaps a subclass, "metallic hydrogen", which may or may not really exist). ArthurPSmith (talk) 19:58, 7 March 2017 (UTC)
In fact, looking at hydrogen as a substance, we do have metallic hydrogen (Q428895) and dihydrogen (Q3027893), while for hydrogen as an atom or ion we have hydrogen atom (Q6643508), protium (Q15406064), hydron (Q506710), proton (Q2294) and the generic hydrogen (Q556) (not to mention deuterium (Q102296) and tritium (Q54389) which belong to the class isotope of hydrogen (Q466603)). I'm not sure this arrangement is entirely logical, but if there's anything missing I think there's a definite lack of general class for "hydrogen as a substance" that the first two could be subclasses of. ArthurPSmith (talk) 20:07, 7 March 2017 (UTC)
@ArthurPSmith: One thing that seems strange to me is that "hydrogen atom" (for example) does not inherit (and does not have) any of "hydrogen"'s properties like atomic number, electronegativity, antiparticle, oxidation states, etc. The only property connecting these two items (classes) is "manifestation of", which does not necessarily imply very much at all about the hydrogen atom, unless you know more specifically what is meant by "manifestation of". Shouldn't "hydrogen atom" be a subclass of "hydrogen"? If I have an individual hydrogen atom (instance of hydrogen atom), could we not also consider it to be an instance of "hydrogen"? If so then "hydrogen atom" should be a subclass of "hydrogen", and it would inherit all of those interesting properties, right? (apologies for lack of links -- I find it difficult to compose wikitext source with all those opaque { { Q: } } ) DavRosen (talk) 23:34, 7 March 2017 (UTC)
@ArthurPSmith, DavRosen, Benjaminabel: Please be careful with concept behind the different items: for example in the case of hydrogen atom (Q6643508), the real correct description should be "mathematical model of hydrogen atom" (see the WP articles to understand the concept of this item). This is typically the problem of WD when people use an item as a different concept from the initial concept.
So before any modifications of the instance/sublass properties and of the relation between items, we have to define more clearly the concept of each item using a clearer description. That's the first point.
The problem we have is that WD was not built based on a structured classification, but mainly by creation of multiple concepts and now we are trying to link them together in a logical way. But this doesn't mean we have to respect the initial concepts and we can delete some of them.
For example do we need to have some items like isotope of hydrogen (Q466603) ? For me this item is redundant because with the instance/sublass structure I can avoid it.
Classification 1
protium (Q15406064) is subclass of isotope of hydrogen (Q466603)
deuterium (Q102296) is subclass of isotope of hydrogen (Q466603)
tritium (Q54389) is subclass of isotope of hydrogen (Q466603)
isotope of hydrogen (Q466603) is subclass of hydrogen (Q556) and of isotope (Q25276)
Classification 2
protium (Q15406064) is subclass of hydrogen (Q556) and of isotope (Q25276)
deuterium (Q102296) is subclass of hydrogen (Q556) and of isotope (Q25276)
tritium (Q54389) is subclass of hydrogen (Q556) and of isotope (Q25276)
Wikidata was collecting items for interwikis purpose from different WPs having different structure, but the WD classification doesn't have to follow this unstructured classification and we should think from the scratch or at least to feel free to delete or to neglect some items in our classification. If we choose classification 2 for example, we can set isotope of hydrogen (Q466603) as instance of interwiki and avoid to use it in our classification.
Last point, do you know existing ontologie about chemistry or chemicals ? Perhaps before starting the huge task of creating something new can we use an existing ontology or can we find inspiration from something existing. Snipre (talk) 11:03, 8 March 2017 (UTC)
The only example I found is this ontology and perhaps we can find something in that paper but I don't have an access. Snipre (talk) 12:31, 8 March 2017 (UTC)
@Snipre: If hydrogen atom (Q6643508) really means "mathematical model of hydrogen atom", is it a class (in which case what are its instances?) or an individual? What exactly is an actual individual hydrogen atom (a particular one that I'm "pointing to" right now in front of me) an instance of? Can it still be an instance of hydrogen atom (Q6643508)? If not, then are some of these superclasses also mathematical models having no concrete instances in the physical universe?
  • Atom -- smallest indivisible unit of a chemical substance
  • Molecular entity -- any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity
  • Massive quantum particle -- quantum-mechanical particle (elementary or composite) having real positive rest mass
  • Quantum particle -- quantum mechanical particle in nuclear, atomic, and particle physics; often subatomic; composed of elementary particle(s)
  • Quantum -- the minimum amount of any physical entity involved in an interaction
  • Particle -- small localized object in physical sciences
  • Physical object -- singular aggregation of substance(s) such as matter or radiation, with overall properties such as mass, position or momentum
In any case, what's wrong with hydrogen atom (Q6643508) being a subclass of hydrogen (Q556) so that it will indeed have the properties of the chemical element called hydrogen? Why do we need a parallel set of chemical element items side-by-side with an atom item (or mathematical models thereof) corresponding to each of them , with almost no clear relationship the two sets of items being represented?
DavRosen (talk) 13:53, 8 March 2017 (UTC)
@Snipre: I agree with you, we need to define clearly the WD items labels and define their links with an ontology. We could create a new page for this, in which in a first time we list the classes that belongs to chemistry. Currently if we query for items studied by chemistry we get only 5 items of which 1 should be merged/deleted:chemical system (Q28843570), chemical system (Q28843570), chemical compound (Q11173), molecule (Q11369). We could try to extend it with chemical element and any other classes necessary to build a minimal ontology that we could extend later. Benjaminabel (talk) 20:13, 8 March 2017 (UTC)

Adding P143-sourced data

Week ago I noticed that user:Ghuron is adding chemical data from ru.wiki (using imported from Wikimedia project (P143)). I told him that in my opinion this action should be discontinued, as the P143-sourced statements are practically unsourced, cannot be reused by many wikis. This short discussion is here. Also, someone will have to clean up all this data in the future (and I think this will be done by deleting most of this data). Ghuron stated that he's not aware of any consensus about adding unsourced data, so I'm raising this issue here. Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
  Notified participants of WikiProject Chemistry. Wostr (talk) 18:36, 23 February 2017 (UTC)

Could you please provide/link a few examples? --Leyo 22:20, 23 February 2017 (UTC)
@Leyo: [1], [2], the rest is here: [3], [4], and [5]. The first example with urea is not only unsourced but also wrong: urea don't have boiling point under normal conditions – it is decomposing when heated above melting point. Wostr (talk) 09:33, 24 February 2017 (UTC)
Let me clarify a few points:
  1. Personally I'm not interested in adding chemical data by myself. I would estimate my pesonal input here to ~100 statements. But I do want to give ruwiki-community tool, that would enable exporting any data from infoboxes to wd (including chemical data)
  2. If infobox value contains a link to source and/or qualifiers such as Property:P2076 and Property:P2077, the tool will try to export them as well (see [6], [7])
  3. I do not understand how issue with maintanance categories in de-wiki makes it impossible to distinguish between unsourced and sourced data so local infoboxes will use only the later
  4. I do not see how this discussion is relevant here. The question is not about how to source statements or that sourced statements is better than unsourced. The question is about whenever WD will accept unsourced statements. Looking into [8] I can see that people are still using petscan, quickstatment and harvest template to create unsourced statements as we speak. If there is global consensus that is not tolerated, we certainly have a problem here.
  5. Despite my own opinion that mass exports from large wp projects are generally healthy for wikidata at this point, I would definetly respect project-level consensus. You are the ones who is working on that data on the day-to-day basis, so your opinion matters more than anyone else. If you believe that concrete list of properties should not be created unsourced, I can "blacklist" them at ru-wiki. --Ghuron (talk) 09:00, 24 February 2017 (UTC)
  • I would just note that this RFC basically on referencing: Wikidata:Requests for comment/Verifiability and living persons seemed to conclude we are not yet ready at wikidata to enforce strict policies of requiring sources even for BLP data, so I think the same applies to chemistry data also. Sources are nice, and please don't remove well-sourced data. But if wikidata is missing information that can be added in a reasonably reliable from some of the language wiki's, I recommend we allow it - hopefully the mistakes as pointed out above will be rare and can be fixed along the way. ArthurPSmith (talk) 16:27, 24 February 2017 (UTC)
    • @ArthurPSmith: I have about 5-year experience in verification of this "reasonably reliable" data in pl.wiki. Since 2012 I added sources to ~3000 single values in chemboxes (and that's about 60% of all unsourced chem data in pl.wiki infoboxes). This data cannot be trusted, especially when it comes from wikis, where sources are not mandatory (that was the case of pl.wiki, where sources were mandatory only for controversial informations [and chem data was not considered 'controversial', so it was freely copied from, unfortunataly, en.wiki] and only few years ago there was a change in policies; this is the case of ru.wiki as I can understand). Such unsourced data are sometimes correct, but it can be original research or jokes as well. While it is kept in the local wiki, then it's the problem of that wiki. Importing this to WD makes it not only a problem for WD users but also for any other wiki that would like to reuse this data. As for now this data is not reused by other wikis, but it will change. What's more, almost every day there are complaints in pl.wiki about poor quality of WD data (and that's always problem with P143-sourced data; and at this moment we are reusing only simple data like pictures in biographies, Commons categories etc.). As for now we have some sourced chem data in WD (e.g. from CDC databse) and I think there will be more in the future from different sources. Allowing unsourced data for compounds is a step backward in my opinion and as I stated in Ghuron's discussion – no information is better than unsourced information. Wostr (talk) 21:27, 24 February 2017 (UTC)
      • Let's consider the following scenario: we'll have top-10 major wp projects actively populate chemical data into infoboxes and export "missing" data to wd. Yes, there will be a lot of more mistakes in unsourced statements. But those mistakes will be observable not only by 20 members of this projects, but hundreds and thousands of people in local wikipedias. Those mistakes will be noticed and since majority of wp-editors do not want to be engaged in WD edit wars, the easiest way to win would be to correct mistake in wd AND PROVIDE SOURCE there. And this is not purely hypothetical scenario, I can see this happening with birth/death dates/places for people. 2 years ago there was a lot of complains in ru-wiki, that bio-data in wd is unreliable, now in 2 of 3 discipancy wd wins over ru-wiki. --Ghuron (talk) 07:18, 25 February 2017 (UTC)
We can't accept unsourced data (incl. imported from: Russian Wikipedia etc.) for physicochemical property data such as melting point, vapor pressure etc. It's way better not to have any value than incorrect data.
Importing identifiers or easily verifiable data such as molecular formulae is less sensitive. --Leyo 08:44, 27 February 2017 (UTC)
melting point (P2101), boiling point (P2102), sublimation temperature (P2113), decomposition point (P2107), flash point (P2128), standard enthalpy of formation (P3078), enthalpy of vaporization (P2116), thermal conductivity (P2068), vapor pressure (P2119), autoignition temperature (P2199), lower flammable limit (P2202)? --Ghuron (talk) 13:32, 27 February 2017 (UTC)
Yes, and probably a few more. In principle any property that needs to be determined experimentally, i.e. cannot be deduced from the structure alone (molecular mass, molecular formula, SMILES, InChi, InChIKey etc.). --Leyo 08:31, 28 February 2017 (UTC)
I think I understand your idea, but I'm not sure I'm qualified to come up with complete list of such properties. And without that list I cannot "blacklist" things in code. Can you help me, please? --Ghuron (talk) 09:18, 28 February 2017 (UTC)

Same problem with @Mikey641:

User:Mikey641 and its bot Mikey641bot is importing NFPA data from WP:en. We really need a discussion before any large import in this project. Snipre (talk) 23:56, 28 February 2017 (UTC)

Same as above. I thought that every bot task has to be accepted separately (and I don't see anything here), but apparently I was wrong. Wostr (talk) 22:36, 1 March 2017 (UTC)
@Wostr: you are not wrong. However, it looks like Mikey641 has stopped the bot work in this case. If you think further action is required it could be brought up with the administrators. ArthurPSmith (talk) 16:42, 2 March 2017 (UTC)
@ArthurPSmith: Okay, thanks. But I don't think that administrative actions are needed here, because the import has been stopped. Wostr (talk) 21:16, 2 March 2017 (UTC)
You are aware that Widar is a tool that can be used even if you don't have a bot flag. The difference is that it doesn't overload the recent changes and it's faster.--Mikey641 (talk) 08:04, 8 March 2017 (UTC)
@Mikey641: Yes, it should not be open to every new user.--Kopiersperre (talk) 10:25, 8 March 2017 (UTC)

SMILES for radicals?

I ran into the situation that radicals in Wikidata end up having the same canonical SMILES (P233) or isomeric SMILES (P2017). This is because SMILES does not handle radicals well, tho if crafted carefully, it could be derived. The CXSMILES extension of SMILES, however, does a better job. But is CXSMILES acceptable for canonical SMILES (P233) and isomeric SMILES (P2017)? Or should we disallow canonical SMILES (P233) and isomeric SMILES (P2017) for entities of type radical (Q185056)? Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
  Notified participants of WikiProject Chemistry --Egon Willighagen (talk) 07:15, 8 April 2017 (UTC)

From my opinion the radical "formula" shouldn't be defined as canonical SMILES. But I can't say if we can use the isomeric SMILES. Perhaps is a good idea first to see what the other databases are doing in order to have a common way to treat this feature. Snipre (talk) 19:01, 9 April 2017 (UTC)

Standard atomic weight of the elements

Sigh. The en:Standard atomic weight is defined and published by the CIAAW (an IUPAC commission) for 84 chemical elements. All Wikidata has to do is: copy these values from their website. Unfortunately, Wikidata reads secondary (tertiary?) values by PubChem for values. Example of bad: silver (Q1090). Meanwhile, I tried to have this Wikidata:Property proposal/standard atomic weight accepted—to no avail. Also disturbing is that Wikidata can not accept an interval for a value???, and that a Reader (or infobox) can not pre-state exactly which mass they want. -DePiep (talk) 18:50, 9 April 2017 (UTC)

@DePiep: You can be more constructive by providing the data source instead of just giving the name of an unknown commission: we have the habit to use books or articles and in your case this will help a lot of persons if you can provide a reference in order to help people to understand what you are speaking about.
Currently we have in WD one of the reports of the CIAAW (see Atomic weights of the elements 2009 (IUPAC Technical Report) (Q13422885) and the online access here) and even if this is not the last one I propose to use this one as reference for the discussion.
So from the document we can see that some standard atomic masses are described by interval and some not. So as we can only one datatype to represent this concept we can't ask for a new datatype "interval". But the current quantity datatype is able to handle interval as uncertainty.
For exemple if we take hydrogen we can see from table 5 in the 2009 report that the interval is [1.0078; 1.0082]. But from table 6 we can define an average value 1.008. So in WD we can write value = 1.008, lower value = 1.0078 and upper value = 1.0082. This is one possibility to mix interval data in our quantity datatype.
The other possibility is to forget the interval data and just use the average value provided by the report: as engineer, I never use an interval to calculate the molar quantity of a compound but always an average value. Even the CIAAW recognized that reality and provided a set of average values to replace the intervals.
WD will never be able to modelize the complexity of the real world: this is not possible and this is not its goal to became THE unique database of the worl. But only to propose a good value for most of the general concepts and when somebody wants to have a more accurate value, he can use the more specialized databases or references. Snipre (talk) 07:48, 10 April 2017 (UTC)
And last thing: data display or data selection in WP in not under the responsibility of WD. You can always create a template in WP which select data according to user preferences. But this is something to develop. Snipre (talk) 07:51, 10 April 2017 (UTC)

Is there an interest in a large load of chemical structures?

We have recently released our data from the CompTox Chemistry Dashboard as a public CC-BY dataset. I have loaded the dataset to Figshare at https://doi.org/10.6084/m9.figshare.4836413.v1. This file includes a large number of CAS numbers, preferred and trivial names and the appropriate DTXSIDs that will link to the chemistry dashboard. It may be a good dataset to expand the chemical wikidata. I cannot guarantee that all chemical structure-name-CASRN mappings are perfect but those of you working with the challenges of data quality in chemistry databases will already know that! --Antony Williams 01:40, 11 April 2017 (UTC)

@ChemConnector: Thank you very much for your contribution. One thing: it would be nice in the future if you could associate to each entry a structural identifier like InChI or SMILES. Names and CAS numbers are not absolute identifiers. Snipre (talk) 08:07, 11 April 2017 (UTC)
@Snipre: Those can be found in column H-J. If we agree we want all of them, or a subset in, I can create QuickStatements to add them to Wikidata. --Egon Willighagen (talk) 10:32, 11 April 2017 (UTC)
@ChemConnector: Sorry I didn't have a deep look at the file.
@Egon Willighagen: Perhaps we can first perform a pre-check of the data with the current data from WD before doing the importation. We should import data only if the CAS number and the InChI match with the proposed data then for the rest we should create sets of data presenting contradictions or missing data for comparison in order to analyze data deeper before importation. We have to perform preprocessing analysis before any importation, WD is not the trash bin of the web collecting everything without a minimal curation. So I propose first to define the rules of importation for this new set of data
Case CAS number in WD InChIKey in WD CAS number in data set InChIKey in data set CAS number match InChIKey match Action
Case 1 Yes Yes Yes Yes Yes Yes Data import in WD
Case 2 Yes Yes Yes Yes Yes No ?
Case 3 Yes Yes Yes Yes No Yes ?
Case 4 Yes No Yes Yes Yes - ?
Case 5 No Yes Yes Yes - Yes ?
Case 6 Yes Yes Yes No Yes - ?
Case 7 Yes Yes No Yes - Yes ?
Others cases .. .. .. .. .. .. Data have to be checked before import
As CAS number is less reliable than InChIKey I proposed to import without check only cases where at least InChIKey match. Cases where CAS numbers match but no InChIKey match can be defined because of missing data or InChIKey are not the same have to be analyzed deeper before any importation.
We can put all conflicting data in some subpages for further analysis like in this example. Snipre (talk) 12:31, 11 April 2017 (UTC)
I also do a scan based on the InChIKey and added CompTox IDs for exact matches. I don't trust the CAS for that. I reported that in this blog post: http://chem-bla-ics.blogspot.nl/2017/01/epa-comptox-dashboard-ids-in-wikidata.html I read the proposal from ChemConnector as adding chemical to Wikidata for which there is not InChIKey in Wikidata. I will read your comments and table asap! --Egon Willighagen (talk) 15:57, 11 April 2017 (UTC)
@Egon Willighagen: So you already did most of the work. My proposition is now to go to the next step: to work on the cases where one identifier is missing or in conflict when the second one is matching. The idea is to improve WD or the data set by finding missing data which are available in other databases or correcting data when one database propose a wrong identifier. Snipre (talk) 19:47, 11 April 2017 (UTC)
@Snipre: Well, 'most' of the work, is about 36 thousand links back to the Dashboard, out of 700 thousand. For matching based on the CAS registry number, I would recommend Magnus' Mix&Match, as I really like to see manual curation of that (CAS numbers can be wrong on both sides). I can make a query to look for CAS number mismatches, for which the InChI matches. What do you recommend on how to put this in a subpage? I don't have experience with that. --Egon Willighagen (talk) 16:15, 13 April 2017 (UTC)
@Egon Willighagen, ChemConnector: A last point: CompTox Chemistry Dashboard data is released under CC-BY licence, WD uses CC0 licence. This will a problem later when someone will use CompTox Chemistry Dashboard data from WD without mentioning the original source, requirement of the CC-BY. Snipre (talk) 21:00, 11 April 2017 (UTC)
Yes, I agree with that observation. @ChemConnector:, the previous ID<>InChIKey mappings were available as CCZero, but I cannot automate using CC-BY data for inclusion in Wikidata because of the CC-BY being to restrictive. (With the previous DTXSIDs I gave the attribution anyway, as that is a clear expectation of Wikidata). I can get a lot done if only the SMILES and DTXSIDs are CCZero. --Egon Willighagen (talk) 13:48, 13 April 2017 (UTC)

Rename alkali metals (Q19557)

Currently alkali metals (Q19557) is defined as Group 1 but most of the concept used in this item is related to alkali metals. But group 1 and alcali metals are different due to hydrogen: hydrogen is part of group 1 but not part of alcali metals. So a new item is necessary. One proposition is to create a new item for group 1 with renaming the current item to alkali metal or to move all data related to alkali metal to a new item. What is the best choice ? @Aleks-ger: Snipre (talk) 21:18, 11 April 2017 (UTC)

In the item above, the name in the few remaining languages should be corrected to alkali metals etc. --Leyo 07:22, 12 April 2017 (UTC)
@Leyo: Did you create a new item for the group 1 concept including alkali metals and hydrogen ? Snipre (talk) 10:03, 12 April 2017 (UTC)
New item for group 1 :Q29366681 Snipre (talk) 06:01, 13 April 2017 (UTC)
@Ricordisamoa: we need to update the periodic table app for this change! ArthurPSmith (talk) 12:47, 13 April 2017 (UTC)
@ArthurPSmith: Either subclass of (P279) Q29366681 is added to other elements in alkali metals (Q19557) as well, or the way the app works will have to be tweaked. --Ricordisamoa 09:07, 17 April 2017 (UTC)
Ah, I didn't notice that problem. And if you look we have a problem with Ag and In now too, I think I've fixed In, no idea what the problem is with Ag. I've added the new group 1 classes to the others so it should work now. ArthurPSmith (talk) 13:43, 17 April 2017 (UTC)
See gerrit:348696 --Ricordisamoa 09:17, 20 April 2017 (UTC)
thanks for the poke! Your changes look fine. I still don't understand what's up with silver though - everything looks fine on the wikidata end but somehow the group and period are not getting through to the app??? ArthurPSmith (talk) 16:18, 20 April 2017 (UTC)
Thank you for the notice. Deployed with restart, now even silver is looking fine. --Ricordisamoa 11:04, 21 April 2017 (UTC)
I made an edit on Ag yesterday - one of the subclass statements had "preferred" rank and I restored it to "normal", that seems to have resolved this? Anyway something to watch out for... ArthurPSmith (talk) 14:37, 21 April 2017 (UTC)

Help needed sorting out food additives

Hi, On something as paramount as food additives (on which I've already done some work in the past year), we still can't output a full and reliable list, with special cases like E905c, E304ii… properly handled. I haven't been able to find a massively multilingual file with translations of additives (Arabic, Japanese, Chinese…). The only glimmer of hope is for European languages (I've done the import in Open Food Facts) Otherwise, the planet seems to be void of any reliable translation for such fundamental items. Am I missing something ? Has anyone a file or something to sort this ? Am I condemned to slowly fix this, or is there a way to massively overhaul the situation ? --Teolemon (talk) 16:19, 21 April 2017 (UTC)

https://en.wiki.openfoodfacts.org/Global_additives_taxonomy/Europe
Ok, we crafted this file for European languages. Straight from the EU translation memory. https://openfoodfacts.slack.com/files/teolemon/F02T2ULBW/32012r0231.txt --Teolemon (talk) 15:09, 23 April 2017 (UTC)
Names are unreliable. That's why we use identifier like E numbers. The only chance is to contact the Chemistry Wikiprojects on the different WPs and see with them if they can provide you a list of E numbers with the corresponding names in the local language. Snipre (talk) 21:56, 23 April 2017 (UTC)
That's what must be done. I guess there's no way to mass contact all Chemistry Wikiprojects ? There is simply no multilingual list of additives on the planet. Food and Cosmetic makers hide the additives by using synonyms instead of E/INS numbers--Teolemon (talk) 06:39, 25 April 2017 (UTC)
Teolemon E numbers are officially mandatory only in EU and Switzerland. So you can have official translations only for the languages used in Europe. E numbers are not used in other countries or not mandatory so translations are not official. Have a look at the Codex Alimentarius ans especially at its publications: there is one document Food Labelling - Complete Texts which is in Chinese, Russian and Arabic and perhaps there is something about E numbers. I can't download the document so I can't confirm you what is in that document. Snipre (talk) 09:44, 25 April 2017 (UTC)

Calcium sulphate

It seems like the item for calcium sulfate (Q407258) was incorrectly edited by a bot back in October, as a result of which many of the statements were changed so that they now apply to calcium sulfate dihydrate (instead of calcium sulfate). Could somebody please clean up the item and possibly create a new one for calcium sulfate dihydrate? Thanks, Einstein2 (talk) 21:14, 3 June 2017 (UTC)

Wow, that's quite an edit. Have you tried contacting ProteinBoxBot about the problem? ArthurPSmith (talk) 15:26, 5 June 2017 (UTC)
No several possibilities: we have to use the last version before the bot edit and create a new item for calcium sulfate dehydrate. All edits since the edit of ProteinBoxBot are suspicious (meaning they are probably related to calcium sulfate dehydrate) and unless someone take the responsibility of checking all statements I proposed to delete everything since October 2016. Snipre (talk) 16:10, 5 June 2017 (UTC)
Ok, I restored as of October before ProteinBoxBot, and created calcium sulfate dihydrate (Q30135771) using what that bot added. I've re-added some of the subsequent edits that seemed to belong on Q407258 but somebody else should maybe review... ArthurPSmith (talk) 19:39, 5 June 2017 (UTC)
Thank you. I tried to recover all the statements and descriptions added since October. – Einstein2 (talk) 21:35, 5 June 2017 (UTC)
This happened because the item had the wrong PubChem CID, the wrong InChI key and what not, so this is one of the item with an identifier mess imported from Wikipedia. My bot makes a decision, if 2/3 of identifiers match a certain compound, then the WD item is about this compound. In rare cases where the identifier mess is really big, this can lead to a situation like described above. There are then 2 options, either correct all wrong identifiers or revert to an older version. Sebotic (talk) 18:03, 6 June 2017 (UTC)
I think the PubChem CID is right - but they seem to have the same ID for both compounds - from the link it says "CaSO4· nH2O (n = 0 or 2)". I couldn't tell on the InChI itself, but the InChI Key looked correct. Can you double-check the ID's for these two compounds? ArthurPSmith (talk) 18:25, 6 June 2017 (UTC)
They are correct now bc I just corrected them, before, they where about the dihydrate. Here's an analysis I did a while ago, of compounds which need manual inspection, if you'd like to do more work Sebotic (talk) 18:29, 6 June 2017 (UTC)
Sebotic "My bot makes a decision, if 2/3 of identifiers match a certain compound, then the WD item is about this compound". There is the problem, problem of the programming: if there is a contradiction between data, bots don't have to take decision but a report should be done and a contributor has to check.
When we have more than 700 CAS number used several times in different items, 50 items with several ChEBI IDs, 165 items with several UNII, 181 items with several InChIKey, no algorithm can decide what is correct: someone has to check and to clean. Programming can only help us by identifying the suspicious data. Snipre (talk) 22:26, 6 June 2017 (UTC)
External databases are full of errors, WD imported a bunch of errors from WP, so there is only one solution: manual curation especially when no reference database can be considered has fully reliable. Snipre (talk) 22:26, 6 June 2017 (UTC)

Following a specific chemical ontology?

@Snipre: suggested this ontology as a starting point for organizing our own here, and I agree it's a good place to start, particularly as it seems to be based largely on (enwiki) wikipedia article sources in the first place. Some details there don't look right to me (for example I don't understand the purpose of "Element" vs. "ChemicalElement") but the general structure and relations seem reasonable. In particular "ChemicalElement" is clearly referring to the macroscopic domain, while "Atom" refers to the microscopic. @Benjaminabel, DavRosen, DePiep, Ricordisamoa: your thoughts?

Might be a good starting point, but some things about it still seem odd. For example:
  • ^ChemicalElement has ^Atom as component.
  • ^ChemicalSubstance has ^Isotope as component.
  • ^Atom has ^Isotope as component.
  • ^Nuclide is a synonym to ^Atom.
I'm not sure where a class like "hydrogen atom" could fit in. If hydrogen is a subclass of ChemicalElement, but it is composed of atoms (which incidentally are synonymous with nuclides?), then it seems that those atoms can't easily be identified as hydrogen atoms because they don't aquire the properties of hydrogen until they get composed together to form hydrogen as a chemicalelement.
Also, I'm not so sure about the heavy usage of specially-defined properties like "is isotope of"/"has isotope" property.
DavRosen (talk) 21:51, 8 March 2017 (UTC)
I do think the ontology mentioned will be useful, but I suggest that we be sure to identify what the concrete classes look like (i.e. classes whose instances are individual objects in the real world like a physical object or lump of matter named Mylump that I happen to be holding in my hand), and that these can be consistently linked to the abstract classes such as pure concept or model subclasses, or metaclasses of concrete classes (i.e. classes whose instances are themselves each a concrete class). Or that certain ones among the existing classes that we might have considered to be abstract might be able to (also?) serve as concrete classes whose instances could ultimately be molecules, atoms, etc., and/or objects/lumps of matter made up of such molecules, atoms, etc.
Almost any concrete class in chemistry will ultimately (transitively/recursively) be a subclass of (i.e. a particular collection of) ordinary matter (and often also a subclass of physical body, right? And I'm thinking that any fundamental (microscopic/bound) object will also be an instance of their (indirect)subclass molecular entity, and also presumably some of its subclasses such as atom, electronegative atom, and possibly even hydrogen atom, etc.
I'm *not* saying that we *necessarily* need to have a concrete subclass of atom corresponding to an atom of each element, and certainly not an atom subclass for each ionization state of each isotope, etc., of each element :-) but we need to know in principle what any given concrete class would look like if there were ever a good reason to create it.
Does that make sense? If we focused *solely* on the abstract/conceptual classes of chemistry (or added the concrete classes as a disorganized afterthought) then we wouldn't be clearly representing the fact that any particular lump of matter (like Mylummp) is in fact composed of instances of some very specific classes that are studied by chemistry.
DavRosen (talk) 14:29, 9 March 2017 (UTC)
@DavRosen: I think that makes sense, but can you flesh it out a bit more? It sounds like you're avoiding metaclasses (like "element") to start with? ArthurPSmith (talk) 16:55, 9 March 2017 (UTC)
@ArthurPSmith: I'm not sure if we can completely get away from existing metaclasses at this point, but that's okay so long as concrete classes exist (or at least some exist and it's clear how we *could* create any others as wanted or needed) and their relationship to the metaclasses is appropriate. In the meantime I'm trying to understand what we already have. I see that nuclide and chemical element are are each a subclass of one another! In one direction since 2014 and the other since 2016. Can anyone comment on which of these relationships (if either) might be correct? And chemical element is said to be a metaclass, but of a metaclass of exactly what class? Twice it was specified as a metaclass of nuclide (once by User:TomT0m and once by me) but both were undone (mine by me because I'm no longer sure). Also, molecular entity has been a concrete class since 2015 (first of matter, then of physical object, and recently I changed that to a more specific subclass of those), but is this correct? If I find two separate physical molecules (or atoms etc.) that have identical characteristics, do these represent two instances of molecular entity, or just one since they are not "constitutionally or isotopically distinct" from one another? In this latter case, molecular entity would probably be a metaclass of a concrete class that might not yet exist, right? DavRosen (talk) 19:53, 9 March 2017 (UTC)
@DavRosen: I don't think subclass of is the right relationship in either direction for nuclide or chemical element. part of (P361) and has part (P527) maybe. I was looking around for other chemical ontologies - OpenCyc has one, but it is somewhat limited and focused I believe only at the macroscopic "substance" level. But it might be an interesting example to look at. For example start with ElementStuff and there's a fairly logical subclass ("type" in Cyc terminology) hierarchy; "isotope" is a subclass, but I think in the macroscopic sense of talking about stuff made entirely of one isotope of the element. A nuclide on the other hand, to me at least, is the nuclear equivalent of the atom, and definitely not a type of substance. ArthurPSmith (talk) 20:34, 9 March 2017 (UTC)
Okay, but is nuclide a metaclass of the concrete class atomic nucleus (or are they redundant)? More generally, couldn't an entire atom (or atomic nucleus since it's an ion of an atom anyway -- or is "atom" meant to be limited to neutral atom?) be classified by nuclide, so nuclide could be a metaclass of atom, even though the nuclide (like the element/atomic number) depends only on its nucleus? DavRosen (talk) 21:04, 9 March 2017 (UTC)
well it sounds like we're getting into definitions that may be ambiguous. To me "nuclide" = "atomic nucleus" (in the sense of the properties of a nucleus with a specific neutron and proton count, not a generic nucleus), not including the electrons or other components, and referring to a single one not a macroscopic collection. However, the definition on nuclide (Q108149) (at least in English) seems more like what I would call an "isotope" - a macroscopic collection of atoms with a specific nuclide in the nucleus. But maybe these terms aren't universally understood that way, I'm not sure. We may have to define things more precisely than is customary in these areas to have a workable ontology. ArthurPSmith (talk) 21:18, 9 March 2017 (UTC)
Perhaps nuclide could be a metaclass both for classes of nuclei and also for classes of atoms having those nuclei. More importantly, would you say that the class atom includes only neutral atoms, or also their ions, or is there no clear answer? If it's ambiguous we could create a subclass of atom for the unambiguously-narrow sense (neutral) and a superclass of atom for the unambiguously-broad sense (including both neutral atoms and ions), and one of them could eventually be merged with atom later if there's ever a consensus on which one atom should represent. I think wikipedia-linked classes are often ambiguous because wikipedia articles cover multiple or vague concepts. DavRosen (talk) 21:49, 9 March 2017 (UTC)
I quote frwiki « Un nucléide est un type d'atome » ( « a nuclide is a type of atom » )   is a nuclide, so we can replace « a nuclide » with  . This gives   is a type of atom. This means
⟨   ⟩ subclass of (P279)   ⟨ atom ⟩
. But this also means
⟨   ⟩ instance of (P31)   ⟨ nuclide ⟩
as it’s an example of nuclide. Hence nuclide and atom are not synonymous as it would imply that
⟨   ⟩ instance of (P31)   ⟨ nuclide ⟩
is both an instance and a subclass of nuclide, and it’s not consistent. Nuclide is a metaclass, as element and isotope. author  TomT0m / talk page 16:28, 24 May 2017 (UTC)
Currently "has part carbon" is used both for substances (cast iron (Q483269)) and molecule types (ethanol (Q153)). I think this cannot be correct, but I don't know which statements should be changed, if any. If I got everything right, it depends on what definition of chemical element is used on wikidata. From the description of chemical element (Q11344) ("each instance has the same specified number of protons in its atomic nucleus, and is itself also a class") it seems that a chemical element is a type of atom with particular atomic number and this is also what en:Chemical element says. On the other hand, in the proposed ontology from ontology4.us it is defined as an aggregate of atoms with particular atomic number. If we assume an analogous definition of isotope, than in the first case we have 1. (a particular   atom) instance of   subclass of  , 2. cast iron (a material type) has parts of the class  , 3. ethanol (a molecule type) has parts of the class   (quantity = 2). I'm not sure about point 2., but to me it looks like the relation (<galaxy> has parts of the class <star>) listed on Help:Basic membership properties. In this case, the meaning of   is "carbon atom", so "x instance of  " = "x is a carbon atom". On the other hand, if we assume the second definition, we have 1. (a particular   atom) part of (a particular lump of  ) instance of   subclass of  , 2. cast iron has part  . I don't know how to express 3. in this case - there seems to be no easy way to go down from the substance level to the atomic level. So which definition is the right one: is carbon an atom class or a substance class? Kubaello (talk) 22:00, 21 June 2017 (UTC)
@Kubaello: Do you want to create item for one particular atom of C ? If not, your first classification a particular   atom) instance of   subclass of   is not necessary.
You can always go deeper in the classification by adding new properties. For example, I can say
  • (a particular   atom in the excited 1s22s12p3 state) instance of (a particular   atom) subclass of ( ) subclass of ( ).
But I can classify like that
  • (a particular   atom in the excited 1s22s12p3 state) subclass of (  atom in the excited 1s22s12p3 state) subclass of (a particular   atom) subclass of ( ) subclass of ( ).
And if I add a new characteristic like the spin of electrons, I can classify (a particular   atom in the excited 1s22s12p3 state) as class instead of instance. This is no objective definitions of instance/class, the users of the classification define what is an instance and what is the class. So instead of trying to define what is a n instance and what is a class, we have to define what we want to class: do we want to work with element and isotope only ? Do we want to create items for atom in a specific electronic state ? Once you define the most identified element of the classification, you define the instance level. If you don't that, the discussion about instance/class is useless: just use class and never instance as someone can always characterize more precisely a existing concept in order to create a new one leading to the obsolescence of previous instance classification. Snipre (talk) 15:18, 22 June 2017 (UTC)
@Snipre, Kubaello: I don't think that's what Kubaello was mainly asking about. The issue at hand is whether the wikidata item for carbon is defining the class of all individual atoms of carbon, or rather is defining the class of all substances made purely of carbon. This returns to the initial question here - do we need to be more specific in our chemical ontology to distinguish those two types of meanings for the term "chemical element"? Or do we allow both with the understanding that an atom is a sort of special case of "substance purely made of the element". To respond to Kubaello's actual question, I think with that understanding (which is effectively what we are doing now) your (2) is fine either way, but (3) brings the question - what does the qualifier "quantity = 2" signify? In principle it means each molecule of ethanol has two atoms of carbon, but we don't actually have a wikidata item representing carbon atoms separate from collections of carbon atoms. Perhaps what we really need is a different property - stoichiometry perhaps? To express these relationships better, rather than duplicating every element with atom vs substance distinctions? ArthurPSmith (talk) 17:41, 22 June 2017 (UTC)
@Snipre, ArthurPSmith: I do not really want to create items for particular atoms and I don't think they would be useful. However, I think it is important to have clear definitions of concepts such as chemical element. If carbon is used both to mean a class of atoms and a class of substances it makes querying the database more difficult. For example, if we ask for subclasses of carbon we only get isotopes, but no phases/allotropes. This happens because diamond and graphite are subclasses of allotrope of carbon and not of carbon itself. Querying for subclasses of iron returns isotopes (iron-56, iron-60) and phases/materials (ferrite, sheet iron). Moreover, both queries return "isotope of X" (where X = carbon/iron), which is wrong, as these are metaclasses and their instances are not instances, but subclasses of respective elements. Also diamond and carbon shouldn't be subclasses, but instances of allotrope of carbon. For example Great Mogul Diamond is a diamond, but Great Mogul Diamond isn't an allotrope of carbon. Maybe none of those examples have severe consequences, but all in all the database seems inconsistent. It also makes editing more difficult - I feel uneasy making some edits, especially concerning the instance and subclass relations, if I don't know what some items are actually meant to represent. I think that consistency is very important and having good foundations would help in many ways - for example preventing problems such as nuclide and chemical element each being a subclass of one another. Regarding using only classes - I think this is correct in case of atoms, as probably no one will need items describing particular atoms. This is not true in general and I disagree that there is no objective definition of an instance. For example the item "Abraham Lincoln" could never be a class - it describes a single human being by definition. I support the idea of having separate items representing atom types and elementary substances. There would be more items, but they would have simple and precise meanings. I think it would be much less confusing than having two meanings assigned to single item. Most properties would probably be stored on atom type items and substance items would only need to reference those items, so there wouldn't be much duplication. That being said, I think there are some properties which would make sense for substance items, like most stable allotrope (at given temperature). Kubaello (talk) 21:41, 22 June 2017 (UTC)
@Kubaello: I agree with you about the common problem of definition in WD but the reason of this problem is mainly coming from the origin of WD: WP articles where concepts are mixed and where creation of some articles is not based on any logical reasoning.
Then about your affirmation "the item "Abraham Lincoln" could never be a class", I think this represents a very limited view of the classification: if you want, you can create an item "Abraham Lincoln as teenager", "Abraham Lincoln as president",... and linked all these parts of the life of this man under a class representing not only a physical body. With the affirmation of "Abraham Lincoln" could never be a class", we completely miss the time parameter in the concept: a baby is a child, a teenager is a human, an adult is a human,... So for each period of the life of "Abraham Lincoln" you can create an item. What is important is not classification , this is definition and rules you want to apply in your ontology. I can only point your own comment mentionning the difficulty of finding the good definition for chemical element or atom and at the same time saying that "Abraham Lincoln" could never be a class: if you can define this last comment in a so definitive tone, why can't you apply the same reasonning for chemical element or atom ? This is not a criticism about what you said, I just what to analyze with you what are the reasoning behind one of your affirmation: an human for you is mainly a timeless physical body. You didn't integrate the time as characteristic of a human, this is a choice not an objective way to define a person. Just tak a biography of Abraham Lincoln: this is not one bloc of sentences, there are different chapters often based on chronology or acivities. This is another way to consider Abraham Lincoln. Snipre (talk) 22:31, 22 June 2017 (UTC)
@ArthurPSmith: I understood the problem but the chosen example is always a bad one. And as most of the time, the problem is not a problem of classification but a problem of definition. We just mix two different things here: the classification is often easy when the definition is clear. Yourself indicates "The issue at hand is whether the wikidata item for carbon is defining the class of all individual atoms of carbon, or rather is defining the class of all substances made purely of carbon". No mention of instance. So the best is to avoid to speak about instance/class and to focus on definition only. Snipre (talk) 21:50, 22 June 2017 (UTC)
@Snipre: I agree with everything you said about Abraham Lincoln - having separate items for each period of life is another valid way to describe a person, which I failed to see. It seems the distinction between instances and classes is not as simple and not as useful as I thought. Kubaello (talk) 12:01, 23 June 2017 (UTC)
@Snipre, ArthurPSmith: Concerning the definition of chemical element, IUPAC Gold Book also lists both meanings. Do you think it is good as it is now - assigning both meanings to single item and having joint items for atom type/elementary substance, molecule type/pure substance? What about ions/salts? For example chloride ion (chloride ion (Q108200)) has description "chemical compound" and at least some linked wikipedia pages are mainly about chlorides. Kubaello (talk) 12:01, 23 June 2017 (UTC)
@Kubaello: No, IUPAC definitions can't be both in one item. WD requires to separate the definitions in two different items. But you can take the IUPAC definition for WD, for example like in the case of chemical substance. And chloride ion (Q108200) has to be defined as anion, not as chemical compound. The problem is coming from bot imports who didn't do any analysis of what was imported from some databases and which mixed everything under chemical compound because the bot operator never try to see if all databases items were really only about compounds. Snipre (talk) 20:54, 27 June 2017 (UTC)
@Snipre: This sounds reasonable, but there is a problem with wikipedia articles also mixing information about ions and salts. If we split mixed items, what should happen to wikipedia links? Do you think we should keep the mixed item for the purpose of keeping links in one place and create two new items: one for the ion and one for salts? Kubaello (talk) 21:11, 27 June 2017 (UTC)
Kubaello We can't force WP to choose one definition or to avoid to mix different data in one item. This is the choice of each WP to decide eachWD item is more relevant for their article. I thing we shouldn't loose time to define what is the main concept in an article: we have ebough work in WD to define clear definition for each item and to link the items with correct relations. Snipre (talk) 21:23, 27 June 2017 (UTC)
@Snipre: I agree we shouldn't try to define main concepts of WP articles. I'll focus on splitting WD items. Kubaello (talk) 21:37, 27 June 2017 (UTC)

Tautomers? The creatinine example.

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
  Notified participants of WikiProject Chemistry I was just checking duplicate DSSTox substance ID (P3117) identifiers and ran into several IDs with more than one entry (using a demo SPARQL query for IMDB IDs), e.g. these two have the same : creatinine tautomer (Q28529684) and creatinine (Q426660). The latter is linked to a Wikipedia page, showing that that page is not for a specific chemical graph, but for two tautomeric structures. The Wikidata page creatinine (Q426660) tries to capture that and has two SMILES (interestingly, listing a *third* tautomer). And these two tautomers (in Wikidata) happen to have different Standard International Chemical Identifier (Q203250)s. Consequently, the entry has two PubChem (Q278487) CIDs, should have two ChemSpider and two ChEBI identifiers, etc.

My suggestion is to adopt a policy (habit, custom, ...) to have chemical compound entries in Wikidata that represent a tautomeric mixture (maybe even add that as type) with specific chemical graph structures (i.e. fixed bond positions) as subclasses. For example, remove the InChIs from creatinine (Q426660) and make creatinine tautomer (Q28529684) and the two other tautomeric structures childs of the entry. That way we do no have to fiddle with loosing the constraints. What do you think? --Egon Willighagen (talk) 11:37, 20 May 2017 (UTC)

Isn't this a Bonnie & Clyde issue? (recently discussed at enwiki). In short: all three elements in Bonnie & Clyde get a different QID, and have relationship properties and their applicable identifiers like DSSTox substance ID (P3117). I got the impression that in chemicals, this orthography already was introduced. -DePiep (talk) 12:01, 20 May 2017 (UTC)
Ah, interesting! Yes, that sounds very similar, indeed. I will read up on that link to a project to link into these kind of situations and will have to get back to this after that. Thanks for the pointer. --Egon Willighagen (talk) 12:05, 20 May 2017 (UTC)
I think that we need, and I have handled that in such a way anyway, a 'tautomer of' property (as in ChEBI) and represent each tautomer as a separate WD item. Bc if you have several Inchis, Ikeys, Chemspider, Pubchem on one item, you don't know anymore which ones describe which tautomer. and modeling that so a computer will understand it is harder than just indroduce a new property and have an item for each of them. Sebotic (talk) 17:28, 20 May 2017 (UTC)
@Sebotic: Why would we need a new property for that? Just use instance of (P31) = "tautometer". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 20 May 2017 (UTC)
Also, such issues should throw a constraint violation, reminding us to resolve them. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:48, 20 May 2017 (UTC)
@Egon Willighagen: Not convinced by the interest of creating items for tautomers: here we reach a definition problem of what we want to have as chemical stuff in WD ? Do we want to have all possible chemicals like in ZINC database ? Do we want to be the merged database of PubChem, ChEBI and ChemSpider ?
In my opinion only chemicals which have been isolated should have an item: having a InChI or a SMILES is not sufficient. When I see that and that I don't think that PubChem and ChemSpider are good source to define what is a real chemical. Snipre (talk) 07:47, 22 May 2017 (UTC)
@Snipre: not disagreeing with your point, but I do note that it is the English Wikipedia that refers to the various tautomers, and in doing so, I do think we would like to capture that accurately. I am not sure Wikidata should overrule what Wikipedia thinks is notable. Or? --Egon Willighagen (talk) 17:11, 13 June 2017 (UTC)
@Egon Willighagen: For each tautomer we can define for the most stable one the relation instance of (P31): chemical compound (Q11173) and for the second one instance of (P31): tautomer (Q334640) with qualifier of (P642). This solution allows the difference between the two possible versions. Snipre (talk) 21:29, 27 June 2017 (UTC)

Soliciting suggestions of new data sources

Dear all, we on the Gene Wiki / ProteinBoxBot team are doing some planning and prioritization of future biomedical data sets to load, and we'd like to solicit suggestions from the broader Wikidata community. Historically, the scope of our bot loading effort has revolved around genes, proteins, drugs, diseases, and microbes. And more recently we've also helped related groups load data on genetic variants and pathways. We would welcome suggestions of either other related entity types that should be systematically loaded, or data sources that describe relationships between these entity types. Obviously, availability of a high-quality, CC0-licensed data source is essential. Please let us know if you have any suggestions. (Cross posting to WD:MB, WD:MED, and Wikidata:WikiProject_Chemistry.) Best, Andrew Su (talk) 20:04, 23 June 2017 (UTC)

There's a growing conversation about molecules and their adverse effect of human health. I'm talking food additives, cosmetic molecules, pesticides… There are some maximum daily doses prescribed by institutions like the EFSA (European Food Safety Agency). Also a trove of related studies, commercial or not about each of those substances. And potential relationships with diseases (lung cancer, breast cancer…). Is that something that could fall within the scope of ProteinBoxBot ? --Teolemon (talk) 11:58, 24 June 2017 (UTC)
@Teolemon: In general EFSA data sounds very interesting, but unfortunately this description of the data access rules does not sound promising in terms of loading to Wikidata. Best, Andrew Su (talk) 17:18, 26 June 2017 (UTC)
https://twitter.com/Jan3R1chard50n is the PoC. They do open data. --Teolemon (talk) 12:22, 27 June 2017 (UTC)
Thanks for the lead. I will follow up! Best, Andrew Su (talk) 16:50, 27 June 2017 (UTC)
@Teolemon: Unfortunately I don't think we will be able to load data from EFSA anytime in the short term... https://twitter.com/Jan3R1chard50n/status/884485494939295747 Best, Andrew Su (talk) 20:02, 10 July 2017 (UTC)

Problem of concept definition for some ions

There is a regular mistake for items about ions: People mix the concept of ion and those of compounds family containing the ion. I don't know if the second concept is really necessary so I prefer to have feedback of other users before doing a merge.

Snipre (talk) 11:29, 29 June 2017 (UTC) @Infovarius: Snipre (talk) 14:21, 30 June 2017 (UTC)

I've noticed your changes in some elements about ions/compound classes and I'm not sure if adding ion/anion is correct. It is true that two concepts are usually mixed in one element and it should be fixed, but most articles in Wikipedias are about compound classes (and these articles are linked to WD elements) not about anions (info about anions are in most cases included in the article about compound class). So we cannot link de:Sulfate to WD element about an anion (Sulfate sind Salze oder Ester der Schwefelsäure so it's not about an anion), we cannot add commons:Category:Sulfates to such element, and we cannot even add instance of (P31)/subclass of (P279) = sulfate (anion) to e.g. sodium sulfate (Q211737). The element about anion could be used only with has part (P527). There should be different elements for anions and for compound classes (and the latter should be linked to Wikipedia articles in most cases). Wostr (talk) 11:14, 30 June 2017 (UTC)
@Wostr: Thanks for your comment. I never dared about the WP articles because they are not reliable to define the concept of an item: currently most items linked to WP articles focused on compounds family have data concerning only the ion. So we have to clean the items and I can do that job. I will let the Wikipedians decide which item is corresponding the best to their article. The only critical thing to define now is: do we want to keep some items for compounds family or can we merge those mentioned items ? Snipre (talk) 11:30, 30 June 2017 (UTC)
I will one step further because I already had some feedback from other contributors:
if we agree to have two different classes of items, one class for the ions and one for the compounds family, all items for compounds family have to have subclass of (P279): chemical compound (Q11173) orsubclass of (P279): salt (Q12370) ? Snipre (talk) 14:34, 30 June 2017 (UTC)
I thought we decided above just a few days ago (discussion with Kubaello) to split element and substance items - so it seems even more clear we should do the same in this case, for ions and compounds based on those ions. Definitely don't merge the ones that are already separated. Since salt (Q12370) is a subclass of chemical compound (Q11173) I would go with the most specific superclass that's accurate. ArthurPSmith (talk) 14:38, 30 June 2017 (UTC)
Is salt (Q12370) correct in this case? As e.g. sulfate can be a salt or an ester. Both are derivatives of the same acid, but if we decide to have elements like sulfate (salt) and sulfate (anion), we should also create sulfate (ester)... I'm not sure that spliting sulfate into 3 diffrent elements is the best option. Maybe sulfate (not anion) should be described as derivative of xxx acid without classifying it as a salt/ester and only by chemical compound (Q11173)? Wostr (talk) 21:35, 30 June 2017 (UTC)
@Wostr: The different items already exist for sulfate category. Sulfate ion and sulfate group are not the same: just think about all funcitonal groups composing organic molecules. Mixing salts and organic molecules is just creating a messas inorganic and organic chemistry follow different rules especially for nomenclature. Snipre (talk) 01:35, 1 July 2017 (UTC)


@Snipre:: Yes, but sulfate as a class of compounds is defined in the first place as salt or ester and that's true for many similar classes (exluding those that don't exists as an esters and a salt form is only possible). I think the correct hierarchy is:
  • sulfuric acid derivative
    • sulfate (salt or ester)
      • sulfate (salt)
        • containing sulfate anion
      • sulfate (ester)
        • containing sulfate group
    • hydrogen sulfate (salt or ester)
      • hydrogen sulfate (salt)
        • containing hydrogen sulfate anion
      • hydrogen sulfate (ester)
        • containing hydrogen sulfate group
(of course there could be a division into organic [ester or salt]/inorganic [salt] in the first place but IMHO concept of organic/inorganic compounds is not a good choice in classification of compounds in terms of their structure etc.)
That's why I'm asking which sulfate is the one which we plan to create here: sulfate anion vs sulfate salt or maybe sulfate anion vs sulfate salt or ester? In the latter salt (Q12370) would be incorrect. Wostr (talk) 08:36, 1 July 2017 (UTC) PS Beacause maybe sulfate salt and sulfate ester could be eliminated and have only sulfate salt or ester. Wostr (talk) 08:38, 1 July 2017 (UTC)
@Wostr:: "sulfate as a class of compounds is defined in the first place as salt or ester". Who say that ? This is classification so we are free to follow some existing classifications or not. For example ChEBI classification doesn't follow your classification (see sulfuric acid derivative where the class is splitted into sulfuric ester, sulfates and organic sulfate. The best is to compare existing classifications and to identify what are the rules defined for each classification and choose the one which corresponds to our needs. Snipre (talk) 17:58, 4 July 2017 (UTC)
@Snipre:, I know ChEBI classification and you should check how these three classes are linked to each other and their definitions:
  • sulfuric acid derivatives : organic sulfate , sulfates , sulfuric ester
    • sulfates : sulfate salt , organic sulfate (?)
    • sulfuric ester and organic sulfate are not linked in any way but both classes are esters of sulfuric acid.
I don't see any reason, why would anyone follow this classification, it's unlogical and IMHO incorrect. Wostr (talk) 21:59, 4 July 2017 (UTC) PS And the organic sulfate is ambiguous – sulfate salt with organic cation is not an organic sulfate? That's why classification based on organic/inorganic concept is not a good option. Wostr (talk) 22:12, 4 July 2017 (UTC)
@Wostr: I didn't say we have to follow ChEBI classification, I just mentioned there are other types of classifications. I want to start a discussion about what are the elements of a good classification. If I take the examples you gave about sulfuric acid derivative I don't understand the need of the intermediate levels sulfate (salt or ester) and hydrogen sulfate (salt or ester) ? Why can we have a more horizontal classification without unnecessary levels ? We can have
  • sulfuric acid derivative
    • sulfate (salt)
    • sulfate (ester)
    • hydrogen sulfate (salt)
    • hydrogen sulfate (ester)
The only way to be able to choose is to define criteria first and then to start the classification. Snipre (talk) 08:15, 20 July 2017 (UTC)
Yes, the salt and ester level may be redundant – my proposition was bases on pl.wiki category tree and the intermediate level is useful there (but it may not be here). Wostr (talk) 09:30, 20 July 2017 (UTC)

bromoaniline (Q4973722)

Q27120791 has repeatedly been merged into bromoaniline (Q4973722) by a user. However, I do strongly believe that this is against Wikidata principles. de:Bromaniline is an article on a group of chemical substances, whereas en:Bromoaniline is a disambiguation page. Any other views? --Leyo 13:15, 17 July 2017 (UTC)

Hmm, the enwiki page is only disambiguating the different articles about specific chemicals in that group, so they really are about the same thing. I think the best solution here might be to change the enwiki article so it's not a disambiguation page but more about the group of chemicals (with links to the specific articles retained of course), perhaps some of the text from the dewiki article can be translated and imported? Yes in general disambiguation pages and regular article pages should not be mixed up like this. ArthurPSmith (talk) 14:45, 17 July 2017 (UTC)
I had a similar problem with Metasilicate and after a small discussion on WP:en, I deleted all references to disambiguation page to create a real article about the Metasilicate family. Snipre (talk) 16:15, 17 July 2017 (UTC)
I wouldn't call en:Metasilicate an article, but rather an unsourced stub. ;-) But anyway, this is surely a possibility if someone is going to convert en:Bromoaniline into an article. If not, we need another solution. --Leyo 21:14, 17 July 2017 (UTC)
@Leyo: Don't mix page type and evaluation: an unsourced stub is an article at a stub level. But this is an article compared to a disambiguation page or a category. What is important is not the content, because the content is under WP responsibility, but how we have to classify it in WD. Did you do a difference in WD between a A-class article and a stub article ? Snipre (talk) 10:18, 18 July 2017 (UTC)
Yes, sure, en:Metasilicate an article in Wikidata terms.
The case of Q27120791, however, remains open. --Leyo 10:51, 18 July 2017 (UTC)
My proposition:
* Delete the mention of disambiguation page in WP:en (transform it in an article)
* Delete all descriptions in Q27120791
* Delete instance of disambiguation page
The definition in the WP:en article is not correct: Bromoaniline can't refer to 2-bromoaniline or 4-bromoaniline. This is against the chemical nomenclature. Bromoaniline can refer to only 1 thing: the family of Bromoaniline compounds.
Better to keep an unique system where general terms are used for compounds family only and to avoid the use of disambiguation page for chemicals. There is a chemical nomenclature which is able to identify each chemical, so if a name is not sufficient to distinguish two chemicals, it is wrong. Simplification in chemical naming is just a bad habit which is support by no authorities in chemistry. Snipre (talk) 12:09, 18 July 2017 (UTC)
By family of bromoaniline compounds you mean three bromoaniline isomers or bromoanilines and any derivative of them (having bromoaniline in the structure)? Wostr (talk) 20:15, 19 July 2017 (UTC)
I prefer to keep bromoaniline for the group of the three isomers and to use another name like bromoaniline derivatives for the rest. I have a rigid idea of the naming for chemicals but I am facing so many cases where names don't help to identify compounds that I prefer to use strong principles. That's my opinion. Snipre (talk) 21:11, 19 July 2017 (UTC)
I agree. I asked because in some wikis we have different approaches and names like bromoanilines etc. refer to group of isomers or derivatives (and sometimes to both: article describes group of isomers and category collects derivatives). Wostr (talk) 09:22, 20 July 2017 (UTC)
Done. By the way en:Chloroaniline is not a disambiguation on WP:en, but a list. There is a huge structure problem in WP:en preventing to have a good idea of the classification scheme. Snipre (talk) 08:03, 20 July 2017 (UTC)

crystal violet (Q420705)

After several bot changes, this item (cation) does not correspond anymore to its sitelinks (chloride). I would prefer to keep the sitelinks in this items, i.e. to correct identifiers etc. --Leyo 15:39, 12 June 2017 (UTC)

Hi - I looked over changes for the past more than 1 year and I don't see an edit that changed the chemical formula, can you point to one that was a problem? Or you could chat with User:Egon Willighagen who has edited this item recently and is not a bot, get his advice... ArthurPSmith (talk) 18:09, 12 June 2017 (UTC)
The first edits that started causing this issue were done more than a year ago (e.g. Special:Diff/240694269). --Leyo 21:47, 12 June 2017 (UTC)
ah, that's almost 2 years ago. It might be better to create a new item for the chloride and move the sitelinks you think belong there. ArthurPSmith (talk) 15:37, 13 June 2017 (UTC)
Yes, this likely needs to be resolved... splitting the cation from the salt seems wise. I have to check what "gentian violet" formally refers to, but EN-Wikipedia suggests the salt indeed. I also second the observation that the change to a cation was made almost two years ago. My more recent edits are based on InChIKey matches, and points to the cation. One question I have here, what is more important, the sitelinks or the data here? Not sure how to check when the sitelinks are made, and what to do when local Wikipedia versions contradict (that happens). So, I am not sure at this moment if the proper action is to make a new entry for the ion or the chloride salt... ---Egon Willighagen (talk) 17:08, 13 June 2017 (UTC)
@Egon Willighagen, ArthurPSmith: I corrected it so the whole item reflects the chloride salt only. The kation version should be represented as separate item, as it has its own UNII, ChEBI, ChEMBL, etc. The items should be linked via part of (P361) and has part (P527). This issue did not go unnoticed as you can seem from the fact that it is included in my issue list. Please curate! Sebotic (talk) 18:11, 13 June 2017 (UTC)
@Egon Willighagen, ArthurPSmith, Leyo: Better move sitelinks that changing WD data: WP articles are not reliable as often data about several compounds are mixed in the same article. The only best way in my opinion is to used the InChI as key parameter to define what is the real compound or entity represented by the item. Snipre (talk) 14:36, 22 June 2017 (UTC)
@Sebotic: Can you update Group C list ? I wanted to continue the curation of that list and each time I opened an item I saw that a value was added since the date the list was issued ? Thanks Snipre (talk) 14:36, 22 June 2017 (UTC)
@Snipre: I updated lists C and D, rest will follow. Sebotic (talk) 00:17, 2 August 2017 (UTC)

Iron phosphate (Q1211305) and contains administrative territorial entity (P150)

I found that Iron phosphate (Q1211305) has contains administrative territorial entity (P150) with iron(II) phosphate (Q2618789) and iron(III) phosphate (Q1311179) (these two phosphates have located in the administrative territorial entity (P131)). At first, I wanted to delete located in the administrative territorial entity (P131) and contains administrative territorial entity (P150), but maybe we have some properties that would be better in that case? Wostr (talk) 17:12, 6 August 2017 (UTC)

perhaps has part (P527) and part of (P361) respectively? ArthurPSmith (talk) 19:53, 7 August 2017 (UTC)

Monolingual text IUPAC names?

Four years ago the IUPAC name was proposed as property and put on hold to await establishment of multilingual text (see Wikidata:Property_proposal/Pending). This does not seem to happen soon. I asked today on the #wikidata IRC channel where it was suggested that reproposing the property may be a good idea. But before I do so, I like to hear the ideas of the current people involved and others who commented last time, like @Andrew_Su:, @Tobias1984:, @Emw:, @Graeme Bartlett:, and @Beetstra:. --Egon Willighagen (talk) 17:56, 2 September 2017 (UTC) Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
  Notified participants of WikiProject Chemistry

Before doing anything we need the creation of the multilingual datatype. I don't think we need to reopen the proposal: we need to ask the creation of the good datatype. Snipre (talk) 18:15, 2 September 2017 (UTC)
Can you explain why this needs to be multilingual, and not monolingual? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:43, 2 September 2017 (UTC)
Snipre (talkcontribslogs), I sort of have the same question as Pigsonthewing (talkcontribslogs)... the entry in Wikidata:Property_proposal/Pending does not give a lot of context, and browsing the Phabricator, it seems that multilingual is for single 'texts' that have more than one language... but IUPAC names are always just in one language: it's an English IUPAC name, a Dutch IUPAC name, etc. If not mistaken, you participated in the original discussion... do you know where that discussion was held 4 years ago, or let me know more of the context of back then? --Egon Willighagen (talk) 21:59, 2 September 2017 (UTC)
@Pigsonthewing, Egon Willighagen: The difference is in the definition of the datatype: a monolingual datatype means you store one value with one language per statement. If you have several languages, then you create several statements. A monolingual datatype means you have some values in small set of languages. In the case of multilingual datatype, you have a value for all the languages the system can support. From contributor point of view, this doesn't help a lot, but for machine or programming point of view, this is different: for a monolingual datatype, I am not sure to find a value for a defined language so I have to start first to check if a value exists for the language I want and then if the value exists, I can extract if. With a multinlingual datatype I don't need to first check if a value exists for one specific language: the value exists anyway but the value can be empty.
From database point of view, the multilingual datatype save space as it already include the structure to save a value for all languages. Perhaps I should use a better picture: the monolingual datatype is like a flat in one bedroom: you can have one guest. The multilingual datatype is like a flat with several bedrooms. What is the difference ? The flat with several bedrooms shares a unique batchroom and an unique kitchen. I want to have several guests with flat composed of only one bedroom I need several flats with all having a bathroom and a kitchen.
How this impact WD ? Just try to open an item with several hundred of statements (like Germany (Q183)): this is a nightmare for data amount because for each value you have, you have to deal with a corresponding statement structure to download. Another picture: the monolingual datatype is like a bookshelf designed for only one book. If you want several books, you need several bookshelves. The multilingual datatype is like a bookshelf designed for several books so you save space and time to look for a book.
And if my last explanation is not sufficent, try to see once how the data are stored in the servers in JSON language, it is something like:
blablabla(value = XXX, language = YYY)blablabla
So storing 5 monlingual datatype statements gives that
blablabla(value = XXX1, language = YYY1)blablabla
blablabla(value = XXX2, language = YYY2)blablabla
blablabla(value = XXX3, language = YYY3)blablabla
blablabla(value = XXX4, language = YYY4)blablabla
blablabla(value = XXX5, language = YYY5)blablabla
With a mutilingual datatype, this would be:
blablabla(value = XXX1, language = YYY1; value = XXX2, language = YYY2; value = XXX3, language = YYY3; value = XXX4, language = YYY4; value = XXX5, language = YYY5; value = XXX5, language = YYY5;)blablabla
Just do the same exercise with 200 or 300 languages. Do you see a difference ?
So the multinlingual datatype is different from the monolingual datatype because:
* multinlingual datatype assume the existence of one value (which can be empty) for all languages of the system. This helps data extraction because you don't need to check first the existence of a value in a specific language
* from memory space and access time point of view, the multilingual is optimized to store dataset contrary to monolingual datatype.
This two features deal mainly with system features and not with contributor features but Wikidata is not only used by humans and this is good to design systems which can be easily used by machines too. Snipre (talk) 03:51, 3 September 2017 (UTC)
Perhaps I have to add info about drawbacks of multinlingual datatype: this datatype assume one and only one value per language. For IUPAC name, this not always the case, as we can have several names we will have to create different IUPAC name properties like the "preferred IUPAC name". Snipre (talk) 04:12, 3 September 2017 (UTC)
So, no IUPAC names in the foreseeable future? --Egon Willighagen (talk) 07:18, 3 September 2017 (UTC)
Two steps process:
* ask the development team if they plan to create the multilingual datatype in a near future (  Done, see here)
* if no, then we can go back to the first discussion: do we want to use the monolingual datatype for IUPAC name or do we prefer to wait for a hypothetical multilingual datattype creation first ?
This is my opinion but I am not the majority. Snipre (talk) 22:54, 3 September 2017 (UTC)
Among the already present properties I can only see an advantage of multilingual datatype in one case: media legend (P2096). We are in this case only interested in having one line of text for each language. The Dutch, French or Danish text do not have to say the same thing, but you still only need max one value per language. But on second thought, I am not so sure any longer. If there would be a "new caption property" with multilingual datatype, it would still only be useful if it had a translation to the language I am interested in. In an infobox at Wikipedia et all, I am not interested in a fallback to English or any other language. The only thing I am interested in, is to find the caption who have "language=sv". One single claim instead of 17 saves space, but that is only true until somebody add a second claim with the same property. Then a Multilanguage datatype would probably take more space, since I guess a multilang-datatype needs more metadata. Another drawback could be that a source for the en-value in the multilang datatype maybe isn't the same as the source for the "de-value"? -- Innocent bystander (talk) 05:50, 4 September 2017 (UTC)
Could we just use official name (P1448) for this (sourced to IUPAC)? ArthurPSmith (talk) 14:47, 4 September 2017 (UTC)
ArthurPSmith: As explained there are several IUPAC names, for example take en:Acetic acid which has a Systematic IUPAC name (Ethanoic acid) and a Preferred IUPAC name (Acetic acid). Then more complex molecules have different names depending on the nomenclature used. So we will have several properties for IUPAC names or other systematic names, all having a specific translation in the different languages. Snipre (talk) 18:47, 4 September 2017 (UTC)
Retained names, preselected names etc. just in organic IUPAC nomenclature. For inorganic nomeclature there are at least 4 different methods (and eg. there may be several names created using just one method; see table IX in IUPAC Red Book 2005) and there will be PINs for inorganic compounds in the future as well [9]. I think the whole system of properties (multilingual as I don't see how it would be possible to maintain all of these names using monolingual properties; there is already World Health Organisation International Nonproprietary Name (P2275) and having several such properties with dozens of names...) should be discussed in great detail before any atempts to add IUPAC systematic names to WD (and to ensure that these properties will be filled with correct IUPAC systematic names, as the Internet and sources (and Wikipedia too) are full of non-systematic, semi-systematic, almost-systematic, old-systematic, CAS-systematic, ACS-systematic etc. names). Also there is a problem that not every language has its own translation of IUPAC nomenclature (and even if there is one, sometimes it's a translation of an old recommendations – so we have to decide whether we should accept names in languages that don't have its own translation or reject these names). And how to deal with names that was systematic in old recommendations? I don't think it is reasonable to propose any property about IUPAC systematic name without months of discussion. Wostr (talk) 23:42, 4 September 2017 (UTC)

heavy water (Q155890)

w:Heavy water: "form of water that contains a larger than normal amount of the hydrogen isotope deuterium (heavy hydrogen), rather than the common hydrogen-1 isotope (protium) that makes up most of the hydrogen in normal water". So, "heavy water (chemical substance)" is "water (chemical substance)" and is not "water (chemical compound)". --Fractaler (talk) 09:08, 8 December 2017 (UTC)

And is this element about heavy water (mixture of D2O, DHO and H2O) or heavy water (D2O)? instance of (P31) is ok, it's the problem that Wikipedia articles do not corresponds 1:1 to WD element and maybe there should be another WD element about heavy water as a mixture? Wostr (talk) 17:43, 10 December 2017 (UTC)
Wikipedia is the previous stage in the evolution of the representation of the world model. It is already an atavism and a rudiment that only human uses (only human-readable medium (Q372222)), but not machines (machine-readable data (Q6723621)). Of course, there must be unambiguous terms in Wikidata, not homonym (Q160843) --Fractaler (talk) 09:08, 11 December 2017 (UTC)
@Fractaler: just create element about 'heavy water (mixture of D2O, DHO and H2O)' and then we'll have to decide where wikilinks should be placed. heavy water (Q155890) is IMHO correct, all the properties pertain to D2O and not to heavy water as defined in Wikipedia article. I don't know what you're trying to show by posting your opinion about Wikipedia/Wikidata? Wostr (talk) 13:04, 11 December 2017 (UTC)
I'm afraid we will not have time to decide (my items are not eligible for discussion, and the administrator who deletes without discussion has no feedback). Opinion about "Wikipedia -> Wikidata" just can help the user to think: and what project has a great perspective, is it worth worrying about what will happen there with Wikipedia articles (we can give links and not on Wikipedia articles) --Fractaler (talk) 13:31, 11 December 2017 (UTC)
The problem is that we don't spend enough time to discuss ontology or which kind of classification we want to use. If everyone comes and creates its own items corresponding to his particular view of a subject, we will be lost. We need more discussion like this one and from that discussion we need to create general rules for later. Snipre (talk) 15:00, 11 December 2017 (UTC)
How do you define, user creates items corresponding to his particular view of a subject or to not his particular view of a subject? There is any ontology, classification that Wikidata wants to implement? Or we have only Wikidata:Notability and nothing else? Even no main goal? Then what's the point of discussing? --Fractaler (talk) 06:57, 12 December 2017 (UTC)
I don't have a problem with creating a "heavy water" item in wikidata about mixtures with higher than normal D levels, in fact I was looking for one the other day and surprised it didn't exist. The enwiki link though actually is mainly about D2O (though it also has some info on DHO and water with heavy oxygen isotopes). ArthurPSmith (talk) 15:53, 11 December 2017 (UTC)

Should we keep canonical SMILES (P233)?

There are currently two properties for Simplified molecular input line entry specification (Q466769): canonical SMILES (P233) and isomeric SMILES (P2017). I checked their proposals and discussion pages, and am not sure if we should keep both. A 'canonical SMILES' suggest is has been made unique, but does not require stereochemistry to be defined (an 'absolute SMILES' would). The 'isomeric SMILES' has all the stereochemistry, but does not have to be unique. Now, the uniqueness is based by the algorithm used, and I could not find any pointer to what algorithms would be used, and therefore we have no way to validate the canonical SMILES (P233) is in fact correct. Interestingly, it is the isomeric SMILES instead of the canonical SMILES that should be unique for all compounds. Why do we have canonical SMILES (P233) and, second, should we keep it? Or should we go to a situation where canonical SMILES (P233) is used only for compound classes (where stereochemistry may be undefined; things that are subclass of (P279)) and use only isomeric SMILES (P2017) for compound instances (where all stereochemistry is defined; things that are instance of (P31))? --Egon Willighagen (talk) 14:28, 10 December 2017 (UTC)

At the very beginning, only a property called SMILES was created and was assumed to be unique but when the case of isomeric compounds were integrated in WD, then the distinction between isomeric and canonical was necessary. Even if the uniqueness is not possible with canonical SMILES, I don't see a problem to keep that property. The only problem is when two correct but different SMILES are present in an item: then just use the refrence to extract SMILES you want. Using only SMILES from the same reference is similar to use SMILES based on the same algorithm. And as both SMILES were not converted into ID properties, the uniqueness is not a problem: only the definition was wrong at the begining of the creation process, but not the current use. Snipre (talk) 20:39, 10 December 2017 (UTC)
Would you say it is acceptable that a chemical compound without any form of stereochemistry has a isomeric SMILES (P2017)? --Egon Willighagen (talk) 05:20, 11 December 2017 (UTC)
Why not ? What prevent you to generate an isomeric SMILES for a non-isomeric compound ? As we have currently no way to indicate in WD that a compound is a stereoisomer, we can use the fact that if the canonical SMILES is the same as the isomeric SMILES then the compound is not a stereoisomer. This is the same problem if a compound has no identifier in a database: do you prefer to add the statement with no value or if no statement about the identifier is present in the item can you conclude that the identifier doesn't exist in the corresponding database ?
This is just a question of format and what kind of data is always required.
Currently I prefer to have an isomeric SMILES for a non-isomeric compound than nothing: it is easier to determine that the compound is not a stereoisomer. Snipre (talk) 14:53, 11 December 2017 (UTC)
There is a way to indicate that a compound is a stereoisomer, use the stereoisomer of (P3364) property and link to those WD items which are the stereosomers of the compound in question. Sebotic (talk) 18:21, 12 December 2017 (UTC)

CAS Registry Number (P231) and ChemIDplus

Why CAS Registry Number (P231) links to ChemIDplus? I don't think there is any official relation between CAS and this NLM database. Wostr (talk) 19:59, 11 December 2017 (UTC)

@Wostr: Because the CAS Registry database is a restricted access database. So it is not possible to link to that database. ChemIDplus is one of the largest chemical databases using CAS number as identifier. Snipre (talk) 13:49, 12 December 2017 (UTC)
@Snipre:, I know that CAS is not freely available, but why we have to link to anything in that case? Why don't we link to eg. Magnus Cas tool or something. It's not that I have some negative opinion about CID, I use it almost every day in fact. But I was suprised when I saw link to CID from CAS number, especially that these two databases are unrelated to each other. Wostr (talk) 18:19, 12 December 2017 (UTC)
While I agree that linking to ChemIDplus is not optimal, I also agree with User:Snipre that it's currently the best solution. Linking to PubChem or ChemSpider currently would not work techically, although the databases have more CAS numbers. Sebotic (talk) 18:28, 12 December 2017 (UTC)

Difference 4

What law/rule says about this 4-difference:s2->p6->d10->f14->g18->h22->i26->? --Fractaler (talk) 13:51, 14 December 2017 (UTC)

Next

Binary compound (binary compound (Q66392)?) -> Ternary phase or ternary compound (Q7702932)? --Fractaler (talk) 18:03, 17 December 2017 (UTC)

A specific chemical element (Q11344) is a class?

I think I don't understand how a chemical element (Q11344) is a class (Q5127848) (actually, a first-order metaclass (Q24017414) to the same effect).

This would imply that carbon (Q623) is a abstract "class" of objects, and not an object by itself. Only specific appearances like graphite (Q5309) (an allotropy (Q81915)) and carbon-13 (Q1770822) (an isotope (Q25276)) qualify as carbon objects. However. Instead of declaring these two the same wrt chemical element by definition (same atomic number), they are declared being different carbons by a non-defining property (allotrope difference, mass number). That would be like: making a single car type a class because its members (cars) have different colors. -DePiep (talk) 10:59, 5 December 2017 (UTC)

chemical element: "all atoms with the same number of protons in the atomic nucleus". So, "all atoms" is a gravitational object (participates in gravitational interaction), but not a non-gravitational object (does not participate in gravitational interaction). --Fractaler (talk) 11:04, 5 December 2017 (UTC)
The Gold book link quote starts with: "A species of atoms; all ... " and has a #2. BTW doesn't "all atoms" (mistakenly) imply that only the whole universe collection of carbon atoms is carbon?
So when pointing to a lump of such black stuff, one can not say: "this is carbon", but "these are carbons"? -DePiep (talk) 11:51, 5 December 2017 (UTC)
"A species of atoms = all atoms with the same number of protons in the atomic nucleus". --Fractaler (talk) 11:54, 5 December 2017 (UTC)
I add: when my pencil drops, I do think the carbon in it is does interact by gravitation, and I know it is a mix of isotopes. -DePiep (talk) 11:57, 5 December 2017 (UTC)
@DePiep: The IUPAC definition is mixing 2 concepts and this leads to the current definition in WD based on subclass: is a chemical element the sum of all atoms with a particular proton number or a substance ? According to the first definition chemical element is a subclass, according to the second it is a instance. The second problem is how do we link an isotope to the corresponding chemical element item ? If I said that deuterium is an instance of hydrogen, then hydrogen has to be a subclass. We miss a general concept including all relations between chemical elements and isotopes. Snipre (talk) 13:43, 5 December 2017 (UTC)
The IUPAC definition is not mixing 2 concepts. The IUPAC definition 1: "all atoms ...". The IUPAC definition 2: "A pure chemical substance composed of atoms". There is no such IUPAC definition as "all atoms and a pure chemical substance composed of atoms". So, "chemical element" (by IUPAC) is homonym (Q160843) Fractaler (talk) 14:09, 5 December 2017 (UTC)
We have hydrogen (Q556) (all hydrogenous atoms with the same number of protons in the atomic nucleus) and hydrogen atom (Q6643508) (one of all hydrogenous atoms with the same number of protons in the atomic nucleus). The same with carbon/carbon atoms. --Fractaler (talk) 12:03, 5 December 2017 (UTC)
@DePiep: none of the things you list qualify as "carbon objects". Hope Diamond (Q640037) is a "carbon object", but even then it's not an "instance of" "carbon", it's an instance of a particular kind of collection of carbon atoms (with some impurities). I think the only thing that could truly be an instance of carbon is a particular identified atom within such a physically identified object - and even then I'm not sure quantum mechanics allows you to identify specific atoms that way! However, thinking of classes as sets or groups, then it makes a lot of sense for carbon (Q623) to subclass atom (Q9121), as carbon describes a subset of the things that are included in the set of all atoms. Similarly carbon-13 (Q1770822) is clearly a subclass of carbon (Q623). Other relationships probably require some more thought and diecussion. I would view "hydrogen" and the "hydrogen atom" as essentially equivalent, but we have two separate items due to enwiki splitting off some of the properties of hydrogen to the hydrogen atom topic. I don't think there's any need for wikidata to duplicate every element with an "xxx atom" item. ArthurPSmith (talk) 15:46, 5 December 2017 (UTC)
Oh - and yes, car (Q1420) is a class, with subclasses specific models. The models are instances of automobile model (Q3231690) (a metaclass similar to chemical element (Q11344)). The only thing that should be an instance of (any subclass of) car (Q1420) is a specific automobile like Mr. T's custom 1982 GMC minivan (Q41074175). However, of course wikidata is not yet particularly consistent in that area either. But in principle this should be clear. ArthurPSmith (talk) 16:04, 5 December 2017 (UTC)
@ArthurPSmith:: In the automobile-tree example there is a level missing: the explicit type minivan (Q223189). Then, when going parallel to chemical elements, the "type" carbon is a well-defined real life object (the physical substance), and so should not need to be diffused over non-defining and non-altering subtypes (isotopes). -DePiep (talk) 19:07, 5 December 2017 (UTC)
@DePiep: It's not missing at all. Mr. T's custom 1982 GMC minivan (Q41074175) instance of (P31) minivan (Q223189) subclass of (P279)car (Q1420), those are the current relationships in wikidata and there is no ambiguity there. I have no idea what you are referring to regarding "diffused over non-defining and non-altering subtypes" - the existence of a subclass does not "diffuse" the superclass, it just applies a filter in a sense. ArthurPSmith (talk) 22:16, 5 December 2017 (UTC)
@ArthurPSmith: It was missing to make the parallel with elements complete: minivan (Q223189) as a subclass like carbon (Q623) is a subclass. This is the essence of my question. By "diffused" I mean: allowing different objects into a class or, conversely, creating a subclass to accomodate various objects. The various objectes here (atoms, that is isotopes, of carbon) are a "diffusion" of carbon (while the diffusing criteria are not relevant: all copper isotopes are copper is my point). -DePiep (talk) 07:44, 6 December 2017 (UTC)
I'm really not sure what you're getting at. "minivan" is like "carbon-13" in our analogy, if "automobile" is like "carbon". "A subclass of B" means that everything which is an instance of A is also an instance of B. So all A's are also B's. All carbon-13's (and -12's and -14's etc) are carbon. Can you perhaps describe a concrete problem that you are concerned about here? ArthurPSmith (talk) 16:33, 6 December 2017 (UTC)
(ec) Maybe substance (Q27166344) (or base material (Q214609)) is the word we need here (although its WD definition seems tailored to exclude elements?, strange). Carbon is a substance, and as such not abstract or class. Also, the substance is the same as the element. What WD does, is introducing differences that are not part of the definition, then requiring a class abstaction to bring them together again. While no, the isotopes are not relevant (not defining) for being an element. (Like defining a car to be a class of the red, blue, grey, black, ... cars because they do appear in these colors).
Another check may be: WD does treat carbon dioxide (Q1997) as a concrete object (not abstracted in to a "class" of CO2 molecules, while no doubt having different isotopes). With this, I see no reason to treat elements differently.
re: I'm not suggesting "carbon object" (I understand that to be: an object made out of carbon). I say that the object is carbon.
re: No we don need the item "carbon atom" to say "carbon" (I did not mention), neither do we need "class: carbon" to mean "substance carbon, or element carbon". -DePiep (talk) 16:20, 5 December 2017 (UTC)

about isotope (Q25276)

Also, somewhere in the orthography, I am missing the idea that "all atoms that ..." is equal to "isotopes of". So at least, carbon (Q623) should be defined: "all isotopes that have ...". -DePiep (talk) 19:07, 5 December 2017 (UTC)
It looks like you missed out on our discussion on all this earlier this year. See User:ArthurPSmith/Draft:Elements, Nuclides, Chemicals Ontology for my attempt to try to rationalize the relationships, and the Discussion for feedback from some of the users who have commented here. @TomT0m: was the one perhaps most clearly arguing for the "element = class of atoms" definition. The way "element" is used in English is a little fuzzy, and there is clearly some conflict with how the same concept is used in other languages, so we have been attempting to be as precise as possible. ArthurPSmith (talk) 22:21, 5 December 2017 (UTC)
Also - as to CO2 not being a class in wikidata - it can be in principle, we just haven't had a need for it. heavy water (Q155890) is a subclass of water (Q283); every chemical compound can be similarly partitioned (just as with the elements) by isotopic composition or by other physical attributes of the molecules that may vary from one molecule to another, so those would be subclasses. The subclasses don't "define" the superclass, other than to the degree that they more precisely constrain what the concepts involved mean where there may otherwise be some ambiguity. ArthurPSmith (talk) 20:34, 6 December 2017 (UTC)
Chemical element (1): "all atoms with the same number of protons in the atomic nucleus". Isotopes of a given element have the same number of protons in each atom. So, chemical element: all isotopes with a certain number of protons in the atomic nucleus. For example, by "six protons", label=carbon (Q623) (isotopes) = ...+carbon-12 (Q1058364)+carbon-13 (Q1770822)+carbon-14 (Q840660).... "all isotopes" is not a abstract object; "all cars of model X" is "cars", and "cars" is not a abstract object; etc. --Fractaler (talk) 13:31, 6 December 2017 (UTC)
  • Whatever, restart. Since an "element X" is a class (OK for sake of argument), then why create a subclass "isotopes of element X" as a subclass? Why? EVERY atom that is an element X atom, is an isotope of X. Why the subclass? -DePiep (talk) 00:02, 20 December 2017 (UTC)
Good point. the current structure based on "Isotopes of X" is a Wikipedia structure and doesn't fit our instance/subclass classification.
So we should have
Snipre (talk) 15:46, 20 December 2017 (UTC)
@DePiep: I agree with Snipre, the "isotopes of X" metaclasses are unnecessary as far as defining the ontology here goes. But note that they are defined as subclasses of isotope, not as subclasses of atom, so they are at a different meta-level; you refer to them "as a subclass" but they are a different kind of class than the individual isotopes or elements. And note for example isotope of caesium (Q24638) has 17 sitelinks so we can't just get rid of it (in other words these metaclasses are a significant organizational grouping for the language wikipedia's). ArthurPSmith (talk) 16:17, 20 December 2017 (UTC)
@ArthurPSmith: Couldn't we exclude all items "isotopes of X" from the WD classification by defining "isotopes of X" as instance of Wikipedia article Wikimedia article page (Q15138389) ? Here we reach the conflict of WD as base for interwiki links and at the same time ontology. As Wikipedia articles don't follow any rules or classification scheme, it is impossible to create a correct classification if we have to include all WP articles in the WD ontology. My proposition is to create then a class of items for all WP articles which don't fit the WD ontology to allow interwiki links but to don't use them in the classification/ontology of WD. Snipre (talk) 16:34, 20 December 2017 (UTC)
Typical example : WikiProject Chemistry (Q8487234). Nobody will use that item in any classification. Snipre (talk) 16:50, 20 December 2017 (UTC)
@Snipre: We could do that, but I don't know that it would do much good and might be harmful. It is useful to have some sort of link from a particular isotope to the wikipedia article that describes it, but what that link should be I'm not sure. I don't think the current structure actually does any harm though - C13 is an instance of "isotopes of carbon" which is a subclass of "isotope", so by the regular transitivity that means C13 is also an instance of "isotope". So your ontology is represented (though indirectly) in the current state of things. Also we have some code that depends on the current structure (the wikidata chart of the nuclides app) so that would need to be modified significantly if this change was made. ArthurPSmith (talk) 16:52, 20 December 2017 (UTC)
(Side note only: thank you for engaging. I have problems with talking & understanding the Wikidata "class" orthography, clearly. Is my original post here. Also, I like the mean tough "questioning" that solves most of it). - DePiep (talk) 23:01, 20 December 2017 (UTC)
@ArthurPSmith: Everything shows that the calssificationbasedon instance/subclass is quite difficult socomplexifyingthe structure by adding unecessary levels will just increase misusesof these 2 relations. We have tokeep the things simple, just remember that WD shouldbe used by machines meaning that the data extraction should be as simple as possible.
And be affraid of doing modification because someone is using the data is against wiki principle: if we have good reasons to do the modification, then we just find the appropriate way to announce that modification. Projects based on wiki are by definition changing all the time.
In our case I completely understand that we have to be sure of our data model before doing any modifications so even if we find an agreement about the case of "isotope of X" I thingwe should continue the discussion about chemical ontology in order to propose a global view and then once we all agree, we can start the modification. Snipre (talk) 09:11, 21 December 2017 (UTC)

Examples of "is-taxonomy (Q7211)"

1) caesium-112 (Q2342719) (112 caesium atom) is isotope of caesium (Q24638) (isotope of caesium, one of the cesium atoms, "cesium isotopic atom"). isotope of caesium (Q24638) is an atomic component of "isotopic group of atoms" or caesium (Q1108) (all isotopes of element with the atomic number of 55, i.e caesium-112 (Q2342719)+caesium-113 (Q2114606)+...+¹⁵¹Cs; consist of isotope of caesium (Q24638)). --Fractaler (talk) 13:21, 21 December 2017 (UTC)

Density instead of specific gravity from the CDC database

I think some mistake has been made (in 2015). I saw this edition and checked the source – CDC has only specific gravity (Specific gravity at 68 °F (unless a different temperature is noted) referenced to water at 39.2 °F (4 °C)). And yes, I know that in nearly every situation the assumption that sp.grav. = density is true. But IMHO we should not make such assumptions in WD and sources that has specific gravity only should not be used to support density values. And that's not the problem in only one item (eg. [10]). @Emily Temple-Wood (NIOSH): – I'm pinging you, because maybe it was discussed before? Wostr (talk) 19:05, 25 December 2017 (UTC)

Fats and fatty acids

I wonder how fatty acids would be structured with instance of (P31) and subclass of (P279). I started to make edits like this where I replaced chemical compound (Q11173) with long chain fatty acid (Q24067397) due it was a more precisely description of what it was an instance of, and due to long chain fatty acid (Q24067397) is a subclass of chemical compound (Q11173) I thought it was not needed. Then user:Wostr made this reverts where chemical compound (Q11173) is added and long chain fatty acid (Q24067397) is set to a subclass not an instance.

I wonder now, how will this be done? I think that chain length should be an instance due I would call it a property an the property talk:279 reeds "If [item A] has this property with value [item B], [item B] is required to have property subclass of (P279)".

The next thing i added was subclass of (P279) saturated fat (Q970537). And then as user:Wostr points out on my talk page it is not right in a chemistry way to use a fat type on a fatty acid, but in a nutritional way it's the same thing. Here I have no idea about how it will be done. Wish you can clarify this. (English is not my native language, so excuse for all problems in language) Kolurpen (talk) 22:02, 21 October 2017 (UTC)

I see a two questions. First, I understood that specific chemical compounds (like myristic acid (Q422658)) should be instances, and compound classes are, well classes. Therefore, I would guess it should be instance of (P31) long chain fatty acid (Q24067397), like you suggested. Regarding subclass of (P279) saturated fat (Q970537), I agree with user:Wostr: we should not mix biological roles/functions with chemical information. The concept "source of fat" is not the same as "is fat". On the other hand, if you wish to add "has part" "fat" to LCFA, that would make sense to me. --Egon Willighagen (talk) 09:06, 22 October 2017 (UTC)
There is IMHO a problem with instance of (P31) long chain fatty acid (Q24067397): as I understand, in such case instance of (P31) chemical compound (Q11173) should be deleted (because it has fatty acid (Q61476), then its carboxylic acid (Q134856) and so on to chemical compound (Q11173)), but (1) from the Wikiproject main page: "Add for each pure chemical substance (i.e. not mixtures or solutions) the property instance of (P31) with the value chemical compound (Q11173) (this property has to be added to each chemical in order to be able to retrieve all chemicals on wikidata through a simple query. Additional precisions will be performed using other properties)" and (2) without instance of (P31) chemical compound (Q11173) in every chemical compound there is no simple query to obtain all chemical compounds in WD (and f.e. in such situation I would have to withdraw any WD data import in pl.wiki chemistry-related infoboxes, as there would be no option to control this data [now it's possible through Listeria]). Wostr (talk) 13:30, 22 October 2017 (UTC)
This concern on the project page seems a bit out of date, as it's still a simple SPARQL query to do something like "wdt:P31/wdt:P279*" to get all instances of a class including all its subclasses. However, we have a very general issue for chemical compounds which we've discussed somewhat on these pages before - and it's even worse for things like genes and proteins: is the class "chemical compound" a metaclass whose instances are abstract idealizations of particular molecules, and thus are themselves classes whose instances are the physical instantiation of such molecules in the real world? Or is each physical manifestation of a particular ideal molecule also an instance of "chemical compound" (or a relevant subclass like fatty acids etc)? I think we can be consistent about this in wikidata, but up to now we have not been. ArthurPSmith (talk) 19:49, 23 October 2017 (UTC)
Are there any technical issue to have both instance of (P31) chemical compound (Q11173) and instance of (P31) long chain fatty acid (Q24067397), if not I think it is the best way to do it. Kolurpen (talk) 18:52, 25 October 2017 (UTC)
@Egon Willighagen, Kolurpen, ArthurPSmith, Wostr: The principle defined in the Wikiproject page was written at the very beginning of the project and the idea behind it was to be sure to retrieve all chemicals especially if the different subclasses were not correctly linked to chemical compound item. @ArthurPSmith: the SPARQL works only if the sublass item is linked to chemical compound item through the subclass relations. Are we sure of that ?
Second idea was to use the more simple classification in order to let us the time to define the way we want to classify the chemical compounds:
* do we want to put everything through instance of ? ex: myristic acid (Q422658): instance of acid (Q11158), carboxylic acid (Q134856), organic compound (Q174211), long chain fatty acid (Q24067397) ...
* do we want to use additional properties to describe structural parameters ?
The main objective is to develop a complet classification before starting to modify anything else in a large extend. And better as propose by Kolurpen to always add chemical compound (Q11173) beside any other "instance of" until we are sure that the classification tree is correctly organized.
Snipre (talk) 14:30, 5 December 2017 (UTC)
To answer your first listed question, I think it should be enough to be instance of (P31) the most specific class, as long as that class is subclass of (P279) chemical compound (Q11173). To answer your 'are we sure' question, I did some queries:
* number of instance of (P31) chemical compound (Q11173): 156987
* number of instance of (P31)/subclass of (P279)* chemical compound (Q11173): 735206 (mmm... I guess I need to look into that...)
--Egon Willighagen (talk) 14:12, 31 December 2017 (UTC)
OK, that second number is so much larger because it includes RNAs, like Sphinx (Q7576805). --Egon Willighagen (talk) 14:25, 31 December 2017 (UTC)

GHS labelling – incorrect and unuseable model

Globally Harmonised System has some properties for some time now (P728 (P728), P940 (P940), GHS signal word (P1033); pictograms will be probably added one day) but from the beginnig this model is IMHO incorrect, so the data will not be useful for anyone. As these properties pertain to labbeling only, not the classification, so I'll limit my comment only to GHS labelling.

Problem 1 – area of application

Although the GHS is theoretically international (United Nations) and should be identical in every jurisdiction, this is not entirely true in practice. I am familiar mainly with EU GHS introduced by the CLP Regulation and unfortunately I do not know much about OSHA standard in the United States, nor about standards in Canada, Australia & Oceania, Asia and South America regions. But some differences between EU GHS and US GHS can be easily seen in Safety Data Sheets of the same product from the same producer:

  • Sigma-Aldrich SDS for ethanol [11], Polish version (CLP) from 2015: pictrograms 02, 07; Danger; H225, H319; P210, P280, P305+P351+P338, P337+P313, P403+P235.
  • Sigma-Aldrich SDS for ethanol [12], US version (OSHA) from 2015: pictrograms 02, 07; Danger; H225, H319; P210, P233, P240, P241, P242, P243, P264, P280, P303+P361+P353, P305+P351+P338, P337+P313, P370+P378, P403+P235, P501.

In UE GHS P-phrases are usually limited to 5, but that's not true for OSHA. What's more, there are EUH-phrases in EU, which are absent in regulations of other countries (eg. EUH014 is placed in section 2.2 of the SDS along with H- and P-phrases, but in US SDS this may be as a Reacts violently with water in the section 2.3 Hazards not otherwise classified (HNOC) or not covered by GHS).

Problem 2 – labelling depends on the form of chemical

Fortunately, this problem is limited in WD, as we have (or rather we will have in the future) different WD elements for every stereoisomer, salt (e.g. hydrochloride, sulfate etc. for drugs), hydrate etc., so it will be a problem for Wikipedias to design infoboxes to import WD data from different element. But apart from that, this problem still exist in WD: some chemicals have different labelling depending on their physical form (e.g. compressed/liquefied gas vs not pressurised, stabilised vs nonstabilised powder, powder versus crystals/granules etc.

Also there is a problem with substances that exists only as a solution with water or are sold only in that form. That is the case of some inorganic acids, where labelling depends on the concentration of the acid. Labelling without information about the form of chemical is (in such cases) incomplete and therefore incorrect.

Problem 3 – the date is important

The first reason is that, the law is not permanent. The CLP Regulation has several amendments so far (mainly in the form of ATPs, Adaptation to Technical Progress), that changed 'harmonised classification' (more on this later), phrases etc. What's more, some chemicals cannot be fully assessed at given time – for example, toxicological data is not fully investigated yet or carcinogenic/mutagenic/teratogenic properties are not known, but later studies may confirm that. The point is that, the labelling from 2012 may not be up-to-date (correct) in 2022.

Problem 4 – some phrases has to be specified

The example is H373: May cause damage to organs (state all organs affected, if known) through prolonged or repeated exposure (state route of exposure if it is conclusively proven that no other routes of exposure cause the hazard). Organs and route of exposure is sometimes specified in SDS, but these two are not the only things that sometimes has to be specified in certain phrases.

Problem 5 – 'official' and 'unofficial' labelling (EU)

In the EU the labelling comes from different sources: (1) 'harmonised labelling' is the labelling from Annex VI to the CLP Regulation, (2) 'notified labelling' is the labelling of substances not included in CLP and it comes from the manufacturers. The first can be described as 'official' and is available for a number of chemicals, but in certain situations manufacturers can modify this labelling. The second is a result of many tests that the manufacturer carries out to assess the hazards posed by the substance.

The problem with 'harmonised labelling' is that, there are no P-phrases assigned (this is always done by the manufacturer). In pl.wiki (and I think in de.wiki too) we deal with that by finding SDS with 'harmonised labelling' and we add P-phrases from this SDS. However, sometimes there are no SDS that uses 'harmonised labelling' but only its adjusted version (including additional hazards).

Sometimes there is also 'harmonised labelling' for a specific group of compounds; for example: arsenic compounds, with the exception of those specified elsewhere in this Annex.

Problem 6 – GHS labelling stored separately

There is only one 'harmonised labelling' for a given substance (except for the situations described in Problem 2), but there may be dozens or even hundreds of labelling that comes from manufacturers (only in the EU). Each set of GHS label elements (pictogram, signal word, H-, EUH-, P-phrases) has to be given together, because taking GHS label elements from various sources can make the whole labelling incorrect. In present situation, where H-phrases are all in one property, all P-phrases in another etc. I really don't know how it will be possible to distinguish H-phrases from source A and phrases from source B (especially that some phrases can be in both sources, and some in only one).

Questions and proposed solutions for some of presented problems
Problem 1: every GHS information has to be added with qualifier indicating jurisdiction OR we have to limit GHS properties to only one jurisdiction and have different sets of properties for every jurisdiction.
Problem 2: we have to add qualifier for concentration/form and discuss e.g. how to add labelling to diluted acid etc.
Problem 3: I don't know whether this should be solved by a qualifier or by the access date of the source.
Problem 4: this could be specified by using qualifier in the present situation, but it's not possible in the proposed model (below).
Problem 5: should we distinguish 'harmonised labelling' from 'notified labelling' (labelling from other sources than ECHA)? Should we take P-phrases from other source to complete 'harmonised labelling' (and how to indicate that P-phrases comes from source A and the rest of the labelling form source B)? Should we indicate that labelling is not for given substance, but for group of substances (e.g. arsenic compounds)?
Problem 6: all GHS labelling elements should be qualifiers; all GHS pictograms and phrases has to be different WD elements; there should be a discussion how to create these elements, especially for GHS phrases (should the phrase code be a WD label and phrase a WD description or should we have a property for GHS phrases (multilingual?) to be used in WD elements for phrases).
Proposed model

It's similar to the way in which we add critical/triple point. Problem 4 cannot be fixed in this model (and as a matter of fact, I didn't know how to implement this in pl.wiki too).

Problem 5 is also not included in this model; the distinguish between 'harmonised'/'notified' labelling can be done in two ways I think: (1) qualifier with 'harmonised labelling' Q and 'manufacturer labelling' Q etc. or (2) qualifier like issued by/resulting from and possible elements like CLP Regulation or the manufacturer's name (Sigma-Aldrich, Alfa Aesar, Thermo Fisher Scientific etc.). The problem of indicating that P-phrases were added from different source can be fixed in the (2) option by adding both CLP Regulation and manufacturer's name (and two sources of course).

Also there is a problem with the solution to problem 2 ;) concentration in CLP is usually given like ≥ 55% and I think that is not possible to indicate something like that in the qualifier. Am I right?

What's more, I think that also NFPA 704 and ADR transport labelling should be model in the similar way to GHS. I'm not an expert in this field and I'm not familiar with the US NFPA standard nor the international transport labelling, but AFAIK in both labelling systems problem 2 and problem 6 can occur; also problem 1 and problem 2 may apply to NFPA 704. This can be done using the same property dangerous substance labelling (P...).

There were at least two discussions in this project related to GHS – (1), (2) – both in 2016. Wostr (talk) 00:59, 16 December 2017 (UTC)

I agree that the current system is not the best one but most of the defined problems are not real if we respect 2 rules:
And if we want to add data about a defined solution, then we have to follow the current rule defining that each concept requires a dedicated item. No date in the qualifier section is necessary if we follow the guidelines of help:sources which requires for each source statement a date or at least a document where a publication date is defined. And as for all physical data, it is possible to add more qualifiers to describe which state/form is concerned by the statement.
But I support the creation of new property and the grouping of SGH and NFPA 704 ans perhaps ADR data in one statement to allow a simplified data extraction without a complex query based on source data filtration. I prefer something like "safety classification" instead of "dangerous substance labelling". SGH is more than labelling. Snipre (talk) 22:36, 16 December 2017 (UTC)
@Snipre:, I did not write that HCl(aq) labelling should be included in HCl(g) item. But I understand that you want to have different items for various concentrations of the same acid? How is it better from one item with dedicated qualifier? Also, I don't think that filtering you described would be easy to implement in Wikipedia infobox (if even possible). Safety classification would be wrong from my point of view; of course GHS is more than labelling, but we (in WD and Wikipedias) are adding labelling only; it is possible however to add new properties about classes, then it would be okay; so I think Safety labelling or Safety classification and labelling (if ADR will be included, as we have UN classes, not pictograms). Wostr (talk) 23:38, 16 December 2017 (UTC)
@Wostr: We decided to use 2 different items for pure compounds and for the corresponding solution, for which reasons ? Because the characterics of the 2 substances were too different. Same reasonning can be applied for concentrated and diluted solutions. Just thing about the nighmare to extract the thermal conductivity of a solution at different temperatures if all thermal conductivities of all solutions at all temperatures are available in the same item. Same for melting point or boiling point where you can have 10, 20 or 40 values depending on the concentration.
Again if you accept to differentiate HCl(g) HCl(aq), what is the problem to distinguish HCl 10%(aq) and HCl 33%(aq) ?
Then for the labelling question of the new name of the property, SGH is more than labelling because we include not only the pictogramm but the H and P phrases. Just take a detergent and lok for HXXX: you will never find it. Labelling data is only a part of the SGH data. Then all systems are more than just pictogramms: they aim to identify hazards, and provide classification based on measurement. Same for ADR where the packaging group has nothing to do with labelling: there is more than just a simple label on the tanks. Labelling is not the main target but only the visible part: the main idea is identification and classification. Snipre (talk) 00:06, 17 December 2017 (UTC)
  • @Snipre:, okay, I think we just have different concepts on what labelling and what classification is. Labelling in our case is pictogram + phrases (H, EUH, P) and signal word; classification is (1) process of assessing hazards posed by the substance (2) GHS classes and H-phrases that comes from this assessment. Labelling (pictograms, phrases etc.) is a result of assigning specific class or classes.
And I think you're wrong with this detergent: you will find all the elements on the label: pictograms, signal word, H-phrases, EUH-phrases (if applicable) and P-phrases – but not their codes Hxxx/Pxxx, only their full text (you could find R/S codes on the DSD/DPD labels, but in GHS codes are not present on the label, as it has no meaning for the consumer; just like E numbers on food has no meaning for nearly everyone).
And what is important: H-phrases included in GHS classification can be different from H-phrases included in labelling – the example is this: from classification we get H400 and H410, but only H410 will be a part of the label (its Article 27 of the CLP Regulation about avoiding unnecessary duplication of hazard statements).
I think it is crucial to distinguish GHS classification (pictogram + H-phrases) from GHS labelling (pictogram, signal word, H-, EUH-, P-phrases). Such distinction is clearly visible in ECHA CLI database (there are some maintenance problems right now, so I cannot link to an example, but here's the picture). Also, GESTIS database gives only labelling and in SDS [13] there is also clear distinction.
So, you're right that GHS is more than labelling, but both in WD and Wikipedias we have only labelling data. Adding GHS classification is possible, but we would need another H-phrases property for this (just like I said, it's not 1:1 relationship when it comes to H-phrases).
  • About the acids: I'm just afriad that it will be too difficult to add GHS data to different items (as CLP gives labelling not for specific concentrations, but for a range, e.g. ≥55%); of course I can create items for 55%, 60% and so on, but from that comes my second concern: it will be not possible to use that data in Wikipedia (where I need information "≥55%" not labelling for each concentration).
And a word about: Just thing about the nighmare to extract the thermal conductivity of a solution at different temperatures if all thermal conductivities of all solutions at all temperatures are available in the same item. Same for melting point or boiling point where you can have 10, 20 or 40 values depending on the concentration — I think it's not so far from a situation of a pure substance where a property has a temperature and a pressure qualifiers (or maybe even 3 or more qualifiers) – it's just another qualifier for concentration. And from one item I can get both data for various temperatures and conc=const and data for various concentrations and temp=const. Of course it is possible that I am somewhat limited in this matter and I simply do not see or understand something. Also, I'm not for or against any of this options (one item vs different items for every concentration) — I do not know much about this whole onthology in WD, I just have some concerns resulting from a more practical approach. Wostr (talk) 01:17, 17 December 2017 (UTC)
@Wostr: Ok, go step by step: we can start with the creation of a new property "safety classification and labelling" but I prefer to avoid any distinction between classification dta and labelling data: labelling data is a reduced set of classification data. You always need to classify first and then you can define the label. And to define the labelling data from the classification, you use rule so creating two different properties will just lead to unecessary work and data duplication.
For the problem of solutions, We can still discuss and if you are right concerning the fact that we already have to filter to extract the data we want, I fear that adding all data about solutions in one item will just create a huge item which will be impossible to open and to use (just try to scroll down the item Germany (Q183). By already splitting the data into different items we will save potential problem later as the data amount will increase. Just think about WP articles which were splitted after several years due to information accumulation in one article. Snipre (talk) 22:40, 28 December 2017 (UTC)
I can't see what's in Germany (Q183), because my old laptop just froze everytime I try to scroll down... I suppose it's the problem of too many data in this item (so maybe you're right about splitting the data about solutions into many items). I'll try to add 'safety classification and labelling' proposal to Wikidata:Property proposal/Natural science in the next few days. Wostr (talk) 23:48, 28 December 2017 (UTC)
@Wostr: So you understand my concern when adding all solutions data in only one item, and by the way I created the proposal for 'safety classification and labelling'. Feel free to modify the proposal if you have some comment about it. Snipre (talk) 00:23, 29 December 2017 (UTC)

Hydrates

As part of checking pl.wiki infoboxes data against WD, I'll have to add some items about hydrates. But I didn't find any info about how to link items about anhydrous form and hydrate form (also, how to link different hydrates to each other). There are some items about hydrates, but none of them uses any reasonable relation: there is only instance of (P31) = hydrate (Q462174) and different from (P1889) = anhydrous form (also, IMHO description of different from (P1889) makes this property inappropriate in the case of hydrates). I didn't find anything in the WikiProject archive discussions. In ChEBI they use has part (P527)/part of (P361) and instance of (P31) = hydrate (Q462174). Would such relation be okay? Wostr (talk) 19:22, 29 December 2017 (UTC)

For hydrates P527/P361 makes sense to me. For other compounds that might have more complex relations there may be a way to make such relations work indirectly; we've also recently had the properties polymer of (P4600) and monomer of (P4599) that express rather specific chemical relations, so it may be sensible to propose specific properties in other cases also. ArthurPSmith (talk) 03:29, 30 December 2017 (UTC)
I added property proposal for 'hydrate of'/'anhydrous form of'. I think that having specific for hydrates may be advantageous. Wostr (talk) 21:35, 31 December 2017 (UTC)
Return to the project page "WikiProject Chemistry/Archive/2017".