Wikidata talk:WikiProject Chemistry/Archive/2013

Organization question

Is it a problem if we keep English as working language for this task force ? Snipre (talk) 11:27, 14 February 2013 (UTC)

I think it might be a problem if we don't. Thanks again for starting this. Looks very good so far.--Saehrimnir (talk) 15:50, 14 February 2013 (UTC)

Base for one Data Record

Hi all, to be honest, up until now I was not involved with WikiData at all and I am not planning to changes this very much. I just want to bring up one very basic but important question: On what basis should a data record in Wikidata be based?

To give you an example: de:Isoleucin - fr:Isoleucine - en:Isoleucine has in all language versions one article (and therefore just one WikiData interwiki data set). This article covers in all languages at least the two enantiomeric compounds: L-Isoleucine and D-Isoleucine. Both of them do have different CAS-Numbers, PubChem-Entries and physical properties like specific torsion angles and melting points. In deWP also the stereoisomeric compounds L-allo-Isoleucine and D-allo-Isoleucine are subject of the same article, in enWP in addition the two Isoleucines are part of the data table.

So what I want to point out is: To be very precise with the physical properties of stereoisomers, it will be not sufficient to have one data set per lemma, it must be one data set per stereoisomer (and, furthermore, if different isotopes are involved, one per isotope).--Mabschaaf (talk) 16:14, 14 February 2013 (UTC)

You are right about streoisomer: data will have to put implement according to the correct component. For isotope I don't think that for large molecules it is possible to detect a large difference in physical properties. Snipre (talk) 19:48, 14 February 2013 (UTC)
Isotope labelled compounds are rare in WP, but we should keep them in mind.
Far more important are compounds which usually are present as salts, like de:Ephedrin/fr:Ephedrine/en:Ephedrine, where some data relate to the hydrochloride, some to the sulphate and some to the hemihydrate. In deWP we try to catch this up by adding a proper description to each value in the box, enWP is not displaying any further details in the box, frWP takes only care about this fact at the CAS/EINECS-entries. In my optinion WikiData should have a data set for each of these different compounds. In other words: There should be a distinct data set for each isomer and each salt of each isomer. So one data set is clearly connected to a full substance name including stereochemical descriptors and counter ions/salts.--Mabschaaf (talk) 10:13, 15 February 2013 (UTC)

Policy

Proposition about general policy for data about chemicals and elements:

  • Data have to refer to the exact chemical/element defined by the item description.
    • For chemical the distinction has to be made between stereoisomers or mixture of stereoisomers i.e. a data of a specific stereoisomer can't be added as statement of the item describing a mixture of isomers
    • Same rule concerning salts wihch have to be separated from the neutral form of the component
    • If no item exists for the specific component please refer to the general policy of Wikidata to create the item
  • Data has to be referenced with the help of available structure. Referencing includes addition of conditions in which the data are measured according to available structure (qualifier(s)). The Chemistry task force defines mandatory references justifying the conservation of the statement.
Please comment this proposition. thanks Snipre (talk) 20:18, 20 February 2013 (UTC)

Classification

A classification has to be organised to describe the chemical. The first divison can be organic/inorganic. Then the question is to know haw we can classify the components: by functional group ? Does anyone know a classification for chemical coumpound ?

  • Organic chemical
    • Hydrocarbon
      • Alkane
      • Alkene
      • Alkyne
    • Carbonyl
      • Ketone
      • Aldehyde
      • ...
    • ...
  • Inorganic chemical
    • ...
  • ...

Snipre (talk) 21:16, 25 February 2013 (UTC)

Organic/Inorganic is really not very useful for classification. In deWP substances are classified by functional groups (as you proposed above) with chemical elements on the top level (Hydrocarbons are part of Hydrogen containing compounds and carbon containing compounds). Just take a look at de:Kategorie:Chemische Verbindung nach Element (should be easy to understand even for non-German speakers).--Mabschaaf (talk) 19:01, 1 March 2013 (UTC)
I am not clear what this discussion is about the "description field" or a Property "compound class" ? In the first case chemical compound should be sufficient, in the later I agree that we should classify by functional group.--Saehrimnir (talk) 19:47, 1 March 2013 (UT

Is it possible to come up with a classficitation that is close to the enwp category tree?

en:Category:Chemical compoounds

...

Mange01 (talk) 22:23, 1 March 2013 (UTC)

Have you read what I wrote? A discrimination between organic and inorganic is just historical, not systematical. Methane is inorganic (by definition!), Ethane organic? Seems not very logical.
My proposal would be: Come back to the roots! No complicate decision wheter a compound is organic/inorganic or aromatic/aliphatic. It should be very easy, even for not high sophisticated chemists. Start with the chemical formula: C containing compound, H containing compound, Na containing compound, etc. Maybe we could discuss to use this according to the order of the Hill-Notation (Top level: does the compound contain Carbon-Atoms, second level: does it contain Hydrogen, etc). This would make decision how to classify pretty straight. --Mabschaaf (talk) 10:21, 12 March 2013 (UTC)
So before entering the details we have to focus on the basics: I propose to fix the property "instance of" with value "chemical compound" for all pure chemical and property "instance of" with value "chemical substance" for mixture of pure chemicals. Snipre (talk) 22:49, 21 March 2013 (UTC)
It seems like unlike in German en:Chemical substance also applies only to pure chemicals at least the IUPAC has defined it so. So it would be better to have chemical compound and chemical mixture.--Saehrimnir (talk) 16:36, 23 March 2013 (UTC)

For the classification according to function groups we need 2 things: a property and a list of functional groups. For the property we can use again "instance of" (Property:P31), use another existing property or create a new property specific to chemical classification (like chemical family or chemical class). For the functional groups list we need to define that list in order to give contributors an easy way to classify themselves compounds. Snipre (talk) 02:09, 29 March 2013 (UTC)

summary

See Wikidata:Chemistry_task_force/Tools#Classification_trees Snipre (talk) 02:18, 29 March 2013 (UTC)

Sounds good.--Saehrimnir (talk) 15:04, 29 March 2013 (UTC)

Infoboxes

Can the Chembox infobox parameters be used as property names? What parameters should be prioritized? See en:Template:Chembox. -- Mange01 (talk) 22:24, 1 March 2013 (UTC)

Look at Wikidata:Chemistry task force/Properties: we already compare en, de, and fr chemox in order to extract the main properties. Snipre (talk) 07:52, 2 March 2013 (UTC)

How do we write chemical formula in wikidata db ?

please give your opinion there. 178.237.94.235 11:53, 16 March 2013 (UTC)

Classifying chemicals with 'instance of' (P31) is incorrect

A few weeks ago, a bot added instance of (P31) claims to items about chemicals. This is problematic, since those items are not about instances. As explained in Help:Basic membership properties, P31 only applies to subjects that represent single, concrete things. For example, a particular molecule of ethylamine in a container on a lab bench would be an instance. Of course, Wikidata is not concerned with any one particular instance of ethane; it is interested in the class of thing called ethylamine.

This could be corrected by replacing those 'instance of' claims with subclass of (P279) claims. This notion is supported not only by the straightforward logic above, but also by the fact that ChEBI, the largest database of small chemical compounds that uses Semantic Web properties, uses 'subclass of' and not 'instance of' to classify compounds like ethylamine. (If you're interested and your computer can handle opening a ~137 MB file in a browser, then you can see for yourself in http://www.berkeleybop.org/ontologies/obo-all/chebi/chebi.owl.)

This should be fixable by a routine bot request. What are others' thoughts? Emw (talk) 02:09, 10 April 2013 (UTC)

Instance of definitely is the wrong property here. We might say that the decay events of some radioactive elements are instances of radioactivity, but I agree subclass of is the better property in general.--Jasper Deng (talk) 02:13, 10 April 2013 (UTC)
You assume that a molecule of ethylamine is different from the general concept of ethylamine: it is not right because properties of an amount of ethylamine are not different from a molecule of ethylamine. Instead of proposing some modification better propose the definitions of subclass and instance applicable to all possibilities especially on countable and uncountable element. For me a subclass has to have the properties of a class, then a class can contain classes and instances. The question is now to know if a class which can contain only one type of instance is still a class. For me doing a difference between one molecule of ethylamine and the concept of ethylamine is just brain mess. And the comparison with the chEBI onthology is not correct because from what I know they have no concept below subclass. If you find a similar property to "instance of" in the chEBI onthology I will agree with you, if not wikidata and chEBI are not the same and comparaison cannot be always true. Snipre (talk) 11:39, 10 April 2013 (UTC)
@Emw The present definition of instance of is good for countable elements but for uncountable elements you are creating an arbitrary distinction between one unique element and several identical elements. And if you want to push the details until the end, if you look at the properties definition, the item ethylamine is defined by specific properties so no differences between one or several molecules. Look at the chemical formula and you will find C2H7N which is the atomic composition of ONE molecule of ethylamine and not C2nH7nNn which the atomic composition of n molecules of ethylamine. Snipre (talk) 12:17, 10 April 2013 (UTC)
The ChEBI ontology doesn't concern instances, it concerns only classes; thus one would not expect ChEBI to classify chemicals as instances. The source for the 'instance of' (P31) and 'subclass of' (P279) properties are rdf:type and rdfs:subClassOf, which are both W3C recommendations for the Semantic Web. If you look in the (huge) chebi.owl file, you'll see that ChEBI describes all chemicals with rdfs:subClassOf, which corresponds to 'subclass of' (P279). Given that both ChEBI and P31/P279 represent structured data using the same W3C recommendation, I think ChEBI's decision to use 'subclass of' instead of 'instance of' is relevant to Wikidata.
More importantly, though, the argument for using 'subclass of' instead of 'instance of' to classify chemicals is a straightforward appeal to the meaning of those two ontological terms. The distinction between them is explained in Help:Basic membership properties most closely for this case of chemicals by the example for 'quark'. It's type-token distinction, which is the basis for differentiating classes (types) and instances (tokens).
I don't see how I'm making an arbitrary distinction between countable items and uncountable items -- Wikipedia has already done that. The Wikipedia article on ethylamine is clearly not about an individual molecule of ethylamine -- that is, the article is not about a single ethylamine molecule with a unique location in space and time. If it were, then the article would be about ethylamine as an instance. But the article is clearly about ethylamine as a class.
You entertain the question of whether a class that contains only instances that are identical except for their location in space and time is still a class. The answer is "yes". That's because an instance is fundamentally a thing with a unique location in space and time. While all instances of ethylamine might be exact copies of each other, they all occupy a different space and time. These molecules are each instances of a kind of thing (i.e. a class) called 'ethylamine'. This distinction can admittedly be a bit of a brain mess for certain subjects like chemicals. However, once the idea of an instance as a "spatiotemporal particular" is clear, cases like this become much easier to think about. Emw (talk) 04:03, 11 April 2013 (UTC)
An item is an instance of a class if it can not be subdivided further without breaking its relation to the class. For example: USS Vincennes is an instance of Ticonderoga-class cruiser, but its is not an instance of Ship class, although Ticonderoga-class cruiser is an instance, not a subclass, of Ship class. Delta class submarine however is a subclass of Ship class since it subdivides into four different classes. The same principle applies to chemical substances; Ethanol is an instance, not a subclass, of alcohol. It is a subclass of molucule, but since each ethanol-molecule is indistinguishable from another, subdividing them is quite pointless. /Esquilo (talk) 08:29, 16 April 2013 (UTC)
  • Just a small addition to what was said above: it is possible to create item for each molecule of ethanol but then the properties of a molecule item will be the same as for the substance item: time eand place properties are not relevant because even if you label a molecule with a name and you it back into an large amount of other molecules, you can't find it again. Substance item is the lowest subdivision you can do in chemistry in term of identification. As the property "instance of" in the lowest classification level we have to match it with the lowest chemical subdivision. Snipre (talk) 08:47, 16 April 2013 (UTC)
  • Esquilo, have you read Help:Basic_membership_properties? Whether 'instance of' or 'subclass of' is applicable for a given Wikidata item is determined by whether that item is an instance or a class. An instance is a token and a class is a type; please see type-token distinction if the distinction between 'instance' and 'class' is unclear. If you're still not convinced that classifying ethanol and other chemical compounds with 'instance of' is incorrect, please see my more detailed reply at the Help:Basic_membership_properties talk page. Emw (talk) 01:40, 17 April 2013 (UTC)
Actually I have not (finding guidlines on Wikidata is more difficult than on other Wikimedia projects), but the examlpe of USS Nimitz and Nimitz-class aircraft carrier matches my description exactly. Is is simply applied inheritance and polymorphism of the same kind that is used in Object-oriented programming. The sentence from the talk-page "homo sapiens is an instance of species, individual homo sapiens are not" is an even better example. /Esquilo (talk) 08:39, 17 April 2013 (UTC)
+1 for the programming concept. The classification relies on properties not on conceptual distinctions: as position and time are not properties of element we can not use them in order to perform a possible distinction even if it si possible to do it. A classification relies only on what you have as properties in your classification even if other classifications can do the thing differently. If now you create an item for an individual molecule of ethanol, if we don't specify its position at a certain time by adding new properties there will be no difference with the properties set from the item ethanol so how do you differentiate a molecule from the concept ? In terme of wikidata classification you can't so the conceptual distinction between a molecule and its concept is wrong (again according to the classification used in wikidata right now). Snipre (talk) 10:28, 17 April 2013 (UTC)

Chemical formula

Hi. Alunite (Q338106) has and end member formula on rruff.info/ima/: KAl3(SO4)2(OH)6. I'm confortable with this, the formula is similar to my school time. De.wikipedia uses a different notation: KAl₃[(OH)₆|(SO₄)₂]. Is this ok? --Chris.urs-o (talk) 13:49, 15 May 2013 (UTC)

Normally there is a rule for chemical writing but right now I can't say for inorganic compounds. 141.6.11.15 12:10, 16 May 2013 (UTC)
Are things moving? Is there a controversy? Or a new consensus building up? --Chris.urs-o (talk) 14:13, 16 May 2013 (UTC)
Square brackets of "anion complex" for minerals is nice to have, but not really essential. The formatting is anyway so limited in the current "chemical formula string" that we might as well leave them away. Once more advanced math-typesetting-datatypes become available we can reintroduce this concept. (It is also not always straight forward what should go into the anion complex: http://wwwchem.uwimona.edu.jm/courses/inorgnom.html). --Tobias1984 (talk) 14:25, 16 May 2013 (UTC)
Thanks, so de.wikipedia is right according to IUPAC rules. 141.6.11.15 15:49, 17 May 2013 (UTC)
They follow mineralienatlas.de, but I think that rruff.info/ima/ is sometimes more up to date. --Chris.urs-o (talk) 15:25, 18 May 2013 (UTC)
I think we need to add a qualifier for the chemical formula to give the method used to write the formula.
Right now we have
  • Hill formula for organic component
  • complex rules for complex component
  • inorganic rules for salts and inorganic acids
If you know other rules please add them. Snipre (talk) 16:17, 18 May 2013 (UTC)

Just thinking...

...that you may be interested in this. --Ricordisamoa 05:41, 30 May 2013 (UTC)

Classification ... again

I am trying to match wikidata item for chemicals (around 4500) with their Pubchem ID in order to extract different data from the PubChem database. But I have some problem to define some chemical entities. To list the chemicals present in Wikidata I use instance of (P31) = chemical compound (Q11173) or a subclass of chemical compound (Q11173). By doing that I found some radicals or some mixture of chemicals, isomers mixtures or substance mixtures, defined as instance of (P31) = chemical compound (Q11173). So I propose to reserve the use of instance of (P31) = chemical compound (Q11173) for an unique molecule (no mixture of different compounds), for an unique isomer (no mixture of different isomers). Radical or ion are not considered as chemical compound (Q11173).

Chemical entity Example instance of (P31) subclass of (P279) Properties
Isomer mixture butanol (Q663902) - chemical compound (Q11173) ?
Simple isomer butan-1-ol (Q16391) chemical compound (Q11173)
butanol (Q663902)
- ?
Simple isotope dideuterium (Q6419441) ? ? ?
Radical methyl (Q4407) radical (Q185056) - ?
Anion carbonate (Q181699) anion (Q107968) - ?
Cation ammonium cation (Q190901) cation (Q326277) - ?
Allotrope diamond (Q5283) chemical compound (Q11173)
carbon
- ?

To solve the problem of allotrope and isotrope, we need to create an intermediate item between element/chemical coumpound and unique isotrope/allotrope:

Can I1 and I2 be the same item ? Snipre (talk) 11:39, 28 September 2013 (UTC)

Source definition

Please look at this proposition to source use of ATC code (P267). Snipre (talk) 18:56, 13 October 2013 (UTC)

Return to the project page "WikiProject Chemistry/Archive/2013".