Wikidata talk:WikiProject Chemistry

Active discussions
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015, Archive 2016, Archive 2017, Archive 2018, Archive 2019.

A lot of duplicate dataEdit

Since several weeks a lot of duplicated data were generated. I don't want to blame anyone, I just want to remind that a check if necessary after a merge of the addition of data.

See constraint violation reports for

Most of those problems are corrected after some days, but please have a look. Snipre (talk) 14:30, 11 November 2019 (UTC)

Mmmm... a lot of new chemical entries with very minimal information and indeed many duplicate CAS registry numbers. Not so happy about this either. It has been brought up, but it's not clear what the situation of resolving the problems is. --Egon Willighagen (talk) 15:02, 22 November 2019 (UTC)
The current situation is that we have a lot of duplicates and we have to merge then manually. The format of CAS numbers in these new items have been corrected, so some items can be quickly merged, but because some chemical compounds may have more than one CAS number, there may be items that are in fact duplicates, but won't show on any constraint violations list and it will be problematic to find those duplicates. Wostr (talk) 16:00, 22 November 2019 (UTC)
Note the conflict reports are somewhat behind. Also I went through all InChi key duplicates and had to leave those pairs that were tautomers (I marked them), because InChi keys for tautomers apparently can be (are?) identical. The actual numbers from fresh queries are:
  • InChi: distinct 18 (report 5+1), single 26 (report 33)
  • CAS: distinct 400 (report 536), single 87 (report 91+8)
  • InChi key: distinct 27 (report 28+2), single 26 (report 32)
With this query I count 13 tautomer pairs that have identical InChi keys, so I'll go through the others again:
SELECT DISTINCT ?item1 ?item1Label ?item2 ?item2Label ?value 
{
	?item1 wdt:P235 ?value .
	?item2 wdt:P235 ?value .
       ?item1 wdt:P6185 ?item2 .
	FILTER( ?item1 != ?item2 && STR( ?item1 ) < STR( ?item2 ) ) .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}
--SCIdude (talk) 17:08, 23 November 2019 (UTC)
Standard InChI/InChIKey is identical for tautomers, but InChI software can produce a non-standard versions of InChI/InChIKeys. However, I don't know of any software that can easily generate non-standard InChI – if we have one, we could change single value constraint (Q19474404) in InChI (P234) to single-best-value constraint (Q52060874) or better update it with separator (P4155) so as to we could have both InChI in one item with a qualifier that describes if it's a standard or non-standard InChI. Wostr (talk) 18:21, 23 November 2019 (UTC)
The Chemistry Development Kit (Q2383032) can do this. I can make a script for this. --Egon Willighagen (talk) 20:28, 23 November 2019 (UTC)
How it could work, i.e. how we could generate non-stanard InChI/InChIKeys with it? (I'm not very good at technical things; is it w software that anyone can run?) Wostr (talk) 15:49, 30 November 2019 (UTC)

Tautomer/zwitterionEdit

@Wostr, Egon Willighagen, SCIdude: By creating dedicated items for different tautomers or zwitterionic forms, and adding all identifiers to all tautomer/zwitterion forms, we are generating contraint violations for most identifiers. How can we handle that problem ?

Some solutions:

  1. Put all constraint violations related to tautomers and zwitterion in the exception list
  2. Between the different forms and according to a defined set of criteria, choose one form which will because the chemical compound and the other forms will be defined as instance of tautomer/zwitterion. All undefined identifers will be linked to the chemical compound item, with all general properties.

The second one is the best according to my opinion because we avoid to work with 2 items at the same time: most of the time we only have data for undefined tautomer/zwitterion form. Snipre (talk) 15:01, 30 November 2019 (UTC)

We are not the only ones having separate entries, ChEBI has too, so with solution 2 you need to decide which ChEBI id to link or get constraint violations with two ids. You could also remove some constraints as a different solution. --SCIdude (talk) 15:20, 30 November 2019 (UTC)
While I am not arguing against the concerns, I have mixed feelings about not allowing tautomers and zwitterions. Particularly tautomers have different physchem properties, and even zwitterions can be linked to experimental data (e.g. crystal structures). I also do not currently have a good suggestion. One issue is that tautomers are ill defined, and particularly in the context of the InChI(Key), where the algorithms has it limitations. --Egon Willighagen (talk) 15:32, 30 November 2019 (UTC)
@Egon Willighagen: Nobody was proposing to ban the creation of item for tautomer or zwitterion: the discussion is to find a good way to integrate those particular cases in WD. Snipre (talk) 16:29, 30 November 2019 (UTC)
  • Which ids cause constraint violations? I know that InChI/InChIKey does, but that problem requires finding a way to generate non-standard InChI/InChIKey. The standard InChIs/InChIkeys should be present in both items – neither InChI nor InChIKey is 100% unique for chemicals. For zwitterions: instance of (P31) zwitterion (Q245115)/ subclass of (P279) zwitterion (Q245115) should be always present and for zwitterions you can tell if an ids refers to the neutral/zwitterionic form by SMILES/systematic name for example. For carbohydrates (chain/ring structure): the same, InChIs are different, SMILES are different, even systematic names are different. For compounds with mobile-H: usually the same.
    The real problem with tautomers is the InChI/InChIKey, but that's not only our problem, it's the problem of the standard configuration of InChI software and it's a known issue that is solvable by generating non-standard InChI/InChIKey. Then we only have to decide what to do with two InChI values in one item (deprecate StdInChI, prefer NonStdInChI etc.). Wostr (talk) 15:45, 30 November 2019 (UTC)

@Wostr, Egon Willighagen, SCIdude: I will try another approach:

Zwitterion case:

Always consider the neutral form as the chemical compound form. A second item for the zwitterionic form can be created with the following properties

Neutral form Zwitterion
instance of: chemical compound instance of: zwitterion
All IDs and properties for the neutral form, for mixtures of neutral form and zwitterion form or undefined form (Sdt InChI and InChIKey) IDs and properties only for the zwitterion form (non-standard InChI and InChIKey)

Tautomer case:

The most stable form or the form which is present in excess is defined as the form A in standard conditions. The other form, is defined as Form B.

Form A Form B
instance of: chemical compound instance of: tautomer
All IDs and properties for form A, for mixtures of A and B forms or undefined form (Sdt InChI and InChIKey) IDs and properties only for the form B (non-standard InChI and InChIKey)

Snipre (talk) 16:42, 30 November 2019 (UTC)

No objection from me. Implementation of the zwitterion case can be automated if the compounds are in ChEBI (ChEBI explicitly names zwitterions). Additionally, a metaclass "class or group of zwitterions" may be needed, ChEBI has a hierarchy for them. --SCIdude (talk) 17:05, 30 November 2019 (UTC)
I can't agree to everything above. StdInChI is valid for both forms (neutral and zwitterionic) and we should find a way to model this properly, Non-standard InChI is an addition that may help in distinguishing the forms, but is not a substitute. instance of (P31) tautomer (Q334640) for only one tautomer is also wrong; both are tautomers in the same way of each other; also, as tautomer of (P6185) is present, I don't think we need to explicitly classify compounds as tautomers (similarly, we don't classify compounds as stereoisomers); both should be classified according to its structure etc. I can agree to that part 'all IDs and properties for form A, for mixtures of A and B forms or undefined form' with an exception for cases (if there would be any such cases) when ID clearly distinguish form A/form B/mixture of A and B/undefined form. Also, there may be situations when we should keep an ID with a deprecated rank in one item and have it in a second item with a normal rank. 'Additionally, a metaclass "class or group of zwitterions" may be needed' is not needed — zwitterionic form has a charge of 0, so I don't think we need to classify them in a different way as chemical compounds (only instance of (P31) zwitterion (Q245115)/ subclass of (P279) zwitterion (Q245115)). Wostr (talk) 20:29, 30 November 2019 (UTC)
@Wostr: The problem of the StdInChI is applicable to most identifiers: so why do we have to treat StdInChI in a particula way ? We have to find a solution for all identifiers.
Then can both tautomers be a chemical compound or will tautomer be a subclass of chemical compound ? This more critical in term of ontology.
In anyway, we can't treat both tautomers in the same way, or we will have to create a third item which will be tautomer undefined. Snipre (talk) 11:46, 1 December 2019 (UTC)
tautomers in the same way – in regards to classification; classifying only one tautomer as tautomer is not correct, classifying both seems redundant to me (these items already have tautomer of (P6185)). I asked, which IDs are causing problems similar to InChI/InChIKey? Because I think most of the problems can be solved only by checking the data in the source: we have DTXSID50274234 in pyridine-3,4-diol (Q74411505) and 3-hydroxypyridin-4(1H)-one (Q27891533), but the source clearly states the IUPAC name, has structure shown, has SMILES. If we have a real problem in which the source has e.g. IUPAC names for both tautomers, SMILES for both etc., we can either move the IDs to the prevalent form, or (IMHO better option) deprecate the IDs in the less common form with proper reason for deprecation (P2241). Wostr (talk) 15:01, 1 December 2019 (UTC)
@Wostr: This is perhaps not correct in an ideal classification but we need a pragmatic solution. So please provide a complete solution to my question regarding how do you plan to link the tautomers to higher classes ? Do you plan to define both tautomer as instanc of chemical compound or any subclass of chemical compound ? This is not correct because both tautomers are not different chemical compounds.
And following your proposition for IDs, this means we will have for the same chemical a splitting of the IDs between 2 items, this reducing the capacity of connections of external databases through an unique WD item, especially when external databases are not defining different ID for tautomers. Snipre (talk) 14:43, 13 December 2019 (UTC)
@Snipre: I though I answered this, but apparently it has not been saved. I don't think we need any special solution regarding tautomers in regards to their classification, any tautomer should be classified according to the structure and/or other qualities. E.g. carbohydrates have 'group of isomers' items, then can be linked to carbohydrates (there could be also link to specific classes of heterocyclic compounds for closed ring forms and aldehydes/ketones for open chain forms etc.). In Wikipedias there was always problem with categories for compounds having different tautomeric forms — which category should be assigned. Here we can assign different classes for different tautomeric forms. This is not correct because both tautomers are not different chemical compounds — this is not so obvious, tautomers are defined simply as 'isomers' with one specific feature that are 'readily interconvertible'. this means we will have for the same chemical a splitting of the IDs between 2 items, this reducing the capacity of connections of external databases through an unique WD item – we already have this in items for which an external, reliable source incorrectly gave an ID which is correct for other chemical compound (such statement is deprecated in WD, but still an ID exists in two items). This is not something that should occur frequently, but is unavoidable. We just have to limit this to cases where it is necessary and mark such statements clearly (qualifier, rank). Wostr (talk) 17:42, 5 January 2020 (UTC)
@Wostr:
  • This is not correct because both tautomers are not different chemical compounds — this is not so obvious, tautomers are defined simply as 'isomers' with one specific feature that are 'readily interconvertible'.
This is not correct if you consider the fact that chemical compound is a subclass of chemical substance and if you consider the definition of chemical substance: "Matter of constant composition best characterized by the entities (molecules, formula units, atoms) it is composed of. Physical properties such as density, refractive index, electric conductivity, melting point etc. characterize the chemical substance."
Using the inheritance property of subclass relation, chemical compound should have defined physical properties. Isolated tautomers don't exist but in most cases, the equilibrium beteween tautomers favors one form. Based on that reasoning I continue to say that one form is a chemical compound, the most thermodynamically stable one, because properties measured are mainly resulting of that form, and the second form should only defined as tautomer because it is a kind of hypothetical chemical compound (exists, but not isolable). As simple rule, for keto-enol tautomers, we should define keto tautomers as chemical compound and enol as tautomers only, as keto are the most stable form.
  • this means we will have for the same chemical a splitting of the IDs between 2 items, this reducing the capacity of connections of external databases through an unique WD item – we already have this in items for which an external, reliable source incorrectly gave an ID which is correct for other chemical compound (such statement is deprecated in WD, but still an ID exists in two items)
This way of doing is just a propagation of errors and incoherences. Wikidata is not only a simple compilation of data, but should generates an ontology and should be able to provide a logic for machines. This implies to not only observe and mark errors but to try to correct them by alerting the databases and spotting the problem to their attention. Snipre (talk) 04:26, 5 March 2020 (UTC)
Ad 1: Using your argumentation your proposal that one tautomer should be an instance of chemical compound and the other(s) should be instance(s) of tautomer is not correct, because the chemical compound being a hybrid (in fact a mixture) of tautomers should be an instance of chemical compound and every tautomer an instance of tautomer – as you never have a 100% pure substance composed of only one tautomer. Using your proposal for simple annular tautomers may seem simple, but in which phase/conditions you want to measure which one tautomer is prevalent? It seems not so simple for carbohydrates: open chain-ring tautomers – which one is prevalent and why?
Ad 2: As I said, having the same IDs in more than one item is unavoidable (not necessarily for tautomers, but in general), so if there is a need in a particular item describing tautomer to add ID that is added somewhere else, the only thing we should care about is to properly describe the situation using qualifiers and ranks. Wostr (talk) 13:37, 5 March 2020 (UTC)
@Wostr: Concerning your Ad1: if we are not able to isolate one form then there is no reason to create 2 items, one for each form and none should be defined as instance of chemical compound. I was not the one creating items without having a correct data model to propose, so I think we should merge all tautomers: we eliminate the problem of the constraint violations and we keep a coherent data model. Snipre (talk) 19:21, 27 May 2020 (UTC)
Merging items about tautomers is not an option IMHO. That would be a nuclear option. Wostr (talk) 23:08, 27 May 2020 (UTC)
@Wostr: Your argumentation is very impressive:
What do we loose by merging ? Nothing because we can always store the data in one item. If each tautomer can't be isolated then they can't be considered as chemical compound (just have a look at the definition of chemical substance which is the upper class of chemical compound: the notion of defined physical properties is mentioned): if we can't isolate the tautomer and perform some physical measurement, then this is not a chemical substance and not a chemical compound. This means we will have 3 items: one for each tautomer defiend as instance of hypothetical chemical compound (Q50308749) or tautomer (Q334640), but not as chemical compound (Q11173) and a third one which is a kind of chemical with undetermined structure but which respects the criteria of the chemical substance definition (defined chemical composition and measured physical properties). But tautomer items could not use properties like InChIKey (not specific for tautomer) or other physical properties. This will just add mess to the current situation during data import from other databases. Snipre (talk) 10:46, 2 June 2020 (UTC)
My argumentation reflects my desire to discuss this further, I really see no point as I don't like to write and read the same argumentation over and over. We lack too many things to proceed with this topic – proper classification of compounds that can be a basis for attempting to include tautomers in this classification; participants in this project and this discussion, because without more participants, I don't think there will be some sort of a consensus here. Ad rem: some tautomers can be isolated, some cannot. We can't isolate some chemical compounds, but still we have items about them; also, we can't isolate some ions, be we still have items about them. InChI and InChIKey can be properly assigned to every tautomer, but it won't be StdInChI/StdInChIKey — as I wrote above, I don't think that assigning the same StdInChI to more than one item while also having Non-StdInChI in these items is a problem. Quite the opposite, in ChemSpider for example you have such situations. Wostr (talk) 14:58, 2 June 2020 (UTC)

Non-standard InChIEdit

ChemSpider do have non-standard InChIs/InChIKeys (don't know, however, with what options), but there is no entries for tautomers (at least not for the few I checked). Wostr (talk) 22:40, 10 December 2019 (UTC)

GZWDer added all (most?) of the US EPA CompTox dashboardEdit

Hi all, GZWDer (talkcontribslogs) copied in more or less the full CompTox Chemistry Dashboard (Q26998510) which brings in some 800 thousand new DSSTox substance ID (P3117)s. Along, it also makes the number of CAS registry numbers to >800 thousand. Let's see how that goes with Chemical Abstracts. Currently, there is molecular formula, mass, SMILES, info missing, but I can write a script tomorrow to generated QuickStatements to add missing info (using PubChem to convert the InChIKey to SMILES). Please don't do this manually. --Egon Willighagen (talk) 08:53, 30 January 2020 (UTC)

This is insane... [4]: +533 685 bytes.... Wostr (talk) 23:14, 30 January 2020 (UTC)
@GZWDer: how do you propose to resolve this? --SCIdude (talk) 09:43, 31 January 2020 (UTC)
Hi all, after a long quarantaine (well, ongoing), but not having holiday, I started add missing SMILES. I'm currently doing the easy ones: InChIKey that have SMILES of a single molecule (no salts) and have full stereochemistry defined. The workflow is like this: get a SMILES from PubChem using the InChIKey, use the CDK to recalculate the InChIKey, and proceed only if a match. This (chiral) SMILES is then taken to a second step in which the SMILES is searching in Wikidata by a match on the InChIKey (again, with the same CDK) and with the PubChem CID (so, some redundant work, but it allows me to work with already proven code; see https://github.com/egonw/ons-wikidata/tree/master/Wikidata). This creates QuickStatements. This way, I've "resolved" some 20 thousand of the 800 thousand issues (at the time of writing). This is going to take some time, and there is room for improvement. One thing I started working on, which will improve performance, is output v2 QuickStatements. I'm finishing a last round with v1 QuickStatements, but the next one should be with v2. --Egon Willighagen (talk) 08:10, 26 July 2020 (UTC)
Okay, playing with v2 did not help. The code is updated for it, but it has a number of limitations: 1. it doesn't do sparse data well (v2 is tabular, so you get a lot of empty cells); 2. it still does not group edits for a single item (I think this was already known, but now I've seen it with my own hands). So, I reverted back to v1 QuickStatements. By now, I've added missing info for another 100 thousand items, and the number of Wikidata items with InChIKey but no SMILES is now below 700 thousand. --Egon Willighagen (talk) 05:38, 13 August 2020 (UTC)

(Topic continued at 604_duplicate_InChIKeys)

New property proposalsEdit

I have proposed some new identifier properties. Comments are welcome.--GZWDer (talk) 04:41, 28 March 2020 (UTC)

Difference between CAS numbersEdit

Hi, we have 2 items which are similar pentyl 2-furoate (Q27269583) and furancarboxylic acid pentyl ester (Q72479642). The only difference is the CAS numbers: 4996-48-9 and 1334-82-3. Reaxys has two entries too but no clear explanation about the difference. Can someone have an idea about the reason of the 2 CAS numbers ? Thanks Snipre (talk) 11:22, 12 June 2020 (UTC)

The first CAS specifies synonyms with the acid on 2-position, the second does not. --SCIdude (talk) 14:55, 12 June 2020 (UTC)
I have checked them in SciFinder. 4996-48-9 is Pentyl 2-furoate or 2-​Furancarboxylic acid, pentyl ester. 1334-82-3 is Amyl furoate or Furancarboxylic acid, pentyl ester. The former is one of the isomers of the latter. --Leiem (talk) 13:33, 17 June 2020 (UTC)
@Leiem: Thank you for your answer. So if I understand, 1334-82-3 is for mixtures of pentyl 2-furoate and pentyl 3-furoate. Snipre (talk) 19:15, 14 July 2020 (UTC)
Yes. --Leiem (talk) 11:54, 15 July 2020 (UTC)
  Done Snipre (talk) 11:35, 27 July 2020 (UTC)

Introduction roundEdit

Dear all, User:Wiljes and myself would like to introduce us, our backgrounds, and interests on how to contribute to Wikidata in the context of the WikiProject Chemistry. Would you all be fine with doing that on a separate page (e.g. Wikidata:WikiProject_Chemistry/Who we are, linked out from the main page, i.e. Wikidata:WikiProject_Chemistry? Would you also be interested in that? Thanks for your opinion! --Robert Giessmann (talk) 11:20, 17 June 2020 (UTC)

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
  Notified participants of WikiProject Chemistry

@Rgiessmann: Well it's probably best to introduce your background etc. on your user page here - I see User:Wiljes has done that nicely. You could also link to your Scholia page there, if you have one, which would provide links to your publications etc. As to "interests on how to contribute (in this context)" I think just another post right here would be fine, no? I don't think we need another page to check! ArthurPSmith (talk) 18:59, 17 June 2020 (UTC)
@Rgiessmann: and you can also add your ORCID to your profile page, as I have done on my user page. --Egon Willighagen (talk) 10:03, 26 July 2020 (UTC)

Q5173335Edit

The Wikipedia article seems about a group of compounds instead of a specific one.--GZWDer (talk) 02:59, 20 June 2020 (UTC)

Note that some WP articles are about Kortistatin A and some about the group. --SCIdude (talk) 07:32, 20 June 2020 (UTC) Resolved.

604 duplicate InChIKeysEdit

(continued from GZWDer_added_all_(most?)_of_the_US_EPA_CompTox_dashboard)

Just a note that we are at 604. Wasn't it below 200 half a year ago? --SCIdude (talk) 07:12, 11 July 2020 (UTC)

@SCIdude: There was a huge data import some months ago from DSSTOX database where a lot of InChIKey duplicates exist. The reason is the creation in DSSTOX database of a lot of entries from ChemIDplus where several entries exist for the same InChIKey but with different CAS number. So DSSTOX prefers to ensure an unique entry per CAS number even if this generates InChIKey duplicates. The origin is a poor definition of chemicals in ChemIDplus where racemate or some stereoismers were not correctly identified.
I have no contact for the DSSTOX database and my emails never got some feedback concerning how to clean DSSTOX database. Snipre (talk) 12:26, 12 July 2020 (UTC)
Thanks for the background. Still it is easy to check if an InChi key already exists, so the person doing the import had no idea what s/he was doing and should be stopped from running bots, in general. --SCIdude (talk) 08:05, 13 July 2020 (UTC)
@SCIdude:@Snipre: I am the project lead for the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) that is the community-facing website for the DSSTox database. I have requested a dump file for the latest release to look at the duplicates. There are various reasons this can happen including InChIs not having advanced stereochemistry so while the V3000 molfile may have different stereochemistry from a different stereovariant when the InChI is generated they will become equivalent. SO chemicals can have different names, CASRN and v3000 mols but the same InChIKey --Antony Williams 23:22, 26 July 2020 (UTC)
I have now cleared about 100 of these duplicates, and in my opinion the duplicate keys come from the associated CAS. The person importing did only check for CAS uniqueness and so even created DSSTOX duplicate statements. --SCIdude (talk) 07:05, 27 July 2020 (UTC)
@ChemConnector: Thank you for your answer. Duplicates and curation are a problem we can handle, but only if corrections are made in the database which generates the data. It could be good if we can report in a simplified way the duplicates we found and the result of the analysis, in order to provide a good input to the database administrator. Do you see a problem if we mention the cases in your talk page ? I can use the dashboard you pointed but I miss a feedback saying the problem is under resolution. I suppose you have plenty of other things to do so perhaps working by batch of cases instead of sending one mail for each case can help. Let us know. Snipre (talk) 11:50, 27 July 2020 (UTC)
@Snipre: The most ideal way to do this for us is that someone registers the comment(s) directly against a particular chemical record. If you watch the video here https://www.youtube.com/watch?v=9A9sWRbJrYA starting at 45:05 it tells you the process to submit the comments and when they are resolved the submitter gets a response and the comment is public. See: https://comptox.epa.gov/dashboard/comments/public_index. These keeps track of the comments publicly, registered against the actual record, and makes the curation public. Would this work?--Antony Williams 12:22, 27 July 2020 (UTC)
@ChemConnector: Thanks,I will test that, Regards. Snipre (talk) 19:38, 27 July 2020 (UTC)
@Snipre: Please see your first comment response on here: https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID20975867#comments

Difference between CAS numbers (bis)Edit

@Snipre: I answered the 7 cases you highlighted. Do you agree with my solutions? --SCIdude (talk) 14:03, 17 July 2020 (UTC)

@SCIdude: Thank you for the information but I don't have the time to check your answers now. Let me one week. regards Snipre (talk) 20:20, 20 July 2020 (UTC)
Thanks for looking and doing the work. I think we mostly agree on how to resolve things, for which there are sometimes different ways. --SCIdude (talk) 04:24, 29 July 2020 (UTC)

CAS 28519-04-2 vs. CAS 7134-06-7Edit

We have two items (2-hydroxy-5-methylbenzenesulfonic acid (Q27285095) and 2-hydroxy-5-methylbenzenesulfonic acid (Q72461715)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)

No different compound. Presumably different source in CAS without checking for duplicate substance. --SCIdude (talk) 14:12, 15 July 2020 (UTC)
  Done Merge. Snipre (talk) 19:43, 28 July 2020 (UTC)

CAS 40102-60-1 vs. CAS 1439-07-2Edit

We have two items (S-trans-stilbene oxide (Q27121652) and trans-stilbene oxide (Q72508941)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)

No different compound. Presumably different source in CAS without checking for duplicate substance. P.S.: trans-stilbeneoxide would be a pair of enantiomers (unspec. InChi key) . Interestingly the 2D structure of PubChem is wrong, shows cis-. One can debate whether to change the second item to the pair. In any case the CAS structure does not match the name/synonyms. --SCIdude (talk) 14:21, 15 July 2020 (UTC)
Snipre (talk) 20:56, 28 July 2020 (UTC)
  Done Delete InChIKey Snipre (talk) 21:12, 28 July 2020 (UTC)

CAS 64047-16-1 vs. CAS 6588-17-6Edit

We have two items (sodium 5-heptyl-5-methyl-2-oxo-2,5-dihydro-1,3-oxazol-4-olate (Q82968869) and sodium 5-heptyl-5-methyl-2-oxo-2,5-dihydro-1,3-oxazol-4-olate (Q82543970)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)

The second CAS was obsoleted. Easy merge. --SCIdude (talk) 14:25, 15 July 2020 (UTC)

CAS 13455-34-0 vs. CAS 60459-08-7Edit

We have two items (cobalt sulfate monohydrate (Q27263112) and cobalt sulfate x hydrate (Q72509228)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 19:31, 14 July 2020 (UTC)

The second CAS was obsoleted. Easy merge. --SCIdude (talk) 14:25, 15 July 2020 (UTC)
  Done Snipre (talk) 19:27, 28 July 2020 (UTC)

CAS 103-26-4 vs. CAS 1754-62-7Edit

We have two items (methyl cinnamate (Q204178) and (E)-Cinnamic acid methyl ester (Q72460898)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 20:28, 14 July 2020 (UTC)

The first does not specify cis-/trans- anywhere, also not in synonyms. So again, two possibilities: a pair of cis,trans isomers with a wrong structure (then the CAS would have to be moved to a different item), or a duplicate. I tend to merge these cases because I think they (CAS) are too stupid to define a pair of cis,trans isomers. --SCIdude (talk) 14:31, 15 July 2020 (UTC)
  Done Snipre (talk) 21:52, 29 July 2020 (UTC)

CAS 1701-77-5 vs. CAS 7021-09-2Edit

We have two items (alpha-methoxyphenylacetic acid (Q72517465) and alpha-methoxyphenylacetic acid (Q27283784)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 20:32, 14 July 2020 (UTC)

The first CAS redirects to the second, i.e.: The first CAS was obsoleted. Easy merge. --SCIdude (talk) 14:25, 15 July 2020 (UTC)
@SCIdude: Be careful: Dortmund data bank considers 1701-77-5 as the racemate. See here. Snipre (talk) 19:20, 3 August 2020 (UTC)

CAS 36393-56-3 vs. CAS 37577-07-4Edit

We have two items ((±)-norpseudoephedrin (Q59628358) and (–)-norpseudoephedrine (Q6456100)) with the same InChIKey but with different CAS numbers. Any idea about the difference ? Snipre (talk) 20:39, 14 July 2020 (UTC)

Both CAS have different keys. The first CAS key does not match the key of the item. So, the CAS needs to be moved to a different item, or the item key needs a different item. Actually, there is Q423797 where the CAS should be moved to. --SCIdude (talk) 14:47, 15 July 2020 (UTC)
@SCIdude: Soory I don't understand why you propose to move 36393-56-3 to Q423797 as the CAS number for the item is 492-39-7.
We have the following structure:
So I propose to delete the InChIKey of Q59628358 and to define it as mixture of stereoisomers of norpseudoephedrin. Snipre (talk) 20:20, 28 July 2020 (UTC)
After the CAS is moved we have still two items with the same InChi key. The only further information in (±)-norpseudoephedrin (Q59628358) is the German label (±)-Norpseudoephedrin and the ECHA (which refers to CAS 36393-56-3 which we just moved so the ECHA can be moved there as well). It looks like there is no item for norpseudoephedrine (the pair of isomers) so this item could serve. If we do this the InChi and key should be removed. --SCIdude (talk) 15:08, 15 July 2020 (UTC)
  Done Move InChIKey. Snipre (talk) 20:42, 28 July 2020 (UTC)

CAS and unspecified stereochemistryEdit

When CAS doesn't define some stereo or cis/trans center I have come to the conclusion that they always mean the racemic mixture. One reason is they are a product-oriented database, and they have also no ontologic hierarchy for their items, unlike ChEBI. Do you agree? If you agree then there are 250 such wrongly placed CAS statements:

SELECT ?item ?itemLabel 
WHERE 
{
  VALUES ?class { wd:Q55662548 wd:Q55662547 wd:Q15711994 }
  ?item wdt:P31 ?class.
  ?item wdt:P231 [].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it! --SCIdude (talk) 08:13, 3 August 2020 (UTC)

Jun Namkung (Q55662547)? You're probably right, but it have to be done with caution. Many databases do not differentiate 'cpd with unspecified stereochemistry', 'cpd with unknown stereochemistry' (we treat the preceding two as one) with 'racemic mixture', so in many situations InChI/InChIKey and/or other IDs are incorrectly linked in these DBs with racemic mixture entries. I would advise to not simply delete CAS numbers from our items, but deprecate them with a proper reason for deprecation (P2241). And in the future more attention should be given to automatic imports to items about racemic mixtures. Wostr (talk) 09:04, 3 August 2020 (UTC) Edit: we still have some items that mix the two concepts and still we have thousands of items without proper classification as a group of isomers. Wostr (talk) 09:06, 3 August 2020 (UTC)
Usually I only delete CAS numbers when there is no (longer a) CAS page, or if the link redirects to a CAS we have. What is the point of keeping these? You would also not import a deprecated CAS, would you? --SCIdude (talk) 09:24, 3 August 2020 (UTC)
The point of keeping deprecated IDs is simple — such IDs won't be imported in the future as correct ones. This, of course, applies when at least one of the databases have CAS number linked to the wrong entry or the database does not differentiate between concepts like we do. If CAS number was correct in the past and now is deprecated — this is also a valid reason for keeping this in WD (that's why there is a deprecated rank at all). I wouldn't import deprecated statements to WD, but some deprecated IDs should be kept in WD to ensure an appropriate linkage between databases, to provide an adequate explanation of why the ID is in a certain item and not in another, and to prevent against automatic import of incorrect data in the future. BTW which 'CAS page' do you mean? Wostr (talk) 14:13, 3 August 2020 (UTC)

Beware, GZWDer flooding DSSTOX compound idsEdit

Thanks to creation of a property that User GZWDer requested he felt obliged to play with his bot and add this ID to chemicals. How he identifies them is anyones guess, so have a lookout for what is to be expected. --SCIdude (talk) 05:44, 6 August 2020 (UTC)

@SCIdude: It was discussed in the proposal that his data source provides a link to the existing DSSTox substance ID (P3117), this seems perfectly reasonable. Adding identifiers should be fine if they are from a curated source - and in this case it doesn't involve adding any new items. ArthurPSmith (talk) 19:44, 6 August 2020 (UTC)

Upcoming: ChEBI completionsEdit

Towards a more complete ChEBI coverage the following steps can be identified:

  1. substances: add references to InChi keys identical with those in ChEBI---85,715 references to add
  2. resolve conflicts if keys not identical; add from ChEBI if missing---844 items with conflict, 71 items without InChi key (e.g. peptides)
  3. for any (substance P31 class) add reference if directly supported by ChEBI ("is_a")
  4. for any (class P279 class) add reference if directly supported by ChEBI ("is_a")
  5. ChEBI import completion, full class hierarchy
  6. ChEBI import completion, all substances (pretty good already)
  7. ChEBI check all substances are in their classes

I'm ready to do 1) now, which I guess will not be controversial but, please, fire away with any arguments that come to mind! --SCIdude (talk) 17:39, 7 September 2020 (UTC)

@SCIdude: No we should have a more strict approach before performing mass data imports.
Step 1: for all WD items having InChI, InChIKey and ChEBI ID properties, check if corresponding ChEBI entry in ChEBI database has the same InChI and InChIKey.
  • If yes, check if all 3 properties have an reference to ChEBI, and add or update the reference to ChEBI database (using retrieve date, ChEBI ID property, stated in = ChEBI (Q902623), title properties)
  • If no (InChI or InChIKey or both are different from values in ChEBI database), put WD items in a list for further manual check.
Step 2: for all WD items having InChI and ChEBI ID properties, check if corresponding ChEBI entry in ChEBI database has the same InChI.
  • If yes, check if both properties have an reference to ChEBI, add InChIKey from ChEBI database, and add or update the reference to ChEBI database (using retrieve date, ChEBI ID property, stated in = ChEBI (Q902623), title properties)
  • If no (InChI is different from value in ChEBI database), put WD items in a list for further manual check.
Step 3: for all WD items having InChIKey and ChEBI ID properties, check if corresponding ChEBI entry in ChEBI database has the same InChI.
  • If yes, check if both properties have an reference to ChEBI, add InChI from ChEBI database, and add or update the reference to ChEBI database (using retrieve date, ChEBI ID property, stated in = ChEBI (Q902623), title properties)
  • If no (InChIKey is different from value in ChEBI database), put WD items in a list for further manual check.
Then no importation of ChEBI ontology using instance of (P31) or subclass of (P279): if we do the same with 2 or 3 other databases, this will be a mess to understand the definition of the item based on instance of (P31) and subclass of (P279). From my point of view, ChEBI ontology should stay in ChEBI database, first for copyright reason (ontology is definitively an original work), then because WD should be able to define its own ontology. The purpose of WD is not to integrate all internet information but rather to generate the link between different information sources. Finally ChEBI ontology can change with time, leading to synchronization problems. Snipre (talk) 19:01, 8 September 2020 (UTC)
Query of all WD item about chemical having InChI, InChIKey and ChEBI ID properties: here
Query of all WD item about chemical having InChI and ChEBI ID properties but no InChIKey property: here
Query of all WD item about chemical having InChIKey and ChEBI ID properties but no InChI property: here
Snipre (talk) 19:11, 8 September 2020 (UTC)
@SCIdude: Please follow the recommendations for references: Help:Sources#Databases. If everyone is adding its own structure for reference data, there will no possibility to extract using a common tool the reference data for display in WP for example. Snipre (talk) 19:52, 8 September 2020 (UTC)
  • The copyright issue is unresolved. We are not importing a whole database, and the border where "substantial part" begins needs to be specified legally. The same with common class terms that are used in the literature since decades. What if I just import the hierarchy but under each subclass statement put a reference to an article, which makes it common knowledge?
  • I appreciate the steps given. Here is the list of items where the ChEBI InChi key differs from the InChi key given. I'm going through these right now, and I also submit issues with ChEBI on github, as you can see, based on infomation from articles. --SCIdude (talk) 07:29, 9 September 2020 (UTC)

InChI strings in Wikidata missing 'InChI=' prefixEdit

There are almost 1 million (999176) chemical compounds with an InChI string identifier in Wikidata. However, none of them have the prefix 'InChI=' (captilization important), even though it is in the specification1,2.

Can the entries please be updated to include the 'InChI=' prefix?

References

  1. https://en.wikipedia.org/wiki/International_Chemical_Identifier
  2. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-015-0068-4
I can do. I'll make it a module of the maintenance bot, i.e. Scidudebot. After some thinking, there may be a way to solve any timing problem. --SCIdude (talk) 14:27, 17 September 2020 (UTC)
Bot is running now. At about 20 edits/minute max (actually less at the moment) it will take more than a month for all items with InChi. The only problem with this is that the updated links will not work until the P234 entry is changed. Ideally we want this change when half of the items are done, in order to minimize complaints about not working links. --SCIdude (talk) 14:44, 18 September 2020 (UTC)
Thanks for setting up the bot. Just out of interest why does the process take so long? I am monitoring the updating using this query. --Stuchalk (talk) 19:07, 23 September 2020 (UTC)
A miscalculation. It already finished a week ago. --SCIdude (talk) 13:27, 11 October 2020 (UTC)

PubChem 2D structuresEdit

Can you confirm that the 2D structure in https://pubchem.ncbi.nlm.nih.gov/compound/198165 does not correspond to the InChi 3D? In particular, if you place the benzol with the methyl to the left, the N-heterocycle should be behind the ring system, contrary to what the 2D suggests. I might have seen more such cases already. --SCIdude (talk) 14:24, 21 September 2020 (UTC)

Redrawing the PubChem structure in ChemDraw gives the same stereochemistry and the same InChI as in PubChem. Saving this structure in .mol and opening in InChI 1.05 software gives the same results (InChI from PubChem and from InChI 1.05 is the same). However, 2D structure in PubChem is unintuitive and does not seem to be the best option to visualise this stereochemistry. Wostr (talk) 21:32, 23 September 2020 (UTC)
Thanks for the confirmation. --SCIdude (talk) 05:09, 24 September 2020 (UTC)

Edits from University of CambridgeEdit

I have noticed many chemistry-related edits from IP addresses which belong to University of Cambridge. 131.111.225.4 (talkcontribslogs) and 131.111.114.157 (talkcontribslogs) are a couple of examples. Most of the edits involve creating new items for various polyketides. Presumably, this is some type of ongoing class project. There are also quite a few new creations of listings for polyketides from new accounts - they create the account, start one new Item, then never edit again. These are probably also students involved in the same class project. The reason I'm bringing this up, is that many (maybe most?) of the new Item creations are poorly formed. Q59295080 is a recent example. In particular, many are conflating data for chemicals with data for scientific publications in which they are mentioned. They could definitely use some training and/or guidance. Any suggestions on how to handle this? Edgar181 (talk) 14:09, 4 December 2018 (UTC)

  • I've noticed some items like this one and corrected it (niuhinone A (Q58118804), smenopyrone (Q57391881), (5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)), but I did not think that this may be some sort of a class project — but you are probably right and it may be connected to [5], [6] (cf. the last page). Honestly, I'm not a fan of any class projects involving Wikimedia, but we could try to contact professor Goodman and offer his students a help page (subpage of this wikiproject) with editing info related only to this field (i.e. how to properly add statements, which properties should be used and that scientific article and chemical compound should be separated). I can also create better SVG structures for these new items. Wostr (talk) 14:40, 4 December 2018 (UTC)
I think you have correctly identified the class project that is involved. Maybe we can ask them, at the very least, to provide Wikidata with a list of items that they have already created and to update it with new ones as they are created so that they may be reviewed. Edgar181 (talk) 15:22, 4 December 2018 (UTC)
I sent an email, I will see if I get an answer. Snipre (talk) 20:45, 4 December 2018 (UTC)
If anyone wants to have a look, it appears that all of the last several thousand edits from the IP range 131.111.0.0/16 (search results) are related to this polyketide classwork. Edgar181 (talk) 15:42, 5 December 2018 (UTC)
I'll be happy to help in reformatting these items if you wish, later in the month when I have more time. I think these data are a valuable addition into Wikidate, as they represent manually curated, real information direct from the literature; as such they are probably the only independent source of open data on these compounds on the Web. I'll work with Dr. Goodman as needed. Walkerma (talk) 11:24, 6 December 2018 (UTC)
I'd be very happy to meet any of the people involved. This could be a good way of adding data to Wikidata. Petermr (talk) 13:05, 3 January 2019 (UTC)
  • Copied from Archive/2019 as this is not done yet. Wostr (talk) 18:28, 25 September 2020 (UTC)

List of itemsEdit

This is a list of chemistry-related articles edited from this IP /16 subnet (edit: and from many other accounts/IPs), excluding items about scientific papers, but including redirects, because target items may need some clean-up. I'll try to check and correct these items.

Item Checked? Notes To do
polyrhacitide B (Q43035170)   Checked Wostr (talk) 21:19, 18 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective Total Synthesis of Polyrhacitides A and B (Q59872751) CAS number not verified
motrilin (Q43184772)   Checked Wostr (talk) 21:43, 18 December 2018 (UTC) ids added/corrected, scientific paper data moved to Molvizarin and motrilin: Two novel cytotoxic bis-tetrahydro-furanic γ-lactone acetogenins from Annona cherimolia (Q59874494) CAS number not verified
pentamycin (Q43224626)   Checked Wostr (talk) 17:18, 6 October 2019 (UTC) merged with pentamycin (Q7165030), ids corrected, new image added
lankanolide (Q43228554)   Checked Wostr (talk) 19:36, 6 October 2019 (UTC) ids added/corrected, new image added, scientific paper data moved to The first stereoselective total synthesis of lankanolide. Part 2 (Q69903707) CAS number not verified
(3R,4S,5R,6S)-6-(4-methoxyphenyl)-2,4-dimethylhept-1-ene-3,5-diol (Q43231506)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added
ethyl (4R,5S,6S,7R,8S,E)-5,7-dihydroxy-2,4,6,8-tetramethyldec-2-enoate (Q43235849)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added, new image added CAS number not verified
arugosin G (Q43294163)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) data added
(−)-dictyostatin (Q43297542)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added, new image added
aflatoxin B1 (Q43305230) (redirect)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC)
NMI-1182 (Q43376765)   Checked Wostr (talk) 22:15, 10 October 2019 (UTC) ids added, new image added
bikaverin (Q43389039) (redirect)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC)
4-[2-(2-amino-2-oxoethyl)-2,7-dihydroxy-4-oxochroman-5-yl]-3-hydroxybut-2-enoic acid (Q43394722)   Checked Wostr (talk) 22:15, 10 October 2019 (UTC) ids added, new image added CAS number not verified
9,10-deoxytridachione (Q43396443)   Checked Edgar181 (talk) 14:17, 6 December 2018 (UTC) Publication data moved to Q59459697. PubChem ID added CAS number not verified
myriaporone 3 (Q43397060) (redirect)   Checked Wostr (talk) 22:15, 10 October 2019 (UTC) myriaporone 3 (Q27134979) corrected
thailandamide B (Q43399095)   Checked Wostr (talk) 22:01, 25 September 2020 (UTC) ids added, new image added CAS number not verifed
furaquinocin B (Q43479949)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
9-(N-methyl-L-isoleucine)-cyclosporin a (Q43549418)   Checked Egon Willighagen (talk) 09:14, 25 November 2019 (UTC) Already got merged on May 28.
Conformational significance of EH21A1-A4, phenolic derivatives of geldanamycin, for Hsp90 inhibitory activity. (Q43550570)   Checked Egon Willighagen (talk) 09:14, 25 November 2019 (UTC) Already got merged on Aug 28.
indanomycin (Q43638081)   Checked Wostr (talk) 22:01, 25 September 2020 (UTC) ids added, new image added
dipentaerythritol hexapropionate (Q43653509)   Checked Wostr (talk) 21:03, 27 September 2020 (UTC) new image added CAS number not verified
D-sorbitol hexapropionate (Q43653869)   Checked Wostr (talk) 21:03, 27 September 2020 (UTC) scientific paper data moved to Acetates, Propionates and Butyrates of Simple Saccharides (Q99673340), new image added CAS number not verified
cellulose acetate propionate (Q43654570)   Checked Wostr (talk) 21:03, 27 September 2020 (UTC)
furaquinocin A (Q43636537)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
palmerolide A (Q43770969)   Checked Wostr (talk) 21:03, 27 September 2020 (UTC) CAS number not verified
monensin (Q43772550) (redirect)   Checked Wostr (talk) 22:40, 27 September 2020 (UTC)
indanomycin (Q43775351) (redirect)   Checked Wostr (talk) 22:40, 27 September 2020 (UTC)
murayaquinone (Q43871312)   Checked Edgar181 (talk) 15:04, 6 December 2018 (UTC) Publication data moved to The biosynthesis of murayaquinone, a rearranged polyketide (Q59420925) CAS number not verified
muricatetrocin B (Q43879334)   Checked Wostr (talk) 22:40, 27 September 2020 (UTC) new image added, new data added
nudifloric acid (Q43879862)   Checked Wostr (talk) 22:40, 27 September 2020 (UTC)
parviflorin (Q43959386)   Checked Wostr (talk) 12:04, 28 September 2020 (UTC) scientific paper data moved to Parvifloracin and parviflorin: cytotoxic bistetrahydrofuran acetogenins with 35 carbons from Asimina parviflora (Annonaceae) (Q99689329), ids added, new image added CAS number needs to be verified
(±)-atrovenetinone (Q44073650)   Checked Wostr (talk) 12:04, 28 September 2020 (UTC) ids added, (R)-atrovenetinone (Q45608963) and (S)-atrovenetinone (Q45498657) corrected
amphotericin B (Q44083544) (redirect)   Checked Wostr (talk) 12:04, 28 September 2020 (UTC)
(2,6-dimethylphenyl) (2R,3R,4S,5R,6R)-6-[(1S,3S,4R,5S)-1,4-dimethyl-2,8-dioxabicyclo[3.2.1]octan-3-yl]-3,5-dihydroxy-2,4-dimethylheptanoate (Q44099768)   Checked Wostr (talk) 18:24, 10 October 2020 (UTC) new image added, ids added CAS number not verified
avermectin B1a (Q44107971)   Checked earlier by Egon Willighagen, Wostr (talk) 18:24, 10 October 2020 (UTC)
cryptosporiopsin A (Q44165697)   Checked Wostr (talk) 18:24, 10 October 2020 (UTC) ids added, scientific paper data moved to First total synthesis of cryptosporiopsin A (Q100268431) CAS number not verified
tupichinol A (Q44167222)   Checked Wostr (talk) 20:59, 20 January 2019 (UTC) ids added, scientific paper data removed (New flavans, spirostanol sapogenins, and a pregnane genin from Tupistra chinensis and their cytotoxicity (Q44331518) exists) no image
linfuranone A (Q63568786)   Checked Edgar181 (talk) 19:04, 7 May 2019 (UTC) Scientific paper in Q44170686 group of stereoisomers (absolute conf. not known), CAS number not verified, no new image
dihydrocitrinin (Q44171449)   Checked Wostr (talk) 18:24, 10 October 2020 (UTC) scientific paper data moved to The Synthesis of Dihydrocitrinin and Citrinin (Q100268578), data split to 6,8-dihydroxy-3,4,5-trimethyl-3,4-dihydro-1H-isochromene-7-carboxylic acid (Q100268598) (group of stereoisomers) no new image, CAS number not verified (group of isomers/compound)
Tarchonanthuslactone (Q44178369)
Stegobinone (Q44178535)
muamvatin (Q44180992)
siphonarienone (Q44184464)   Checked Walkerma (talk) 05:10, 16 January 2019 (UTC) Added IDs, new image
(+)-macrosphelide B (Q44186030)   Checked Wostr (talk) 00:03, 16 August 2019 (UTC) ids added; article data moved to Concise Syntheses of (+)-Macrosphelides A and B (Q66467255)
amphotericin B (Q44083544) (redirect)
Phoslactomycin A (Q44188829)
Antibiotic SS-228 Y (Q44195855)
annonacin (Q44195910) (redirect)
Zincophorin (Q44205464)   Checked Edgar181 (talk) 17:35, 7 December 2018 (UTC) minor changes made
mumbaistatin (Q44207859)
furaquinocin I (Q44212329)   Checked Edgar181 (talk) 13:38, 6 December 2018 (UTC) publication data moved to ChemInform Abstract: Total Synthesis of the Furaquinocins (Q59461544); image added (Wostr (talk) 20:35, 17 June 2019 (UTC)) verify CAS number
6'-Hydroxypestalotiopsone C (Q43305590)
8-O-methyl-(3S)-torosachrysone (Q43307090)   Checked Wostr (talk) 18:21, 20 June 2019 (UTC) 8-O-methyl-(3S)-torosachrysone (Q44279596) merged with this item; image added, ids added CAS number not verified
Tedanolide (Q43343316)
rifamycin (Q43347312) (redirect)
Siphonarienal (Q44224371)   Checked Edgar181 (talk) 13:29, 6 December 2018 (UTC) Publication data moved to Q59420946
(-)-spiculoic acid A (Q44224407)
Deoxyherquienone (Q44270099)
reblastatin (Q44271895)
asperlactone (Q44275049)
Myriaporone 4 (Q44277987)
Scytophycin B (Q44278556)
8-O-methyl-(3S)-torosachrysone (Q44279596)   Checked Edgar181 (talk) Publication data moved to Austrocolorins A1 and B1: atropisomeric 10,10′-linked dihydroanthracenones from an Australian Dermocybe sp. (Q59420967); merged with 8-O-methyl-(3S)-torosachrysone (Q43307090) (Wostr (talk) 18:21, 20 June 2019 (UTC)
discodermolide (Q2920456)
Spiculoic Acid B (Q44281618)
Deoxyherqueinone (Q44175462)   Checked Edgar181 (talk) 13:41, 6 December 2018 (UTC) No major problems found. Images from Commons addded.
alchivemycin A (Q44284361)   Checked Edgar181 (talk) 15:06, 6 December 2018 (UTC) Publication data moved to Alchivemycin A, a bioactive polycyclic polyketide with an unprecedented skeleton from Streptomyces sp. (Q59420815) CAS number not sourced
(3S)-3,6,8-trihydroxy-3-methyl-2,4-dihydrobenzo[a]anthracene-1,7,12-trione (Q44285843)   Checked Edgar181 (talk) 13:03, 7 December 2018 (UTC) Chemical name added. Appears to be the unknown and unnatural enantiomer of rabelomycin.
tautomycetin (Q44007750)
(-)-Macrolactin A (Q44287045)
Selective Synthesis of the para-Quinone Region of Geldanamycin (Q44287100)
Myriaporone 1 (Q44287752)
Chlorotonil A (Q44288044)
(−)-dolabriferol (Q44293768)   Checked Wostr (talk) 17:52, 11 December 2018 (UTC) ids added/changed, new image added; (−)-dolabriferol (Q59163350) has been merged into this item earlier by Edgar181 CAS number not verified, Reaxys ID not verified
carbonolide B (Q44295414)
(+)-amomol B (Q44302452)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded
Terrestric acid (Q44307000)
polypropionate (Q44320653)   Checked Wostr (talk) 20:59, 20 January 2019 (UTC) P31/P279 added, definition added
dilithium (Q1189242)
Lycogalinoside B (Q57281678)
Onchidionol (Q57395987)
decarestrictine O (Q57398017)   Checked Wostr (talk) 14:19, 9 December 2018 (UTC) scientific paper data moved to Stereoselective total synthesis of decarestrictine O (Q59582131), ids added/corrected, new image added
Aspiketolactonol (Q57402533)
YC-20 (Q57415434)   Checked Wostr (talk) 21:32, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Antibacterial activity of YC-20, a new oxazolidinone (Q59505238), new image uploaded (with the old one nominated for deletion)
(-)-BABX (Q57417167)
decarestrictine J (Q57418243)   Checked Wostr (talk) 00:32, 6 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of decarestrictine-J via Ring Closing Metathesis (RCM) (Q59484567), new image uploaded CAS numbers (2) not verified
(2Z,5R)-2-hexene-1,5-diol (Q57449957)   Checked Wostr (talk) 13:49, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Concise total synthesis of botryolide B (Q59491952), property prediction based on structure (Q59491903) created to indicate that physical properties are not experimental but structure-derived, Commons file marked for renaming, new image uploaded
auripyrone B (Q57451341)   Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, scientific paper info moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017), new image uploaded
mycoleptone A (Q57451895)   Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected CAS number not verified
concanamycin F (Q57499711)   Checked Wostr (talk) 13:16, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to The First Total Synthesis of Concanamycin F (Concanolide A) (Q59491670), new image uploaded
reveromycin B (Q57499770)   Checked Wostr (talk) 12:54, 6 December 2018 (UTC) ids added/changed, scientific paper data moved to Enantioselective Total Synthesis of Reveromycin B (Q59491449), new image uploaded
decarestrictine J (Q57499875)   Checked Wostr (talk) 00:32, 6 December 2018 (UTC) merged with decarestrictine J (Q57418243)
theonezolide A (Q57502071)   Checked Wostr (talk) 00:41, 9 December 2018 (UTC) ids added/changed, new image uploaded, P31/P279 changed, scientific paper data moved to Theonezolide A: A Novel Polyketide Natural Product from the Okinawan Marine Sponge Theonella sp. (Q59564916)
(5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)   Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, new image uploaded
(+)-baconipyrone A (Q58688643)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded
(−)-baconipyrone C (Q43217268)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded, scientific paper data moved to Total synthesis of (−)-baconipyrone C (Q65963722)
Lagriamide (Q57540827)   Checked Egon Willighagen (talk) 16:03, 22 November 2019 (UTC) SMILES, InChI, InChIKey added
Difficidin (Q58371294)
Basiliskamide B (Q57751679)
Basiliskamide A (Q59247254)
Siphonarin B (Q58371414)
methyl 2,2-bis(3-acetyl-2,6-dihydroxy-5-methylbenzyl)acetate (Q57902075)
Caloundrin B (Q57590129)
Dalesconol A (Q57545860)
reveromycin A (Q58216964)   Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
reveromycin D (Q43578515)   Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
Mycoepoxydiene (Q58217607)
4-hydroxy-5-methylcoumarin (Q59293564)
Trichoharzin (Q58211897)
(-)-rasfonin (Q59247007)
Spirastrellolide F methyl ester (Q59313278)
Lasiodiplodin (Q59287150)
dothideomynone A (Q57981745)   Checked Edgar181 (talk) 16:46, 10 December 2018 (UTC) Publication data moved to Q45149416
Trichbenzoisochromen A (Q57545344)
spongistatin 1 (Q59263700)
peloruside B (Q59242781)
pironetin (Q59220488)
oxoapratoxin A (Q59241846)
Isolasalocid A (Q58839832)
Mollipilin A (Q58837425)
(11β)-11-hydroxycurvularin (Q58361196)
Bionectriol C (Q58211689)
fusarimine (Q57981114)
(+)-macrosphelide B (Q57897760)   Checked Wostr (talk) 00:04, 16 August 2019 (UTC) merged with (+)-macrosphelide B (Q44186030)
methyl xylariate (Q57899491)
Purpurogenic acid (Q57748943)
Caldorin (Q57697944)
Hyaluromycin (Q57420731)
(11β)-11-methoxycurvularin (Q44297259)
archazolid A (Q44002843)
(1R-cis) - Sistodiolynne (Q44081665)
(+)-crocacin C (Q43869524)
Hirsutellone B (Q43267746)
Aloesaponarin II (Q59297186)
1,4-Dihydroxy-2-(hydroxymethyl)-9,10-anthraquinone (Q59263607)
4-epi-onchidione (Q59287996)
Mutactin (Q59115055)
2,​4-​Pentanedione, 1,​1'-​(1,​3-​dioxolan-​2-​ylidene)​bis- (9CI) (Q43146370)
poly(hydroxypropionate) (Q43042914)
luteosporin (Q58213147)   Checked Wostr (talk) 17:15, 11 December 2018 (UTC) scientific paper data moved to Genotoxicity of a Variety of Mycotoxins in the Hepatocyte Primary Culture/DNA Repair Test Using Rat and Mouse Hepatocytes (Q59633242), ids added/changed, new image added
niuhinone A (Q58118804)   Checked Wostr (talk) 01:08, 9 December 2018 (UTC) partially corrected in November (incl. new image); ids added
stevastelin A (Q59315862)   Checked Wostr (talk) 14:41, 10 December 2018 (UTC) ids added/changed, new image added, scientific paper data moved to Stevastelins, a novel group of immunosuppressants, inhibit dual-specificity protein phosphatases (Q59610748) CAS number not verified
pironetin (Q59315591)   Checked Wostr (talk) 01:35, 9 December 2018 (UTC) merged with pironetin (Q59220488)
smenopyrone (Q57391881)   Checked Wostr (talk) 01:31, 9 December 2018 (UTC) corrected in November (new image, ids added, scientific paper data moved to Isolation of Smenopyrone, a Bis-γ-Pyrone Polypropionate from the Caribbean Sponge Smenospongia aurea (Q58046717)); ChemSpider id added
(+)-roxaticin (Q43259451)   Checked Wostr (talk) 13:53, 10 December 2018 (UTC) ids added/corrected, new image added CAS number not verified
dolabriferol C (Q57394391)   Checked Wostr (talk) 13:28, 10 December 2018 (UTC) minor changes, ids added, new image added
dolabriferol B (Q57421096)   Checked Wostr (talk) 17:52, 11 December 2018 (UTC) ids added/changed, new image added
auripyrone A (Q57652685)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October, scientific paper data moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017)
Zincophorin methyl ester (Q44283203)
Reveromycin C (Q57903549)
furaquinocin D (Q44258402)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
furaquinocin E (Q44107981)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
rutamycin B (Q57618038)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) merged with rutamycin B (Q27264198) in October
2-[(E,5R,6R,7R,8R)-5,7-dihydroxy-8-{6-[(2R,3S)-3-hydroxypentan-2-yl]-3,5-dimethyl-4-oxopyran-2-yl}-4,6-dimethylnon-3-en-2-yl]-6-ethyl-3,5-dimethylpyran-4-one (Q57622079)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October and remodelled as group of stereoisomers (Q59199015)
2-[(E,2S,5S,6S,7S,8S)-5,7-dihydroxy-8-{6-[(2R,3R)-3-hydroxypentan-2-yl]-3,5-dimethyl-4-oxopyran-2-yl}-4,6-dimethylnon-3-en-2-yl]-6-ethyl-3,5-dimethylpyran-4-one (Q57515147)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October
Muricatetrocin A (Q57903401)
Cercosporin (Q43635077) (redirect)
geodiamolide C (Q44283410)   Checked Wostr (talk) 11:11, 19 June 2019 (UTC) Scientific paper data moved to Geodiamolides C to F, new cytotoxic cyclodepsipeptides from the marine sponge Pseudaxinyssa sp. (Q64711760); ids added, image added verify CAS number
granaticin (Q43772940)   Checked Edgar181 (talk) 18:03, 10 January 2019 (UTC) Merged into Q27106795
pteroenone (Q43563062)
untenolide A (Q44283932)   Checked Wostr (talk) 20:59, 20 January 2019 (UTC) ids added, image added CAS number not verified
massarilactone H (Q43872317)
sistodiolynne (Q43562351)
Virginiamycin M1 (Q58231308)
Xestodecalactone C (Q59158596)
Penicillolide (Q44188757)
calyculin C (Q58234458)   Checked Edgar181 (talk) 20:30, 24 February 2019 (UTC) Publication data moved to Q61861448
Molvizarin (Q43143335)
(2R,3E)-5-Chloro-N-[(2E,4R)-2,4-dimethyl-5-oxo-5-(1-pyrrolidinyl)-2-penten-1-yl]-2,4-dimethyl-N-(phenylmethyl)-3-pentenamide (Q59191782)
2-carboxyanthraquinone (Q59196332)
2-Anthraceneaceticacid, 3-acetyl-9,10-dihydro-4,5-dihydroxy-9,10-dioxo- (Q58003453)
13-hydroxypalitantin (Q44182627)
Isoannonacin (Q57617619)
Amphidinin B (Q59593833)
anthracimycin (Q14405541) (changes to existing item)
anthracimycin (Q59315034) (redirect)   Checked Wostr (talk) 19:02, 11 December 2018 (UTC) merged to anthracimycin (Q14405541) by the author
hamigeran A (Q59315549)   Checked Edgar181 (talk) 18:07, 12 December 2018 (UTC) Additional identifiers added. Publication data at Q46864433.
Citromycin (Q15410872) (changes to existing item)
Exiguapyrone (Q44299518)
penicyclone C (Q57584186)
Siphonarienedione (Q58209983)
Scabrolide A (Q59159910)
8-hydroxygeranyl acetate (Q57984205)
Siphonarienolone (Q58840595)
6E,8E-3-hydroxy-4,6,8,10,12-pentamethylpentadeca-6,8-dien-5-one (Q58015313)
geodiamolide A (Q58191896)   Checked Wostr (talk) 11:11, 19 June 2019 (UTC) Scientific paper data moved to Stereostructures of geodiamolides A and B, novel cyclodepsipeptides from the marine sponge Geodia sp (Q64711770); ids added, image added
(E)-siphonarienfuranone (Q59295886)
Micromelone A (Q59116673)
Botcinic Acid (Q57398604)
(+)-membrenone A (Q57585250)   Checked Wostr (talk) 16:12, 17 June 2019 (UTC) scientific article data moved to Membrenones: New polypropionates from the skin of the mediterranean mollusc Pleurobranchus membranaceus (Q64689324); ids added, image added
denticulatin B (Q44176507)
(+)-macrosphelide A (Q57829724)   Checked Wostr (talk) 00:03, 16 August 2019 (UTC) ids added; article data moved to Concise Syntheses of (+)-Macrosphelides A and B (Q66467255)
pectinatone (Q44299496)
(+)-membrenone C (Q58625985)   Checked Wostr (talk) 16:29, 17 June 2019 (UTC) scientific article data moved to Total synthesis of natural (+)-membrenone C and its 7-epimer (Q64691276); ids added, image added CAS number not verifed
Exiguaone (Q58688649)
Dihydrosiphonarin B (Q59278719)
Vallartanone B (Q59310911)
(+/-)-4-O-methyl-7-deoxyaklavinone (Q58851111)
Pellasoren A (Q58241762)
Khafrefungin (Q58049114)
Structure of onchidione, a bis-​γ-​pyrone polypropionate from a marine pulmonate mollusk (Q57394773)
(+)-polyrhacitide A (Q58635409)   Checked Wostr (talk) 21:19, 18 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of (+)-polyrhacitide A (Q59873415) CAS number not verified
Norpectinatone (Q59295080)
(−)-membrenone B (Q58688761)   Checked Wostr (talk) 16:29, 17 June 2019 (UTC) ids added, image added
okilactomycin (Q61422890)
doxycycline (Q63212296)
Amphoteronolide B (Q63212988)
(−​)​-​amomol A (Q57584266)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded, scientific paper data moved to Forming Spirocyclohexadienone-Oxocarbenium Cation Species in the Biomimetic Synthesis of Amomols (Q65963596)
aspyrone (Q43858253)

List of editorsEdit

Accounts
IPs
  1. 131.111.0.0/16
  2. 2A00:23C5:5A0A:BA00:DD82:618D:FC4C:EC0
  3. 2001:630:212:DE0:117D:A5AF:2C8B:F0AB
  4. 86.1.157.78
  5. 85.255.232.122
  6. 85.255.234.220
  7. 94.119.64.27
  8. 128.232.229.115
  9. 128.232.244.112
  10. 146.198.196.246
  11. 192.76.8.94
  12. 193.60.93.97
  13. 193.60.94.9

Validation of CAS numbers; collaboration with Wikipedia?Edit

Hi all, for the past few months we have been talking to a source of trusted CAS number information, and likely we cause this to confirm many CAS numbers, similar to commonchemistry.org (Q18907859). Together with this source, we're exploring how to this data into Wikipedia and Wikidata, and we have been talking about using ChemBox to pull out the information from Wikidata (which I think it does for various other fields already. On the Wikidata side, I want a clear data model: We don't just want to give the CAS, but also this new source as reference, when it was added/verified, etc. Importantly, I am also thinking about indicating on what basis the statement was made. For example, was this based on InChI(-Key) matching? The model should ideally say this, so that we can detect items where the InChIKey changed after the match was done. We're likely talking a few hundred thousand CAS registry numbers, so I like to work out these details early. We may use the bots used for proteins/genes. Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
  Notified participants of WikiProject Chemistry --Egon Willighagen (talk) 07:21, 11 October 2020 (UTC)

Now cross-posted as Validation of CAS numbers; collaboration with Wikidata?. --Egon Willighagen (talk) 07:51, 11 October 2020 (UTC)
So, which part of a CAS entry is definitive? The InChi key, the name, the 2D structure, any of the links or synonyms? I ask because, usually, there are multiple mismatches between any of these properties, and this is why I stopped relying on their entries. --SCIdude (talk) 08:11, 11 October 2020 (UTC)
Agreed. I hope this will become public soon. --Egon Willighagen (talk) 09:49, 11 October 2020 (UTC)

After several years trying to clean some chemical data, I have some kind of action list:

  • Create a policy to ensure a correct definition of what should be included in your data set
    • how to handle tautomer (one entry for both forms, one entry for each and in that latter case how to manage the data from databases which are not doing the difference between the tautomers,...)
    • how to handle partially defined stereoisomers
    • how complexes (ligand bond) and salt (ionic bond) should be defined
  • Use a structural identifier as primary key for identification like InChI or InChIKey
  • Generate a list of your database identifiers with the structural identifier (for example Wikidata Q number /InChIKey)
  • Wikidata can't be a source, so you can't upload data without a reference, this is the key factor to allow external persons to trust data from WD: they don't have to trust Wikidata, they can trust the reference related to each value in WD.
  • From your list defined above start to fill a table with other identifiers by matching the structural identifier. For example, if you want to link your identifier with the identifier of the PubChem database, you have to find which entries in your list and in PubChem have the same structural identifier. Problems appear if your policy concerning tautomers or way of describing complex/salt is not similar or if the other database is not strict with the rule one structural identifier = only one database identifier.
  • Once your table is finished, with the list of identifiers and their related reference,then you can import the data into wikidata.
  • Finally, periodically, using your table as master date, you check the change of identifier values in WD and if you find an difference, then your investigate the origin of the change.
  • Less frequently than the previous point, your check your table against the external databases to see if some changes occur in their data set.

The reason to use an intermediate table is to have the possibility to perform different checks before the importation of datat to WD: to ensure that each external identifier is unique (if not the case, the data has to be curated in the external database),...

I would not start with mass impot of CAS values before 1) we curate the WD data set: as long as we have constraint violations for our InChIKey/InChI values, this means we have duplicates or wrong defined Q numbers, 2) as CAS registry databse is not providing a n InChIKey/InChI value for ech CAS number, we need to rely on other databases to create that relation. So we need to curate first other database to ensure the uniqueness of their values. Only after these two steps we can start to consider CAS numbers. Snipre (talk) 21:34, 13 October 2020 (UTC)

@Egon Willighagen: To answer your question, I would propose to use your new data set of curate an established and well known database, as example PubChem, and then using the curated CAS number in that databse, to import them into WD. Why ? Because WD can't be a source. We need to rely on external documents or databases, we need references for the values imported into WD. WD should be the connection between references and other authorities, not becoming the reference. Snipre (talk) 21:40, 13 October 2020 (UTC)
Regarding the Snipre's list above about stereoisomers/tautomers/etc. I'd also say that it has long been a problem in WD with no proper solution. Also: we till have no clue how to classify chemical entities. Without solutions to these problems, no real work can be done here. Wostr (talk) 14:33, 14 October 2020 (UTC)

ChEBI and mapping typeEdit

The new constraint on ChEBI ID (P683) to always have a qualifier mapping relation type (P4390) is an interesting idea. When is the mapping exact? The ChEBI InChi key has to match the item key, of course. Since I'm soon done with checking all differences, it might be an idea to add exact mapping for all items with ChEBI that have a single key with the latest ChEBI release as reference, because these are the ones that were confirmed to be matching. Opinions? --SCIdude (talk) 17:44, 19 October 2020 (UTC)

As ChEBI could be our best chance for classification of chemical species, I thought that it would be good to know if there is 1:1 relation between ChEBI and WD — that is not always true, because we have sometimes IDs for zwitterion linked to regular item etc. I put this constraint with suggestion constraint (Q62026391). I usually use SKOS to indicate that the ID was manually checked and there is certainty that there is 100% equivalency between WD entry and ChEBI entry. Wostr (talk) 18:29, 19 October 2020 (UTC)
First it would be necessary to add the constraint in the examples mentioned in ChEBI ID (P683) to understand what kind of value to add to this new constraint. Then if the reference information are added based on help:sources, there is no need for additional constraint. Snipre (talk) 13:45, 23 October 2020 (UTC)

Bot to populate missing GHS data from pubchem LCSSEdit

I've noticed that a lot of chemicals are missing the GHS data (this has been a little annoying because I've written some custom software to generate labels for chemical bottles, based on data from here). I'd like to write a bot to take the GHS data (and possibly other things too?) from the pubchem laboratory chemical safety summary (LCSS) dataset, and put it into wikidata.

Unfortunately the pubchem pug_rest API doesn't seem to expose the GHS data in particular, so it would have to come from the less-structured pug_view API (or more accurately, the published dumps of LCSS pug_view data). I've already written a series of XSL transforms that take that data and turn it into something a bit more usable.

Anyway, I hope this idea is agreeable, and I am looking for some input on how to go about this without stepping on anyone's toes.

ChemHobby (talk) 06:28, 30 November 2020 (UTC)

Concerns here are mostly about duplicate items or claims so please check the existing data and property constraints first before writing. --SCIdude (talk) 07:27, 30 November 2020 (UTC)

I think, at first, I would have it only add data to items that already exist. ChemHobby (talk) 16:48, 30 November 2020 (UTC)

Yes but you shouldn't add duplicate claims to existing items, as well. Just a heads up. --SCIdude (talk) 09:14, 1 December 2020 (UTC)
No, no, no. PubChem GHS data is usually labelled with source 'Regulation (EC) No 1272/2008', but this data is not a valid EU GHS! It's more similar to US GHS than to EU GHS. There is also ECHA database (CLI) from which there is also no possibility to import correct data to Wikidata. I did not know of any database from which one can import valid GHS data to WD. Wostr (talk) 12:32, 2 December 2020 (UTC) BTW which data set from [11 available for ethanol] you would like to import? From 6 EU GHS data I see that no set is a valid EU GHS data. There are also 3 JP GHS datasets, but I can't tell right now if that is compatible with JP GHS regulations. Wostr (talk) 12:35, 2 December 2020 (UTC)
I don't understand. Can you elaborate on why the data is not valid? Surely at least the data labelled 'Regulation (EC) No 1272/2008' can go against Q2005334? ChemHobby (talk) 04:41, 3 December 2020 (UTC)
It cannot. We have GHS labelling in WD, not GHS classification. P-phrases in PubChem are automatically added in number exceeding the limit of P-phrases for EU GHS. Sometimes there are H phrases that should be omitted in labelling. Sometimes the information of additives or impurities is lacking. What's more, I don't think that data from ECHA can be legally imported to Wikidata. Wostr (talk) 15:19, 3 December 2020 (UTC)
Hmm.... What about starting with importing the signal word and pictograms, and leaving H and P phrases as unknown value for now? Then maybe later the bot can populate H/P statements by applying the appropriate rules for labelling. Or, we could take the data from the table 3.1 here which specifically includes both the classification and labelling H codes, as well as signal word/pictograms. Again P statements could be left as unknown value. ChemHobby (talk) 15:47, 3 December 2020 (UTC)
It is not possible to apply any rules for P-phrases. Current constraints for safety classification and labelling (P4952) do not permit partial labelling – quite correctly. EU GHS labelling can be added manually using proper sources or semi-automatically by making a spreadsheet with data from such sources and adding this data using QS. I know no other possibility right know for EU GHS. Wostr (talk) 13:34, 4 December 2020 (UTC)
Alright... In that case, what is a 'proper source' to use for this? ChemHobby (talk) 19:10, 4 December 2020 (UTC)
There are databases like GESTIS, safety data sheets of trusted companies. Depends on the jurisdiction, I think there are more sources available for e.g. OSHA GHS, because there are different rules for GHS in US. Wostr (talk) 04:10, 5 December 2020 (UTC)

Q3268366/Q56702552Edit

Some sitelinks needs moving. Check the labels too.--GZWDer (talk) 18:02, 15 January 2021 (UTC)

Done. --SCIdude (talk) 18:14, 15 January 2021 (UTC)

Untangling CAS IDsEdit

--GZWDer (talk) 15:48, 1 February 2021 (UTC)

Return to the project page "WikiProject Chemistry".