Wikidata talk:WikiProject Chemistry

Active discussions
Icône de rangement Old discussions are archived in Archive 2013, Archive 2014, Archive 2015, Archive 2016, Archive 2017, Archive 2018, Archive 2019.

Edits from University of CambridgeEdit

I have noticed many chemistry-related edits from IP addresses which belong to University of Cambridge. (talkcontribslogs) and (talkcontribslogs) are a couple of examples. Most of the edits involve creating new items for various polyketides. Presumably, this is some type of ongoing class project. There are also quite a few new creations of listings for polyketides from new accounts - they create the account, start one new Item, then never edit again. These are probably also students involved in the same class project. The reason I'm bringing this up, is that many (maybe most?) of the new Item creations are poorly formed. Q59295080 is a recent example. In particular, many are conflating data for chemicals with data for scientific publications in which they are mentioned. They could definitely use some training and/or guidance. Any suggestions on how to handle this? Edgar181 (talk) 14:09, 4 December 2018 (UTC)

  • I've noticed some items like this one and corrected it (niuhinone A (Q58118804), smenopyrone (Q57391881), (5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)), but I did not think that this may be some sort of a class project — but you are probably right and it may be connected to [1], [2] (cf. the last page). Honestly, I'm not a fan of any class projects involving Wikimedia, but we could try to contact professor Goodman and offer his students a help page (subpage of this wikiproject) with editing info related only to this field (i.e. how to properly add statements, which properties should be used and that scientific article and chemical compound should be separated). I can also create better SVG structures for these new items. Wostr (talk) 14:40, 4 December 2018 (UTC)
I think you have correctly identified the class project that is involved. Maybe we can ask them, at the very least, to provide Wikidata with a list of items that they have already created and to update it with new ones as they are created so that they may be reviewed. Edgar181 (talk) 15:22, 4 December 2018 (UTC)
I sent an email, I will see if I get an answer. Snipre (talk) 20:45, 4 December 2018 (UTC)
If anyone wants to have a look, it appears that all of the last several thousand edits from the IP range (search results) are related to this polyketide classwork. Edgar181 (talk) 15:42, 5 December 2018 (UTC)
I'll be happy to help in reformatting these items if you wish, later in the month when I have more time. I think these data are a valuable addition into Wikidate, as they represent manually curated, real information direct from the literature; as such they are probably the only independent source of open data on these compounds on the Web. I'll work with Dr. Goodman as needed. Walkerma (talk) 11:24, 6 December 2018 (UTC)
I'd be very happy to meet any of the people involved. This could be a good way of adding data to Wikidata. Petermr (talk) 13:05, 3 January 2019 (UTC)

List of itemsEdit

This is a list of chemistry-related articles edited from this IP /16 subnet (edit: and from many other accounts/IPs), excluding items about scientific papers, but including redirects, because target items may need some clean-up. I'll try to check and correct these items.

Item Checked? Notes To do
polyrhacitide B (Q43035170)   Checked Wostr (talk) 21:19, 18 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective Total Synthesis of Polyrhacitides A and B (Q59872751) CAS number not verified
motrilin (Q43184772)   Checked Wostr (talk) 21:43, 18 December 2018 (UTC) ids added/corrected, scientific paper data moved to Molvizarin and motrilin: Two novel cytotoxic bis-tetrahydro-furanic γ-lactone acetogenins from Annona cherimolia (Q59874494) CAS number not verified
Q43224626   Checked Wostr (talk) 17:18, 6 October 2019 (UTC) merged with pentamycin (Q7165030), ids corrected, new image added
lankanolide (Q43228554)   Checked Wostr (talk) 19:36, 6 October 2019 (UTC) ids added/corrected, new image added, scientific paper data moved to The first stereoselective total synthesis of lankanolide. Part 2 (Q69903707) CAS number not verified
(3R,4S,5R,6S)-6-(4-methoxyphenyl)-2,4-dimethylhept-1-ene-3,5-diol (Q43231506)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added
ethyl (4R,5S,6S,7R,8S,E)-5,7-dihydroxy-2,4,6,8-tetramethyldec-2-enoate (Q43235849)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added, new image added CAS number not verified
arugosin G (Q43294163)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) data added
(−)-dictyostatin (Q43297542)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC) ids added, new image added
Q43305230 (redirect)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC)
NMI-1182 (Q43376765)   Checked Wostr (talk) 22:15, 10 October 2019 (UTC) ids added, new image added
Q43389039 (redirect)   Checked Wostr (talk) 20:36, 6 October 2019 (UTC)
4-[2-(2-amino-2-oxoethyl)-2,7-dihydroxy-4-oxochroman-5-yl]-3-hydroxybut-2-enoic acid (Q43394722)   Checked Wostr (talk) 22:15, 10 October 2019 (UTC) ids added, new image added CAS number not verified
Photodeoxytridachione (Q43396443)   Checked Edgar181 (talk) 14:17, 6 December 2018 (UTC) Publication data moved to Q59459697. PubChem ID added.
Q43397060 (redirect)   Checked Wostr (talk) 22:15, 10 October 2019 (UTC) myriaporone 3 (Q27134979) corrected
thailandamide B (Q43399095)
furaquinocin B (Q43479949)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
Q43549418   Checked Egon Willighagen (talk) 09:14, 25 November 2019 (UTC) Already got merged on May 28.
Q43550570   Checked Egon Willighagen (talk) 09:14, 25 November 2019 (UTC) Already got merged on Aug 28.
Indanomycin (Q43638081)
dipentaerythritol hexapropionate (Q43653509)
D-sorbitol hexapropionate (Q43653869)
cellulose acetate propionate (Q43654570)
furaquinocin A (Q43636537)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
palmerolide A (Q43770969)
Q43772550 (redirect)
Q43775351 (redirect)
Murayaquinone (Q43871312)   Checked Edgar181 (talk) 15:04, 6 December 2018 (UTC) Publication data moved to Q59420925
Muricatetrocin B (Q43879334)
nudifloric acid (Q43879862)
Parviflorin (Q43959386)
atrovenetinone (Q44073650)
Q44083544 (redirect)
(2R,3R,4S,5R,6R)-2,6-Dimethylphenyl-6-((1S,3S,4R,5S)-1,4-dimethyl-2,8-dioxa-bicyclo[3.2.1]octan-3-yl)-3,5-dihydroxy-2,4-dimethylheptanoate (Q44099768)
Avermectin B1a (Q44107971)
Cryptosporiopsin A (Q44165697)
tupichinol A (Q44167222)   Checked Wostr (talk) 20:59, 20 January 2019 (UTC) ids added, scientific paper data removed (New flavans, spirostanol sapogenins, and a pregnane genin from Tupistra chinensis and their cytotoxicity (Q44331518) exists) no image
Linfuranone A, a new polyketide from plant-derived Microbispora sp. GMKU 363 (Q44170686)   Checked Edgar181 (talk) 19:04, 7 May 2019 (UTC) Chemical data split to Q63568786
Dihydrocitrinin (Q44171449)
Tarchonanthuslactone (Q44178369)
Stegobinone (Q44178535)
muamvatin (Q44180992)
siphonarienone (Q44184464)   Checked Walkerma (talk) 05:10, 16 January 2019 (UTC) Added IDs, new image
(+)-macrosphelide B (Q44186030)   Checked Wostr (talk) 00:03, 16 August 2019 (UTC) ids added; article data moved to Concise Syntheses of (+)-Macrosphelides A and B (Q66467255)
Q44083544 (redirect)
Phoslactomycin A (Q44188829)
Antibiotic SS-228 Y (Q44195855)
Q44195910 (redirect)
Zincophorin (Q44205464)   Checked Edgar181 (talk) 17:35, 7 December 2018 (UTC) minor changes made
Mumbaistatin (1) (Q44207859)
furaquinocin I (Q44212329)   Checked Edgar181 (talk) 13:38, 6 December 2018 (UTC) publication data moved to ChemInform Abstract: Total Synthesis of the Furaquinocins (Q59461544); image added (Wostr (talk) 20:35, 17 June 2019 (UTC)) verify CAS number
6'-Hydroxypestalotiopsone C (Q43305590)
8-O-methyl-(3S)-torosachrysone (Q43307090)   Checked Wostr (talk) 18:21, 20 June 2019 (UTC) Q44279596 merged with this item; image added, ids added CAS number not verified
Tedanolide (Q43343316)
Q43347312 (redirect)
Siphonarienal (Q44224371)   Checked Edgar181 (talk) 13:29, 6 December 2018 (UTC) Publication data moved to Q59420946
(-)-spiculoic acid A (Q44224407)
Deoxyherquienone (Q44270099)
reblastatin (Q44271895)
asperlactone (Q44275049)
Myriaporone 4 (Q44277987)
Scytophycin B (Q44278556)
Q44279596   Checked Edgar181 (talk) Publication data moved to Austrocolorins A1 and B1: atropisomeric 10,10′-linked dihydroanthracenones from an Australian Dermocybe sp. (Q59420967); merged with 8-O-methyl-(3S)-torosachrysone (Q43307090) (Wostr (talk) 18:21, 20 June 2019 (UTC)
discodermolide (Q2920456)
Spiculoic Acid B (Q44281618)
Deoxyherqueinone (Q44175462)   Checked Edgar181 (talk) 13:41, 6 December 2018 (UTC) No major problems found. Images from Commons addded.
alchivemycin A (Q44284361)   Checked Edgar181 (talk) 15:06, 6 December 2018 (UTC) Publication data moved to Q59420815 CAS number not sourced
(3S)-3,6,8-trihydroxy-3-methyl-2,4-dihydrobenzo[a]anthracene-1,7,12-trione (Q44285843)   Checked Edgar181 (talk) 13:03, 7 December 2018 (UTC) Chemical name added. Appears to be the unknown and unnatural enantiomer of rabelomycin.
tautomycetin (Q44007750)
(-)-Macrolactin A (Q44287045)
Selective Synthesis of the para-Quinone Region of Geldanamycin (Q44287100)
Myriaporone 1 (Q44287752)
Chlorotonil A (Q44288044)
(−)-dolabriferol (Q44293768)   Checked Wostr (talk) 17:52, 11 December 2018 (UTC) ids added/changed, new image added; Q59163350 has been merged into this item earlier by Edgar181 CAS number not verified, Reaxys ID not verified
carbonolide B (Q44295414)
(+)-amomol B (Q44302452)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded
Terrestric acid (Q44307000)
polypropionate (Q44320653)   Checked Wostr (talk) 20:59, 20 January 2019 (UTC) P31/P279 added, definition added
dilithium (Q1189242)
Lycogalinoside B (Q57281678)
Onchidionol (Q57395987)
decarestrictine O (Q57398017)   Checked Wostr (talk) 14:19, 9 December 2018 (UTC) scientific paper data moved to Stereoselective total synthesis of decarestrictine O (Q59582131), ids added/corrected, new image added
Aspiketolactonol (Q57402533)
YC-20 (Q57415434)   Checked Wostr (talk) 21:32, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Antibacterial activity of YC-20, a new oxazolidinone (Q59505238), new image uploaded (with the old one nominated for deletion)
(-)-BABX (Q57417167)
decarestrictine J (Q57418243)   Checked Wostr (talk) 00:32, 6 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of decarestrictine-J via Ring Closing Metathesis (RCM) (Q59484567), new image uploaded CAS numbers (2) not verified
(2Z,5R)-2-hexene-1,5-diol (Q57449957)   Checked Wostr (talk) 13:49, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to Concise total synthesis of botryolide B (Q59491952), property prediction based on structure (Q59491903) created to indicate that physical properties are not experimental but structure-derived, Commons file marked for renaming, new image uploaded
auripyrone B (Q57451341)   Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, scientific paper info moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017), new image uploaded
mycoleptone A (Q57451895)   Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected CAS number not verified
concanamycin F (Q57499711)   Checked Wostr (talk) 13:16, 6 December 2018 (UTC) ids added/corrected, scientific paper data moved to The First Total Synthesis of Concanamycin F (Concanolide A) (Q59491670), new image uploaded
reveromycin B (Q57499770)   Checked Wostr (talk) 12:54, 6 December 2018 (UTC) ids added/changed, scientific paper data moved to Enantioselective Total Synthesis of Reveromycin B (Q59491449), new image uploaded
Q57499875   Checked Wostr (talk) 00:32, 6 December 2018 (UTC) merged with decarestrictine J (Q57418243)
theonezolide A (Q57502071)   Checked Wostr (talk) 00:41, 9 December 2018 (UTC) ids added/changed, new image uploaded, P31/P279 changed, scientific paper data moved to Theonezolide A: A Novel Polyketide Natural Product from the Okinawan Marine Sponge Theonella sp. (Q59564916)
(5R,7R,9R)-7,9-dihydroxy-5-decanolide (Q57513843)   Checked Wostr (talk) 21:19, 5 December 2018 (UTC) ids added/corrected, new image uploaded
(+)-baconipyrone A (Q58688643)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded
(−)-baconipyrone C (Q43217268)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded, scientific paper data moved to Total synthesis of (−)-baconipyrone C (Q65963722)
Lagriamide (Q57540827)   Checked Egon Willighagen (talk) 16:03, 22 November 2019 (UTC) SMILES, InChI, InChIKey added
Difficidin (Q58371294)
Basiliskamide B (Q57751679)
Basiliskamide A (Q59247254)
Siphonarin B (Q58371414)
methyl 2,2-bis(3-acetyl-2,6-dihydroxy-5-methylbenzyl)acetate (Q57902075)
Caloundrin B (Q57590129)
Dalesconol A (Q57545860)
reveromycin A (Q58216964)   Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
reveromycin D (Q43578515)   Checked Wostr (talk) 15:41, 9 December 2018 (UTC) ids added/corrected, new image added
Mycoepoxydiene (Q58217607)
4-hydroxy-5-methylcoumarin (Q59293564)
Trichoharzin (Q58211897)
(-)-rasfonin (Q59247007)
Spirastrellolide F methyl ester (Q59313278)
Lasiodiplodin (Q59287150)
dothideomynone A (Q57981745)   Checked Edgar181 (talk) 16:46, 10 December 2018 (UTC) Publication data moved to Q45149416
Trichbenzoisochromen A (Q57545344)
spongistatin 1 (Q59263700)
peloruside B (Q59242781)
pironetin (Q59220488)
oxoapratoxin A (Q59241846)
Isolasalocid A (Q58839832)
Mollipilin A (Q58837425)
(11β)-11-hydroxycurvularin (Q58361196)
Bionectriol C (Q58211689)
fusarimine (Q57981114)
Q57897760   Checked Wostr (talk) 00:04, 16 August 2019 (UTC) merged with (+)-macrosphelide B (Q44186030)
methyl xylariate (Q57899491)
Purpurogenic acid (Q57748943)
Caldorin (Q57697944)
Hyaluromycin (Q57420731)
(11β)-11-methoxycurvularin (Q44297259)
archazolid A (Q44002843)
(1R-cis) - Sistodiolynne (Q44081665)
(+)-crocacin C (Q43869524)
Hirsutellone B (Q43267746)
Aloesaponarin II (Q59297186)
1,4-Dihydroxy-2-(hydroxymethyl)-9,10-anthraquinone (Q59263607)
4-epi-onchidione (Q59287996)
Mutactin (Q59115055)
2,​4-​Pentanedione, 1,​1'-​(1,​3-​dioxolan-​2-​ylidene)​bis- (9CI) (Q43146370)
poly(hydroxypropionate) (Q43042914)
luteosporin (Q58213147)   Checked Wostr (talk) 17:15, 11 December 2018 (UTC) scientific paper data moved to Genotoxicity of a Variety of Mycotoxins in the Hepatocyte Primary Culture/DNA Repair Test Using Rat and Mouse Hepatocytes (Q59633242), ids added/changed, new image added
niuhinone A (Q58118804)   Checked Wostr (talk) 01:08, 9 December 2018 (UTC) partially corrected in November (incl. new image); ids added
stevastelin A (Q59315862)   Checked Wostr (talk) 14:41, 10 December 2018 (UTC) ids added/changed, new image added, scientific paper data moved to Stevastelins, a novel group of immunosuppressants, inhibit dual-specificity protein phosphatases (Q59610748) CAS number not verified
Q59315591   Checked Wostr (talk) 01:35, 9 December 2018 (UTC) merged with pironetin (Q59220488)
smenopyrone (Q57391881)   Checked Wostr (talk) 01:31, 9 December 2018 (UTC) corrected in November (new image, ids added, scientific paper data moved to Isolation of Smenopyrone, a Bis-γ-Pyrone Polypropionate from the Caribbean Sponge Smenospongia aurea (Q58046717)); ChemSpider id added
(+)-roxaticin (Q43259451)   Checked Wostr (talk) 13:53, 10 December 2018 (UTC) ids added/corrected, new image added CAS number not verified
dolabriferol C (Q57394391)   Checked Wostr (talk) 13:28, 10 December 2018 (UTC) minor changes, ids added, new image added
dolabriferol B (Q57421096)   Checked Wostr (talk) 17:52, 11 December 2018 (UTC) ids added/changed, new image added
auripyrone A (Q57652685)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October, scientific paper data moved to Total Synthesis of Auripyrones A and B and Determination of the Absolute Configuration of Auripyrone B (Q57821017)
Zincophorin methyl ester (Q44283203)
Reveromycin C (Q57903549)
furaquinocin D (Q44258402)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
furaquinocin E (Q44107981)   Checked Wostr (talk) 17:29, 17 June 2019 (UTC) ids added, image added
Q57618038   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) merged with rutamycin B (Q27264198) in October
2-[(E,5R,6R,7R,8R)-5,7-dihydroxy-8-{6-[(2R,3S)-3-hydroxypentan-2-yl]-3,5-dimethyl-4-oxopyran-2-yl}-4,6-dimethylnon-3-en-2-yl]-6-ethyl-3,5-dimethylpyran-4-one (Q57622079)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October and remodelled as group of stereoisomers (Q59199015)
2-[(E,2S,5S,6S,7S,8S)-5,7-dihydroxy-8-{6-[(2R,3R)-3-hydroxypentan-2-yl]-3,5-dimethyl-4-oxopyran-2-yl}-4,6-dimethylnon-3-en-2-yl]-6-ethyl-3,5-dimethylpyran-4-one (Q57515147)   Checked Wostr (talk) 18:22, 11 December 2018 (UTC) corrected earlier in October
Muricatetrocin A (Q57903401)
Q43635077 (redirect)
geodiamolide C (Q44283410)   Checked Wostr (talk) 11:11, 19 June 2019 (UTC) Scientific paper data moved to Geodiamolides C to F, new cytotoxic cyclodepsipeptides from the marine sponge Pseudaxinyssa sp. (Q64711760); ids added, image added verify CAS number
Q43772940   Checked Edgar181 (talk) 18:03, 10 January 2019 (UTC) Merged into Q27106795
pteroenone (Q43563062)
untenolide A (Q44283932)   Checked Wostr (talk) 20:59, 20 January 2019 (UTC) ids added, image added CAS number not verified
massarilactone H (Q43872317)
sistodiolynne (Q43562351)
Virginiamycin M1 (Q58231308)
Xestodecalactone C (Q59158596)
Penicillolide (Q44188757)
calyculin C (Q58234458)   Checked Edgar181 (talk) 20:30, 24 February 2019 (UTC) Publication data moved to Q61861448
Molvizarin (Q43143335)
(2R,3E)-5-Chloro-N-[(2E,4R)-2,4-dimethyl-5-oxo-5-(1-pyrrolidinyl)-2-penten-1-yl]-2,4-dimethyl-N-(phenylmethyl)-3-pentenamide (Q59191782)
2-carboxyanthraquinone (Q59196332)
2-Anthraceneaceticacid, 3-acetyl-9,10-dihydro-4,5-dihydroxy-9,10-dioxo- (Q58003453)
13-hydroxypalitantin (Q44182627)
Isoannonacin (Q57617619)
Amphidinin B (Q59593833)
anthracimycin (Q14405541) (changes to existing item)
Q59315034 (redirect)   Checked Wostr (talk) 19:02, 11 December 2018 (UTC) merged to anthracimycin (Q14405541) by the author
hamigeran A (Q59315549)   Checked Edgar181 (talk) 18:07, 12 December 2018 (UTC) Additional identifiers added. Publication data at Q46864433.
Citromycin (Q15410872) (changes to existing item)
Exiguapyrone (Q44299518)
penicyclone C (Q57584186)
Siphonarienedione (Q58209983)
Scabrolide A (Q59159910)
8-hydroxygeranyl acetate (Q57984205)
Siphonarienolone (Q58840595)
6E,8E-3-hydroxy-4,6,8,10,12-pentamethylpentadeca-6,8-dien-5-one (Q58015313)
geodiamolide A (Q58191896)   Checked Wostr (talk) 11:11, 19 June 2019 (UTC) Scientific paper data moved to Stereostructures of geodiamolides A and B, novel cyclodepsipeptides from the marine sponge Geodia sp (Q64711770); ids added, image added
(E)-siphonarienfuranone (Q59295886)
Micromelone A (Q59116673)
Botcinic Acid (Q57398604)
(+)-membrenone A (Q57585250)   Checked Wostr (talk) 16:12, 17 June 2019 (UTC) scientific article data moved to Membrenones: New polypropionates from the skin of the mediterranean mollusc Pleurobranchus membranaceus (Q64689324); ids added, image added
denticulatin B (Q44176507)
(+)-macrosphelide A (Q57829724)   Checked Wostr (talk) 00:03, 16 August 2019 (UTC) ids added; article data moved to Concise Syntheses of (+)-Macrosphelides A and B (Q66467255)
pectinatone (Q44299496)
(+)-membrenone C (Q58625985)   Checked Wostr (talk) 16:29, 17 June 2019 (UTC) scientific article data moved to Total synthesis of natural (+)-membrenone C and its 7-epimer (Q64691276); ids added, image added CAS number not verifed
Exiguaone (Q58688649)
Dihydrosiphonarin B (Q59278719)
Vallartanone B (Q59310911)
(+/-)-4-O-methyl-7-deoxyaklavinone (Q58851111)
Pellasoren A (Q58241762)
Khafrefungin (Q58049114)
Onchidione (Q57394773)
(+)-polyrhacitide A (Q58635409)   Checked Wostr (talk) 21:19, 18 December 2018 (UTC) ids added, scientific paper data moved to Stereoselective total synthesis of (+)-polyrhacitide A (Q59873415) CAS number not verified
Norpectinatone (Q59295080)
(−)-membrenone B (Q58688761)   Checked Wostr (talk) 16:29, 17 June 2019 (UTC) ids added, image added
okilactomycin (Q61422890)
Amphoteronolide B (Q63212988)
(−​)​-​amomol A (Q57584266)   Checked Wostr (talk) 12:11, 28 July 2019 (UTC) ids added/changed, new image uploaded, scientific paper data moved to Forming Spirocyclohexadienone-Oxocarbenium Cation Species in the Biomimetic Synthesis of Amomols (Q65963596)

List of editorsEdit

  2. 2A00:23C5:5A0A:BA00:DD82:618D:FC4C:EC0
  3. 2001:630:212:DE0:117D:A5AF:2C8B:F0AB

Need helpEdit

By comparing different databases about (+)-epibatidine (Q423783), I found discrepancies for the name of the 2 stereoisomers:

InChIKey PubChem CHEMBL Drugbank ChemIDplus Guide to Pharmacology Ligand ID Reaxys ChemSpider
NLPRAJRHRHZCQQ-IVZWLZJFSA-N (-)-Epibatidine (-)-EPIBATIDINE (+)-epibatidine Epibatidine - epibatidine (6633501) (+)-Epibatidine
NLPRAJRHRHZCQQ-UTLUCORTSA-N (+)-Epibatidine (+)-EPIBATIDINE - - (+)-epibatidine (-)-epibatidine (5811732) (-)-Epibatidine

For some databases NLPRAJRHRHZCQQ-IVZWLZJFSA-N is (+)-Epibatidine, for others it is (-)-Epibatidine. Who is right ? Snipre (talk) 19:11, 29 August 2019 (UTC)

A small effortEdit

If you have a little time, please help to solve the following constraint violations. If we can solve all these violations, we will be able to extract data from databases using InChIKey as matching criterion. Thanks Snipre (talk) 19:15, 29 August 2019 (UTC)

Case 1: potassium hydrideEdit

What is the correct representation of potassium hydride: as a salt composed of potassieum as cation and hydrogen as anion or as molecule with a covalent bond ?

InChIKey PubChem CHEMBL ChEBI ChemIDplus ChemSpider Reaxys ...
OCFVSFVLVRNXFJ-UHFFFAOYSA-N 82127 - - - 74121 ? (?) ...
NTTOTNSKUYCDAV-UHFFFAOYSA-N - - 32589 7693-26-7 16787786 ? (?) ...

Snipre (talk) 19:37, 29 August 2019 (UTC)

It has ionic character, so the salt representation is better. However, I think that both IDs should be kept in an item, but one deprecated with reason for deprecation (P2241) incorrect structure of molecular entity (Q52679949). Wostr (talk) 22:23, 10 October 2019 (UTC)

PubChem depositEdit

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Samuel Clark
Tris T7
  Notified participants of WikiProject Chemistry Hi all, I want to let everyone know that I have initiated uploading the chemicals from Wikidata to PubChem. This will create a further route to crosslink the databases (Wikidata and Wikipedia already link to PubChem, Wikipedia is actively being deposited in PubChem). Now, Wikipedia != Wikidata and uploading Wikidata separately actually has additional advantages, such as further validation reports. I already fixed a number of SMILES errors found by PubChem and not by the Chemistry Development Kit. It also reports duplicated, and a lot more. I will upload the report somewhere as soon as I have it. I have created a script to create an input CSV file ( More later. --Egon Willighagen (talk) 16:18, 22 September 2019 (UTC)

Update: the first deposit is committed and now up for review with PubChem curators. I got two reports, but neither contain the external identifier, so I need to combine these with the input first before they are useful. More later. --Egon Willighagen (talk) 17:22, 22 September 2019 (UTC)
Update: and here are the reports (created with --Egon Willighagen (talk) 18:41, 22 September 2019 (UTC)
I am having trouble following. I think you are saying that currently Wikidata items and PubChem items map to each other on the wiki side, but not on the PubChem side, and you are sharing information on the PubChem side so that people can start there and navigate to wiki. If this is correct, then that seems great.
Currently you are treating Wikidata and Wikipedia as different entities because even though Wikidata and Wikipedia link to each other, their content is different enough to justify two links. Also, the PubChem community is unlikely to know how to readily move from one to the other, so that is another reason for two links. You shared your mapping software in GitHub. You have a log of error reports published in a table on wiki.
This all seems useful, so great. Blue Rasberry (talk) 15:26, 23 September 2019 (UTC)
@Egon Willighagen: If you have good contact with PubChem, could you asked them to generate a subset of their data containing PubChem CID, InChI, InChKey and SMILES under CC0 ? MAin argument: if all databases are doing the same, WD can becomes the way for databases to access to chemical IDs in other databases.
Currently only DrugBank played the game. Snipre (talk) 11:52, 27 September 2019 (UTC)
Yes, will ask Evan soon. We'll both be at the Beilstein Open Science meeting. In the past the answer was: PubChem is public domain and cannot have a CC0 license/waiver (which claims ownership). The other problem is to determine which parts of PubChem are public domain, and which are owned by the data provider :( --Egon Willighagen (talk) 17:55, 27 September 2019 (UTC)
@Snipre: I have spoken with Evan and they are working on rolling out license info annotation of all sources they are incorporating. This will allows is to distinguish to pure PubChem data (public domain) from the external data, and in that case, under what license. Now, as Evan indicated, the external chemistry sources (that submit data) are not very good at tracking the license, and often the include data that actually came from a third party, so PubChem's work on the license provenance is a slow and hard process. --Egon Willighagen (talk) 07:34, 18 October 2019 (UTC)
@Egon Willighagen: Thank you. But after a check all data generated by PubChem are under the public domain. So InChI, InChIKey, SMILES and PubChem CID are free, this is the most important thing for me. Snipre (talk) 13:23, 4 November 2019 (UTC)

New consistency testsEdit

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Samuel Clark
Tris T7
  Notified participants of WikiProject Chemistry I have put two additional consistency tests online on our research group Jenkins server: The two new tests compare the canonical and isomeric SMILES with the InChIKey provided by Wikidata item. For the canonical SMILES it compares the first InChIKey block (see these fails). For the isomeric SMILES it compares the full InChIKey (see these fails). --Egon Willighagen (talk) 07:57, 27 October 2019 (UTC)

@Egon Willighagen: Not sure your strategy is the good one: I would prefer to fix the InChIKey and extract the corresponding InChI and SMILES from an authoritative source. Then using the InChIKey we can extract different IDs from open databases and check if the existing IDs in WD are correct.
I am currently extracting PubChem data using InChIKey (slow process, today 40k over 160k items are in my computer) and I plan to replace all InChI and SMILES using that extraction. This is a solution which can hurt some people but I don't want to start to check unreferenced statements or data imported from WP. Snipre (talk) 16:12, 29 October 2019 (UTC)
It's just recording the inconsistencies. It doesn't say what the proper fix it. I think your work will solve at least some of the problems (and I hope it doesn't introduce others :). Mind you, the inconsistency may likely reflect not just that data, but very likely also other identifiers. But certainly also physchem properties and sitelinks should be checked. It's hard to automate this, which is exactly why the tests do not fix anything and only flag. --Egon Willighagen (talk) 08:37, 1 November 2019 (UTC)
@Egon Willighagen: Got it. In that case, it is more interesting to have a tracking of those numbers in order to see evolution over the time. Snipre (talk) 13:26, 4 November 2019 (UTC)


Hi there, in wikipedia articles on proteins the wikidata description and the first statement ("instance of") is about genes (in german: Gen, in french: gène), for example here. There are almost no articles on genes in any wikipedia, as the gene codes for a protein, which causes the phenotype and does the work in a cell. Articles on wikipedia are almost always about the protein. Besides this inconsistency, we sporadically receive complaints that the wikidata description in the wikipedia article is wrong. Is it possible to have the item descriptions and the "instance of"-statements changed from gene to protein (in german: Protein, in french: protéine) by bot? All the best, --Ghilt (talk) 11:13, 29 October 2019 (UTC)

  1. In WD there should be at least two different items: one about a gene and one about a protein — HMGCR (Q14864139) + 3-hydroxy-3-methylglutaryl-CoA reductase (Q415607) (see encodes (P688) statements in gene items). You can move the sitelinks from one item to another, but descriptions and statements in WD are correct and should not be changed to match Wikipedia articles.
  2. About descriptions from WD in Wikipedia: someone thoughtlessly used the description from WD as a description of a Wikipedia article in mobile version of Wikipedia. Descriptions in WD are not meant to be descriptions of Wikipedia articles nor short definitions. Description in WD is a short phrase designed to disambiguate items with the same or similar labels.
Wostr (talk) 12:05, 29 October 2019 (UTC)
@Wostr:: thanks for the ping, it helps to accelerate the discussion, as i usually am here sporadically. Ad 1) no problem with separate items for gene and protein. But the inconsistence still exists that the wikipedia articles on proteins are connected via the gene items, not the protein item. Ad 2) the WD description is also shown in the desktop version of the articles, e.g. de:HMG-CoA-Reduktase, not just the mobile version. --Ghilt (talk) 12:19, 29 October 2019 (UTC)
@Ghilt: I don't see 'Gen der Spezies Homo sapiens' in de:HMG-CoA-Reduktase anywhere. Could you point at the exact place where it is shown? I don't know if there are any guidelines in Wikidata:WikiProject Molecular biology regarding sitelinks from gene/protein item to Wikipedia, but IMHO if the Wikipedia articles describe a protein then sitelinks should be moved from gene item to protein item (importScript( 'User:Matěj_Suchánek/moveClaim.js' ); may be used for this). Probably each item should be considered individually. Wostr (talk) 12:49, 29 October 2019 (UTC)
WD is responsible to clear the definition of the items to allow a correct use of the items. But the interwikis is not the main responsibility of WD: WD doesn't know what is written in the WP articles so this is the task of the WP to check the correctness of link between WD and WP. Snipre (talk) 16:02, 29 October 2019 (UTC)
Hmm, i can see 'Wikidata: HMGCR (Q14864139), Gen der Spezies Homo sapiens, alternative Bezeichnungen: keine' and the complaining IP on the talk page was mobile. What can i do to get it corrected. And how can i help (i don't have a bot)? --Ghilt (talk) 22:06, 29 October 2019 (UTC)
@Ghilt:, if the sitelinks in HMGCR (Q14864139) refers to the protein, then you can move them from HMGCR (Q14864139) to 3-hydroxy-3-methylglutaryl-CoA reductase (Q415607) or to other item. Generally, if Wikipedia article does not correspond to the WD item — look for item that better matches the article and move sitelinks to that item. If it's a problem with many articles describing genes/proteins – write here, maybe someone from there will help with a bot. Wostr (talk) 00:01, 30 October 2019 (UTC)
@SCIdude: Perhaps you can be interested in that discussion. Snipre (talk) 13:19, 4 November 2019 (UTC)

Thanks. Yes, sitelinks are often on genes, even if the articles are about proteins (WP writers want it all in one). The solution is to have separate concepts in WD and a WD item that collects them all and gets the sitelinks, example: insulin (Q70598743). I made the proposal at enwiki but no one cares, it's a WD issue, anyway. Maybe we can agree to it? It would also resolve the original poster's problem when implemented, and the implementation could be automatized. --SCIdude (talk) 13:59, 4 November 2019 (UTC)

I've checked 20 protein articles on de.wp from different protein families and 19 had the wikidata description "gene". So, i guess around 95 % of the 2372 protein articles in de.wp have the wrong description. Should i contact the Wikidata:WikiProject Molecular biology concerning the correction? --Ghilt (talk) 10:37, 6 November 2019 (UTC)
@Ghilt, SCIdude, Wostr: I think this problem should be addressed to Wikidata:WikiProject Molecular biology and/or Wikidata:WikiProject Gene Wiki. Snipre (talk) 13:22, 6 November 2019 (UTC)
ok, a thread was opened here: Wikidata_talk:WikiProject_Molecular_biology#Correction_of_Wikidata_descriptions_of_Wikipedia_protein_articles, --Ghilt (talk) 14:08, 6 November 2019 (UTC)
@Snipre: I agree regarding the genes but chemistry has the same problem with conflations, so what's your opinion on nitrite (Q72158415) or acetylleucine (Q72282660) or 7-methylguanosine (Q72286919) or tyrocidine (Q72370012)? --SCIdude (talk) 14:12, 6 November 2019 (UTC)
I always add sitelinks to the item that's the closest to the concept described in Wikipedia articles. The problem with items like above is that Wikipedia articles are not directly connected to any 'true' item, so the import of data from Wikidata to Wikipedia is not easy. Usually, data provided by Wikidata in Wikipedia infoboxes was correct for such articles. Wostr (talk) 19:26, 6 November 2019 (UTC)
But AFAIK infobox template writers already deal with articles having gene or protein WD items (by following the encodes/encoded by claims), so theoretically they should be able to handle it. Agree that this should be made as easy as possible as a first step. --SCIdude (talk) 07:47, 7 November 2019 (UTC)
@SCIdude, Wostr: I have the same opinion than Wostr: the items you created are kind of original research based on WP articles. But as WP claims it is not a source, we can't used those article as reference. WD has a lower granularity than WP and WD is not depending on WP: so if WP contributors want to merge several topics in article, WD has no obligation to create an item in order to match WP reality. Using lua or any good wikicode, it is possible to extract data from different WD items to display them in one WP article, so there is no need to create those mentioned items. Please delete them. Snipre (talk) 14:03, 11 November 2019 (UTC)
Even if I agreed, where to put the sitelinks? As to own research, I can put references on the claims, no problem. --SCIdude (talk) 14:36, 11 November 2019 (UTC)
@SCIdude: If you want a fast answer, please ping. I am curious to know what kind of reference you can add to justify a concept mixing an ion, a group of salts, a group of organic chemicals and a class of compounds. I hope you don't plan to use the English article, because you will forget the main rule of WP itself: WP is not a source.
I return you the question: how do you plan to handle the differences in the coverage of the Wikipedia articles ? For example, if I take the French article Nitrite, it covers the ion and the salts concept, but not the esters compounds. To be coherent you should create a new "amalgam" item just for the French article. If you plan to refer only the en:WP, this would be a cultural bias, and this is not admissible.
Then to respond to your question, just select the more appropriate item, and if you can't or don't want to choose, you can let the different WPs choose what is the best solution according to their understanding of their articles by putting a message in the relevant Wikiproject Chemistry.
This is not the goal of WD to handle the mess of the different WPs. Snipre (talk) 14:30, 30 November 2019 (UTC)

import of physiological itemsEdit

Last weeks I fixed some ChEBI issues and added classes. I'm now ready to start importing >2.5k ChEBI substances/ions (and later 300 more classes) that will then later (with all others) be linked from protein molecular function and processes items. The substances/ions items will get on creation: instance-of classes, aliases, InChi key, and possibly Beilstein and Reaxys ids, if available, from ChEBI. Just to not completely surprise you. --SCIdude (talk) 09:17, 9 November 2019 (UTC)

Are you sure that the items you're going to import won't be duplicates of existing items? Last month we had at least few hundred items created with incorrectly checked CAS numbers and many of them were duplicates. Wostr (talk) 16:52, 9 November 2019 (UTC)
Last week I already added ChEBI ids to all items that had an InChi key but no ChEBI id. All of the new compound items have missing ChEBI id with InChi keys. So there can't be duplicates (assuming every compound has a key). As to the classes I do them manually and start with a search because any hit reduces my workload. --SCIdude (talk) 18:08, 9 November 2019 (UTC)

A lot of duplicate dataEdit

Since several weeks a lot of duplicated data were generated. I don't want to blame anyone, I just want to remind that a check if necessary after a merge of the addition of data.

See constraint violation reports for

Most of those problems are corrected after some days, but please have a look. Snipre (talk) 14:30, 11 November 2019 (UTC)

Mmmm... a lot of new chemical entries with very minimal information and indeed many duplicate CAS registry numbers. Not so happy about this either. It has been brought up, but it's not clear what the situation of resolving the problems is. --Egon Willighagen (talk) 15:02, 22 November 2019 (UTC)
The current situation is that we have a lot of duplicates and we have to merge then manually. The format of CAS numbers in these new items have been corrected, so some items can be quickly merged, but because some chemical compounds may have more than one CAS number, there may be items that are in fact duplicates, but won't show on any constraint violations list and it will be problematic to find those duplicates. Wostr (talk) 16:00, 22 November 2019 (UTC)
Note the conflict reports are somewhat behind. Also I went through all InChi key duplicates and had to leave those pairs that were tautomers (I marked them), because InChi keys for tautomers apparently can be (are?) identical. The actual numbers from fresh queries are:
  • InChi: distinct 18 (report 5+1), single 26 (report 33)
  • CAS: distinct 400 (report 536), single 87 (report 91+8)
  • InChi key: distinct 27 (report 28+2), single 26 (report 32)
With this query I count 13 tautomer pairs that have identical InChi keys, so I'll go through the others again:
SELECT DISTINCT ?item1 ?item1Label ?item2 ?item2Label ?value 
	?item1 wdt:P235 ?value .
	?item2 wdt:P235 ?value .
       ?item1 wdt:P6185 ?item2 .
	FILTER( ?item1 != ?item2 && STR( ?item1 ) < STR( ?item2 ) ) .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
--SCIdude (talk) 17:08, 23 November 2019 (UTC)
Standard InChI/InChIKey is identical for tautomers, but InChI software can produce a non-standard versions of InChI/InChIKeys. However, I don't know of any software that can easily generate non-standard InChI – if we have one, we could change single value constraint (Q19474404) in InChI (P234) to single best value constraint (Q52060874) or better update it with separator (P4155) so as to we could have both InChI in one item with a qualifier that describes if it's a standard or non-standard InChI. Wostr (talk) 18:21, 23 November 2019 (UTC)
The Chemistry Development Kit (Q2383032) can do this. I can make a script for this. --Egon Willighagen (talk) 20:28, 23 November 2019 (UTC)
How it could work, i.e. how we could generate non-stanard InChI/InChIKeys with it? (I'm not very good at technical things; is it w software that anyone can run?) Wostr (talk) 15:49, 30 November 2019 (UTC)


@Wostr, Egon Willighagen, SCIdude: By creating dedicated items for different tautomers or zwitterionic forms, and adding all identifiers to all tautomer/zwitterion forms, we are generating contraint violations for most identifiers. How can we handle that problem ?

Some solutions:

  1. Put all constraint violations related to tautomers and zwitterion in the exception list
  2. Between the different forms and according to a defined set of criteria, choose one form which will because the chemical compound and the other forms will be defined as instance of tautomer/zwitterion. All undefined identifers will be linked to the chemical compound item, with all general properties.

The second one is the best according to my opinion because we avoid to work with 2 items at the same time: most of the time we only have data for undefined tautomer/zwitterion form. Snipre (talk) 15:01, 30 November 2019 (UTC)

We are not the only ones having separate entries, ChEBI has too, so with solution 2 you need to decide which ChEBI id to link or get constraint violations with two ids. You could also remove some constraints as a different solution. --SCIdude (talk) 15:20, 30 November 2019 (UTC)
While I am not arguing against the concerns, I have mixed feelings about not allowing tautomers and zwitterions. Particularly tautomers have different physchem properties, and even zwitterions can be linked to experimental data (e.g. crystal structures). I also do not currently have a good suggestion. One issue is that tautomers are ill defined, and particularly in the context of the InChI(Key), where the algorithms has it limitations. --Egon Willighagen (talk) 15:32, 30 November 2019 (UTC)
@Egon Willighagen: Nobody was proposing to ban the creation of item for tautomer or zwitterion: the discussion is to find a good way to integrate those particular cases in WD. Snipre (talk) 16:29, 30 November 2019 (UTC)
  • Which ids cause constraint violations? I know that InChI/InChIKey does, but that problem requires finding a way to generate non-standard InChI/InChIKey. The standard InChIs/InChIkeys should be present in both items – neither InChI nor InChIKey is 100% unique for chemicals. For zwitterions: instance of (P31) zwitterion (Q245115)/ subclass of (P279) zwitterion (Q245115) should be always present and for zwitterions you can tell if an ids refers to the neutral/zwitterionic form by SMILES/systematic name for example. For carbohydrates (chain/ring structure): the same, InChIs are different, SMILES are different, even systematic names are different. For compounds with mobile-H: usually the same.
    The real problem with tautomers is the InChI/InChIKey, but that's not only our problem, it's the problem of the standard configuration of InChI software and it's a known issue that is solvable by generating non-standard InChI/InChIKey. Then we only have to decide what to do with two InChI values in one item (deprecate StdInChI, prefer NonStdInChI etc.). Wostr (talk) 15:45, 30 November 2019 (UTC)

@Wostr, Egon Willighagen, SCIdude: I will try another approach:

Zwitterion case:

Always consider the neutral form as the chemical compound form. A second item for the zwitterionic form can be created with the following properties

Neutral form Zwitterion
instance of: chemical compound instance of: zwitterion
All IDs and properties for the neutral form, for mixtures of neutral form and zwitterion form or undefined form (Sdt InChI and InChIKey) IDs and properties only for the zwitterion form (non-standard InChI and InChIKey)

Tautomer case:

The most stable form or the form which is present in excess is defined as the form A in standard conditions. The other form, is defined as Form B.

Form A Form B
instance of: chemical compound instance of: tautomer
All IDs and properties for form A, for mixtures of A and B forms or undefined form (Sdt InChI and InChIKey) IDs and properties only for the form B (non-standard InChI and InChIKey)

Snipre (talk) 16:42, 30 November 2019 (UTC)

No objection from me. Implementation of the zwitterion case can be automated if the compounds are in ChEBI (ChEBI explicitly names zwitterions). Additionally, a metaclass "class or group of zwitterions" may be needed, ChEBI has a hierarchy for them. --SCIdude (talk) 17:05, 30 November 2019 (UTC)
I can't agree to everything above. StdInChI is valid for both forms (neutral and zwitterionic) and we should find a way to model this properly, Non-standard InChI is an addition that may help in distinguishing the forms, but is not a substitute. instance of (P31) tautomer (Q334640) for only one tautomer is also wrong; both are tautomers in the same way of each other; also, as tautomer of (P6185) is present, I don't think we need to explicitly classify compounds as tautomers (similarly, we don't classify compounds as stereoisomers); both should be classified according to its structure etc. I can agree to that part 'all IDs and properties for form A, for mixtures of A and B forms or undefined form' with an exception for cases (if there would be any such cases) when ID clearly distinguish form A/form B/mixture of A and B/undefined form. Also, there may be situations when we should keep an ID with a deprecated rank in one item and have it in a second item with a normal rank. 'Additionally, a metaclass "class or group of zwitterions" may be needed' is not needed — zwitterionic form has a charge of 0, so I don't think we need to classify them in a different way as chemical compounds (only instance of (P31) zwitterion (Q245115)/ subclass of (P279) zwitterion (Q245115)). Wostr (talk) 20:29, 30 November 2019 (UTC)
@Wostr: The problem of the StdInChI is applicable to most identifiers: so why do we have to treat StdInChI in a particula way ? We have to find a solution for all identifiers.
Then can both tautomers be a chemical compound or will tautomer be a subclass of chemical compound ? This more critical in term of ontology.
In anyway, we can't treat both tautomers in the same way, or we will have to create a third item which will be tautomer undefined. Snipre (talk) 11:46, 1 December 2019 (UTC)
tautomers in the same way – in regards to classification; classifying only one tautomer as tautomer is not correct, classifying both seems redundant to me (these items already have tautomer of (P6185)). I asked, which IDs are causing problems similar to InChI/InChIKey? Because I think most of the problems can be solved only by checking the data in the source: we have DTXSID50274234 in pyridine-3,4-diol (Q74411505) and 3-hydroxypyridin-4(1H)-one (Q27891533), but the source clearly states the IUPAC name, has structure shown, has SMILES. If we have a real problem in which the source has e.g. IUPAC names for both tautomers, SMILES for both etc., we can either move the IDs to the prevalent form, or (IMHO better option) deprecate the IDs in the less common form with proper reason for deprecation (P2241). Wostr (talk) 15:01, 1 December 2019 (UTC)
@Wostr: This is perhaps not correct in an ideal classification but we need a pragmatic solution. So please provide a complete solution to my question regarding how do you plan to link the tautomers to higher classes ? Do you plan to define both tautomer as instanc of chemical compound or any subclass of chemical compound ? This is not correct because both tautomers are not different chemical compounds.
And following your proposition for IDs, this means we will have for the same chemical a splitting of the IDs between 2 items, this reducing the capacity of connections of external databases through an unique WD item, especially when external databases are not defining different ID for tautomers. Snipre (talk) 14:43, 13 December 2019 (UTC)
@Snipre: I though I answered this, but apparently it has not been saved. I don't think we need any special solution regarding tautomers in regards to their classification, any tautomer should be classified according to the structure and/or other qualities. E.g. carbohydrates have 'group of isomers' items, then can be linked to carbohydrates (there could be also link to specific classes of heterocyclic compounds for closed ring forms and aldehydes/ketones for open chain forms etc.). In Wikipedias there was always problem with categories for compounds having different tautomeric forms — which category should be assigned. Here we can assign different classes for different tautomeric forms. This is not correct because both tautomers are not different chemical compounds — this is not so obvious, tautomers are defined simply as 'isomers' with one specific feature that are 'readily interconvertible'. this means we will have for the same chemical a splitting of the IDs between 2 items, this reducing the capacity of connections of external databases through an unique WD item – we already have this in items for which an external, reliable source incorrectly gave an ID which is correct for other chemical compound (such statement is deprecated in WD, but still an ID exists in two items). This is not something that should occur frequently, but is unavoidable. We just have to limit this to cases where it is necessary and mark such statements clearly (qualifier, rank). Wostr (talk) 17:42, 5 January 2020 (UTC)
  • This is not correct because both tautomers are not different chemical compounds — this is not so obvious, tautomers are defined simply as 'isomers' with one specific feature that are 'readily interconvertible'.
This is not correct if you consider the fact that chemical compound is a subclass of chemical substance and if you consider the definition of chemical substance: "Matter of constant composition best characterized by the entities (molecules, formula units, atoms) it is composed of. Physical properties such as density, refractive index, electric conductivity, melting point etc. characterize the chemical substance."
Using the inheritance property of subclass relation, chemical compound should have defined physical properties. Isolated tautomers don't exist but in most cases, the equilibrium beteween tautomers favors one form. Based on that reasoning I continue to say that one form is a chemical compound, the most thermodynamically stable one, because properties measured are mainly resulting of that form, and the second form should only defined as tautomer because it is a kind of hypothetical chemical compound (exists, but not isolable). As simple rule, for keto-enol tautomers, we should define keto tautomers as chemical compound and enol as tautomers only, as keto are the most stable form.
  • this means we will have for the same chemical a splitting of the IDs between 2 items, this reducing the capacity of connections of external databases through an unique WD item – we already have this in items for which an external, reliable source incorrectly gave an ID which is correct for other chemical compound (such statement is deprecated in WD, but still an ID exists in two items)
This way of doing is just a propagation of errors and incoherences. Wikidata is not only a simple compilation of data, but should generates an ontology and should be able to provide a logic for machines. This implies to not only observe and mark errors but to try to correct them by alerting the databases and spotting the problem to their attention. Snipre (talk) 04:26, 5 March 2020 (UTC)
Ad 1: Using your argumentation your proposal that one tautomer should be an instance of chemical compound and the other(s) should be instance(s) of tautomer is not correct, because the chemical compound being a hybrid (in fact a mixture) of tautomers should be an instance of chemical compound and every tautomer an instance of tautomer – as you never have a 100% pure substance composed of only one tautomer. Using your proposal for simple annular tautomers may seem simple, but in which phase/conditions you want to measure which one tautomer is prevalent? It seems not so simple for carbohydrates: open chain-ring tautomers – which one is prevalent and why?
Ad 2: As I said, having the same IDs in more than one item is unavoidable (not necessarily for tautomers, but in general), so if there is a need in a particular item describing tautomer to add ID that is added somewhere else, the only thing we should care about is to properly describe the situation using qualifiers and ranks. Wostr (talk) 13:37, 5 March 2020 (UTC)

Non-standard InChIEdit

ChemSpider do have non-standard InChIs/InChIKeys (don't know, however, with what options), but there is no entries for tautomers (at least not for the few I checked). Wostr (talk) 22:40, 10 December 2019 (UTC)


Hi all, a quick heads-up. PubChem (Q278487) has released a subset of notable compounds. Tier0 set contains 360 chemical compounds ("compiled from 8 categories: AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse", see I have started adding compounds from this set to Wikidata (with permission from Emma and Evan; tho I requested making it CC0), limited currently to neutral compounds where all stereocenters have defined parity. I am using my createWDitemsFromSMILES.groovy script. From the data from Tier0, I use the "compound name", SMILES, and PubChem CID. From the SMILES, the script calculates the InChIKey. The latter and the PubChem CID are used to detect of the compound is already in Wikidata. If not, the script creates QuickStatements (but, as said, only for neutral compounds with full stereo defined). I've started yesterday and doing this in batches, to keep an eye out on what happens. There is a significant number of CREATEs that fail, which is likely due to the name or the SMILES being too long (I have yet to verify this). --Egon Willighagen (talk) 15:13, 22 November 2019 (UTC)

In the last 4 days the QS error rate was relatively high. I was warned by GWDZ a few days ago that this happens because I had batches running. But even today I had up to 4% error rate in my batches. Just FYI. --SCIdude (talk) 16:35, 23 November 2019 (UTC)
Please point me to the problems you found. I will check then if it is my (and if so, fix stuff). --Egon Willighagen (talk) 20:29, 23 November 2019 (UTC)
The discussion is here. My affected batches are molbio related. --SCIdude (talk) 07:22, 24 November 2019 (UTC)
Ah, interesting. I thought it was the length of some IUPAC names :) Thanks for the heads-up. The errors are not a big issue for me. I'll just rerun the job, and it will pick up the failed entries automatically (i.e. notice the compounds are not there). --Egon Willighagen (talk) 08:18, 24 November 2019 (UTC)
The items of the collection were not chosen expertly. They e.g. picked (2S,3S,4S,5R,6R)-6-[(2R,3R,4R,5R,6R)-3-acetamido-2,5-dihydroxy-6-sulfooxyoxan-4-yl]oxy-3,4,5-trihydroxyoxane-2-carboxylic acid (Q76001893) for its name (chondroitin sulfate), an obscure entry that poses for Q75014826. I'll rename to the IUPAC. --SCIdude (talk) 15:55, 24 November 2019 (UTC)
That is not entirely correct. It was done automatically, but based on the amount of external data for that PubChem entry. So, it selects on *least* obscure, according to external database. We can ask Evan for details. --Egon Willighagen (talk) 20:49, 12 December 2019 (UTC)

GHS labellingEdit

We have now a problem like here. user:Wikisaurus added a lot of statements that are (1) incomplete (what's make them incorrect, because lack of H-phrases or P-phrases makes an impression that there are no such phrases for specific substance, (2) the source of this is [6], not even a consolidated version, but still improper source for labelling for obvious reasons (lack of P-phrases and the fact that harmonised classification and labelling is not always the prevalent). Wostr (talk) 23:05, 1 December 2019 (UTC)

This problem has been quickly solved by Wikisaurus. Wostr (talk) 20:17, 4 December 2019 (UTC)

Images of chemical substancesEdit

Hi! How should images for chemical substances be specified? I believe there is 4 main types of images, see above. Many 3D schemes (both balls-and-sticks and spacefills) where imported from Dutch Wikipedia by @Multichill:, and they were added to image (P18), but maybe it is better to have them in chemical structure (P117), as they are really just another representations of the same thing as 2D schemes? Probably with different qualifiers to distinguish them from 2D schemes? Wikisaurus (talk) 18:41, 4 December 2019 (UTC)

The only valid type of image in chemical structure (P117) is chemical structure (drawn according to the IUPAC recommendations), not 3D representations of structures, which are IMHO quite useless (+some of them are not correct) and mainly act as decorations in the Wikipedia articles. The rest should be put in image (P18), usually I add media legend (P2096) and depicts (P180) (like in Q418425#P18) to the images of samples of compounds (without it, retrieving the proper image of the sample of a chemical compound to be used in e.g. Wikipedia infobox would be impossible. You could add depicts (P180) ball-and-stick model (Q905563) as a qualifier (or depicts (P180) molecular model (Q2196961), or some new item describing VdV model or something) to 3D models, but honestly, it seems to me like a waste of time (because, as I wrote before, these models are usually just pure decorations). Wostr (talk) 20:16, 4 December 2019 (UTC)
PS If someone would like to do it and have enough time, it is possible to propose to create another property for molecular model (Q2196961) images only, but I'm not sure it's worth it. However, it would surely be easier to retrieve proper images for the Wikipedias and other projects. Wostr (talk) 20:20, 4 December 2019 (UTC)
Thanks for the idea, it sounds good to create a separate property and use depicts (P180) with molecular model (Q2196961) or space-filling model (Q900806) on it. And, well, I do not think it is a waste of time, someone should someday sort out samples and models anyway. Wikisaurus (talk) 21:26, 4 December 2019 (UTC)
Wikidata:Property proposal/Natural science. The biggest problem would be to move all 3d models to the new property, but maybe most of them could be moved using bot/QS, because many have 'vdW', 'ball-and-stick', 'model', '3D', 'spacefill', 'model', 'sticks' etc. in filenames. Wostr (talk) 21:44, 4 December 2019 (UTC)
Wikidata:Property proposal/molecular model. Wostr (talk) 22:22, 4 December 2019 (UTC)

Property for substructures in items describing classes of chemical compoundsEdit

I was looking for a way to properly describe structural class of chemical compounds (Q47154513), i.e. something language independent and independent from using external-ids for definitions. Right now for some classes we have definition put into the description or we have definition in GoldBook ID (or other database like ChEBI); for the rest, we don't have any definition or we have only Wikipedia sitelinks... sometimes with different definitions. However, it's not possible to maintain huge classification tree (right now, we have almost 2,5k compound classes and at least a few thousand items that should have instance of (P31) structural class of chemical compounds (Q47154513), but are only classified either as a subclass of a chemical compound, or of a chemical substance).

I'm thinking about proposing a new property for SMARTS line notation; it's an extension of SMILES (and it's not very popular, not like SMILES) intended to use for describing molecular patterns, not specific structures of chemical compounds. As InChI Trust stated on its website substructure searching (...) is beyond the mission of the InChI project, so we can't hope that it will be possible to use an official method like InChI for this. SMARTS is a bit harder than SMILES and not every chemical software support this, but there is free SMARTSviewer (there is a way to have a format URL for this: , but it doesn't work as intended – it downloads correct png file, but without the .png).

By using SMARTS one can describe and distinguish:

  • primary amine: [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6]
  • secondary amine: [NX3H1!$(NC=[!#6])!$(NC#[!#6])]([#6])[#6]
  • tertiary amine: [NX3H0!$(NC=[!#6])!$(NC#[!#6])]([#6])([#6])[#6]
  • primary aromatic amine: [NX3H2!$(NC=[!#6])!$(NC#[!#6])]c
  • ketone: [#6][CX3](=O)[#6]
  • aldehyde: [CX3H1](=O)[#6]

and so on... This examples can be used e.g. in PubChem to search for structures having this as a substructure. Also, this notation may help in classification of compounds we have in WD in the future.

What do you think about introducing this to items about structural class of chemical compounds (Q47154513)? Or maybe there is a better method? Wostr (talk) 19:46, 6 December 2019 (UTC)

SMARTS is certainly something to add, but you could also put some of the classes in a hierarchy by fully importing ChEBI. That is on my list but not soon. --SCIdude (talk) 07:54, 7 December 2019 (UTC)
  Support --Egon Willighagen (talk) 19:38, 12 December 2019 (UTC)

Citation needed for chemical propertiesEdit

Is there any objection for adding property constraint (P2302) citation needed constraint (Q54554025) to physical, chemical and biological/toxicological/safety properties (→Wikidata:WikiProject Chemistry/Properties)? Or at least property constraint (P2302) citation needed constraint (Q54554025) / constraint status (P2316) suggestion constraint (Q62026391)? It is now impossible to check new additions of such values that may be incorrect (e.g. as a result of vandalism), it would however not affect the regular additions as a result of Wikipedia infobox imports. What's more, in the distant future of WD every such value should have a source, so why don't start asking for sources right now? Wostr (talk) 20:10, 6 December 2019 (UTC)

I like that, I think. I was thinking of something around #1lib1ref for this too. I guess these would make them easy to find too? --Egon Willighagen (talk) 17:24, 13 December 2019 (UTC)

>14,700 new chemical entities without stereo/group statusEdit

You may or may not have noticed the activity of User:Zcp3000 who created >14,700 natural product stubs with InChi key and Pubchem CID as instances of chemical entity, this since Nov-14. Which seems not wrong of course but could be improved. Is there a way to get the number of undefined stereo centers from PubChem or else? I remember seeing this on an entry but cannot find which database. If we had a reliable source for this the assignment of these entities to compound/group could be automated. --SCIdude (talk) 07:49, 7 December 2019 (UTC)

That's why every mass import should be consulted first under pain of reverting reverting all the contributions... Unfortunately, it's not possible in WD. What the heck is (3S)-3-Methyl-7,9-dimethoxy-3,4-dihydro-1H-naphtho[2,3-c]pyran-10-ol (Q77138777)? This information about defined/undefined stereocenters is in ChemSpider for sure (see e.g. [7]). However, I don't know if or how it can be automated. Wostr (talk) 15:06, 7 December 2019 (UTC)
I will make a script that reports the number of undefined stereocenters (and bonds) for all compounds in Wikidata. --Egon Willighagen (talk) 20:47, 12 December 2019 (UTC)
The script found about 33 thousand compounds with missing stereochemistry. At this moment, there will likely be false positives, like entries that are already marked as one of the types for classes, etc. Please report observations here, when you run into them: It may now report also racemic mixtures as missing stereo (which they should), etc. I'll fix that this weekend. Have fun! (PS you can find all tests on this page: --Egon Willighagen (talk) 21:44, 12 December 2019 (UTC)
@Egon Willighagen: you're sure this works? I get 6 missing centers for Q25100985 but I think the double bounds prevent any ambiguity. --SCIdude (talk) 06:46, 13 December 2019 (UTC)
Yes, I'm sure. This is one of the corner cases (and a false positive): is the ring small enough that only one combination double bond stereochemistry is possible (tho I tend to agree with you on this one). There will be examples like this where domain expertise has to be involved. But plz let me know if you find additional ones so that we can discuss those too. --Egon Willighagen (talk) 13:50, 13 December 2019 (UTC)
User User:Zcp3000 here. I have been working to add the recently released NPAtlas Data to Wikidata. My plan was to use the INCHIKEYS as UIDs to create stubs for each entry. The ultimate goal is to link each of these compounds within wikidata to their producing organism. I have been using a script to add items line by line and in some cases the items are unnamed - this is the source of duplicate entries under (3S)-3-Methyl-7,9-dimethoxy-3,4-dihydro-1H-naphtho[2,3-c]pyran-10-ol (Q77138777) - which clearly needs to be fixed. I am new to Wikidata and am open to suggestions on improving the quality of this data.
@Zcp3000: One problem hinted above is that the entity from PubChem may either have complete or incomplete stereochemistry. Please check this project's info pages on which classes to instantiate from in either case. --SCIdude (talk) 17:10, 7 December 2019 (UTC)
Please also check for duplicates/multiples first. For example your edit of Veraguamide J (Q27135798) created a constraint conflict, you need to fix or avoid this. --SCIdude (talk) 17:15, 7 December 2019 (UTC)
@SCIdude: Thanks. Theres ~25k entries. I've stopped uploading and will address any issues we've encountered. Duplicates/names I should be able to figure out. Stereochemistry I will look into and see where to address. Is this the best place for documenting these issues or asking for comments/suggestions? --Zcp3000
@Zcp3000: This item now has 460 InChi keys: (3S)-3-Methyl-7,9-dimethoxy-3,4-dihydro-1H-naphtho[2,3-c]pyran-10-ol (Q77138777). Yes this is the chemistry talk page and there are always people active. --SCIdude (talk) 17:25, 7 December 2019 (UTC)
  • We have e.g. Antifungal macrolide (Q75069044). What is the source of en:label? Why it's not e.g. systematic name from PubChem? Why the label and description is capitalised? Why it has instance of (P31) chemical entity (Q43460564)? And why only PubChem CID and InChIKey has been added? How such entry is helpful for anyone in any way? Wostr (talk) 00:35, 9 December 2019 (UTC) PS Wikidata and Wikimedia projects are really not a sandbox, at least not anymore (or I just naively hope that's true). Wostr (talk) 00:36, 9 December 2019 (UTC)
  • @Wostr:. This is part of the effort to include data from the Natural Product Atlas. The utility of having these entries lies, ultimately in linking these compounds to their producers. I was going to create stub entries and continually add features/claims to them. Now I realize, I may have gone about this the wrong way so on the recommendation of @SCIdude, I have suggested the creation of the Natural Product Atlas property which would allow us to link pre-exisiting WD entities to NPAtlas. Re: capitalization and label source - this was taken from the NP Atlas download dataset - we can instead use the Pubchem version. Re: PubChemCID and InChIKey - my approach was to create a minimal linkable entity which could then have fields filled in by e.g. a bot. Re: utility. Utility ultimately lies in relationship of these compounds to their producing organisms. None of that data is currently in Wikidata but you need to start somewhere.Zcp3000
    • Okay, now we should wait for creation of this new external-ID. Then new property should be populated and we can see what is left and how it should be added to WD. Matching of chemical compounds should never rely on chemical names (even systematic names can be generated in many different ways), the best option is to match using more than one identifier (or at least Standard InChI/InChIKey). Wostr (talk) 20:58, 10 December 2019 (UTC)
    • @Wostr: great. I agree about the order-of operation. As an aside, I would be interested in learning your wikidata workflow. Are you editing by hand? using scripts? relying on bots? I am still learning and am sure there are better-or worse ways to go about editing - it seems you made quick work of the 10,10a-dihydroxy-7-methoxy-2,2-dimethyl-5-(2-methyl-1-propen-1-yl)-1,10,10a,14,14a,15b-hexahydro-12H-3,4-dioxa-5a,11a,15a-triazacycloocta[1,2,3-lm]indeno[5,6-b]fluorene-11,15(2H,13H)-dione (Q11954479) entity including quite a few properties. I wonder if you use a template etc.
      • No, I edit mostly manually, with the help of some scripts (there is dataDrainer for cleaning up the item from incorrect labels/descriptions/aliases; moveClaim for moving statements between items). I used QuickStatements a few times, but that requires preparing a lot of data before and I usually don't have that much time. However, most operations in WD is done using bots or at least QuickStatements or some other semi-automatic tools. Wostr (talk) 13:55, 11 December 2019 (UTC)
      • if you ask me it depends. Half QuickStatements, half manual work (but using many of the tools available). If the number of affected items is more than a few hundred, only QS is practical, and if the task is complicated many QS steps may be needed. --SCIdude (talk) 16:45, 11 December 2019 (UTC)
      • Thank you both. After poking around a bit to try to find the plugins you mention, I found them on your respecetive commons.js page:, and I'll try some of these out and see how they work.

@Zcp3000: You added some InChIKey to existing items having already an InChIKey. Without reference I can't check what is correct so I will revert your edits. See Wikidata:Database_reports/Constraint_violations/P235&oldid=1075319026. Snipre (talk) 04:26, 19 December 2019 (UTC)

    • @Snipre: Snipre, thank for your maintenance of wikidata chemistry and my apologies again for any incorrect WD statements I've created. As per the discussion above and on the NPAtlas property discussion page, I now know theres a more rigorous way to go about adding these properties ( requiring InChIKey + Pubchem CID match on an entity; ignoring the label & description). The script that created these properties is here: In this script, an InChIKey may have been added to an entity if the Label matches but the InChIKey doesn't - this can be due to more than one chemical entity for the same named compound.

racemate / pair of enantiomersEdit

The concepts of 1. racemic mixture and 2. pair (group) of enantiomers are clearly different and, so, would need different items, right? But this will create duplicate InChi / keys as well. Have people found a solution? --SCIdude (talk) 15:12, 10 December 2019 (UTC)

Racemate is a mixture of both enantiomers, so the InChI wouldn't be the same. InChI for mixtures is under development ([8]). There are in fact at least 4 different StdInChIs for racemate, each of the enantiomers and compound with undefined stereochemistry. Applying InChI of 'compound with undefined stereochemistry' to the racemate is not correct. Wostr (talk) 17:23, 10 December 2019 (UTC) Example with InChIs (and SMILES btw) that is possible with current state of InChI software: [9]. Wostr (talk) 20:54, 10 December 2019 (UTC)
So there is no problem. Thanks. --SCIdude (talk) 06:26, 11 December 2019 (UTC)

Manuscript: Wikidata as a FAIR knowledge graph for the life sciencesEdit

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Samuel Clark
Tris T7
  Notified participants of WikiProject Chemistry

Dear all: You may have seen that we recently published a preprint entitled "Wikidata as a FAIR knowledge graph for the life sciences". This manuscript was primarily spearheaded by the Gene Wiki team, which has been active in data modeling and data ingestion for a variety of biomedical resources.

Our goal was to write a manuscript that educated the general biological community about Wikidata and to drive more growth and participation. To do this, we selected and described a series of scientific vignettes -- identifier translation, integrative biomedical SPARQL queries, crowdsourced curation, Wikidata-backed application development, and phenotype-based disease diagnosis. Those vignettes were based on our own areas of interest as well as our guess at what would appeal to our target audience.

Of course, there are many possible vignettes that could fit under the broad title we chose. As a matter of practicality, we could not include them all while still creating a final product of reasonable length and focus.

However, upon further reflection and discussion with colleagues, we realized that while the selection of vignettes needed to be somewhat limited, the manuscript should reflect a more complete and inclusive representation of the people behind the larger movement, including those that worked on aspects that weren't directly highlighted as vignettes. Therefore, we'd like to invite anyone to add their name to the author list or acknowledgements by adding their name to Wikidata:WikiProject Molecular biology/FAIR_knowledge_graph. Note that due to journal policies, all authors must still meet the ICMJE standards, but interpreted according to the broadly-defined title of the manuscript. (That broader scope might also be summarized by the class-level diagram shown at right, which is included as Figure 1 in the manuscript.)

Finally, this message is being cross-posted to many places. We will monitor replies at Wikidata_talk:WikiProject_Molecular_biology, or please {{Ping}} me to notify me of replies or discussion elsewhere. Best, Andrew Su (talk) 22:53, 18 December 2019 (UTC)

  • Hi all

Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Samuel Clark
Tris T7
  Notified participants of WikiProject Chemistry did you reply to Andrew yet?

Dagstuhl 2020Edit

Hi all, next week I'm attending a Dagstuhl meeting (see around metabolomics. Several people from chemistry databases will be there, like Evan E. Bolton (Q28194918) (PubChem (Q278487)) and David S Wishart (Q27887604) (Human Metabolome Database (Q5937262)). Do we have open questions we want to ask them? Please add them here as comments. Saehrimnir
Jasper Deng
Egon Willighagen
Denise Slenter
Daniel Mietchen
Andy Mabbett
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
Devon Fyson
Samuel Clark
Tris T7
  Notified participants of WikiProject Chemistry --Egon Willighagen (talk) 13:48, 25 January 2020 (UTC)

GZWDer added all (most?) of the US EPA CompTox dashboardEdit

Hi all, GZWDer (talkcontribslogs) copied in more or less the full CompTox Chemistry Dashboard (Q26998510) which brings in some 800 thousand new DSSTox substance ID (P3117)s. Along, it also makes the number of CAS registry numbers to >800 thousand. Let's see how that goes with Chemical Abstracts. Currently, there is molecular formula, mass, SMILES, info missing, but I can write a script tomorrow to generated QuickStatements to add missing info (using PubChem to convert the InChIKey to SMILES). Please don't do this manually. --Egon Willighagen (talk) 08:53, 30 January 2020 (UTC)

This is insane... [10]: +533 685 bytes.... Wostr (talk) 23:14, 30 January 2020 (UTC)
@GZWDer: how do you propose to resolve this? --SCIdude (talk) 09:43, 31 January 2020 (UTC)

New property proposalsEdit

I have proposed some new identifier properties. Comments are welcome.--GZWDer (talk) 04:41, 28 March 2020 (UTC)

Return to the project page "WikiProject Chemistry".