Wikidata:Requests for permissions/Bot/SoCalChemBot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 16:32, 11 October 2016 (UTC)[reply]
SoCalChemBot edit
SoCalChemBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Sebotic (talk • contribs • logs)
Task/s: Imports and maintains chemical compound items from primary resources, adds their interaction with their protein targets, environmental targets, etc.
Code: https://bitbucket.org/sulab/wikidatabots/
Function details:
- Imports chemical compound information from a set of open resources, e.g. PubChem, ChEMBL, ChemSpider
- Will add as many external identifiers as possible. acetic acid (Q47512) is a good example of what is currently possible.
- Will center the data imports around InChI keys, a machine readable, unique identifier for chemical compounds (This is also in accordance with the Wikidata Project Chemistry)
- Will clean-up and/or report problematic items with inconsistent InChI keys. In this context, inconsistent means that an item holds external identifiers for a different InChI key
- Will perform consistency checks for chemical structure data, including InChI, canonical SMILES and isomeric SMILES.
- Will add chemical classifications according to the ChEBI ontology
- Will add chemical properties, e.g. exact mass, density, (Currently, there are ~65 Wikidata chemical properties and external identifiers in the the bot will operate on)
- Will add selected, relevant bioassay information from PubChem
- Will add drug to drug-target and chemical compound to in-vivo target information to the chemical compound and relevant targets (e.g. proteins of different species/taxons)
--Sebotic (talk) 18:38, 3 October 2016 (UTC)[reply]
Update: I performed the required test edits. Sebotic (talk) 04:31, 4 October 2016 (UTC)[reply]
- Support I support this request. Julialturner (talk) 04:44, 4 October 2016 (UTC)[reply]
- Support Gstupp (talk) 05:13, 4 October 2016 (UTC)[reply]
- Support. I see a lot of potential for valuable contributions from this bot. Connecting with evidence from scholarly literature will help to improve data quality. Linking out to external identifiers will improve the visibility of Wikidata and make the knowledge base that much more useful. YULdigitalpreservation (talk) 12:23, 4 October 2016 (UTC)[reply]
- Support I fully support this bot. Well thought out and will maintain structure and integrity to the chemical compound space in WD Putmantime (talk) 17:02, 4 October 2016 (UTC)[reply]
- Support I support this bot. This bot will aggregate relevant information from well-established and widely used resources and will perform important consistency checking tests to support data quality and data integration for the chemical space in WD. In my opinion, this bot will contribute to position WD as a useful and reliable hub for the Chemical space to the community NuriaQueralt (talk)
- Support I welcome this bot proposal. The chemical space on Wikidata would certainly benefit from this bot.. --Andrawaag (talk) 19:51, 4 October 2016 (UTC)[reply]
- @Sebotic: What does "Will add chemical classifications according to the ChEBI ontology" exactly mean? A substructure classification of chemicals with P31?--Kopiersperre (talk) 08:39, 5 October 2016 (UTC)[reply]
- @Kopiersperre: The plan is to use the ChEBI ontology to get that done, at least for all chemical compounds with a ChEBI Id, but we can discuss if/how to do it for other compounds not in ChEBI. I have done that sucessfully for the Gene Ontology, so it is also doable with ChEBI. But I am open to ideas. Sebotic (talk) 08:54, 5 October 2016 (UTC)[reply]
- @Sebotic: I've tried to formulate an example, please correct me if I'm wrong. So for codeine (Q174723) the bot will import
- @Kopiersperre: The plan is to use the ChEBI ontology to get that done, at least for all chemical compounds with a ChEBI Id, but we can discuss if/how to do it for other compounds not in ChEBI. I have done that sucessfully for the Gene Ontology, so it is also doable with ChEBI. But I am open to ideas. Sebotic (talk) 08:54, 5 October 2016 (UTC)[reply]
- codeine (CHEBI:16714) is a morphinane alkaloid (CHEBI:25418)
- codeine (CHEBI:16714) is a organic heteropentacyclic compound (CHEBI:38164)
I hoped that CHEBI could replace the substructure classification currently done by the category system, but I think it's not enough. In German Wikipedia Codein is in
- Category:Opioids (Q8887197) (incl. in morphinane alkaloid)
- Category:Alkaloids (Q8235519) (incl. in morphinane alkaloid)
- Category:Cycloalkenols (Q8923182)
- Category:Cyclohexenes (Q8923217)
- Category:Phenol ethers (Q8965786)
- Q8923013
- Category:Piperidines (Q7334131)
--Kopiersperre (talk) 09:16, 5 October 2016 (UTC)[reply]
@Kopiersperre: As far as I could see, these Wikipedia categorizations do not live on Wikidata right now, but we could add them. They, in principle, could also live togehter with 'subclass of' categorizations. But I think we actually should try to only have one classification system. What I can see from the categories for Codeine, some of them are somewhat redundant, because e.g. all opioids are also piperidines (I guess) and also phenole ethers, so a classification as opioid should be sufficient, where the opioids class itself has then the correct classification which can be inferred for Codeine. This works pretty well using SPARQL with its star (*) operator. Therefore, I think that ChEBI still can be used. But in addition, other classifications could be added. We just need to make it clear in the references how they were made. The more challenging question is how to make those systematically and integrate them with ChEBI, that'd be a hard task. I suggest that I start out with ChEBI and we could also explore how the Wikipedia categories could be streamlined to fit them in (in the worst case, they just stay on as category properties). Sebotic (talk) 17:28, 5 October 2016 (UTC)[reply]
- It is also important to add that only a hierarchical, ontology-like classification is feasible anyway, because putting all chenmical compound classes/categories onto a certain chemical compound is already difficult for a a relatively small molecule like e.g. codeine, but the larger a molecule gets, the harder this will become. So, simple substructure classificatoin should happen high up in the tree and the families further down, based on the categorizations higher up. Sebotic (talk) 17:44, 5 October 2016 (UTC)[reply]