Wikidata:WikidataCon 2019/Submissions/Cheminformatics to improve Wikidata on chemical compounds

Title edit

Cheminformatics to improve Wikidata on chemical compounds

Abstract edit

Chemistry has long been an important domain-specific corner in the Wiki community, with active wiki projects on Wikipedia and Wikidata. The two are not tightly linked, though increasingly information from Wikidata shows up on Wikipedia. Moreover, we have been using Wikidata as interoperability resource in our research into human metabolism and metabolic diseases. This requires the information about chemicals and metabolites in Wikidata to be accurate. We have been using cheminformatics to support our manual work to add missing information and compounds and curate existing knowledge. In this presentation it will be shown how the Chemistry Development Kit, Bioclipse, and QuickStatements have been used in the past two years for these purposes (https://chem-bla-ics.blogspot.com/search?q=wikidata). We will demonstrate this infrastructure of Open Source tools, and how it can be used for using the SMILES and InChI information to: link out to external databases (e.g. the EPA CompTox Dashboard, Massbank, LIPID MAPS, etc); add physicochemical properties; add missing InChIs and chemical formulas using the SMILES; add new compounds based on a SMILES; and, detect incorrect or inconsistent information in Wikidata items on chemical compounds.

Participants will learn edit

  • How to use Bioclipse to ...
    • create QuickStatements to add missing compounds to Wikidata
    • validate data in Wikidata
  • How SPARQL can be used to aggregate information from Wikidata and the limitations of that
  • How Scholia can support curating compound class information

Submitter edit