Wikidata:WikiProject Chemistry/Natural products

Scope and related projectsEdit

ScopeEdit

This page assists in the curation, retrieval and dissemination of information related to natural product (Q901227) — subclass of chemical substance (Q79529) that can be found in taxon (Q16521) — subclass of living organism class (Q21871294) — as well as related information, especially bibliographic reference (Q10358455).

Related WikiProjectsEdit

HistoryEdit

The project has been initiated by Adriano Rutz and User:GrndStt from the University of Geneva (Q503473) and joined by Jonathan Bisson. The initial objective was to build an open database compiling natural products, chemical structures, their producing organisms and an associated bibliographic reference documenting such links. For this, we have compiled taxonomic, chemical and bibliographical data from existing resources and standardized them. WikiData with its Wikidata:WikiProject Chemistry, Wikidata:WikiProject Taxonomy, Wikidata:WikiProject Source MetaData fits with the purpose of this database to be available for all and linked to other resources.

ParticipantsEdit

HumansEdit

[+ Add yourself to the list]

The participants listed below can be notified using the following template in discussions:

{{Ping project|Chemistry}}

BotsEdit

The bot (made in Kotlin) is able to take our file, process it and add it to the Test Wikidata instance: See some example entries: [[1]]: Example of compound (linked to a specie and with a reference) [[2]]: Example of species

As we don't have any SPARQL endpoint for this instance of Wikidata, we can't check easily if the entity already exists, but the bot supports SPARQL queries to resolve entities and avoid creating duplicates.

It works OK, it is decently fast despite the API speed limitations.

Structure of the initial dataEdit

Table
organismLowestTaxon organismDbTaxo organismTaxonId inchikeySanitized inchiSanitized smilesSanitized referenceCleanedTitle referenceCleanedDoi referenceCleanedPmid referenceCleanedPmcid
Curcuma longa NCBI 136217 VFLDPWHFBUODDF-FCXRPNKRSA-N InChI=1S/C21H20O6/c1-26-20-11-14(5-9-18(20)24)3-7-16(22)13-17(23)8-4-15-6-10-19(25)21(12-15)27-2/h3-12,24-25H,13H2,1-2H3/b7-3+,8-4+ COc1cc(/C=C/C(=O)CC(=O)/C=C/c2ccc(O)c(OC)c2)ccc1O Characterization of powdered turmeric by liquid chromatography-mass spectrometry and gas chromatography-mass spectrometry 10.1016/0021-9673(96)00103-3 NA NA


Most of these are now redundant as we would just have to extract three aspects of these and if they do not exist their own entries. This is currently what we are working on so we can get:

There is already the necessary property taxon (Q16521) that we use this way:

We have currently around 500,000 entries that are likely clean enough to be added.


User:SCIdude created a test entry to demonstrate: [3]. This shows that the property requires the reverse statement to be made as well.

Query examplesEdit

What is already there?Edit

This query shows about 50 natural product of taxon (P1582) and >1,200 found in taxon (P703) statements, the latter mostly human metabolites (we exclude crude drugs, oils, etc). More than 100 of all kinds have no reference.

SELECT ?item ?itemLabel ?taxonLabel ?artLabel WHERE {
  VALUES ?classes {
    wd:Q11173 # chemical compound
    wd:Q59199015 # group of stereoisomers
    wd:Q79529 # chemical substance
    wd:Q17339814 # group of chemical substances
    wd:Q47154513 # structural class of chemical compounds
  }
  ?item wdt:P31 ?classes. # instance of
  {
    ?item p:P1582 ?stmt. # natural product of taxon
    ?stmt ps:P1582 ?taxon. # natural product of taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref. 
      ?ref pr:P248 ?art. # stated in
    }
  }
  UNION
  {
    ?item p:P703 ?stmt. # found in taxon
    ?stmt ps:P703 ?taxon. # found in taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref.
      ?ref pr:P248 ?art. # stated in
    }
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!

What is already there? (detailed version)Edit

Detailed version of the previous query. First steps for a WD to Lotus move.

SELECT ?item ?itemLabel ?inchi ?inchikey ?isomeric_smile ?cas  ?chebi  ?chembl  ?pubchem  ?taxon  ?taxonLabel ?isomeric_smilesLabel ?gbif_id ?ncbi_id ?genusLabel ?familyLabel ?orderLabel ?classLabel ?divisionLabel ?phylumLabel WHERE {
  VALUES ?classes {
    wd:Q11173 # chemical compound
  }
  ?item wdt:P31 ?classes ; # instance of
            wdt:P234 ?inchi ;
            wdt:P235 ?inchikey ;
            # wdt:P231 ?cas ;
            # wdt:P683 ?chebi ;
            # wdt:P592 ?chembl ;
            # wdt:P662 ?pubchem.
  OPTIONAL {
    ?item wdt:P31 ?classes ; 
            wdt:P231 ?cas ;
  }
  OPTIONAL {
    ?item wdt:P31 ?classes ; 
            wdt:P683 ?chebi ;
  }
  OPTIONAL {
    ?item wdt:P31 ?classes ; 
            wdt:P592 ?chembl ;
  }
  OPTIONAL {
    ?item wdt:P31 ?classes ; 
            wdt:P662 ?pubchem ;
  }
  {
    ?item p:P2017 ?stmt_smiles.
    ?stmt_smiles ps:P2017 ?isomeric_smiles.
    }
  
  {
    ?item p:P703 ?stmt. # found in taxon
    ?stmt ps:P703 ?taxon. # found in taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref.
      ?ref pr:P248 ?art. # stated in
    }
    OPTIONAL {
     ?taxon wdt:P171* ?genus . 
     ?genus wdt:P105 wd:Q34740 .
            }
    OPTIONAL {
     ?taxon wdt:P171* ?family . 
     ?family wdt:P105 wd:Q35409 .
            }
    OPTIONAL {
     ?taxon wdt:P171* ?order . 
     ?order wdt:P105 wd:Q36602 .
            }
    OPTIONAL {
     ?taxon wdt:P171* ?class . 
     ?class wdt:P105 wd:Q37517 .
            }
    OPTIONAL {
     ?taxon wdt:P171* ?division . 
     ?division wdt:P105 wd:Q334460 .
            }
    OPTIONAL {
     ?taxon wdt:P171* ?phylum . 
     ?phylum wdt:P105 wd:Q38348 .
            }
    OPTIONAL {
     ?taxon wdt:P846 ?gbif_id;
            }
    OPTIONAL {
     ?taxon wdt:P685 ?ncbi_id;
            }

  }
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  #SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Which organisms contain Glucobrassicin?Edit

This queries for organisms indicated as the source of glucobrassicin (Q906037), when queried by it's InChiKey ("DNDNWOWHUWNBCK-JZYAIQKZSA-N").

SELECT ?item ?itemLabel ?taxonLabel ?artLabel WHERE {
  VALUES ?classes {
    wd:Q11173 # chemical compound
    wd:Q59199015 # group of stereoisomers
    wd:Q79529 # chemical substance
    wd:Q17339814 # group of chemical substances
    wd:Q47154513 # structural class of chemical compounds
  }
  ?item wdt:P31 ?classes. # instance of
  ?item wdt:P235 'DNDNWOWHUWNBCK-JZYAIQKZSA-N'
  {
    ?item p:P1582 ?stmt. # natural product of taxon
    ?stmt ps:P1582 ?taxon. # natural product of taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref. 
      ?ref pr:P248 ?art. # stated in
    }
  }
  UNION
  {
    ?item p:P703 ?stmt. # found in taxon
    ?stmt ps:P703 ?taxon. # found in taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref.
      ?ref pr:P248 ?art. # stated in
    }
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!

Which chemical compounds are found in Arabidopsis thaliana (Q158695) ?Edit

This query returns all chemical compounds found in the taxon who's taxon name (P225) is Arabidopsis thaliana.

SELECT ?item ?itemLabel ?taxonLabel ?artLabel WHERE {
  VALUES ?classes {
    wd:Q11173 # chemical compound
    wd:Q59199015 # group of stereoisomers
    wd:Q79529 # chemical substance
    wd:Q17339814 # group of chemical substances
    wd:Q47154513 # structural class of chemical compounds
  }
  ?item wdt:P31 ?classes. # instance of
  ?taxon wdt:P225 'Arabidopsis thaliana'
  {
    ?item p:P1582 ?stmt. # natural product of taxon
    ?stmt ps:P1582 ?taxon. # natural product of taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref. 
      ?ref pr:P248 ?art. # stated in
    }
  }
  UNION
  {
    ?item p:P703 ?stmt. # found in taxon
    ?stmt ps:P703 ?taxon. # found in taxon
    OPTIONAL {
      ?stmt prov:wasDerivedFrom ?ref.
      ?ref pr:P248 ?art. # stated in
    }
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!