User:ProteinBoxBot/Maintenance Queries
Introduction
editGenerally speaking, all the queries below should return zero results. This list is used to help identify inconsistencies in data modeling that have been introduced either by human editors or bots.
Maintenance / Quality Control (QC) Queries
editGet Entrez gene ids mapped to multiple items
editThis can be more generically found as a constraint on each property's talk page. See here, under "Distinct values: this property likely contains a value that is different from all other items."
The following query uses these:
- Properties: Entrez Gene ID (P351)
#Unique value constraint report for P351: report listing each item SELECT DISTINCT ?item1 ?item1Label ?item2 ?item2Label ?value { ?item1 wdt:P351 ?value . ?item2 wdt:P351 ?value . FILTER(?item1 != ?item2 && str(?item1)<str(?item2)) . SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en" } . }
Get Items with multiple Entrez gene ids
editThis can be more generically found as a constraint on each property's talk page. See here, under "Single value: this property generally contains a single value."
The following query uses these:
- Properties: Entrez Gene ID (P351)
SELECT DISTINCT ?item ?itemLabel ?count { { SELECT ?item (COUNT(?value) AS ?count) { ?item wdt:P351 ?val . } GROUP BY ?item } . FILTER(?count > 1) . SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en" } . } ORDER BY DESC(?count) LIMIT 1000
Retrieve all proteins which also carry an encodes property
editThe following query uses these:
- Items: protein (Q8054)
- Properties: instance of (P31) , UniProt protein ID (P352) , encodes (P688)
select ?p ?pLabel ?uniprot ?d ?dLabel where { ?p wdt:P31 wd:Q8054 . ?p wdt:P352 ?uniprot . ?p wdt:P688 ?d . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }
Retrieve all items that are both a gene and protein
editThe following query uses these:
- Items: protein (Q8054) , gene (Q7187)
- Properties: instance of (P31)
select ?p ?pLabel where { ?p wdt:P31 wd:Q8054 . ?p wdt:P31 wd:Q7187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }
Retrieve all genes with a uniprot ID
editThe following query uses these:
- Items: gene (Q7187)
- Properties: instance of (P31) , UniProt protein ID (P352)
select ?p ?pLabel ?uni where { ?p wdt:P31 wd:Q7187 . ?p wdt:P352 ?uni SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }
Report all items used as protein GO annotations which do not have Gene Ontology ID
editThe following query uses these:
- Properties: molecular function (P680) , cell component (P681) , biological process (P682) , Gene Ontology ID (P686)
select ?protein ?pot_go where { ?protein wdt:P680|wdt:P681|wdt:P682 ?pot_go . FILTER NOT EXISTS {?pot_go wdt:P686 ?no_go} . }
Detects human genes which have interwiki links to Gene Wiki infobox templates instead of the actual Wikipedia article
editThe following query uses these:
- Items: Homo sapiens (Q15978631)
- Properties: Entrez Gene ID (P351) , found in taxon (P703)
SELECT ?cid ?entrez_id ?label ?article WHERE { ?cid wdt:P351 ?entrez_id . ?cid wdt:P703 wd:Q15978631 . OPTIONAL { ?cid rdfs:label ?label filter (lang(?label) = "en") .} ?article schema:about ?cid . ?article schema:inLanguage "en" . FILTER REGEX(STR(?article), "Template", "i") }
Detects genes which encode for proteins that are not encoded by that gene
editThe following query uses these:
- Properties: encodes (P688) , Entrez Gene ID (P351) , UniProt protein ID (P352) , found in taxon (P703) , encoded by (P702)
SELECT ?gene ?protein WHERE { ?gene wdt:P688 ?protein . ?gene wdt:P351 ?entrez . ?protein wdt:P352 ?uni . ?gene wdt:P703 ?gt . ?protein wdt:P703 ?pt . FILTER (?gt = ?pt) FILTER NOT EXISTS { ?protein wdt:P702 ?gene . } }
Detects genes which encode for proteins that are found in different taxons
editThe following query uses these:
- Properties: encodes (P688) , Entrez Gene ID (P351) , UniProt protein ID (P352) , found in taxon (P703)
SELECT ?gene ?gtLabel ?protein ?ptLabel WHERE { ?gene wdt:P688 ?protein . ?gene wdt:P351 ?entrez . ?protein wdt:P352 ?uni . ?gene wdt:P703 ?gt . ?protein wdt:P703 ?pt . FILTER (?gt != ?pt) ?gt rdfs:label ?gtLabel; FILTER(LANG(?gtLabel) = "en"). ?pt rdfs:label ?ptLabel; FILTER(LANG(?ptLabel) = "en"). }