User:ProteinBoxBot/Maintenance Queries

Introduction

edit

Generally speaking, all the queries below should return zero results. This list is used to help identify inconsistencies in data modeling that have been introduced either by human editors or bots.

Maintenance / Quality Control (QC) Queries

edit

Get Entrez gene ids mapped to multiple items

edit

This can be more generically found as a constraint on each property's talk page. See here, under "Distinct values: this property likely contains a value that is different from all other items."

The following query uses these:

  • Properties: Entrez Gene ID (P351)     
    #Unique value constraint report for P351: report listing each item
    SELECT DISTINCT ?item1 ?item1Label ?item2 ?item2Label ?value {	
      ?item1 wdt:P351 ?value .
      ?item2 wdt:P351 ?value .
      FILTER(?item1 != ?item2  && str(?item1)<str(?item2)) .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en" } .
    }
    

Get Items with multiple Entrez gene ids

edit

This can be more generically found as a constraint on each property's talk page. See here, under "Single value: this property generally contains a single value."

The following query uses these:

  • Properties: Entrez Gene ID (P351)     
    SELECT DISTINCT ?item ?itemLabel ?count {
    	{
    		SELECT ?item (COUNT(?value) AS ?count) {
    			?item wdt:P351 ?val .
    		} GROUP BY ?item
    	} .
    	FILTER(?count > 1) .
    	SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en" } .
    } ORDER BY DESC(?count) LIMIT 1000
    

Retrieve all proteins which also carry an encodes property

edit

The following query uses these:

Retrieve all items that are both a gene and protein

edit

The following query uses these:

  • Properties: instance of (P31)     
    select ?p ?pLabel where {
      ?p wdt:P31 wd:Q8054 .
      ?p wdt:P31 wd:Q7187 .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    }
    

Retrieve all genes with a uniprot ID

edit

The following query uses these:

Report all items used as protein GO annotations which do not have Gene Ontology ID

edit

The following query uses these:

Detects human genes which have interwiki links to Gene Wiki infobox templates instead of the actual Wikipedia article

edit

The following query uses these:

  • Properties: Entrez Gene ID (P351)     , found in taxon (P703)     
    SELECT ?cid ?entrez_id ?label ?article WHERE {
        ?cid wdt:P351 ?entrez_id .
      	?cid wdt:P703 wd:Q15978631 . 
        OPTIONAL { ?cid rdfs:label ?label filter (lang(?label) = "en") .}
        ?article schema:about ?cid .
        ?article schema:inLanguage "en" .
        FILTER REGEX(STR(?article), "Template", "i")
    }
    

Detects genes which encode for proteins that are not encoded by that gene

edit

The following query uses these:

Detects genes which encode for proteins that are found in different taxons

edit

The following query uses these:

Maintenance / Quality Control (QC) Query Results

edit

User:ProteinBoxBot/Maintenance_Query_Results