Open main menu

Wikidata talk:WikiFactMine/Core SPARQL

Diseases by medical specialtyEdit

To guide the development of the life sciences SPARQL:

#Diseases by medical specialty
#defaultView:BubbleChart
SELECT ?spec ?specLabel ?count
WHERE 
{
  {
    SELECT ?spec (COUNT(?item) AS ?count)    
WHERE {
        ?item wdt:P31 wd:Q12136 .
        ?item wdt:P1995 ?spec  .
        }
  GROUP BY ?spec 
  }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 100

Try it!

Charles Matthews (talk) 08:46, 11 July 2017 (UTC)

On the other side of the discussion:

#Diseases not associated with the top 20 medical specialties
SELECT ?item ?itemLabel ?label ?lang
  WHERE {
  ?item wdt:P31 wd:Q12136 .
  MINUS {?item wdt:P1995 wd:Q1071953 .}
  MINUS {?item wdt:P1995 wd:Q788926 .}
  MINUS {?item wdt:P1995 wd:Q162606 .}
  MINUS {?item wdt:P1995 wd:Q162555 .}
  MINUS {?item wdt:P1995 wd:Q83042 .}
  MINUS {?item wdt:P1995 wd:Q7867 .}
  MINUS {?item wdt:P1995 wd:Q171171 .}
  MINUS {?item wdt:P1995 wd:Q120569 .}
  MINUS {?item wdt:P1995 wd:Q327657 .}
  MINUS { ?item wdt:P1995 wd:Q161437 .}
  MINUS {?item wdt:P1995 wd:Q103824 .}
  MINUS {?item wdt:P1995 wd:Q105650 .}
  MINUS {?item wdt:P1995 wd:Q10379 .}
  MINUS {?item wdt:P1995 wd:Q203337 .}
  MINUS {?item wdt:P1995 wd:Q2861470 .}
  MINUS {?item wdt:P1995 wd:Q101929 .}
  MINUS {?item wdt:P1995 wd:Q123028 .}
  MINUS {?item wdt:P1995 wd:Q5284418 .}
  MINUS {?item wdt:P1995 wd:Q189553 .}
  MINUS {?item wdt:P1995 wd:Q177635 .}
 
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
LIMIT 10000

Try it!

Initially 7447 hits. Charles Matthews (talk) 15:57, 11 July 2017 (UTC)

Today 6098 hits. Charles Matthews (talk) 14:31, 25 July 2017 (UTC)

Variant to show specialities outside the chosen 20:

#Diseases with "medical specialty" not associated with the top 20 medical specialties
SELECT ?item ?itemLabel ?specLabel
  WHERE {
  ?item wdt:P31 wd:Q12136 .
  ?item wdt:P1995 ?spec .
  MINUS {?item wdt:P1995 wd:Q1071953 .}
  MINUS {?item wdt:P1995 wd:Q788926 .}
  MINUS {?item wdt:P1995 wd:Q162606 .}
  MINUS {?item wdt:P1995 wd:Q162555 .}
  MINUS {?item wdt:P1995 wd:Q83042 .}
  MINUS {?item wdt:P1995 wd:Q7867 .}
  MINUS {?item wdt:P1995 wd:Q171171 .}
  MINUS {?item wdt:P1995 wd:Q120569 .}
  MINUS {?item wdt:P1995 wd:Q327657 .}
  MINUS { ?item wdt:P1995 wd:Q161437 .}
  MINUS {?item wdt:P1995 wd:Q103824 .}
  MINUS {?item wdt:P1995 wd:Q105650 .}
  MINUS {?item wdt:P1995 wd:Q10379 .}
  MINUS {?item wdt:P1995 wd:Q203337 .}
  MINUS {?item wdt:P1995 wd:Q2861470 .}
  MINUS {?item wdt:P1995 wd:Q101929 .}
  MINUS {?item wdt:P1995 wd:Q123028 .}
  MINUS {?item wdt:P1995 wd:Q5284418 .}
  MINUS {?item wdt:P1995 wd:Q189553 .}
  MINUS {?item wdt:P1995 wd:Q177635 .}

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
LIMIT 10000

Try it!

After some cleanup, 66 hits. Charles Matthews (talk) 14:28, 25 July 2017 (UTC)

ICD-9 (P493) without health specialty (P1995)Edit

Interesting that over 3K items are marked with ICD-9 (P493) but not with health specialty (P1995):

SELECT ?item WHERE {
  ?item wdt:P493 [] .
  OPTIONAL { ?item wdt:P1995 ?dummy0 }
  FILTER(!bound(?dummy0))
}

Try it!

Today this has 3166 hits.

From http://www.icd9data.com/2015/Volume1/default.htm, tabulated against best matches:

ICD-9 range ICD-9 definition health specialty (P1995) best match
001-139 Infectious And Parasitic Diseases infectiology (Q788926) rather than infectious disease (Q18123741)
140-239 Neoplasms oncology (Q162555)
240-279 Endocrine, Nutritional And Metabolic Diseases, And Immunity Disorders endocrinology (Q162606), immunology (Q101929)
280-289 Diseases Of The Blood And Blood-Forming Organs hematology (Q103824)
290-319 Mental Disorders psychiatry (Q7867)
320-389 Diseases Of The Nervous System And Sense Organs neurology (Q83042)
390-459 Diseases Of The Circulatory System cardiology (Q10379)
460-519 Diseases Of The Respiratory System pulmonology (Q203337)
520-579 Diseases Of The Digestive System gastroenterology (Q120569)
580-629 Diseases Of The Genitourinary System urology (Q105650)
630-679 Complications Of Pregnancy, Childbirth, And The Puerperium obstetrics (Q5284418)
680-709 Diseases Of The Skin And Subcutaneous Tissue dermatology (Q171171)
710-739 Diseases Of The Musculoskeletal System And Connective Tissue rheumatology (Q327657)
740-759 Congenital Anomalies medical genetics (Q1071953)
760-779 Certain Conditions Originating In The Perinatal Period
780-799 Symptoms, Signs, And Ill-Defined Conditions
800-999 Injury And Poisoning emergency medicine (Q2861470)

Charles Matthews (talk) 09:06, 13 July 2017 (UTC)

The suggestion is to use, for example,

#ICD9 handling
SELECT DISTINCT ?item ?itemLabel 
#?icd ?icd2 ?icd3 removed so that SELECT DISTINCT works
WHERE {
 ?item wdt:P31 wd:Q12136 .
  ?item wdt:P493 ?icd  .
  OPTIONAL { ?item wdt:P1995 ?dummy0 }
  FILTER(!bound(?dummy0))
  BIND(SUBSTR(?icd,1,3) AS ?icd1) #drop decimal point and anything after
  BIND(CONCAT("1",?icd1) AS ?icd2) #add initial digit 1 to preserve initial zeroes
  BIND(xsd:integer(?icd2) AS ?icd3) # casting function to turn string to integer
  FILTER(1000 < ?icd3 && ?icd3 <1140)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
      }

Try it!

to address the first row of the table. Charles Matthews (talk) 16:01, 13 July 2017 (UTC)

Comment: the binding can be done in one line, such as BIND(xsd:integer(CONCAT("1",SUBSTR(?icd,1,3))) AS ?icd1) . Charles Matthews (talk) 17:00, 24 July 2017 (UTC)

#ICD9 handling. This prototype does a case analysis with three cases. Output in two columns adapted to QuickStatement input.
SELECT DISTINCT ?item ?itemLabel ?ms1 ?ms2
WHERE {
 ?item wdt:P31 wd:Q12136 .
  ?item wdt:P493 ?icd  .
  OPTIONAL { ?item wdt:P1995 ?dummy0 }
  FILTER(!bound(?dummy0))
  
  BIND(xsd:integer(CONCAT("1",SUBSTR(?icd,1,3))) AS ?icd1)
  #One-line processing of ICD-9 code into integer. Currently fails in a few alphanumeric cases.
  BIND((IF (1000 < ?icd1 && ?icd1 <1140, "Q788926",1/0)) AS ?spec1)
  #Dividing by zero, deliberate introduction of error in the "else" part of IF. This causes a desired fallback in COALESCE.
  BIND((IF (1139 < ?icd1 && ?icd1 <1240, "Q162555",1/0)) AS ?spec2)
  BIND((IF (1279 < ?icd1 && ?icd1 <1290, "Q103824",1/0)) AS ?spec3)
  
  BIND(COALESCE(?spec1,?spec2,?spec3,2) AS ?ms2)
  VALUES ?ms1 { "P1995" }
  VALUES ?ms2 { "Q788926" "Q162555" "Q103824" }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
      }

Try it!

This is a prototype for a QuickStatements updater with large case analysis. Baseline hits 226. Charles Matthews (talk) 09:05, 25 July 2017 (UTC)

MeSH descriptor ID (P486) without health specialty (P1995)Edit

This bubble chart shows the main prefixes to be D00, D01, C53, D02, C56 in the MeSH descriptor ID (P486)-marked diseases not yet given a health specialty (P1995) value.

#Diseases without "medical speciality", prefix of MeSH ID
#defaultView:BubbleChart

SELECT ?code ?codeLabel ?count
WHERE 
{
  {
    SELECT ?code (COUNT(?item) AS ?count)    
WHERE {
        ?item wdt:P31 wd:Q12136 .
        ?item wdt:P486 ?meshid  .
        OPTIONAL { ?item wdt:P1995 ?dummy0 }
         FILTER(!bound(?dummy0))
        BIND(SUBSTR(?meshid,1,3) AS ?code)
        }
  GROUP BY ?code 
  }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 100

Try it!

Charles Matthews (talk) 16:38, 13 July 2017 (UTC)

MeSH code (P672) without health specialty (P1995)Edit

Working conclusion, though, is that MeSH code (P672) is going to be of more use. Those properties really should be added in parallel. Though certainly they haven't been, so far. Wouldn't be that hard a bot job to add MeSH code (P672) where MeSH descriptor ID (P486) exists, as a preliminary to getting into this area. Charles Matthews (talk) 17:45, 13 July 2017 (UTC)

ICD-10 (P494) sanity checkEdit

This query

#ICD10 sanity check for "neurology" medical specialty
SELECT ?item ?itemLabel ?icd
  WHERE {
  ?item wdt:P1995 wd:Q83042 .
  ?item wdt:P494 ?icd  .
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
  }

Try it!

shows a wide range of ICD-10 (P494) identifiers (not just prefix H) for items marked as health specialty (P1995) neurology (Q83042). Explanations are not so hard to find: strokes can affect the brain, for example. Dexbot added some health specialty (P1995) statements two years ago, don't know the rationale. Charles Matthews (talk) 09:11, 26 July 2017 (UTC)

Gene Ontology IDs applied to genesEdit

Here is a sample query:

SELECT DISTINCT ?gene ?geneLabel 
WHERE
{
    ?gene wdt:P31 wd:Q7187 .
    ?gene wdt:P688 ?protein .
    ?protein wdt:P682 ?process .
    ?process wdt:P686 ?go .
    BIND(str(?go) AS ?gostring)
    FILTER (?gostring = "GO:0006625")
    SERVICE wikibase:label {bd:serviceParam wikibase:language "en" .}
}

Try it!

Charles Matthews (talk) 21:52, 10 August 2017 (UTC)

MeSH Code tree handling for "health speciality"Edit

#MeSH Code tree handling for "health speciality" P672
#C23 (clinical finding) suppressed
#Set out in columns for QuickStatements, though itemLabel is not for that.
SELECT DISTINCT ?item ?itemLabel ?hs1 ?hs2
WHERE {
 ?item wdt:P31 wd:Q12136 .
  ?item wdt:P672 ?meshc  .
  MINUS { ?item wdt:P1995 [ ]}
  BIND(IF (STRSTARTS(?meshc, 'C01.252'), "Q727028", 1/0) AS ?spec01a)  #bacterial infectious disease
  BIND(IF (STRSTARTS(?meshc, 'C01.703'), "Q464067", 1/0) AS ?spec01b)  #fungal infectious disease
  BIND(IF (STRSTARTS(?meshc, 'C02'), "Q1928978", 1/0) AS ?spec02)  #viral infectious disease
  BIND(IF (STRSTARTS(?meshc, 'C03'), "Q1601794", 1/0) AS ?spec03) #parasitic infectious diseases
  BIND(IF (STRSTARTS(?meshc, 'C04'), "Q162555", 1/0) AS ?spec04) #oncology
  BIND(IF (STRSTARTS(?meshc, 'C05'), "Q4116663", 1/0) AS ?spec05) #musculoskeletal disorder
  BIND(IF (STRSTARTS(?meshc, 'C06'), "Q120569", 1/0) AS ?spec06) #gastroenterology
  BIND(IF (STRSTARTS(?meshc, 'C07'), "Q55082545", 1/0) AS ?spec07) #stomatognathic disease
  BIND(IF (STRSTARTS(?meshc, 'C08'), "Q203337", 1/0) AS ?spec08) #pulmonology
  BIND(IF (STRSTARTS(?meshc, 'C09'), "Q189553", 1/0) AS ?spec09) #otolaryngology
  BIND(IF (STRSTARTS(?meshc, 'C10'), "Q83042", 1/0) AS ?spec10) #neurology
  BIND(IF (STRSTARTS(?meshc, 'C11'), "Q161437", 1/0) AS ?spec11) #ophthalmology 
  BIND(IF (STRSTARTS(?meshc, 'C12.294'), "Q6742925", 1/0) AS ?spec12a) #male reproductive system disease
  BIND(IF (STRSTARTS(?meshc, 'C12.777'), "Q7900883", 1/0) AS ?spec12b) #urinary system disease
  BIND(IF (STRSTARTS(?meshc, 'C13'), "Q80015", 1/0) AS ?spec13) #obstetrics gynecology
  BIND(IF (STRSTARTS(?meshc, 'C14'), "Q389735", 1/0) AS ?spec14) #cardiovascular disease
  BIND(IF (STRSTARTS(?meshc, 'C15.378'), "Q103824", 1/0) AS ?spec15a) #hematology
  BIND(IF (STRSTARTS(?meshc, 'C15.604'), "Q6708237", 1/0) AS ?spec15b) #lymphatic system disease
  BIND(IF (STRSTARTS(?meshc, 'C16.320'), "Q727096", 1/0) AS ?spec16a) #genetic disease
  BIND(IF (STRSTARTS(?meshc, 'C16.614'), "Q200779", 1/0) AS ?spec16b) #birth defect
  BIND(IF (STRSTARTS(?meshc, 'C17.3'), "Q1779300", 1/0) AS ?spec17a) #connective tissue disease
  BIND(IF (STRSTARTS(?meshc, 'C17.8'), "Q171171", 1/0) AS ?spec17b) #dermatology
  BIND(IF (STRSTARTS(?meshc, 'C18.452'), "Q2351083", 1/0) AS ?spec18a) #disease of metabolism 
  BIND(IF (STRSTARTS(?meshc, 'C18.654'), "Q1361144", 1/0) AS ?spec18b) #nutrition disorder
  BIND(IF (STRSTARTS(?meshc, 'C19'), "Q162606", 1/0) AS ?spec19) #endocrinology
  BIND(IF (STRSTARTS(?meshc, 'C20'), "Q101929", 1/0) AS ?spec20) #immunology
  BIND(IF (STRSTARTS(?meshc, 'C21'), "Q3751709", 1/0) AS ?spec21) #environmental disease
  BIND(IF (STRSTARTS(?meshc, 'C22'), "Q170201", 1/0) AS ?spec22) #veterinary medicine

  BIND(COALESCE(?spec01a,?spec01b,?spec02,?spec03,?spec04,?spec05,?spec06,?spec07,?spec08,?spec09,?spec10,?spec11,?spec12a,?spec12b,?spec13,?spec14,?spec15a,?spec15b,?spec16a,?spec16b,?spec17a,?spec17b,?spec18a,?spec18b,?spec19,?spec20,?spec21,?spec22,000000) AS ?hs2)

  VALUES ?hs1 { "P1995" }
  VALUES ?hs2 { "Q727028" "Q464067" "Q1928978" "Q1601794" "Q162555" "Q4116663" "Q120569" "Q55082545" "Q203337" "Q189553" "Q83042" "Q161437" "Q6742925" "Q7900883" "Q80015" "Q389735"  "Q103824" "Q6708237" "Q727096" "Q200779" "Q1779300" "Q171171" "Q2351083" "Q1361144" "Q162606" "Q101929" "Q3751709" "Q170201" }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } 
}

Try it!

Charles Matthews (talk) 12:34, 8 August 2018 (UTC)

Cleanup steps:

  1. Replaced cardiovascular disease (Q389735) by cardiology (Q10379)
  2. Replaced urinary system disease (Q7900883) by urology (Q105650)
  3. Replaced bacterial infectious disease (Q727028) by infectiology (Q788926)
  4. Replaced fungal infectious disease (Q464067) by infectiology (Q788926)
  5. Replaced viral infectious disease (Q1928978) by infectiology (Q788926)
  6. Replaced parasitic infectious diseases (Q1601794) by infectiology (Q788926)
  7. Replaced musculoskeletal disorder (Q4116663) by rheumatology (Q327657)
  8. Replaced stomatognathic disease (Q55082545) by oral and maxillofacial surgery (Q504033)
  9. Replaced environmental disease (Q3751709) by environmental medicine (Q512285)
  10. Replaced nutrition disorder (Q1361144) by nutrition (Q2138622)
  11. Replaced male reproductive system disease (Q6742925) by urology (Q105650)
  12. Replaced lymphatic system disease (Q6708237) by angiology (Q539690)
  13. Replaced birth defect (Q727096) by medical genetics (Q1071953)
  14. Replaced genetic disease (Q200779) by pediatrics (Q123028)
  15. Replaced connective tissue disease (Q1779300) by rheumatology (Q327657)
  16. Replaced disease of metabolism (Q2351083) by endocrinology (Q162606)

Charles Matthews (talk) 08:06, 10 August 2018 (UTC)

Return to the project page "WikiFactMine/Core SPARQL".