Wikidata talk:WikiFactMine/Core SPARQL
Diseases by medical specialty edit
To guide the development of the life sciences SPARQL:
#Diseases by medical specialty
#defaultView:BubbleChart
SELECT ?spec ?specLabel ?count
WHERE
{
{
SELECT ?spec (COUNT(?item) AS ?count)
WHERE {
?item wdt:P31 wd:Q12136 .
?item wdt:P1995 ?spec .
}
GROUP BY ?spec
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 100
Charles Matthews (talk) 08:46, 11 July 2017 (UTC)
On the other side of the discussion:
#Diseases not associated with the top 20 medical specialties
SELECT ?item ?itemLabel ?label ?lang
WHERE {
?item wdt:P31 wd:Q12136 .
MINUS {?item wdt:P1995 wd:Q1071953 .}
MINUS {?item wdt:P1995 wd:Q788926 .}
MINUS {?item wdt:P1995 wd:Q162606 .}
MINUS {?item wdt:P1995 wd:Q162555 .}
MINUS {?item wdt:P1995 wd:Q83042 .}
MINUS {?item wdt:P1995 wd:Q7867 .}
MINUS {?item wdt:P1995 wd:Q171171 .}
MINUS {?item wdt:P1995 wd:Q120569 .}
MINUS {?item wdt:P1995 wd:Q327657 .}
MINUS { ?item wdt:P1995 wd:Q161437 .}
MINUS {?item wdt:P1995 wd:Q103824 .}
MINUS {?item wdt:P1995 wd:Q105650 .}
MINUS {?item wdt:P1995 wd:Q10379 .}
MINUS {?item wdt:P1995 wd:Q203337 .}
MINUS {?item wdt:P1995 wd:Q2861470 .}
MINUS {?item wdt:P1995 wd:Q101929 .}
MINUS {?item wdt:P1995 wd:Q123028 .}
MINUS {?item wdt:P1995 wd:Q5284418 .}
MINUS {?item wdt:P1995 wd:Q189553 .}
MINUS {?item wdt:P1995 wd:Q177635 .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
LIMIT 10000
Initially 7447 hits. Charles Matthews (talk) 15:57, 11 July 2017 (UTC)
Today 6098 hits. Charles Matthews (talk) 14:31, 25 July 2017 (UTC)
Variant to show specialities outside the chosen 20:
#Diseases with "medical specialty" not associated with the top 20 medical specialties
SELECT ?item ?itemLabel ?specLabel
WHERE {
?item wdt:P31 wd:Q12136 .
?item wdt:P1995 ?spec .
MINUS {?item wdt:P1995 wd:Q1071953 .}
MINUS {?item wdt:P1995 wd:Q788926 .}
MINUS {?item wdt:P1995 wd:Q162606 .}
MINUS {?item wdt:P1995 wd:Q162555 .}
MINUS {?item wdt:P1995 wd:Q83042 .}
MINUS {?item wdt:P1995 wd:Q7867 .}
MINUS {?item wdt:P1995 wd:Q171171 .}
MINUS {?item wdt:P1995 wd:Q120569 .}
MINUS {?item wdt:P1995 wd:Q327657 .}
MINUS { ?item wdt:P1995 wd:Q161437 .}
MINUS {?item wdt:P1995 wd:Q103824 .}
MINUS {?item wdt:P1995 wd:Q105650 .}
MINUS {?item wdt:P1995 wd:Q10379 .}
MINUS {?item wdt:P1995 wd:Q203337 .}
MINUS {?item wdt:P1995 wd:Q2861470 .}
MINUS {?item wdt:P1995 wd:Q101929 .}
MINUS {?item wdt:P1995 wd:Q123028 .}
MINUS {?item wdt:P1995 wd:Q5284418 .}
MINUS {?item wdt:P1995 wd:Q189553 .}
MINUS {?item wdt:P1995 wd:Q177635 .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
LIMIT 10000
After some cleanup, 66 hits. Charles Matthews (talk) 14:28, 25 July 2017 (UTC)
Interesting that over 3K items are marked with ICD-9 ID (P493) but not with health specialty (P1995):
SELECT ?item WHERE {
?item wdt:P493 [] .
OPTIONAL { ?item wdt:P1995 ?dummy0 }
FILTER(!bound(?dummy0))
}
Today this has 3166 hits.
From http://www.icd9data.com/2015/Volume1/default.htm, tabulated against best matches:
ICD-9 range | ICD-9 definition | health specialty (P1995) best match |
---|---|---|
001-139 | Infectious And Parasitic Diseases | infectious diseases (Q788926) rather than infectious disease (Q18123741) |
140-239 | Neoplasms | oncology (Q162555) |
240-279 | Endocrine, Nutritional And Metabolic Diseases, And Immunity Disorders | endocrinology (Q162606), immunology (Q101929) |
280-289 | Diseases Of The Blood And Blood-Forming Organs | hematology (Q103824) |
290-319 | Mental Disorders | psychiatry (Q7867) |
320-389 | Diseases Of The Nervous System And Sense Organs | neurology (Q83042) |
390-459 | Diseases Of The Circulatory System | cardiology (Q10379) |
460-519 | Diseases Of The Respiratory System | pulmonology (Q203337) |
520-579 | Diseases Of The Digestive System | gastroenterology (Q120569) |
580-629 | Diseases Of The Genitourinary System | urology (Q105650) |
630-679 | Complications Of Pregnancy, Childbirth, And The Puerperium | obstetrics (Q5284418) |
680-709 | Diseases Of The Skin And Subcutaneous Tissue | dermatology (Q171171) |
710-739 | Diseases Of The Musculoskeletal System And Connective Tissue | rheumatology (Q327657) |
740-759 | Congenital Anomalies | medical genetics (Q1071953) |
760-779 | Certain Conditions Originating In The Perinatal Period | |
780-799 | Symptoms, Signs, And Ill-Defined Conditions | |
800-999 | Injury And Poisoning | emergency medicine (Q2861470) |
Charles Matthews (talk) 09:06, 13 July 2017 (UTC)
The suggestion is to use, for example,
#ICD9 handling
SELECT DISTINCT ?item ?itemLabel
#?icd ?icd2 ?icd3 removed so that SELECT DISTINCT works
WHERE {
?item wdt:P31 wd:Q12136 .
?item wdt:P493 ?icd .
OPTIONAL { ?item wdt:P1995 ?dummy0 }
FILTER(!bound(?dummy0))
BIND(SUBSTR(?icd,1,3) AS ?icd1) #drop decimal point and anything after
BIND(CONCAT("1",?icd1) AS ?icd2) #add initial digit 1 to preserve initial zeroes
BIND(xsd:integer(?icd2) AS ?icd3) # casting function to turn string to integer
FILTER(1000 < ?icd3 && ?icd3 <1140)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
to address the first row of the table. Charles Matthews (talk) 16:01, 13 July 2017 (UTC)
Comment: the binding can be done in one line, such as BIND(xsd:integer(CONCAT("1",SUBSTR(?icd,1,3))) AS ?icd1) . Charles Matthews (talk) 17:00, 24 July 2017 (UTC)
#ICD9 handling. This prototype does a case analysis with three cases. Output in two columns adapted to QuickStatement input.
SELECT DISTINCT ?item ?itemLabel ?ms1 ?ms2
WHERE {
?item wdt:P31 wd:Q12136 .
?item wdt:P493 ?icd .
OPTIONAL { ?item wdt:P1995 ?dummy0 }
FILTER(!bound(?dummy0))
BIND(xsd:integer(CONCAT("1",SUBSTR(?icd,1,3))) AS ?icd1)
#One-line processing of ICD-9 code into integer. Currently fails in a few alphanumeric cases.
BIND((IF (1000 < ?icd1 && ?icd1 <1140, "Q788926",1/0)) AS ?spec1)
#Dividing by zero, deliberate introduction of error in the "else" part of IF. This causes a desired fallback in COALESCE.
BIND((IF (1139 < ?icd1 && ?icd1 <1240, "Q162555",1/0)) AS ?spec2)
BIND((IF (1279 < ?icd1 && ?icd1 <1290, "Q103824",1/0)) AS ?spec3)
BIND(COALESCE(?spec1,?spec2,?spec3,2) AS ?ms2)
VALUES ?ms1 { "P1995" }
VALUES ?ms2 { "Q788926" "Q162555" "Q103824" }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
This is a prototype for a QuickStatements updater with large case analysis. Baseline hits 226. Charles Matthews (talk) 09:05, 25 July 2017 (UTC)
This bubble chart shows the main prefixes to be D00, D01, C53, D02, C56 in the MeSH descriptor ID (P486)-marked diseases not yet given a health specialty (P1995) value.
#Diseases without "medical speciality", prefix of MeSH ID
#defaultView:BubbleChart
SELECT ?code ?codeLabel ?count
WHERE
{
{
SELECT ?code (COUNT(?item) AS ?count)
WHERE {
?item wdt:P31 wd:Q12136 .
?item wdt:P486 ?meshid .
OPTIONAL { ?item wdt:P1995 ?dummy0 }
FILTER(!bound(?dummy0))
BIND(SUBSTR(?meshid,1,3) AS ?code)
}
GROUP BY ?code
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 100
Working conclusion, though, is that MeSH tree code (P672) is going to be of more use. Those properties really should be added in parallel. Though certainly they haven't been, so far. Wouldn't be that hard a bot job to add MeSH tree code (P672) where MeSH descriptor ID (P486) exists, as a preliminary to getting into this area. Charles Matthews (talk) 17:45, 13 July 2017 (UTC)
ICD-10 (P494) sanity check edit
This query
#ICD10 sanity check for "neurology" medical specialty
SELECT ?item ?itemLabel ?icd
WHERE {
?item wdt:P1995 wd:Q83042 .
?item wdt:P494 ?icd .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
shows a wide range of ICD-10 (P494) identifiers (not just prefix H) for items marked as health specialty (P1995) neurology (Q83042). Explanations are not so hard to find: strokes can affect the brain, for example. Dexbot added some health specialty (P1995) statements two years ago, don't know the rationale. Charles Matthews (talk) 09:11, 26 July 2017 (UTC)
Gene Ontology IDs applied to genes edit
Here is a sample query:
SELECT DISTINCT ?gene ?geneLabel
WHERE
{
?gene wdt:P31 wd:Q7187 .
?gene wdt:P688 ?protein .
?protein wdt:P682 ?process .
?process wdt:P686 ?go .
BIND(str(?go) AS ?gostring)
FILTER (?gostring = "GO:0006625")
SERVICE wikibase:label {bd:serviceParam wikibase:language "en" .}
}
MeSH Code tree handling for "health speciality" edit
#MeSH Code tree handling for "health speciality" P672
#C23 (clinical finding) suppressed
#Set out in columns for QuickStatements, though itemLabel is not for that.
SELECT DISTINCT ?item ?itemLabel ?hs1 ?hs2
WHERE {
?item wdt:P31 wd:Q12136 .
?item wdt:P672 ?meshc .
MINUS { ?item wdt:P1995 [ ]}
BIND(IF (STRSTARTS(?meshc, 'C01.252'), "Q727028", 1/0) AS ?spec01a) #bacterial infectious disease
BIND(IF (STRSTARTS(?meshc, 'C01.703'), "Q464067", 1/0) AS ?spec01b) #fungal infectious disease
BIND(IF (STRSTARTS(?meshc, 'C02'), "Q1928978", 1/0) AS ?spec02) #viral infectious disease
BIND(IF (STRSTARTS(?meshc, 'C03'), "Q1601794", 1/0) AS ?spec03) #parasitic infectious diseases
BIND(IF (STRSTARTS(?meshc, 'C04'), "Q162555", 1/0) AS ?spec04) #oncology
BIND(IF (STRSTARTS(?meshc, 'C05'), "Q4116663", 1/0) AS ?spec05) #musculoskeletal disorder
BIND(IF (STRSTARTS(?meshc, 'C06'), "Q120569", 1/0) AS ?spec06) #gastroenterology
BIND(IF (STRSTARTS(?meshc, 'C07'), "Q55082545", 1/0) AS ?spec07) #stomatognathic disease
BIND(IF (STRSTARTS(?meshc, 'C08'), "Q203337", 1/0) AS ?spec08) #pulmonology
BIND(IF (STRSTARTS(?meshc, 'C09'), "Q189553", 1/0) AS ?spec09) #otolaryngology
BIND(IF (STRSTARTS(?meshc, 'C10'), "Q83042", 1/0) AS ?spec10) #neurology
BIND(IF (STRSTARTS(?meshc, 'C11'), "Q161437", 1/0) AS ?spec11) #ophthalmology
BIND(IF (STRSTARTS(?meshc, 'C12.294'), "Q6742925", 1/0) AS ?spec12a) #male reproductive system disease
BIND(IF (STRSTARTS(?meshc, 'C12.777'), "Q7900883", 1/0) AS ?spec12b) #urinary system disease
BIND(IF (STRSTARTS(?meshc, 'C13'), "Q80015", 1/0) AS ?spec13) #obstetrics gynecology
BIND(IF (STRSTARTS(?meshc, 'C14'), "Q389735", 1/0) AS ?spec14) #cardiovascular disease
BIND(IF (STRSTARTS(?meshc, 'C15.378'), "Q103824", 1/0) AS ?spec15a) #hematology
BIND(IF (STRSTARTS(?meshc, 'C15.604'), "Q6708237", 1/0) AS ?spec15b) #lymphatic system disease
BIND(IF (STRSTARTS(?meshc, 'C16.320'), "Q727096", 1/0) AS ?spec16a) #genetic disease
BIND(IF (STRSTARTS(?meshc, 'C16.614'), "Q200779", 1/0) AS ?spec16b) #birth defect
BIND(IF (STRSTARTS(?meshc, 'C17.3'), "Q1779300", 1/0) AS ?spec17a) #connective tissue disease
BIND(IF (STRSTARTS(?meshc, 'C17.8'), "Q171171", 1/0) AS ?spec17b) #dermatology
BIND(IF (STRSTARTS(?meshc, 'C18.452'), "Q2351083", 1/0) AS ?spec18a) #disease of metabolism
BIND(IF (STRSTARTS(?meshc, 'C18.654'), "Q1361144", 1/0) AS ?spec18b) #nutrition disorder
BIND(IF (STRSTARTS(?meshc, 'C19'), "Q162606", 1/0) AS ?spec19) #endocrinology
BIND(IF (STRSTARTS(?meshc, 'C20'), "Q101929", 1/0) AS ?spec20) #immunology
BIND(IF (STRSTARTS(?meshc, 'C21'), "Q3751709", 1/0) AS ?spec21) #environmental disease
BIND(IF (STRSTARTS(?meshc, 'C22'), "Q170201", 1/0) AS ?spec22) #veterinary medicine
BIND(COALESCE(?spec01a,?spec01b,?spec02,?spec03,?spec04,?spec05,?spec06,?spec07,?spec08,?spec09,?spec10,?spec11,?spec12a,?spec12b,?spec13,?spec14,?spec15a,?spec15b,?spec16a,?spec16b,?spec17a,?spec17b,?spec18a,?spec18b,?spec19,?spec20,?spec21,?spec22,000000) AS ?hs2)
VALUES ?hs1 { "P1995" }
VALUES ?hs2 { "Q727028" "Q464067" "Q1928978" "Q1601794" "Q162555" "Q4116663" "Q120569" "Q55082545" "Q203337" "Q189553" "Q83042" "Q161437" "Q6742925" "Q7900883" "Q80015" "Q389735" "Q103824" "Q6708237" "Q727096" "Q200779" "Q1779300" "Q171171" "Q2351083" "Q1361144" "Q162606" "Q101929" "Q3751709" "Q170201" }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Charles Matthews (talk) 12:34, 8 August 2018 (UTC)
Cleanup steps:
- Replaced cardiovascular disease (Q389735) by cardiology (Q10379)
- Replaced urinary system disease (Q7900883) by urology (Q105650)
- Replaced bacterial infectious disease (Q727028) by infectious diseases (Q788926)
- Replaced fungal infectious disease (Q464067) by infectious diseases (Q788926)
- Replaced viral infectious disease (Q1928978) by infectious diseases (Q788926)
- Replaced parasitic infectious diseases (Q1601794) by infectious diseases (Q788926)
- Replaced musculoskeletal disorder (Q4116663) by rheumatology (Q327657)
- Replaced stomatognathic disease (Q55082545) by oral and maxillofacial surgery (Q504033)
- Replaced environmental disease (Q3751709) by environmental medicine (Q512285)
- Replaced nutrition disorder (Q1361144) by nutrition (Q2138622)
- Replaced male reproductive system disease (Q6742925) by urology (Q105650)
- Replaced lymphatic system disease (Q6708237) by angiology (Q539690)
- Replaced congenital disorder (Q727096) by medical genetics (Q1071953)
- Replaced genetic disease (Q200779) by pediatrics (Q123028)
- Replaced connective tissue disease (Q1779300) by rheumatology (Q327657)
- Replaced metabolic disease (Q2351083) by endocrinology (Q162606)