Wikidata:ScienceSource project/MeSH and cleanup dashboard
In the last quarter of 2019, a major part of the work on ScienceSource was matching the major Medical Subject Headings into Wikidata. These are the D-numbers of the MeSH system. They should be matched 1-to-1 here, so the "reverse lookup" from D-number to Wikidata Q-number can work.
Example: starting from D012343, we can get to transfer RNA (Q201448). There are good tools for this (e.g. the Resolver dedicated tool at https://tools.wmflabs.org/wikidata-todo/resolver.php?, when you enter 486 and D012343).
We want this basic operation to work in all cases, and no ambiguities to arise. This is a requirement for the NCBI2wikidata tool to operate correctly. Of course the D-numbers must also be on the correct items, and this raises another set of issues. Many matches have been inaccurate: the label may have been OK, but actually the item may, for example, be for an article, not the required topic (has been quite common).
Table edit
These stats relate to the issues of:
- Getting a complete set of MeSH descriptor ID (P486) D-number statements into Wikidata. All the mix'n'match catalogs at https://tools.wmflabs.org/mix-n-match/#/group/medical are now complete. Since those catalogs were created towards the end of 2017, there are now MeSH updates to take into account. The current total from mix'n'match is around 28.5K, but the actual current total might be nearer 30K.
- Removing the database constraint violations caused by D-number duplications.[1]
- Creating the MeSH tree code (P672) statements that go with MeSH descriptor ID (P486)[2]
- Building up the tagging of items for review articles with copyright license (P275) statements
- Creating some main subject (P921) statement for all such items
See the footnotes for further details.
MeSH has a major application to searching PubMed (Q180686), which is how it enters ScienceSource, at the very start of the software pipeline. In order to automate batch searches, the term itself is being entered as a subject named as (P1810) qualifier statement to the MeSH descriptor ID (P486) statements. The effect is to allow lists of MeSH terms to be gathered from SPARQL queries.
Date DD/MM/YYYY | MeSH D-number total[3] | MeSH descriptor ID (P486) unique value constraint violations[4][5][6] | MeSH D-number MeSH descriptor ID (P486) lacking MeSH tree code (P672)[7] | review article (Q7318358) with copyright license (P275)[8] | Review articles with license lacking main subject (P921)[9] |
---|---|---|---|---|---|
18/08/2019 | 19779 | 2420 | 11324 | 57360 | 4720 |
20/09/2019 | 19820 | 2322 | 11204 | 57547 | 4686 |
27/09/2019 | 20046 | 2195 | 11143 | 60297 | 3582 |
3/10/2019 | 21110 | 2058 | 12074 | 62062 | 3215 |
22/10/2019 | 23512 | 1979 | 14212 | 64494 | 2542 |
4/11/2019 | 24712 | 1820 | 15326 | 66292 | 2052 |
8/12/2019 | 28205 | 1729 | 18320 | 68043 | 1751 |
15/12/2019 | 28635 | 1717 | 18842 | 68277 | 1663 |
24/12/2019 | 28706 | 1688 | 18335 | 55230 | 1561 |
11/07/2020 | 28701 | 1616 | 18052 | 60762 | 1634 |
1/08/2020 | 28689 | 1354 | 17643 | 62439 | 1459 |
23/08/2020 | 28699 | 1189 463 C-numbers, 726 D-numbers |
17111 | 63877 | 1288 |
13/10/2020 | 28716 | 1052 462 C-numbers, 590 D-numbers |
16373 | 69676 | 805 |
13/11/2020 | 28697 | 456 446 C-numbers, 10 D-numbers |
16003 | 72656 | 597 |
18/01/2021 | 28726 | 454 440 C-numbers, 14 D-numbers |
15674 | 75915 | 465 |
1/01/2022 | 28883 | 470 436 C-numbers, 34 D-numbers |
13227 | 87496 | 248 |
10/05/2022 | 29125 | 584 442 C-numbers, 144 D-numbers |
12218 | 87523 | 222 |
17/11/2022 | 30302 | 577 442 C-numbers, 135 D-numbers |
11219 | 94418 | 1782 |
2/07/2023 | 30535 | 560 443 C-numbers, 117 D-numbers |
11041 | 133275 | 18595 |
Reviews with license edit
Date DD/MM/YYYY | CC0[10] | public domain sign[11] | open access[12] | CC[13] | CC-BY[14] | CC-BY-SA[15] | CC-BY-SA 2.0[16] | CC-BY 2.5[17] | CC-BY-SA 2.5[18] | CC-BY 3.0[19] | CC-BY-SA 3.0[20] | CC-BY-SA 4.0[21] | CC-BY-SA 4.0 Int[22] | CC-BY-NC[23] | CC-BY-NC 2.5[24] | CC-BY-NC-SA 2.5[25] | CC-BY-NC-ND[26] | CC-BY-NC-ND 3.0[27] | CC-BY-NC 4.0[28] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2/07/2023 | 222 | 661 | 1 | 2 | 22718 | 0 | 8308 | 2149 | 1 | 14230 | 0 | 208 | 51008 | 6620 | 248 | 30 | 2849 | 1867 | 6704 |
MeSH updates edit
The mix'n'match catalogs were compiled in 2017. Supplementary work then added the additional MeSH D-numbers 2018-2022. The following type of query can retrieve subsequent additions from https://id.nlm.nih.gov/mesh/query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#> PREFIX mesh: <http://id.nlm.nih.gov/mesh/> SELECT DISTINCT ?s FROM <http://id.nlm.nih.gov/mesh> WHERE { ?s meshv:dateEstablished "2023-01-01"^^xsd:date } ORDER BY ASC(?s)
Version to search by initial letter of tree code:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#> PREFIX mesh: <http://id.nlm.nih.gov/mesh/> PREFIX mesh2024: <http://id.nlm.nih.gov/mesh/2024/> PREFIX mesh2023: <http://id.nlm.nih.gov/mesh/2023/> PREFIX mesh2022: <http://id.nlm.nih.gov/mesh/2022/> SELECT DISTINCT ?d ?name ?tn FROM <http://id.nlm.nih.gov/mesh> WHERE { ?d meshv:treeNumber ?tn; meshv:dateEstablished "2024-01-01"^^xsd:date. ?d rdfs:label ?name . FILTER (regex (?tn, "Z")) } ORDER BY ASC(?d)
MeSH tree comparison edit
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#> PREFIX mesh: <http://id.nlm.nih.gov/mesh/> PREFIX mesh2024: <http://id.nlm.nih.gov/mesh/2024/> PREFIX mesh2023: <http://id.nlm.nih.gov/mesh/2023/> PREFIX mesh2022: <http://id.nlm.nih.gov/mesh/2022/> SELECT ?d ?name ?tn FROM <http://id.nlm.nih.gov/mesh> WHERE { ?d meshv:treeNumber ?tn. ?d rdfs:label ?name . FILTER (regex (?tn, "D03.132")) }
SELECT ?item ?mesh ?itemLabel ?string
WHERE
{{?item wdt:P672 ?string;
wdt:P486 ?mesh.
FILTER (STRSTARTS(?string, "D03.132"))
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}}
order by asc(?string)
Large items queries edit
#Megabyte range scholarly article items, review articles with CC license
SELECT ?item ?statementcount ?itemLabel
WHERE {?item wdt:P31 wd:Q13442814;
wdt:P31 wd:Q7318358;
wdt:P275 [ ];
wikibase:statements ?statementcount.
FILTER (?statementcount > 600 )
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
For referencing buildup:
#Locating P31 statements referenced to PubMed, multiple retrievals
SELECT ?statement (COUNT(?statement) AS ?count)
WHERE {?reference pr:P248 wd:Q180686;
pr:P813 ?date.
?statement prov:wasDerivedFrom ?reference.
?statement ps:P31 wd:Q7318358.
}
GROUP BY ?statement
HAVING (COUNT(?statement) > 10)
#Locating P921 statements referenced to PubMed, specific value
SELECT ?statement
WHERE {?reference pr:P248 wd:Q180686.
?statement prov:wasDerivedFrom ?reference.
?statement ps:P921 wd:Q2335423.
}
#Locate all statements, items and retrieval dates for a property given reference type
SELECT DISTINCT ?date ?statement ?item
WHERE {?reference pr:P248 wd:Q180686;
pr:P813 ?date.
?statement prov:wasDerivedFrom ?reference.
?statement ps:P921 wd:Q8084905.
?item p:P921 ?statement}
ORDER BY ASC(?date)
Lone MeSH statements edit
#Lone D-number MeSH statements
SELECT DISTINCT ?item ?subject ?itemLabel
WHERE {?item wdt:P486 ?subject.
?item wikibase:statements ?statementcount.
FILTER ( ?statementcount = 1 )
FILTER (STRSTARTS(?subject, "D"))
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Notes edit
- ↑ As far as is reasonable. Some edge cases are mentioned on Wikidata talk:Database reports/Constraint violations/P486.
- ↑ There are often multiple statements, which can be incomplete here. There are also obsolete codes to delete, and the codes are updated.
- ↑ Try it!
#All P486 identifiers starting with D SELECT ?item ?mesh WHERE {?item wdt:P486 ?mesh. FILTER(strstarts(?mesh, 'D')) }
- ↑ Wikidata:Database reports/Constraint violations/P486
- ↑ Try it!
# Quick query for items with most values of the property P486 after User:Infovarius, 2019-07-15 SELECT ?item ?itemLabel ?cnt { { SELECT ?item (COUNT(?value) AS ?cnt) { ?item wdt:P486 ?value FILTER(STRSTARTS(?value, 'D')) } GROUP BY ?item ORDER BY DESC(?cnt) LIMIT 100 } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?cnt)
- ↑ Try it!
#Unique value checker, for P486 SELECT DISTINCT ?item1 ?item1Label ?item2 ?item2Label ?value { ?item1 wdt:P486 ?value . ?item2 wdt:P486 ?value . FILTER( ?item1 != ?item2 && STR( ?item1 ) < STR( ?item2 ) ) . FILTER( STRSTARTS(?value, 'D') ) SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } . } LIMIT 1000
- ↑ Try it!
#Topics with MeSH Descriptor ID (D-number) lacking MeSH Code ID SELECT DISTINCT ?item ?itemLabel WHERE {?item wdt:P486 ?meshid. FILTER(STRSTARTS(?meshid,"D")) MINUS {?item wdt:P672 [ ]} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
- ↑ Try it!
#Article items, reviews with license SELECT ?item ?itemLabel WHERE {?item wdt:P31 wd:Q7318358; wdt:P275 [ ]. }
- ↑ Try it!
#Article items, reviews with license, lacking main subject (NB the date ordering, while convenient, introduces duplications on the basis of e-publication being earlier than hard copy - the table uses the number calculated without ?date) SELECT ?item ?itemLabel ?date WHERE {?item wdt:P31 wd:Q7318358; wdt:P577 ?date; wdt:P275 [ ]. MINUS{?item wdt:P921 [ ]} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY ASC(?date)
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q6938433. }
- ↑ Try it!
#Article items, reviews with PDM license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q7257361. }
- ↑ Try it!
#Article items, reviews with "open access" license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q232932. }
- ↑ Try it!
#Article items, reviews with CC license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q284742. }
- ↑ Try it!
#Article items, reviews with CC-BY license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q6905323. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q6905942. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q19125117. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q18810333. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q19113751. }
- ↑ Try it!
#Article items, reviews with CC-BY 3.0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q14947546. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q14946043. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q18199165. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q20007257. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q6936496. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q19113746. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q19068212. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q6937225. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q19125045. }
- ↑ Try it!
#Article items, reviews with CC0 license SELECT ?item WHERE {?item wdt:P31 wd:Q13442814; wdt:P31 wd:Q7318358; wdt:P275 wd:Q34179348. }