Wikidata talk:Mismatch Finder/Collaboration/Purdue Summer of Data 2024
Data type edit
It's hard to give a suggestion without knowing the kinds of methods being considered. For example is this going to use structured data? Or unstructured? Are you going to attempt to do entity linkage?
For example, have you considered looking for mismatches between wikidata and the sitelinked enwiki article. There are substantial errors there as I have found in the past (see User:BrokenSegue/PsychiqConflicts where most of the errors were caused by bad sitelinks or cases where the topic of the enwiki article had shifted). My personal bias is that the most valuable thing to be done is to structure unstructured information in general (so the source doesn't really matter that much) and then compare that to the structured data on Wikidata. See https://summerofcode.withgoogle.com/programs/2023/projects/cKuagkf8 for an example of such a project moving forward.
Other thoughts:
- The ORCID database tends not to be very rich so there are a lot of cases where all you will have is like a name and a few papers.
- How would you link https://data.gov/ data to our items?
- If you are willing to give up on only "open datasets" I would be interested in trying to resolve mismatches against Google Knowledge Graph ID (P2671). It's my impression that google's knowledge graph is the most advanced so "taking" from them would be good.
- What about looking for discrepancies between wikidata and DBpedia if you aren't interested in unstructured data?
FDALabel database edit
I have done an import from this database, but I doubt I did it mismatch-free, and there may be more info there worth importing. The text of each warning is generally concise and consists only of the most import warnings, so it may be worth storing here and adding to articles via wikidata.
I have found/made a complete list (as of a few weeks ago; not sure if it's updated whenever retreived) of drugs with FDA-mandated Black box warnings using it : https://nctr-crs.fda.gov/fdalabel/ui/spl-summaries/criteria/343802 (query: https://nctr-crs.fda.gov/fdalabel/ui/#/search/spl-summaries/criteria/343802) using The FDALabel Database. It allows Presence of... specific sections of the prescribing information (e.g., BOXED WARNING) per the main page documentation. It produces a result of >16k labels with boxed warnings. I think I did import it mostly adequately but I doubt I did it mismatch-free. (I know it's at least partly wrong - the source is missing for paracetamol, which I can't seem to fix by rerunning the initial upload to wikibase correctly. The initial upload added claim: legal status (medicine) (P3493): boxed warning (Q879952) but without the source reference, but I've added the reference to at least most of the claims. Also, I think I have missed some other things- see "To-Do" in Match Details below. I probably should have matched by UNIIs, ideally, but couldn't figure it out. I matched on:
Match Details. |
---|
Q113145171 type of chemical entity (658) DONE - in 2 parts (either way, needs cites - paracetamol lacks) (but does have applies to jurisdiction / United States of America )
Q59199015 group of sterioisomers (51) DONE Q12140 medication (87) DONE Q169336 mixture (45) doing DONE? Q79529 chemical substance (40) DONE?
Q35456 essential med Q119892838 type of mixture of chem Q28885102 pharm prod Q467717 racemate Q8054 protein (biomolecule) Q422248 mab Q679692 biopharmaceutical Q213901 gene therapy Q2432100 vet drug
Q13442814 article (NO) (230) Q30612 clinical trial (NO) (74) Q7318358 review article (NO) (30) Q16521 taxon (NO?) (5)
Q112826905 Q11344 Q1259977 Q425158 Q425402 Q55640599 Q79460 Q815382 Q84467700 Q912807 |
RudolfoMD (talk) 22:36, 1 December 2023 (UTC)
- Anybody? You can see the warning at, for example Deferiprone - "WARNING[1]" appears in the Infobox.
- I just created Wikidata:Project_chat#Black_box_warnings_project,_parts_1_&_2 to measure interest.
References edit
- ↑ "FDA-sourced list of all drugs with black box warnings (Use Download Full Results and View Query links.)". nctr-crs.fda.gov. w:FDA. Retrieved 22 Oct 2023.