Wikidata talk:Mismatch Finder/Collaboration/Purdue Summer of Data 2024

Latest comment: 4 months ago by RudolfoMD in topic FDALabel database

Data type edit

It's hard to give a suggestion without knowing the kinds of methods being considered. For example is this going to use structured data? Or unstructured? Are you going to attempt to do entity linkage?

For example, have you considered looking for mismatches between wikidata and the sitelinked enwiki article. There are substantial errors there as I have found in the past (see User:BrokenSegue/PsychiqConflicts where most of the errors were caused by bad sitelinks or cases where the topic of the enwiki article had shifted). My personal bias is that the most valuable thing to be done is to structure unstructured information in general (so the source doesn't really matter that much) and then compare that to the structured data on Wikidata. See https://summerofcode.withgoogle.com/programs/2023/projects/cKuagkf8 for an example of such a project moving forward.

Other thoughts:

  • The ORCID database tends not to be very rich so there are a lot of cases where all you will have is like a name and a few papers.
  • How would you link https://data.gov/ data to our items?
  • If you are willing to give up on only "open datasets" I would be interested in trying to resolve mismatches against Google Knowledge Graph ID (P2671). It's my impression that google's knowledge graph is the most advanced so "taking" from them would be good.
  • What about looking for discrepancies between wikidata and DBpedia if you aren't interested in unstructured data?

BrokenSegue (talk) 17:16, 1 December 2023 (UTC)Reply

FDALabel database edit

I have done an import from this database, but I doubt I did it mismatch-free, and there may be more info there worth importing. The text of each warning is generally concise and consists only of the most import warnings, so it may be worth storing here and adding to articles via wikidata.

I have found/made a complete list (as of a few weeks ago; not sure if it's updated whenever retreived) of drugs with FDA-mandated Black box warnings using it : https://nctr-crs.fda.gov/fdalabel/ui/spl-summaries/criteria/343802 (query: https://nctr-crs.fda.gov/fdalabel/ui/#/search/spl-summaries/criteria/343802) using The FDALabel Database. It allows Presence of... specific sections of the prescribing information (e.g., BOXED WARNING) per the main page documentation. It produces a result of >16k labels with boxed warnings. I think I did import it mostly adequately but I doubt I did it mismatch-free. (I know it's at least partly wrong - the source is missing for paracetamol, which I can't seem to fix by rerunning the initial upload to wikibase correctly. The initial upload added claim: legal status (medicine) (P3493): boxed warning (Q879952) but without the source reference, but I've added the reference to at least most of the claims. Also, I think I have missed some other things- see "To-Do" in Match Details below. I probably should have matched by UNIIs, ideally, but couldn't figure it out. I matched on:

RudolfoMD (talk) 22:36, 1 December 2023 (UTC)Reply

Anybody? You can see the warning at, for example Deferiprone - "WARNING[1]" appears in the Infobox.
I just created Wikidata:Project_chat#Black_box_warnings_project,_parts_1_&_2 to measure interest.
- RudolfoMD (talk) 01:18, 11 December 2023 (UTC)Reply

References edit

  1. "FDA-sourced list of all drugs with black box warnings (Use Download Full Results and View Query links.)". nctr-crs.fda.gov. w:FDA. Retrieved 22 Oct 2023.
Return to the project page "Mismatch Finder/Collaboration/Purdue Summer of Data 2024".