Wikidata:ScienceSource project/Project overview

ScienceSource uses a long software pipeline, with a crucial human review step, to find medical facts from the mass of biomedical literature catalogued here on Wikidata. Details of the seven tools and dozen steps in the process can be found on the tools page. This page will give a general view of what goes on. There are also short videos.

Scientific literature, as held in large repositories such as PubMed Central, is not a trackless waste: it can be searched. The metadata (catalogue content) transferred here can then be used in Wikidata queries, to find the most suitable sources.

To extract relational information, such as can be stored in Wikidata, one needs to think in terms of subject, object, and the relation that holds between them. The project concentrates on the case where the relation is medical condition treated (P2175). That means a drug and a "medical condition" (much the same as a "disease" in the broad sense of bodily abnormality) have to be found in a text. The approach is fundamentally simple, but systematic: look for places in texts where drugs and diseases, from given lists, are found close together. Human understanding of natural language can then be brought to bear, to see if the hoped-for relation is actually what the wording expresses.

By restricting the articles searched to the kind medical people find credible, the drug-disease relations found then gain status to become Wikimedia content.