Wikidata:WikidataCon 2017/Submissions/Wikidata for biomedical knowledge integration and curation

This is an Open submission for WikidataCon 2017 that has not yet been reviewed by the members of the Program Committee.

Submission no. 76

Title of the submission: Wikidata for biomedical knowledge integration and curation

Author(s) of the submission: Gregory Stupp (presenting), Sebastian Burgstaller, Tim Putman, Andra Waagmeester (attending), Julia Turner, Elvira Mitraka, Matthew Jacobson, Núria Queralt-Rosinach, Paul Pavlidis, Lynn Schriml, Benjamin Good, Andrew Su

E-mail address: gstuppscripps.edu

Country of origin: USA

Affiliation, if any (organisation, company etc.): Scripps Research (Q793867), Micelio (Q28381786), University of Maryland, Baltimore (Q4119470), University of British Columbia (Q391028)

Type of session: Talk

Length of session: 30-45 min

Ideal number of attendees: 50
EtherPad for documentation: https://etherpad.wikimedia.org/p/WikidataCon-76

Abstract

The sum total of biomedical knowledge is accumulating at an explosive rate. There are now over 1.2 million new articles published every year, averaging to one new article every 26 seconds. Unfortunately, however, the entirety of that knowledge is not easily accessible. In most cases, biomedical knowledge is locked away in free-text research articles, which are very difficult to use for querying and computation. In some cases, that knowledge has been deposited in structured databases, but even then the fragmented landscape of such databases is a barrier to knowledge integration.

Here, we describe the use of Wikidata as an open, community-maintained biomedical knowledge base. We have seeded Wikidata with data on key biomedical entities, including genes, proteins, diseases, drugs, genetic variants, and microbes. To ensure source databases are properly credited, we have implemented a standardized model for referencing and attribution. These data, combined with other data sets imported by the broader Wikidata community, enable powerful integrative queries that span multiple domain areas via the Wikidata SPARQL endpoint.

The emphasis of this abstract is on Wikidata as resource for biomedical Open Data that can serve as a foundation for other bioinformatics applications and analyses. In addition, the code developed to execute this project is also available as Open Source software. This suite of code includes modules for populating Wikidata, for automatically synchronizing with source databases, and for creating domain-specific applications to engage specific user communities.

What will attendees take away from this session?

Understanding the coverage of biomedical entities in Wikidata and the structure they are represented in
Enable use of and expansion of information about biomedical entities in Wikidata by scientists and non-scientists alike

Slides or further information
Slides: https://docs.google.com/presentation/d/1wUYnNEJF-d37TGQnMXQfmWK7o7VAqjoS-KCRiKd99Mc/edit?usp=sharing

Project Website: https://www.wikidata.org/wiki/Wikidata:WikiProject_Gene_Wiki

Source Code: https://github.com/SuLab/GeneWikiCentral (and repos linked therein)

License: MIT

Special requests

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest.