Wikidata:WikidataCon 2017/Submissions/Wikidata for biomedical knowledge integration and curation

 This is an Open submission for WikidataCon 2017 that has not yet been reviewed by the members of the Program Committee.

Submission no. 76
Title of the submission
Wikidata for biomedical knowledge integration and curation

Author(s) of the submission
Gregory Stupp (presenting), Sebastian Burgstaller, Tim Putman, Andra Waagmeester (attending), Julia Turner, Elvira Mitraka, Matthew Jacobson, Núria Queralt-Rosinach, Paul Pavlidis, Lynn Schriml, Benjamin Good, Andrew Su
E-mail address
gstupp@scripps.edu
Country of origin
USA
Affiliation, if any (organisation, company etc.)
Scripps Research (Q793867), Micelio (Q28381786), University of Maryland, Baltimore (Q4119470), University of British Columbia (Q391028)

Type of session
Talk
Length of session
30-45 min
Ideal number of attendees
50
EtherPad for documentation
https://etherpad.wikimedia.org/p/WikidataCon-76

Abstract

The sum total of biomedical knowledge is accumulating at an explosive rate. There are now over 1.2 million new articles published every year, averaging to one new article every 26 seconds. Unfortunately, however, the entirety of that knowledge is not easily accessible. In most cases, biomedical knowledge is locked away in free-text research articles, which are very difficult to use for querying and computation. In some cases, that knowledge has been deposited in structured databases, but even then the fragmented landscape of such databases is a barrier to knowledge integration.

Here, we describe the use of Wikidata as an open, community-maintained biomedical knowledge base. We have seeded Wikidata with data on key biomedical entities, including genes, proteins, diseases, drugs, genetic variants, and microbes. To ensure source databases are properly credited, we have implemented a standardized model for referencing and attribution. These data, combined with other data sets imported by the broader Wikidata community, enable powerful integrative queries that span multiple domain areas via the Wikidata SPARQL endpoint.

The emphasis of this abstract is on Wikidata as resource for biomedical Open Data that can serve as a foundation for other bioinformatics applications and analyses. In addition, the code developed to execute this project is also available as Open Source software. This suite of code includes modules for populating Wikidata, for automatically synchronizing with source databases, and for creating domain-specific applications to engage specific user communities.

What will attendees take away from this session?
  1. Understanding the coverage of biomedical entities in Wikidata and the structure they are represented in
  2. Enable use of and expansion of information about biomedical entities in Wikidata by scientists and non-scientists alike
Slides or further information
Slides
https://docs.google.com/presentation/d/1wUYnNEJF-d37TGQnMXQfmWK7o7VAqjoS-KCRiKd99Mc/edit?usp=sharing

Project Website: https://www.wikidata.org/wiki/Wikidata:WikiProject_Gene_Wiki

Source Code: https://github.com/SuLab/GeneWikiCentral (and repos linked therein)

License: MIT

Special requests

Interested attendees

edit

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest.

  1. -- YULdigitalpreservation (talk) 18:21, 25 July 2017 (UTC)[reply]
  2. ArthurPSmith (talk) 13:28, 26 July 2017 (UTC)[reply]
  3. SammyWiki 17:40, 27 July 2017 (UTC)[reply]
  4. Daniel Mietchen (talk) 08:27, 31 July 2017 (UTC)[reply]
  5. Shani Evenstein (talk) 20:01, 27 October 2017 (UTC)[reply]