User:ProteinBoxBot/2020 sarscov2

Overall summary edit

This sprint is aimed at covering the gene (Q7187) and protein (Q8054) of the sars-cov2 and related corona viruses in Wikidata. This is achieved through drafting a set of entity schemas that describe the semantic landscape and drive the bot development. Deliverables are this linked-data landschape, two bots that regularly update genes, proteins and pathways on Wikidata.

Status edit

Sprint ended. Preparing a manuscript for peer review.

Participants edit

Gameplan edit

  • Create an EntitySchema for Virus Done
  • Create a draft bot to populate Viral reference genomes for Corona Virusses Done
  • Run the bot on a single strain Done
  • Adapt the bot to handle other strains Done

Entity Schema edit

We have developed a schema for Virus Gene (EntitySchema)

Bot development edit

We developed a first bot specifically for SARS-CoV-2 (Q82069695). This bot used mygene.info to get gene annotation into Wikidata. The next step is to adapt the bot to work with other strains.

Example virus edit

Virus Virus ID wikidata item mapping
SARS-CoV-2 (Q82069695) 2697049 Q82069695

Results edit

Bots edit

  • Bot to align genes and proteins from mygene.info, NCBI Eutils and Uniprot on Wikidata
  • Bot to align COVID19 pathways from WikiPathways on Wikidata

EntitySchemas edit

Publications edit

Downstream use edit