User:ProteinBoxBot/2020 sarscov2
Overall summary edit
This sprint is aimed at covering the gene (Q7187) and protein (Q8054) of the sars-cov2 and related corona viruses in Wikidata. This is achieved through drafting a set of entity schemas that describe the semantic landscape and drive the bot development. Deliverables are this linked-data landschape, two bots that regularly update genes, proteins and pathways on Wikidata.
Status edit
Sprint ended. Preparing a manuscript for peer review.
Participants edit
Gameplan edit
- Create an EntitySchema for Virus Done
- Create a draft bot to populate Viral reference genomes for Corona Virusses Done
- Run the bot on a single strain Done
- Adapt the bot to handle other strains Done
Entity Schema edit
We have developed a schema for Virus Gene (EntitySchema)
Bot development edit
We developed a first bot specifically for SARS-CoV-2 (Q82069695). This bot used mygene.info to get gene annotation into Wikidata. The next step is to adapt the bot to work with other strains.
Example virus edit
Virus | Virus ID | wikidata item | mapping |
---|---|---|---|
SARS-CoV-2 (Q82069695) | 2697049 | Q82069695 |
Results edit
Bots edit
- Bot to align genes and proteins from mygene.info, NCBI Eutils and Uniprot on Wikidata
- weekly updates
- Bot to align COVID19 pathways from WikiPathways on Wikidata
- 2x weekly updates
EntitySchemas edit
Publications edit
Downstream use edit
- BridgeDb identifier mapping database
- WikiPathways.org website linking to Wikidata, Scholia, and other databases using ID mappings