|This user account is a bot with a bot flag. It is operated by Andrawaag, Sebotic and Gstupp.|
The objective of this bot is to provide WikiData with up-to-date high quality information about genes, diseases, and drugs from authoritative sources. These concepts will form the backbone upon which many biomedical applications of WikiData will be based. Specifically it will make it possible to answer important biomedical questions using the Wikidata query service. We are working to establish a common set of standards for representing the evidence and provenance of this kind of information in wikidata and will be working to apply these standards to all of the work described below. For more information on the Gene Wiki project as a whole, please see WikiProject Gene Wiki.
To better divide the many tasks we are undertaking, our team also runs these bot accounts:
|mygene.info||NCBI Entrez, Ensembl, Uniprot|
|Interpro||ontology, protein annotations|
|Guide to Pharmacology|
Bot tasks and stateEdit
Bots use a python module for reading and writing to Wikidata called WikidataIntegrator. The open source bot code is divided into a collection of tasks. The initial tasks are concerned with establishing sets of entities corresponding to the three main classes (genes, diseases, drugs) and creating a stable cycle of updates. The next level of tasks focuses on establishing relationships between these entities. All bot edits are based on content from trusted, manually curated scientific resources. For additional information about each bot task, follow the links in the status table below.
|Bot task||Discussion started||Coding and testing||Production ready||Is approved||Has been run|
|Gene and protein items||x||x||x||x||x|
|Microbial gene and protein items||x||x||x||x||x|
|GO Protein Annotations||x|
The results of scheduled bot runs are automatically added to User:ProteinBoxBot/Bot_Status. This table is automatically updated by Jenkins after each bot run. Reports of each run are generated and linked under the "Log Report" column.
A lot of the work done by this bot involves the import, synchronization, and maintenance of information brought in from other sources. Where those sources are not entirely in the public domain, specific agreements need to be reached about which content can be brought into wikidata and hence rendered CC0. We will track these agreements on the legal subpage.
Task permission requestsEdit
- (Open) To track resource license information for WD, a tracking table like one that Daniel Himmelstein did in one of his last projects could be useful. []
- (Open) Getting disease content from wikidata into the disease infobox on wikipedia/
- (closed) Handling interwiki links for genes (on Wikipedia). https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Molecular_and_Cellular_Biology#Preparing_for_WikiData_in_gene.2Fprotein_infoboxes
- (closed) Handling interwiki links(wikidata talk). (Started June 29, 2015)
- (closed) Gaining re-approval of bot following the bots blockage in response to errors related to creating duplicate items. (Started June 2, 2015)
- (Closed) SubclassOf disease.
- (completed) Representing genes, proteins, functions, and orthologues.
- 2017 CIViC sprint
- 2016 TOGO Picture Gallery sprint (SWAT4LS)
- 2016 WikiPathways sprint (SWAT4LS)
- 2016 ShEx sprint (SWAT4LS)
- 2016 CIViC sprint
- 2015 Gene Ontology sprint
- 2015 Gene Disease relations sprint
Bot development cycleEdit
- an initial manual modeling of 1 or 2 example entries.
- Then develop the bot on 10 entries.
- Do a test run on 100 entries
- wait for the possible constraint violations to surface.
- perform a full run
- General bot information, including a list of all approved bots.
- View the flags on a user's page. Bots like this one should have a (bot) flag.
- bot's page on Wikipedia
- Wikipedia bot's 'phase 3'
- Finding stuff on wikidata. For example, to check if a property exists.
- Merging Interpo Items. Help merge Interpro Items with their wikipedia pages.
- SPARQL Examples
- Maintenance Queries