MicrobeBot
This user account is a bot with a bot flag. The bot is operated by Putmantime, Andrew Su and Djow2019. |
Introduction
editThe objective of the MicrobeBot is to add and update Wikidata with information about genes and proteins of microbial origin. A discussion has been initiated on the Project Molecular Biology Talk Page
Sister Bots
editBot tasks and state
editMicrobeBot will provide WikiData with up-to-date high quality information about microbial taxa, genes, gene-products and other annotations from authoritative sources. These concepts will form the backbone upon which many biomedical applications of WikiData will be based. The open source bot code is divided into a collection of tasks. These tasks consist of establishing taxonomic links between species and strains of bacteria, or creating those entities if they do not yet exist. The next level of tasks focuses on creating items for genes and gene-products and linking them to the strain they were sequenced from. Finally, the gene and gene-products are linked together. All bot edits are based on content from trusted, manually curated scientific resources. For additional information about each bot task, follow the links in the status table below.
Bot task | Discussion started | Coding and testing | Production ready | Is approved | Is running | update frequency | last full cycle |
---|---|---|---|---|---|---|---|
Microbial gene and protein items | x | x | x | x | x | ||
... |
Current Scope
editThe set of entities maintained by this bot are determined based on their presence in the expert-curated NCBI Entrez Gene database.
At present, the bot is limited to genes and proteins from bacteria and will be expanded to include microbial genes of non-bacterial origin.
Items maintained by this bot
edit- Bacterial Strains, Genes, and Gene-Products. Lists them all with a query for items with taxon bacteria and some value for Entrez Gene ID:
Gene properties planned for this bot
editProperty | Description | Datatype | Expected value
(if not listed, see property definition) |
---|---|---|---|
P279 | subclass of | Item | Should always include gene (Q7187) |
P351 | Entrez Gene ID | String | Should exist for EVERY item processed by this bot. Property will include concurrent Entrez IDs for each strain of bacterial species |
P644 | Genomic start | String | Should exist for EVERY item processed by this bot. Property will include concurrent Genomic starts for each strain of bacterial species |
P645 | Genomic end | String | Should exist for EVERY item processed by this bot. Property will include concurrent Genomic ends for each strain of bacterial species |
P703 | found in taxon | Item | Will include the bacterial strain item that the gene was sequenced from |
P688 | encodes | Item |
The 'encodes' property links gene items to items specifically about the protein, RNA, or other 'product' of the gene. A single gene corresponds to a particular region of a genome that is related to some set of functions. These functions are carried about by the gene's products. Different products may perform vastly different functions. Hence we separate functional information from the gene item itself, and attach this information to the product items wherever possible. (See Proposal for bringing microbial genome, gene, and protein items to Wikidata)
Protein properties Planned for this bot
editProperty | Description | Datatype | Expected value
(if not listed, see property definition) |
---|---|---|---|
P279 | subclass of | Item | One of: Protein (Q8054), RNA (Q11053), non-coding RNA (Q427087), .. |
P702 | encoded by | Item | Should exist for EVERY item processed by this bot |
P352 | UniProt ID | String | Should exist for EVERY item processed by this bot |
P638 | PDB ID | String | |
P637 | RefSeq Protein ID | String | |
P705 | Ensembl Protein ID | String | |
P681 | Cell Component | Item | |
P682 | Biological Process | Item | |
P680 | Molecular Function | Item |
Data sources
editThe bot will retrieve its content from the following trusted sources:
- mygene.info - is a Web service that integrates and provides a convenient API for information about all genes [1]. It provides data from Entrez Gene and other authoritative sources
- Data will also be loaded from