User:ProteinBoxBot/Microbial gene and protein items
Introduction edit
The ProteinBoxBot maintains information about Genes, Diseases and Drugs in Wikidata. The entities in these three domains are maintained by different corresponding sub-processes of the main bot.
The objective of the Microbial gene and protein sub-process is to add and update Wikidata with information about genes and proteins of microbial origin. A discussion has been initiated on the [Project Molecular Biology Talk Page ]
Current Scope edit
The set of entities maintained by this bot are determined based on their presence in the expert-curated NCBI Entrez Gene database.
At present, the bot is limited to genes and proteins from bacteria and will be expanded to include microbial genes of non-bacterial origin.
Items maintained by this bot edit
- Bacterial Genes. Lists them all with a query for items with taxon bacteria and some value for Entrez Gene ID:
Gene properties planned for this bot edit
Property | Description | Datatype | Expected value
(if not listed, see property definition) |
---|---|---|---|
P279 | subclass of | Item | Should always include gene (Q7187) |
P351 | Entrez Gene ID | String | Should exist for EVERY item processed by this bot. Property will include concurrent Entrez IDs for each strain of bacterial species |
P644 | Genomic start | String | Should exist for EVERY item processed by this bot. Property will include concurrent Genomic starts for each strain of bacterial species |
P645 | Genomic end | String | Should exist for EVERY item processed by this bot. Property will include concurrent Genomic ends for each strain of bacterial species |
P703 | found in taxon | Item | Currently should only include bacteria Q10876 |
P353 | Gene symbol | String | |
P688 | encodes | Item |
The 'encodes' property links gene items to items specifically about the protein, RNA, or other 'product' of the gene. A single gene corresponds to a particular region of a genome that is related to some set of functions. These functions are carried about by the gene's products. Different products may perform vastly different functions. Hence we separate functional information from the gene item itself, and attach this information to the product items wherever possible. (See discussion.)
Protein properties Planned for this bot edit
Property | Description | Datatype | Expected value
(if not listed, see property definition) |
---|---|---|---|
P279 | subclass of | Item | One of: Protein (Q8054), RNA (Q11053), non-coding RNA (Q427087), .. |
P702 | encoded by | Item | Should exist for EVERY item processed by this bot |
P352 | UniProt ID | String | Should exist for EVERY item processed by this bot |
P638 | PDB ID | String | |
P637 | RefSeq Protein ID | String | |
P705 | Ensembl Protein ID | String | |
P681 | Cell Component | Item | |
P682 | Biological Process | Item | |
P680 | Molecular Function | Item |
Data sources edit
The bot will retrieve its content from the following trusted sources:
- mygene.info - is a Web service that integrates and provides a convenient API for information about all genes [1]. It provides data from Entrez Gene and other authoritative sources
- Data will also be loaded from