Hello, I'm trying to improve the molbio part of Wikidata by manual and batch editing. Although being a software dev (main language C++), I have prepared many books for Project Gutenberg (Q22673), contributed in the years 2006-2012 to German Wikipedia (Q48183) (as User:Ayacop), and also have biocurated extensively for UniProt-GOA (Q28018111) and Reactome (Q2134522).
|Babel user information|
|Users by language|
- User:SCIdude/Protein bugs
- MEROPS import?
- IUPHAR IDs without Wikidata, anyone?
- IUPHAR family IDs, anyone?
- BindingDB ids?
- dbSNP import?
- missing OMIM phenotypes, e.g. 1?
- OMIM phenotypic series, see their FAQ
- OMA orthology group ids/groups, see Property_talk:P684#not_efficient_spacewise
- next MONDO sync?
In the manual attempt to create/curate WD items of cleavage products (fragments) of proteins I worked around preproinsulin (Q7240673), angiotensinogen (Q267200), preproghrelin (Q66216544), proglucagon (Q66310097), proopiomelanocortin (Q418896), cerebellin 1 precursor (Q21115606), natriuretic peptide B precursor (Q422288), preproendothelin-1 (Q66361339), apelin (Q2386988), protachykinin-1 (Q21123080), Secretogranin II (Q21105303), thymosin beta 4 (Q7799643), prepro-VIP (Q66499176), neurosecretory protein VGF (Q21122290), augurin precursor (Q66535298), chromogranin A (Q3698322), CAP-18 propeptide (Q411181)
What I'm doing is roughly this:
- if gene and protein is in one item, duplicate to get separate items (moving sitelinks first to the protein)
- remove wrong statements on either (e.g. no PDB/protein IDs/GOA function/localization annotations on genes), make sure the gene has at the most GO process annotations
- create/check all relevant fragment objects, move statements to the resp. item: EnsemblP should be on prepro/pro
- separate out aliases to resp. objects
- add "has part" with all fragments to prepro object
- complete "encodes/encoded by" everywhere
- add "exact match" qualifier to fragment UniProt like e.g. https://www.uniprot.org/uniprot/Q9UBU3#PRO_0000019202
- add Reactome, ChEBI, ChemBL, IUPHAR IDs to fragment if existing (Reactome labels like GENE(1-100) also to fragment aliases)
- add "part of" Reactome process or reaction if missing
- (maybe) move GOA function annotations to resp. fragment if applicable