User:ProteinBoxBot/Protein family bot

Objective edit

This bot function should add and update Wikidata items for protein family (Q417841), protein domain (Q898273), active site (Q423026), binding site (Q616005), supersecondary structure (Q7644128), post-translational protein modification (Q898362), structural motif (Q3273544) items, and create links between proteins and these items.

Introduction edit

This bot is part of a family of bots to capture and maintain Genes, Diseases and Drugs in Wikidata. This builds upon the ongoing work of incorporating all genes and proteins into wikidata. Adding protein family information would allow several new use cases and would allow linking classes of proteins together across species and querying proteins by function.

Properties edit

On items

Property Datatype Explanation
subclass of (P279) item hierarchy
instance of (P31) item type of item (protein family, etc)
InterPro ID (P2926) external-id

On proteins

Property Datatype Explanation
subclass of (P279) item member of protein family
has part(s) (P527) item contains a ...

Data sources edit

Interpro

Output edit

Counts of number of proteins grouped by taxon of proteins that are subclass of a protein family link

Counts of interpro items by type link