User:ProteinBoxBot/Drug items

Introduction edit

The ProteinBoxBot maintains information about Genes, Diseases and Drugs in Wikidata. The entities in these three domains are maintained by different corresponding sub-processes of the main bot.

This objective of the Drug sub-bot is to add and maintain Wikidata items for all drugs relevant to human health.

Intended Scope edit

FDA-approved drugs and drug combinations.

Items maintained by this bot edit

The following Wikidata query will retrieve all items currently maintained by the bot. It retrieves all items that are both instances of a pharmaceutical drug (CLAIM[31:12140) and have a Drugbank ID (CLAIM[715]).

Bot test edits edit

In order to query all bot test edit items, please use CLAIM[31:12140] AND CLAIM[715]

Prototype items edit

Data sources edit

The data sources for this effort are the open databases National Drug File, DrugBank, PubChem and ChEMBL. In order to determine the list of compounds currently approved by the FDA, Drugbank is used, it also provides a set of basic identifiers. In order to aquire more data, the RDF API auf PubChem is used to acquire the PubChem ID and MeSH ID. The National Drug File REST API is used to get the FDA UNII and ChEMBL is also queried directly for the ChEMBL ID.

Output edit

The bot will be able to add new drugs appearing in its source databases to Wikidata. It will also modify existing Wikidata drug items if new information becomes available in the mentioned data sources. The data added for a Wikidata drug item will initially comprise, additionally to labels and aliases, the following

Properties edited by the bot edit

Property Description Datatype
Property:P31 instance of item
Property:P636 route of administration item
Property:P267 ATC code string
Property:P231 CAS registry number string
Property:P486 MeSH ID string
Property:P672 MeSH Code string
Property:P662 PubChem ID External ID
Property:P661 ChemSpider ID External ID
Property:P652 UNII External ID
Property:P665 KEGG ID External ID
Property:P683 ChEBI ID External ID
Property:P274 chemical formula External ID
Property:P715 Drugbank ID External ID
Property:P592 ChEMBL ID External ID
Property:P233 SMILES string
Property:P234 InChI string
Property:P235 InChIKey string
Property:P2275 Word Health Organisation International Nonproprietary Name Monolingual text
Property:P657 RTECS Number string
Property:P2115 NDF-RT ID External ID

Implementation edit

The bot is split in two parts, the drug data aggregator and the actual drug bot which uses the aggregated data to write to Wikidata. The bot code is open source and available for inspection.

Bot approval edit

Bot approval discussion August 2015: Wikidata:Requests_for_permissions/Bot/ProteinBoxBot_4