Property talk:P665
Latest comment: 3 years ago by Egon Willighagen in topic Split identifier into KEGG Drug Identifier, KEGG Compound Identifier, etc?
Documentation
KEGG ID
identifier from databases dealing with genomes, enzymatic pathways, and biological chemicals
identifier from databases dealing with genomes, enzymatic pathways, and biological chemicals
Description | ID in Kyoto Encyclopedia of Genes and Genomes (Q909442), a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms. Since July 2011, KEGG switched to a subscription model, and access via FTP is no longer free. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Applicable "stated in" value | Kyoto Encyclopedia of Genes and Genomes (Q909442) | ||||||||||||
Data type | External identifier | ||||||||||||
Template parameter | Template:Infobox drug (Q6033882) KEGG, Template:Chembox (Q52426) KEGG | ||||||||||||
Domain | According to this template:
genomes, enzymatic pathways, and biological chemicals
According to statements in the property:
When possible, data should only be stored as statementstype of chemical entity (Q113145171), crude drug (Q735160), medication (Q12140), chemical element (Q11344), structural class of chemical entities (Q47154513), group of chemical entities (Q55640599), biological process (Q2996394), disease (Q12136) or mixture (Q169336) | ||||||||||||
Allowed values | [A-Z]\d+ | ||||||||||||
Example | According to this template:
DL-ascorbic acid (Q193598) = "D00018", ketamine (Q243547) = "D08098 ", phenothiazine (Q410846) = "D02601" Leave "D" away?
According to statements in the property:
When possible, data should only be stored as statementsDL-ascorbic acid (Q193598) → D00018 (RDF) sulfur dioxide (Q5282) → D05961 (RDF) | ||||||||||||
Source | https://www.kegg.jp/ | ||||||||||||
Formatter URL | https://www.kegg.jp/entry/$1 | ||||||||||||
Robot and gadget jobs | Gather data from infoboxes and website. | ||||||||||||
Tracking: usage | Category:Pages using Wikidata property P665 (Q26250085) | ||||||||||||
Lists |
| ||||||||||||
Proposal discussion | Proposal discussion | ||||||||||||
Current uses |
| ||||||||||||
Search for values |
[create Create a translatable help page (preferably in English) for this property to be included here]
Format “
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). Known exceptions: Folate biosynthesis (Q60602831)([CDEGHRK]|hsa|map|rn|RC)\d{5}
”: value must be formatted using this pattern (PCRE syntax). (Help)List of violations of this constraint: Database reports/Constraint violations/P665#Format, SPARQL
Distinct values: this property likely contains a value that is different from all other items. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). List of violations of this constraint: Database reports/Constraint violations/P665#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P665#Format, hourly updated report, SPARQL
Type “type of chemical entity (Q113145171), crude drug (Q735160), medication (Q12140), chemical element (Q11344), structural class of chemical entities (Q47154513), group of chemical entities (Q55640599), biological process (Q2996394), disease (Q12136), mixture (Q169336)”: item must contain property “instance of (P31), subclass of (P279)” with classes “type of chemical entity (Q113145171), crude drug (Q735160), medication (Q12140), chemical element (Q11344), structural class of chemical entities (Q47154513), group of chemical entities (Q55640599), biological process (Q2996394), disease (Q12136), mixture (Q169336)” or their subclasses (defined using subclass of (P279)). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). List of violations of this constraint: Database reports/Constraint violations/P665#Type Q113145171, Q735160, Q12140, Q11344, Q47154513, Q55640599, Q2996394, Q12136, Q169336, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). List of violations of this constraint: Database reports/Constraint violations/P665#Entity types
Scope is as main value (Q54828448), as reference (Q54828450): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). List of violations of this constraint: Database reports/Constraint violations/P665#Scope, SPARQL
This property is being used by:
Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
Split identifier into KEGG Drug Identifier, KEGG Compound Identifier, etc? edit
Hi all, since there is now a restriction to have only one KEGG identifier, this causes problems for compounds which actually do have multiple identifiers, like a KEGG Drug and KEGG Compound and KEGG Glycan identifier. I propose to split up this identifier into that various identifiers. I will write these up this weekend. Any objections? Or things you want me to take into account when writing this up? --Egon Willighagen (talk) 10:23, 15 December 2018 (UTC)
Notified participants of WikiProject Chemistry
- @Egon Willighagen: It looks like the majority of the constraint violations refer to the "distinct values constraint" (same value on multiple items) and not a "unique values constraint" (multiple statements on a single item), which is the problem that I think your solution would solve. Unless I'm misreading this? If I've got that right, then I would either support removing that constraint, or simply letting those rare (proportionally speaking) constraint violations exist. Best, Andrew Su (talk) 01:04, 20 December 2018 (UTC)
- @Andrew Su: Yes, removing the "unique values constraint" would solve part of the problem. I do not know the full scale of the issue, but likely indeed less than a few tens of thousands. But by making it more specific we can validate things more accurately. By just removing the "unique values constraint" we cannot check with the constraints system that it only has one "Cxxxxx" identifier (but there are other ways of doing that, of course). --Egon Willighagen (talk) 07:34, 20 December 2018 (UTC)
- @Egon Willighagen: just FYI, I don't believe that the "unique values constraint" is currently applied to this property. So it sounds like you are proposing creating more specific identifier properties and adding that constraint to each. That sounds fine to me, though I also think that the status quo is sufficient for the Gene Wiki team's use cases... Best, Andrew Su (talk) 17:48, 20 December 2018 (UTC)
- @Egon Willighagen: The values that should be flagged are those from the same category (drug, compound etc). They start with the same character and this can be checked by a complex constraint query. No need for multiple properties. --SCIdude (talk) 06:46, 20 August 2020 (UTC)
- True. I could actually run a script that tags everything with a KEGG identifiers starting with a D as drug... --Egon Willighagen (talk) 06:43, 21 August 2020 (UTC)
- @Egon Willighagen: The values that should be flagged are those from the same category (drug, compound etc). They start with the same character and this can be checked by a complex constraint query. No need for multiple properties. --SCIdude (talk) 06:46, 20 August 2020 (UTC)
- @Egon Willighagen: just FYI, I don't believe that the "unique values constraint" is currently applied to this property. So it sounds like you are proposing creating more specific identifier properties and adding that constraint to each. That sounds fine to me, though I also think that the status quo is sufficient for the Gene Wiki team's use cases... Best, Andrew Su (talk) 17:48, 20 December 2018 (UTC)
- @Jura1:I think many KEGG Drug also have KEGG Compound identifiers. The exact number is not easy to determine from KEGG itself, but likely a few thousand close to 1000 already in Wikidata. --Egon Willighagen (talk) 07:26, 20 December 2018 (UTC)
- @Andrew Su: Yes, removing the "unique values constraint" would solve part of the problem. I do not know the full scale of the issue, but likely indeed less than a few tens of thousands. But by making it more specific we can validate things more accurately. By just removing the "unique values constraint" we cannot check with the constraints system that it only has one "Cxxxxx" identifier (but there are other ways of doing that, of course). --Egon Willighagen (talk) 07:34, 20 December 2018 (UTC)