Property talk:P231
Documentation
identifier for a chemical substance or compound per Chemical Abstract Service's Registry database
List of violations of this constraint: Database reports/Constraint violations/P231#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P231#Single value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P231#single best value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P231#Entity types
Check digit to validate the CAS number (CAS RN), Official documentation (Help)
Violations query:
SELECT ?item WHERE { ?item wdt:P231 ?cas . BIND(REGEX (str(?cas), '^[1-9][0-9]{1,6}-[0-9]{2}-[0-9]$') AS ?correct_pattern) BIND(replace(str(?cas), "-","") AS ?c) BIND(strlen(?c) AS ?strlen) BIND(xsd:integer(substr(?c,?strlen,1)) AS ?val) BIND(xsd:integer(substr(?c,?strlen-1,1)) + xsd:integer(substr(?c,?strlen-2,1)) * 2 + xsd:integer(substr(?c,?strlen-3,1)) * 3 + xsd:integer(substr(?c,?strlen-4,1)) * 4 + IF(?strlen>5,xsd:integer(substr(?c,?strlen-5,1)),0) * 5 + IF(?strlen>6,xsd:integer(substr(?c,?strlen-6,1)),0) * 6 + IF(?strlen>7,xsd:integer(substr(?c,?strlen-7,1)),0) * 7 + IF(?strlen>8,xsd:integer(substr(?c,?strlen-8,1)),0) * 8 + IF(?strlen>9,xsd:integer(substr(?c,?strlen-9,1)),0) * 9 AS ?sum0) BIND(?sum0-(xsd:integer(?sum0/10)*10) AS ?sum) BIND(?sum=?val AS ?correct_checksum) FILTER(!?correct_pattern || (?correct_pattern && !?correct_checksum)) }
List of this constraint violations: Database reports/Complex constraint violations/P231#Invalid CAS number
Check digit to validate the CAS number (CAS RN), Official documentation (Help)
Violations query:
SELECT ?item { ?item wdt:P231 ?cas . BIND(replace(str(?cas), "-","") AS ?c) # normalize string => ?c BIND(strlen(?c) AS ?strlen) BIND(xsd:integer(substr(?c,?strlen-1,1)) + xsd:integer(substr(?c,?strlen-2,1)) * 2 + xsd:integer(substr(?c,?strlen-3,1)) * 3 + xsd:integer(substr(?c,?strlen-4,1)) * 4 + IF(?strlen>5,xsd:integer(substr(?c,?strlen-5,1)),0) * 5 + IF(?strlen>6,xsd:integer(substr(?c,?strlen-6,1)),0) * 6 + IF(?strlen>7,xsd:integer(substr(?c,?strlen-7,1)),0) * 7 + IF(?strlen>8,xsd:integer(substr(?c,?strlen-8,1)),0) * 8 + IF(?strlen>9,xsd:integer(substr(?c,?strlen-9,1)),0) * 9 AS ?sum0) FILTER( (?sum0-(xsd:integer(?sum0/10)*10)) != xsd:integer(substr(?c,?strlen,1)) ) } LIMIT 1
List of this constraint violations: Database reports/Complex constraint violations/P231#Invalid CAS number, with LIMIT 1 and no regex check. To quick fix the timeout issue.
This property is being used by:
Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
|
|
Instructions
edit- Only one value per chemical compound.
- Don't mix different numbers representing different compounds in the same item:
- For drugs: only the CAS number of the active substance in the organic form. Salt forms have to be described in other items.
- For hydrates: only the CAS number for the hydrate defined in the label of the item. Other hydrates have to be described in other items.
Single values
editNotified participants of WikiProject Chemistry
- There are a lot of deprecated values to a single chemical compound. It is related at Scifinder, for example, but only one is no deprecated. Moreover, some references incorrectly match the substances to a number of CAS which does not correspond exactly to the WD item, very often with stereoisomers. These deprecated values must appear on the WD? Restrictions "value single" can ignore deprecated values? --Almondega (talk) 11:32, 30 October 2015 (UTC)
- You can add items with outdated cas numbers in the exception list of the property. Snipre (talk) 14:39, 21 January 2017 (UTC)
CAS RNs are not unique
editUnlike what this page says, CAS registry numbers are not unique. Solutions of some compound have the same CAS RN as the compound itself. An example is formaldehyde/formalin.
- @Egon Willighagen: Any source of this? Solutions are often given with CAS number (e.g. in MSDSs), but the CAS refers only to the cpd in that solution. Even mixture of isomers, hydrates etc. have their own CAS number, diffrent from the parent cpd number. ∼Wostr (talk) 18:35, 24 August 2016 (UTC)
- @Wostr: No, not public... it when via Twitter DMs, but "custom care" wrote me: "Unless a solution contains more than one active substance, CAS does not assign separate registry numbers for the active substance and its aqueous solutions in order to prevent generation of multiple registry numbers for the same active substance. Formalin is an aqueous solution of formaldehyde.". But you can easily check this to be the case in SciFinder (Q3648541). I would suggest to ask them on Twitter (https://twitter.com/CASChemistry), but I know that Twitter is not considered a reliable source. Egon Willighagen (talk) 18:52, 24 August 2016 (UTC)
- OK, I understand now. But items about aqueous solutions are (and I think will be) rather uncommon and in most of the cases non-unique CAS number is an error. Adding exceptions is not sufficient? Will we be able to track those errors without this constraint? ∼Wostr (talk) 19:53, 24 August 2016 (UTC)
- @Wostr: I hope we can indeed properly track this! That is, I think all the needs are provided by Wikidata already, to accurately model this. See below, and restricting the uniqueness to chemical compounds, and not chemical substances, would already address the formaline/formaldehyde example. I do agree these examples are (relatively) rare, but to me, the power of Wikidata is that it can semantically and detailedly describe things, and we should do so if we want to have the regular chemist take it seriously, or at least, to such an extend that they want to help fix the true violations. There is nothing as annoying as going through such a list, and finding false violations all the time. Egon Willighagen (talk) 04:20, 25 August 2016 (UTC)
- OK, I understand now. But items about aqueous solutions are (and I think will be) rather uncommon and in most of the cases non-unique CAS number is an error. Adding exceptions is not sufficient? Will we be able to track those errors without this constraint? ∼Wostr (talk) 19:53, 24 August 2016 (UTC)
- @Wostr: No, not public... it when via Twitter DMs, but "custom care" wrote me: "Unless a solution contains more than one active substance, CAS does not assign separate registry numbers for the active substance and its aqueous solutions in order to prevent generation of multiple registry numbers for the same active substance. Formalin is an aqueous solution of formaldehyde.". But you can easily check this to be the case in SciFinder (Q3648541). I would suggest to ask them on Twitter (https://twitter.com/CASChemistry), but I know that Twitter is not considered a reliable source. Egon Willighagen (talk) 18:52, 24 August 2016 (UTC)
- @Egon Willighagen: Even if ACS didn't distinguish between a pure compound and its solutions, nothing prevent us to restrict more the use of CAS numbers in WD:
- in order to keep a homogeneous treatment when comparing water with other solvents
- to be able to use the uniqueness of CAS number as powerful way to detect wrong identification
- to respect the logic of the SciFinder tool which provides only properties of the pure substance with a CAS number and not the properties of all possible solutions.
- Can you explain what is the benefit of considering aqueous solutions as the pure substance ? Snipre (talk) 20:52, 24 August 2016 (UTC)
- @Snipre: Ah, those are interesting points! I have been looking at chemical compound (Q11173) versus chemical substance (Q79529) and this is not well used (and, no, I don't consider an aqueous solutions as the pure substance), is my impression. But if your argument that the CAS is unique for chemical compound (Q11173), then I can certainly live with that. It's not for chemical substance (Q79529). This was not clear from me from the documentation for the property. It currently writes "Distinct values: this property likely contains a value that is different from all other items," where I assume "items" refers to Wikidata items, so including substances. How about, then, to rewrite this documentation to "Distinct values: this property likely contains a value that is different from all other chemical substance items,"? And maybe even bringing up the fact that CAS numbers for compounds and their solutions are identical? Then at least the "constraints violations" tests can take this into account? Egon Willighagen (talk) 04:16, 25 August 2016 (UTC)
Chemical element is a chemical compound?
editIs chemical element (Q11344) part of chemical compound (Q11173)? Chemical compound says: "... consisting of two or more different chemical elements", so that would exclude chemical elements. OTOH, the IUPAC Red Book simply treats elements as compounds (for example, when prescribing chemical formula format).
Then, if elements are defined not compounds, should the Type constraint be extended with Q11344? -DePiep (talk) 13:57, 6 February 2017 (UTC)
- @DePiep: Not a compound, you can add Q11344 to the list of exception. Snipre (talk) 15:01, 6 February 2017 (UTC)
- Please do so for me. I'm not familiar yet with these terms, don't know where to begin. And could you give a few words on why Q11344 should not be in the formal constraint list? It's for ~125 elements only, but it would make a correct check if I'm right. -DePiep (talk) 15:14, 6 February 2017 (UTC)
- @DePiep: Done Snipre (talk) 21:37, 6 February 2017 (UTC)
- Please do so for me. I'm not familiar yet with these terms, don't know where to begin. And could you give a few words on why Q11344 should not be in the formal constraint list? It's for ~125 elements only, but it would make a correct check if I'm right. -DePiep (talk) 15:14, 6 February 2017 (UTC)
How to list alternative CAS numbers?
editThe single value constraint causes violation messages for compound entries with more than one CAS numbers, e.g. for aldehydo-D-glucose 6-phosphate (Q407962). I just checked this one in SciFinder, and both are valid CAS numbers, for the same compound. It's just that one is an "alternative" CAS number. I have now made the primary CAS number the one with the higher priority, but I don't think that removes the violation report. How do we want to solve this? --Egon Willighagen (talk) 11:56, 9 July 2017 (UTC)
- Maybe this has changed but 299-31-0 is now the pyranose tautomer which no longer fits our item. If so, they were not alternative, their team is just slow with cleanup. --SCIdude (talk) 16:09, 24 November 2019 (UTC)
- This has both a single value constraint (with several exceptions listed) and a single best value constraint. I looked at two items in the exceptions and one only had one value, the other had two but one was deprecated (although it would probably be better to change that to normal rank and the other to preferred). Is there still a reason to keep the single value constraint? Peter James (talk) 18:08, 21 February 2020 (UTC)