Wikidata:Property proposal/amino acid (start, end) position

amino acid position, amino acid start position, amino acid end positionEdit

Originally proposed at Wikidata:Property proposal/Natural science

   Not done
Description3 related properties:
  • amino acid position: position on the amino acid chain of a protein
  • amino acid start position: start position of a protein site/span/domain on a protein's amino acid chain
  • amino acid end position: end position of a protein site/span/domain on a protein's amino acid chain
Representsamino acid position (Q66424100)
Data typeQuantity
Domainproperty: superclass is Wikidata property related to biology (Q22988603)
Allowed valuesintegers > 0
Allowed unitsnone
Example 1Phenylalanine hydroxylase (Q420604):
has part (P527)protein phosphorylation (Q7251493)
→ "amino acid position" → 16
Example 2Phenylalanine hydroxylase (Q420604):
has part (P527)ACT domain (Q24745293)
→ "amino acid start position" → 36
→ "amino acid end position" → 114
Example 3Phenylalanine hydroxylase (Q420604):
gene substitution association with (P1916)phenylketonuria (Q194041)
→ "amino acid position" → 39
Planned usemanually specify how specific peptides are part of their proprotein. Bots could then also import such data, or data about position of protein domains, binding positions of posttranslational modifications, or disease mutations
Robot and gadget jobs
  1. either the subject or the object of the statement where any of these 3 properties is added should be an instance of a protein (Q8054) or of a peptide (Q172847)
  2. IF "amino acid start position" exists on a property THEN "amino acid end position" should also exist and vice versa.
See alsogenomic start (P644), genomic end (P645)


The lack of the property is preventing me to completely add knowledge to protein and peptide items, and this must have been an issue for the bots that import from UniProt as well, but I could not find previous discussions. This is an essential addition to the properties of statements about biological macromolecules that consist of amino acids. --SCIdude (talk) 09:52, 13 August 2019 (UTC)

Please note that I felt a single value property necessary (instead of using identical start/end) because I expect a much more frequent application of it than the start/end version from disease variants alone. --SCIdude (talk) 15:21, 13 August 2019 (UTC)


  •   Support David (talk) 05:34, 14 August 2019 (UTC)
  •   Support I've been trying to figure out how to add specific PTMs that are associated with diseases, this would work well. Only question I have is whether or not it should be restricted to amino acid sequences, since there are similar issues with nucleic acid sequences. Eg- specific nucleic acid deletions resulting in dysfunctional proteins, or site-specific methylation. Not sure if it would be better as one general property for aa and na sequences, or two distinct properties. Gtsulab (talk) 20:19, 13 August 2019 (UTC)
  • @Gtsulab:: there is genomic start (P644), genomic end (P645) for nucleic acids (but no single value version). A concept mixing amino acids and nucleic acids does only exist in reality with the abstract, mathematical sequence concept---I would not object against a property "(start,end) position in sequence" if it existed. --SCIdude (talk) 08:07, 14 August 2019 (UTC)
  • @SCIdude:: Yes, exactly!--I could see expanding the constraints/name for genomic start (P644), genomic end (P645) to be more inclusive so it would be more like the "(start,end) position in sequence". In any case, I think a property for a single position in a sequence would be very valuable whether or not it could be applied to both genes and proteins or just proteins. Gtsulab (talk) 18:52, 14 August 2019 (UTC)
  •   Support The idea in general seems quite useful. I liked the discussions around making a more inclusive concept, and I agree with SCIdude that it gets stretched. In the end, for this, it is not quite the order itself that matters, but having a good pointer. That being said, the modelling of pointwise indications is promising, but a bit hazy. "has part" "protein phosphorylation" is not accurate (a biological process is not part of a protein). The qualifier for "gene substitution association with" "phenylketonuria" would have to be something like "position in a sequence inherent to an item (e.g a specific gene or protein) for which a change has this effect". I guess that the local optimum would be changing constraints of genomic start (P644) and genomic end (P645) for inserting the domain info and keep the discussion going on pointwise representations. Anyways, good work. TiagoLubiana (talk) 18:48, 24 August 2019 (UTC)
@TiagoLubiana: Thanks. Please also comment on the successor proposal: Wikidata:Property proposal/position in sequence --SCIdude (talk) 06:17, 25 August 2019 (UTC)