Wikidata:Property proposal/position in sequence
position in biological sequence
editOriginally proposed at Wikidata:Property proposal/Natural science
Description | index or position of a nucleotide in a genomic sequence, or of an amino acid in a protein's amino acid sequence; used as qualifier |
---|---|
Represents | nucleotide (Q28745), amino acid position (Q66424100) |
Data type | Quantity |
Domain | property |
Allowed values | integer > 0 |
Example 1 | phenylalanine hydroxylase (Q420604):
|
Example 2 | phenylalanine hydroxylase (Q420604):
|
Example 3 | PAH (Q14851781):
|
Planned use | exactly specifying polymorphisms (mutations), hereditary diseases, PTMs |
See also | genomic start (P644), genomic end (P645) (these should be renamed/redefined to include amino acid/proteins); note also series ordinal (P1545) which is abstractly similar but associated with series not fixed sequences |
Motivation
editSee Wikidata:Property_proposal/amino_acid_(start,_end)_position. In particular, commenters wished for unification of nuc/aa sequences---therefore redefinition of genomic start (P644), genomic end (P645) should happen simultaneously. SCIdude (talk) 08:27, 24 August 2019 (UTC)
Discussion
editWikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
- Support David (talk) 08:08, 25 August 2019 (UTC)
- Comment We use series ordinal (P1545) to indicate ordinal position in a lot of other cases - for example author lists on an article. Is that not sufficient here? If not I think we'd want this label to be clearer on the distinction. ArthurPSmith (talk) 17:51, 26 August 2019 (UTC)
- Support I believe that series ordinal gets stretched too much. A protein is an entity composed of a sequence of aminoacids in a orderly fashion. But I would not say that a protein is merely a series of aminoacids. It is an "emergent property" of this series. Think about the O-phosphorylated residue (Q66735569). It is in position 16 of the biological series of aminoacids that is inherent to phenylalanine hydroxylase (Q420604). But is not in the position 16 of the protein itself. A different entity, "series of aminoacids that make up Q420604", could (1) be a part of Phenylalanine hydroxylase and (2) have an aminoacid described by series ordinal (P1545). But again, this would be too convoluted. This property could be named position in biological sequence, as both genes and proteins are defined by their specific biological sequence, but are more than that. TiagoLubiana (talk) 19:48, 26 August 2019 (UTC)
- I think the label should make it clearer that this property is limited to these types of sequences. --Yair rand (talk) 23:53, 26 August 2019 (UTC)
- I have changed the label in the proposal, as I agree with the suggestions given. --SCIdude (talk) 05:56, 27 August 2019 (UTC)
- Oppose I don't see how series ordinal (P1545) implies that the whole isn't more then the individual parts. ChristianKl ❪✉❫ 11:44, 28 August 2019 (UTC)
- @ChristianKl series ordinal (P1545) has other problems, see their talk page, it is not identical to sequence index since it allows arbitrary ordinals like 2,4 or 15X. Semantically a series is not a sequence, and AI applications will have problems mapping series ordinal (P1545) to a sequence index. I would agree to use an abstract "index/position in sequence" instead of this proposal, however. --SCIdude (talk) 07:01, 29 August 2019 (UTC)
- Support More than addresses my uncertainty with [| the initial proposal] Gtsulab (talk) 19:23, 9 September 2019 (UTC)
- Support. YULdigitalpreservation (talk) 09:56, 19 September 2019 (UTC)
- If the argument is that this is something qualitiatively different then a sequence index, I don't see why biology is a special case. Why wouldn't it be useful for other sequences correspondingly? ChristianKl ❪✉❫ 10:06, 19 September 2019 (UTC)
- @ChristianKl As said I'm in favor of a generic sequence index property. Do you think such a proposal would pass quickly? Then it would make this one obsolete. --SCIdude (talk) 14:06, 19 September 2019 (UTC)
- @SCIdude: When it comes to passing a proposal quickly, it's about making clear why one choice of modeling the domain is better then other choices of modelling the domain. As long as it's not clear which choice is best, the proposal should stay open. ChristianKl ❪✉❫ 14:33, 19 September 2019 (UTC)
- @ChristianKl As said I'm in favor of a generic sequence index property. Do you think such a proposal would pass quickly? Then it would make this one obsolete. --SCIdude (talk) 14:06, 19 September 2019 (UTC)
- @SCIdude, ديفيد عادل وهبة خليل 2, ArthurPSmith, TiagoLubiana, Yair rand, ChristianKl: @YULdigitalpreservation, Gtsulab: position in biological sequence (P8275) has been created. Pamputt (talk) 15:52, 2 June 2020 (UTC)