Wikidata:Property proposal/part of molecular family
part of molecular family
editOriginally proposed at Wikidata:Property proposal/Natural science
Description | A molecular family that this gene or protein is considered a part of. |
---|---|
Data type | Item |
Domain | protein (Q8054) |
Example 1 | epidermal growth factor receptor (Q424401) → Growth factor receptor cysteine-rich domain superfamily (Q24722567) |
Example 2 | epidermal growth factor receptor (Q424401) → Tyrosine protein kinase, EGF/ERB/XmrK receptor (Q24719584) |
Example 3 | RIC8 guanine nucleotide exchange factor B (Q21121228) → Synembryn domain, protein family (Q83162599) |
Example 4 | reelin (Q13561329) → Reeler domain, protein family (Q83137881) |
Planned use | To be used in the place of part of (P361) in the connection of biomolecular entities to their families. Eventually to be changed by bot or semi-automatically via Quickstatements. |
Motivation
editCurrently part of (P361) is used to link protein items to protein families. As part of (P361) is rather ambiguous and used in many different contexts, a dedicated property for connecting proteins to protein families would facilitate querying and reduce confusion. -- TiagoLubiana (talk) 20:00, 1 July 2022 (UTC)
Discussion
editWikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. TiagoLubiana (talk) 20:02, 1 July 2022 (UTC)
- Why not use subclass of (P279) cf. Help:Basic membership properties? 2A01:CB14:D52:1200:DDCC:316A:70E6:E1A5 15:04, 4 July 2022 (UTC)
- While I agree that subclass of (P279) could work in theory, in practice it ends up being confusing for people to use, as they have different mental models of what protein items are. A dedicated property ensures semantic correctness and may be more agreeable with different users. TiagoLubiana (talk) 17:19, 4 July 2022 (UTC)
- Neutral Not sure if this is needed. The relation is clearly membership in a set, and one of the synonyms of part of (P361) is "element of" which would indicate part of (P361) was created to model membership. --SCIdude (talk) 17:57, 6 July 2022 (UTC)
- @SCIdude: Actually, I disagree that the relationship is clearly membership in a set. As the IP commented above, it is actually closer to subclass of (P279), as far as I see it. By having a dedicated property, we avoid lengthy discussions to sort the semantic details. In practice, part of is used also in protein complexes and biological pathways, making the part of (P361) overloaded. TiagoLubiana (talk) 15:20, 11 July 2022 (UTC)
- In any case you should indicate in your proposal that the new property is a sub-property of one of subclass of (P279), part of (P361). --SCIdude (talk) 15:39, 11 July 2022 (UTC)
- @SCIdude: Actually, I disagree that the relationship is clearly membership in a set. As the IP commented above, it is actually closer to subclass of (P279), as far as I see it. By having a dedicated property, we avoid lengthy discussions to sort the semantic details. In practice, part of is used also in protein complexes and biological pathways, making the part of (P361) overloaded. TiagoLubiana (talk) 15:20, 11 July 2022 (UTC)
- Oppose thinking about what's "more agreeable" instead of thinking about how to define the underlying ontology of what item represents is a bad idea. Making things agreeable to people with different mental models means that different mental models are going to be used and the data is a mess. subclass of (P279) is the right property here. ChristianKl ❪✉❫ 15:13, 19 July 2022 (UTC)
- I agree with you in principle; Wikidata, though, is a lot about using different mental models by design https://blog.wikimedia.de/2013/02/22/restricting-the-world/ as a reference). TiagoLubiana (talk) 15:10, 18 August 2022 (UTC)
- We learned a bit in how to model ontology in Wikidata since 2013. That old approach brought us cyclical subclass trees and often quite strange implications in subclass trees. ChristianKl ❪✉❫ 15:20, 18 August 2022 (UTC)
- I agree with you in principle; Wikidata, though, is a lot about using different mental models by design https://blog.wikimedia.de/2013/02/22/restricting-the-world/ as a reference). TiagoLubiana (talk) 15:10, 18 August 2022 (UTC)
@SCIdude: and @ChristianKl: What about setting this as a subproperty of subclass of (P279)? Would you be okay with that? Cheers, TiagoLubiana (talk) 15:10, 18 August 2022 (UTC)
- No, I think subproperties of subclass of are generally a bad idea. I also haven't seen you bringing forward any advantage of having the property. Only the fact that it encourages people to have different ideas about what our items are about which I see as a disadvantage. ChristianKl ❪✉❫ 15:26, 18 August 2022 (UTC)
- Comment Reading the discussion, I cannot decide why "subclass of" is not enough, but I also do not understand the opposition. For small chemicals 'subclass of' is used too, and it seems to work fine. The proposal is not very specific in what it solves, and reads as a "nice to have", but I cannot with the current information I cannot decide if a new property solves more than it created... questions I have (I let you think about this before I post them): 1. why was "part of" originally used when "subclass of" was an option too then? what were the reasons against "subclass of" then? 2. are there examples where properties of the "class" do not apply to the subclasses? 3. you mention "mental models of what protein items"... what models are those? and which would give problems? --Egon Willighagen (talk) 11:50, 19 August 2022 (UTC)
- It harms our general ontology when we created new properties when the existing ones already express the relationship. If properties are not used in a consistent fashion it's harder for new users to learn how the properties are supposed to be used within Wikidata. ChristianKl ❪✉❫ 12:40, 19 August 2022 (UTC)
Thanks everyone for the comments and feedback. Answering @Egon Willighagen: :
- 1) I am not sure why; a lot of that modeling was done during the Gene Wiki project (maybe @Andrawaag: will have some insights there.) I'd guess most biologists would default to "membership on a set" when talking about protein families, even though that could be more precise. I think this is the first large discussion about that on Wikidata (might be wrong though)
- 2) I'd say there are some evolutionary loss-of-function, but that would still keep a protein classified in a given family. Not different from e.g. "humans" class that lost full-body hair in relation to the superclass "mammals".
- 3) The mental models I refer to are basically confusions w.r.t. partonomy/subclassing and whether items are instances of "protein" or subclasses of "protein", or even both (e.g. see https://www.w3.org/2007/OWL/wiki/Punning)
Answering @ChristianKl:: the goal of the property was to avoid using "has part"/"part of" for protein families, practically to avoid conflict with the ways proteins domains and pathways are modeled on Wikidata (e.g. see Proteolytic degradation of ubiquitinated-Cdc25A (Q36804547) and FAM20, C-terminal domain, protein family (Q83140120). If we reach a consensus that subclass of (P279) should be used instead, I'd be happy with that. It will then be a matter of updating/adapting the ProteinBoxBot code and the items there. TiagoLubiana (talk) 21:05, 22 August 2022 (UTC) edit
- The main way to find consensus about a data model would be to propose a data model on the relevant Wikiproject. Solving problems that would best be addressed by discussions about the desired data model by adding extra properties, increases complexity without creating an agreement about how data is uniformly modeled.
- Ideally, we had model item (P5869) for biological process (Q2996394), biological pathway (Q4915012), protein (Q8054), protein domain (Q898273), and gene (Q7187). While we are at it a first-order class for protein (Q8054) and a decision whether gene (Q7187) is first order or second order and a creation of the missing value would be great as well. ChristianKl ❪✉❫ 11:57, 23 August 2022 (UTC)
- @ChristianKl: I agree; I tried raising the discussion on WikiProject Molecular Biology, though (Wikidata talk:WikiProject Molecular biology#Update the modelling of protein families?) and got no response there for 1 month and a half before proposing the property. How would you suggest to proceed? TiagoLubiana (talk) 21:27, 2 September 2022 (UTC)
- @TiagoLubiana: I see the problem as being about the lack of a data model. To be solved by having a new page that proposes a data model. Ideally, a clear proposal that people can say yes/no to. Afterward, we might ping a bunch of people to get a discussion. ChristianKl ❪✉❫ 22:23, 2 September 2022 (UTC)
- @ChristianKl: I agree; I tried raising the discussion on WikiProject Molecular Biology, though (Wikidata talk:WikiProject Molecular biology#Update the modelling of protein families?) and got no response there for 1 month and a half before proposing the property. How would you suggest to proceed? TiagoLubiana (talk) 21:27, 2 September 2022 (UTC)