Wikidata:Property proposal/part of molecular family

part of molecular family edit

Originally proposed at Wikidata:Property proposal/Natural science

   Not done
DescriptionA molecular family that this gene or protein is considered a part of.
Data typeItem
Domainprotein (Q8054)
Example 1epidermal growth factor receptor (Q424401)Growth factor receptor cysteine-rich domain superfamily (Q24722567)
Example 2epidermal growth factor receptor (Q424401)Tyrosine protein kinase, EGF/ERB/XmrK receptor (Q24719584)
Example 3RIC8 guanine nucleotide exchange factor B (Q21121228)Synembryn domain, protein family (Q83162599)
Example 4reelin (Q13561329)Reeler domain, protein family (Q83137881)
Planned useTo be used in the place of part of (P361) in the connection of biomolecular entities to their families. Eventually to be changed by bot or semi-automatically via Quickstatements.

Motivation edit

Currently part of (P361) is used to link protein items to protein families. As part of (P361) is rather ambiguous and used in many different contexts, a dedicated property for connecting proteins to protein families would facilitate querying and reduce confusion. -- TiagoLubiana (talk) 20:00, 1 July 2022 (UTC)[reply]

Discussion edit

  WikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. TiagoLubiana (talk) 20:02, 1 July 2022 (UTC)[reply]

@SCIdude: and @ChristianKl: What about setting this as a subproperty of subclass of (P279)? Would you be okay with that? Cheers, TiagoLubiana (talk) 15:10, 18 August 2022 (UTC)[reply]

No, I think subproperties of subclass of are generally a bad idea. I also haven't seen you bringing forward any advantage of having the property. Only the fact that it encourages people to have different ideas about what our items are about which I see as a disadvantage. ChristianKl15:26, 18 August 2022 (UTC)[reply]
  •   Comment Reading the discussion, I cannot decide why "subclass of" is not enough, but I also do not understand the opposition. For small chemicals 'subclass of' is used too, and it seems to work fine. The proposal is not very specific in what it solves, and reads as a "nice to have", but I cannot with the current information I cannot decide if a new property solves more than it created... questions I have (I let you think about this before I post them): 1. why was "part of" originally used when "subclass of" was an option too then? what were the reasons against "subclass of" then? 2. are there examples where properties of the "class" do not apply to the subclasses? 3. you mention "mental models of what protein items"... what models are those? and which would give problems? --Egon Willighagen (talk) 11:50, 19 August 2022 (UTC)[reply]
    It harms our general ontology when we created new properties when the existing ones already express the relationship. If properties are not used in a consistent fashion it's harder for new users to learn how the properties are supposed to be used within Wikidata. ChristianKl12:40, 19 August 2022 (UTC)[reply]

Thanks everyone for the comments and feedback. Answering @Egon Willighagen: :

  • 1) I am not sure why; a lot of that modeling was done during the Gene Wiki project (maybe @Andrawaag: will have some insights there.) I'd guess most biologists would default to "membership on a set" when talking about protein families, even though that could be more precise. I think this is the first large discussion about that on Wikidata (might be wrong though)
  • 2) I'd say there are some evolutionary loss-of-function, but that would still keep a protein classified in a given family. Not different from e.g. "humans" class that lost full-body hair in relation to the superclass "mammals".
  • 3) The mental models I refer to are basically confusions w.r.t. partonomy/subclassing and whether items are instances of "protein" or subclasses of "protein", or even both (e.g. see https://www.w3.org/2007/OWL/wiki/Punning)

Answering @ChristianKl:: the goal of the property was to avoid using "has part"/"part of" for protein families, practically to avoid conflict with the ways proteins domains and pathways are modeled on Wikidata (e.g. see Proteolytic degradation of ubiquitinated-Cdc25A (Q36804547) and FAM20, C-terminal domain, protein family (Q83140120). If we reach a consensus that subclass of (P279) should be used instead, I'd be happy with that. It will then be a matter of updating/adapting the ProteinBoxBot code and the items there. TiagoLubiana (talk) 21:05, 22 August 2022 (UTC) edit[reply]

The main way to find consensus about a data model would be to propose a data model on the relevant Wikiproject. Solving problems that would best be addressed by discussions about the desired data model by adding extra properties, increases complexity without creating an agreement about how data is uniformly modeled.
Ideally, we had model item (P5869) for biological process (Q2996394), biological pathway (Q4915012), protein (Q8054), protein domain (Q898273), and gene (Q7187). While we are at it a first-order class for protein (Q8054) and a decision whether gene (Q7187) is first order or second order and a creation of the missing value would be great as well. ChristianKl11:57, 23 August 2022 (UTC)[reply]
@ChristianKl: I agree; I tried raising the discussion on WikiProject Molecular Biology, though (Wikidata talk:WikiProject Molecular biology#Update the modelling of protein families?) and got no response there for 1 month and a half before proposing the property. How would you suggest to proceed? TiagoLubiana (talk) 21:27, 2 September 2022 (UTC)[reply]
@TiagoLubiana: I see the problem as being about the lack of a data model. To be solved by having a new page that proposes a data model. Ideally, a clear proposal that people can say yes/no to. Afterward, we might ping a bunch of people to get a discussion. ChristianKl22:23, 2 September 2022 (UTC)[reply]