Wikidata:Property proposal/etymology

derived from edit

Originally proposed at Wikidata:Property proposal/Lexemes

Done: derived from lexeme (P5191) (Talk and documentation)

Description	Lexemes this one has developed from
Represents	etymon (Q992080)
Data type	Lexeme
Domain	lexeme
Allowed values	older lexemes
Example	en:etymology -> ancient greek:ἔτυμον, ancient greek:-λογία
Source	etymological dictionaries and research
Planned use	https://twitter.com/mewo2/status/959120452575809536/photo/1
Number of IDs in source	potentially all lexemes
Expected completeness	always incomplete (Q21873886)
Robot and gadget jobs	etymology connections should not be circular
See also	based on (P144)

Motivation

One of the original reasons for a dictionary. --Denny (talk) 17:35, 17 April 2018 (UTC)[reply]

Discussion

Comment I think we should name this property "etymon" instead of "etymology". Just a word is not really an etymology (we should think of other properties to describe the etymology of a word). Tubezlob (🙋) 20:14, 18 April 2018 (UTC)[reply]
Good point, agreed. --Denny (talk) 16:00, 20 April 2018 (UTC)[reply]
Support. YULdigitalpreservation (talk) 12:44, 20 April 2018 (UTC)[reply]
Support with Tubezlob's condition. NMaia (talk) 22:38, 21 April 2018 (UTC)[reply]
Support for "etymon" John Samuel 19:45, 22 April 2018 (UTC)[reply]

Comment I took the liberty to change the proposal according to the votes so far. NMaia (talk) 11:54, 23 April 2018 (UTC)[reply]

Support. Tubezlob (🙋) 12:46, 23 April 2018 (UTC)[reply]
Support --Barcelona (talk) 08:27, 24 April 2018 (UTC)[reply]
Support with alias "derived from" -- JakobVoss (talk) 12:05, 24 April 2018 (UTC)[reply]

Discussion edit

@Denny, Tubezlob, YULdigitalpreservation, NMaia, Jsamwrites, Barcelona: is this the good approach? « etymon » (or « root ») is the oldest word from which word derives. I would prefer a "derived from" property where we would store only the closest word. For example, "etymology"@en -> "ethimologie"@frm (and on this latter lexeme, "ethimologie"@frm -> " etymologia"@la and so on). It would be more granular and more rich (and could be used to build etymology graphs). My question is: should we have both or should we only have the more precise one? (sure, a bit of redundancy could allways be useful but I prefer to avoid it if not really necessary). Cdlt, VIGNERON (talk) 10:21, 24 April 2018 (UTC)[reply]

@VIGNERON: OK your proposal is much better. I had not understood that the etymon was the oldest word. I think the best option is to have only the more precise one, the etymon will be very easy to determine. Tubezlob (🙋) 11:37, 24 April 2018 (UTC)[reply]

We could introduce a more specific etymon property later, please start with a not too complicated property. -- JakobVoss (talk) 12:09, 24 April 2018 (UTC)[reply]

@JakobVoss: true. But complication often come from lack of clarity. So, could you please clarify how you see it?

VIGNERON (talk) 12:40, 24 April 2018 (UTC)[reply]

@VIGNERON: I'd start with one generic property that links words to their roots, precursor, or earlier forms of any kind. More specific properties may be added later in a top-down approach based on larger sets of actual entries instead of a limited number of examples. -- JakobVoss (talk) 12:47, 24 April 2018 (UTC)[reply]

Agreed. I was thinking of the "closest word", not the "deepest root". I don't think we need a different proposal, I think we can just adapt this one accordingly. --Denny (talk) 16:02, 24 April 2018 (UTC)[reply]

An other question: should it really be only on the lexeme level? I wonder if it should be at the form level too (or instead?), to cover words with suppletion (Q324982) (see the article, tldr: suppletion is quite common, exists in almost all languages at various level and affects more the most common words - who are likely to be the first that someone would want enter to enter in Wikidata). For instance, the infinitive "to go"@en comes from "gon"@enm but the past tense "went'@en comes from "wendan"@ang (common verbs like "to be", "to have", "to go" are highly suppletive). Maybe it's possible to only use it at the lexeme level with some qualifiers but I'm not sure how (applies to part (P518) and link to the correspondent forms ? a bit crude but it could works for starter). Cdlt, VIGNERON (talk) 12:45, 24 April 2018 (UTC)[reply]

@VIGNERON: Interesting. For this type of case (suppletion for verbs), the best seems to do with qualifiers. But as said here, we never really discuss the data model for conjugation (that's a lot of forms…). Tubezlob (🙋) 15:52, 24 April 2018 (UTC)[reply]

I would keep it on the Lexeme level, and use qualifiers in those cases. My assumption is, even though they happen on high frequency words, they are absolutely rather rare. --Denny (talk) 16:02, 24 April 2018 (UTC)[reply]

@Tubezlob: it's not only for verbs. The adjectives "good" and "bad" are often suppletive for comparative and superlative too ("good"en -> "best"@en ; "bien"@fr -> "meilleur"@fr).

@Denny: exactly. Even in Breton, where it is "quite common" (MacAulay, 1992, p. 448) it's still rather rare in the absolute. Still, we need a way to model these irregularities, qualifier applies to part (P518) should do the job, see Lexeme:L554.

Cdlt, VIGNERON (talk) 07:01, 25 April 2018 (UTC)[reply]

@Denny, Tubezlob, YULdigitalpreservation, NMaia, Jsamwrites, Barcelona: Lexeme should be available tomorrow, could we move on and settle for a name for this property? "derived from"? (at least, as previously said, not "etymon" which is too specific) and maybe "previous" not "older" in description? Cdlt, VIGNERON (talk) 12:37, 22 May 2018 (UTC)[reply]

Both sounds good to me! Changed it in the template above. (Also, remember, we can change labels and descriptions later, too) --Denny (talk) 17:50, 22 May 2018 (UTC)[reply]

@Denny, Tubezlob, YULdigitalpreservation, NMaia, Jsamwrites: @Barcelona, Lea Lacroix (WMDE), JakobVoss, VIGNERON: Done - enjoy! ArthurPSmith (talk) 13:36, 23 May 2018 (UTC)[reply]