Wikidata:Property proposal/root

root edit

Originally proposed at Wikidata:Property proposal/Lexemes

Done: root (P5920) (Talk and documentation)

Description	Part of word that does not have a prefix or a suffix.
Represents	root (Q111029)
Data type	Lexeme
Example	Lexeme:L2233 (كتاب) → Lexeme:L2308 ك ت ب‎ (k-t-b)
Expected completeness	always incomplete (Q21873886)
See also	Chinese character radical (Q849778), proposed; derived from lexeme (P5191)

Motivation

Needed for Arabic and other languages. For more examples see wiktionary:Category:Arabic roots. Kolja21 (talk) 02:10, 4 June 2018 (UTC)[reply]

Discussion

Question @Denny, JakobVoss: I'm not sure how to separate this property from derived from lexeme (P5191). Is there an inverse property for derived from lexeme (P5191)? --Kolja21 (talk) 12:27, 4 June 2018 (UTC)[reply]

Imho we should differentiate between root (Q111029) and etymon (Q992080). Both are an aspect of etymology (Q35245), the history of a word, and used in the sense of "derived from".
Example: Lexeme:L20 (twenty, English) → etymon: Lexeme:L435 (twentig, Old English)

We also have combines lexemes (P5238) and word stem (P5187). Without good examples and documentation to explain the usage of these properties I would not want to create another one, especially without native speakers and linguists. -- JakobVoss (talk) 06:59, 5 June 2018 (UTC)[reply]

Thank you for reminding me of word stem (P5187). Root (Wortwurzel) is a subclass of word stem (P5187) (Wortstamm). (Quote from deWP: "Ein Stamm kann selbst zusammengesetzt sein, also bereits Produkt einer Wortbildungsregel sein, oder er kann eine elementare, unzerlegbare Einheit sein, d. h. eine Wurzel.") I put the proposal on hold. --Kolja21 (talk) 20:40, 5 June 2018 (UTC)[reply]

Question So with word stem (P5187) and derived from lexeme (P5191) already existing, what is the additional value of a "root" property? In many languages (including German), forms are derived from some form of "root" or "stem", which often is not a lexeme in its won right (e.g. the German verb "trinken" has the root "trink-", which is not a lexeme). There may even be a need to fur further markup in the stem, e.g. to mark the stem vowel "tr(i)nk-". So that property seems more useful for deriving morphology. Referring to a root lexeme seems useful mostly for etymological purposes, which can be covered with derived from lexeme (P5191), possibly with an appropriate qualifier. -- Duesentrieb (talk) 13:44, 6 June 2018 (UTC)[reply]

The value of root (Q111029) is that we could use the correct grammatical term. Linguistics is complicated enough. If we start playing around with technical terms ("root" is almost synonym to X and 90 % a kind of subclass of Y) we create problems that can hardly be solved later. --Kolja21 (talk) 19:31, 6 June 2018 (UTC)[reply]

On the other hand, having multiple properties that are very similar makes queries hard and causes confusion.

But more importantly, it seems like introducing this property encourages the creation of roots as separate Lexemes. Perhaps it would be better to first discuss whether this is wanted before creating the property. I can see the value, but I also see problems with that approach. For one thing, linguistically, roots are not necessarily really lexemes, are they? -- Duesentrieb (talk) 14:28, 7 June 2018 (UTC)[reply]

@Duesentrieb: Sorry, auf Deutsch, da mir das Thema auf Englisch zu kompliziert wird. Dein Kommentar wirf eine neue Frage auf: Sollen neben Lexemen (L), Formen (F) und Bedeutungen (S) noch weitere Namensräume geschaffen werden? Davon habe ich bislang nichts gelesen, daher gehe davon aus, dass der Begriff "Lexem" möglichst weit ausgelegt wird, d.h. alle Einträge aus Wiktionary umfasst. Aber selbst wenn wir den Begriff auf die in WP genannte Definition "Lexeme bieten eine Ordnung des Wortschatzes einer Sprache" reduzieren, brauchst du nur ein klassisches Wörterbuch wie das von Hans Wehr begründete Dictionary of Modern Written Arabic (Q4117629) zur Hand nehmen. Wehr sortiert arabische Wörter nach ihren Wurzeln. Die root (Q111029) ist also ein Lexem par excellence. Wir erzeugen keine Konfusion, wenn wir in der Liguistik klar definierte Begriffe als Eigenschaften übernehmen. Ganz im Gegenteil: Das Ergebnis einer automatisierten Abfrage ist wertlos, wenn die Basisdaten unsauer eingetragen sind. --Kolja21 (talk) 00:55, 8 June 2018 (UTC)[reply]

You say that it's useful and even essential to support the linguistic concept of "root". I agree, but I'm not fully convinced that it's useful to have separate properties for "stem" and "root". Is there any situation where both would be used, but with different values? If not, we could just make "root" an alias for "stem". I'm even more sceptical about treating roots as lexemes. The fact that a dictionary uses them for indexing doesn't support this approach in my mind. Couldn't we just treat them as simple text values, instead of making them references to elements? Roots do not have forms or senses; is there a strong need for making statements about roots? If not, I'd suggest to not treat them as lexemes, but use text values instead, as we doe with word stem (P5187). -- Duesentrieb (talk) 12:15, 15 June 2018 (UTC)[reply]

A simple text value is useless. We need the information given in wiktionary:Category:Arabic terms by root. This is a basic of the Arabic language. Roots do not have senses? Of cause they have! That's fun of it and that's why kids learn them in school: ج ل ب ب, related to being dressed. --Kolja21 (talk) 13:15, 15 June 2018 (UTC)[reply]

What do you mean by the information given in the category? If you want to find all Lexemes that have a given root, this can easily be done with a query, and whether the root is given as a text value or a Lexeme reference makes no difference at all.

What do you mean by ج ل ب ب, related to being dressed? Is "being dressed" actually a sense of the word?

I'm not totally opposed to having roots as Lexemes for Arabic; I'm trying to understand why this is important for Arabic, while it seems redundant or misleading for other languages. It makes no sense to me to have the German root "geh" as a Lexeme, even though it's the root of "gehen" (of which "geh" is also the impreative Form), "ausgehen", "Gang", etc. I wouldn't know what senses to attach to that root, either. We could perhaps use a statement to linkt it to a concept it evokes (walking (Q6537379)), but that's not a Sense. But perhaps it's worth having... is that what you are aiming at? -- Duesentrieb (talk) 12:08, 18 June 2018 (UTC)[reply]

"geh" is not a Lexeme and Arabic has not the same grammar as German. You could call "being dressed" the sense of the word ج ل ب ب, but it's not a word, it's a root. If you know the sense of the root you will have a deeper understand of the sense of the words that are connected to this root. If it's a verb, you need the root to know how to conjugate it etc. All this does not apply to "geh". There is for example no dual in German but it still exists. For the same reason we need the property "root" even if we can't use it for German. --Kolja21 (talk) 12:48, 18 June 2018 (UTC)[reply]

Sure, I'm not saying just because something isn't useful for German grammar it shouldn't exist here, or can't be useful for other languages. Some properties clearly make sense for some language family, and not for others.

My point is actually to the contrary: In German you do have to know the root of a verb in order to conjugate it. And knowing the concept associated with the root does give you a deeper understanding of all the words derived from it. Yet nobody has apparently felt the need to manage roots as separate Lexemes for German.

Being able to represent the root of a word is definitely important. I'm trying to find out why a) this has to be separate from the stem and b) why the root should be represented as a lexeme instead of just text. And again, I'm not totally opposed to your proposal, I'm just unconvinced. You are suggesting to add considerable complexity by managing roots as Lexemes. I'm trying to understand the expected benefit of this more "expensive" approach, so we can decide whether it is worth the extra effort. -- Duesentrieb (talk) 10:30, 20 June 2018 (UTC)[reply]

Taking a step back: I support this property if we want to represent roots as lexemes. If we don't, and "root" was just plain text, I the stem property is probably sufficient. But the big question is whether we want to have roots as lexemes, at least for some languages. I see some value in it, but I'm not sure it's worth the added complexity. Perhaps this question is worth a broader discussion. -- Duesentrieb (talk) 09:04, 23 June 2018 (UTC)[reply]

Like I explained to you (in German) above a Arabic root is a lexeme and we already have Arabic roots as lexemes. So you are taking two steps back. --Kolja21 (talk) 13:46, 23 June 2018 (UTC)[reply]

Support --SR5 (talk) 08:09, 13 August 2018 (UTC)[reply]
Comment I removed "on hold". If you don't want go through with the proposal, please withdraw it. "on hold" generally means approved for creation, but awaiting the datatype. --- Jura 09:51, 15 September 2018 (UTC)[reply]
Support I need this for Maltese words. word stem (P5187) is not sufficient because it doesn't allow us to distinguish roots with the same letters but different etymologies. For example, a root containing ħ can be from an Arabic root containing ح or from one containing خ. I wouldn't object to limiting this to Semitic languages for now, we can always expand the scope later if we need to. If we want a clearer label, we could call it "Semitic root", like the English Wikipedia page. - Nikki (talk) 13:52, 15 September 2018 (UTC)[reply]
Support Also useful for Russian words. And there is difference between stem and root: e.g. in a word "некрасивый" the stem is "некрасив" while the root is "крас". --Infovarius (talk) 10:16, 23 September 2018 (UTC)[reply]

@Infovarius, Nikki, Jura1, Kolja21, SR5: @Duesentrieb, JakobVoss: Done: root (P5920) − Pintoch (talk) 07:09, 26 September 2018 (UTC)[reply]