Wikidata talk:Lexicographical data/Archive/2016/01

Latest comment: 8 years ago by Pengo in topic Basic structure
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Basic structure

(en, "A person of Slavic origins.")

  |
  |__ (en, Slav, (noun, -s), ...)
            |_______________ Slav, Slavs
  |
  |__ (sv, slav, (noun, common, -er), ...)
            |_______________ slav, slaven, slaver, slaverna

(en, "A person who is considered as the property of another.")

  |
  |__ (en, slave, (noun, -s), ...)
            |_______________ slave, slaves
  |
  |__ (sv, slav, (noun, common, -ar), ...)
            |_______________ slav, slaven, slavar, slavarna

(sv, "Ofri person")

  |
  |__ (en, slave, (noun, -s), ...)
            |_______________ slave, slaves
  |
  |__ (sv, slav, (noun, common, -ar), ...)
            |_______________ slav, slaven, slavar, slavarna

(sv, "Person som kan associeras till en slavisk folkgrupp")

  |
  |__ (en, Slav, (noun, -s), ...)
            |_______________ Slav, Slavs
  |
  |__ (sv, slav, (noun, common, -er), ...)
            |_______________ slav, slaven, slaver, slaverna

Bonds

  • (sv, "Person som kan associeras till en slavisk folkgrupp", "A person of Slavic origins.", en)
  • (sv, (slav, -ar), (slave, -s), en)
  • (sv, (slav, -er), (Slav, -s), en)
  • (sv, slavar, slaves, en)

The data structure could of course be compressed. There is no need to store the inflection n times, there could be fancy ways of doing the paradigms etc. But the general idea is that we do away with having the idea of a "sense". And note that I wouldn't make the bond (sv, "Ofri person", "A person who is considered as the property of another.", en), as I don't think these two definitions are equivalent, although I would make the bond (sv, (slav, -ar), (slave, -s), en). - Francis Tyers (talk) 10:21, 7 August 2013 (UTC)

I also thought about the idea of having a difference between "strong bonds" (validated by a contributor) and "weak bonds" (inferred by the system) e.g. given (sv, (slav, -ar), (slave, -s), en) you could infer (sv, slavar, slaves, en), but this may not be as accurate.
Furthermore, it should be possible to have bonds between definitions in the same language, e.g. (en, "A person that can be associated with a Slavic ethnic group.", "A person of Slavic origins.", en) - Francis Tyers (talk) 11:05, 7 August 2013 (UTC)
There are many advantages of making the lexeme the fundamental entity, and not the word sense. For one, word senses are much more "fluid" and less agreed on than lexemes, and since adding and merging word senses would be a much heavier operation if they were the fundamental entity than when they are just a sub-entity of the lexeme, this does matter for maintenance of the data. Second, a word sense can, in my opinion only apply to a single word -- no two lexemes (whether from the same or from different languages) can share one word sense - but this is where you disagree with me (check out en:word sense).
I don't really get what you mean here, that two lexemes cannot share a sense, or that a single sense cannot be applied to more than one word. In the above example, the Swedish "sense/definition" "Ofri person" could be applied to "serf", "indentured servant", "prisoner". Or in English/Croatian "Place where aeroplanes take off and land from" could be applied to "aerodrom" "aerodrome" "zračna luka" "airport" (and also Norwegian: "flyplass" "lufthavn"). - Francis Tyers (talk) 13:17, 7 August 2013 (UTC)
The word senses of "aerodrom" and "zračna luka" are different because they are different words. So it cannot be the same word sense. But they can both mean the same thing. It is OK to model it differently, but in this proposal we center it around the lexeme. If you want to make a proposal that centers around a meaning or definition, great. You probably want to take That is the way that OmegaWiki is doing it, by using a "defined meaning". You might be interested in exploring it there. --Denny (talk) 15:17, 7 August 2013 (UTC)
From what I understand of it, an OmegaWiki-style model is what Francis is opposed to. -- Jimregan (talk) 15:59, 7 August 2013 (UTC)
Now I am confused. OmegaWiki centers around defined meanings. Isn't that exactly what Fancis suggests? --Denny (talk) 04:57, 8 August 2013 (UTC)
Yes, I'm definitely 100% against the OmegaWiki model. - Francis Tyers (talk) 10:45, 8 August 2013 (UTC)
What I am proposing is completely orthogonal to DefinedMeaning. What I term "definition" you can call "gloss" if you want. - Francis Tyers (talk) 10:56, 8 August 2013 (UTC)
As you already say, word forms should be shared among different senses automagically. This requires a quite complicated setup, which already indicates that there might be something not completely ideal in the way that is conceptualized. Making the lexeme fundamental, and senses and forms be its dependents, avoids both problem.
I do understand your idea of making the "meaning" of the word the foundational entity. That's what I thought first too, but having read up a bit in the research of the area I have come to an understanding that taking the lexeme leads to a much cleaner design and easier to understand overall system. This is also why I avoid "meaning" and speak of "word sense". Note that if you really want a "hard meaning" that for many words the sense can just refer to the appropriate Wikidata item. The Wikidata items, based on Wikipedia articles, basically provide us with a ground set of "meanings" to which we can attach the senses. I hope this is convincing, I am afraid I am not doing a great job explaining it. --Denny (talk) 12:24, 7 August 2013 (UTC)
I'm not talking about having the "meaning" as central, I'm talking about having the definition as central. I am not interested in a "hard meaning" as I don't believe that "hard meanings" (see Wittgenstein) or "word senses" (see Kilgarriff, 1997) really exist. - Francis Tyers (talk) 13:17, 7 August 2013 (UTC)
What is the practical difference between a "meaning" or a "definition"? Why would Wittgenstein not consider your "definition" as a meaning, or Kilgaroff not consider your "definition" as a sense? I think, Wittgenstein's point was that you cannot capture a meaning anyway. Also Kilgaroff's point is that senses can not be well split up, that the borders are fuzzy - that's what I mentioned above. But it seems to me that both arguments would support not to use "meaning"/"definition"/"sense" as the central structure. "Lexeme"/"word" still seems much more stable to me. --15:17, 7 August 2013 (UTC)
The practical difference is that a meaning is of a word, whereas you may have many definitions that express a single meaning. Kilgarriff would not consider the definitions as senses, they are more like his "word in context", where the context happens to be a definition. - Francis Tyers (talk) 10:45, 8 August 2013 (UTC)
Oh, and we also can simply agree to disagree if you prefer. We might get a bit off-track here. Cheers, --15:17, 7 August 2013 (UTC)
That's fine by me too. I don't want to interfere, I just wanted to clarify some points. I don't have time to do the leg work ... and the person who does have the time should make the decisions. - Francis Tyers (talk) 10:45, 8 August 2013 (UTC)
It is necessary to be careful to distinguish between "word senses" and "meanings", as Francis says interlingual "hard" meanings are quite impractical and possibly don't even really exist. What Kilgarriff means when he says "I don't believe in word senses" is that word senses constitute the meaning of a word for a given task, e.g., in translation it can be assumed that each possible translation is a word sense. In the context of Wiktionary it is clear that the task is to give definitions of words, and as such each definition (gloss) is a word sense. The other issue that seems to be clashing here is the grouping of senses into large meanings, something like "synsets" in WordNet, this could be useful, but as it is not something currently done in Wiktionary (to the best of my knowledge) is it something that should be introduced by Wikidata? Johnmccrae (talk) 14:59, 7 August 2013 (UTC)
Wikisaurus does this in en.wiktionary, but I haven't seen anything similar in any of the other wiktionaries. -- Jimregan (talk) 15:58, 7 August 2013 (UTC)
A gloss is not a definition, and I explicitly opted for a gloss here. Definitions are much harder, and can, in particular, live without their lexeme. A gloss without the lexeme though usually is insufficient. --Denny (talk) 04:57, 8 August 2013 (UTC)
Also, my question remains unanswered: what is the practical difference between a definition and a "hard meaning"? --Denny (talk) 04:57, 8 August 2013 (UTC)

I know this is an old conversation, but just wanted to agree and add some weight to why a gloss or definition cannot be the primary organization entity. Because it would mean that once written, a definition could practically never be updated.

Take for example the current first definition of feather in English:

  1. A branching, hair-like structure that grows on the wings of birds that allows their wings to create lift.

Say that I notice that this definition is overly narrow, as it excludes down feathers and penguin feathers. What happens if I change it to:

  1. A branching, hair-like structure that grows on the wings or body of birds typically allowing their wings to create lift

Right now, to make that edit is easy enough. If I wanted to do that with the first structure listed in this conversation then I'd need to edit every gloss line in something that looks vaguely like this: (machine translation used to create an example)

(sense) S2012

    (gloss) (en) A branching, hair-like structure that grows on the wings of birds that allows their wings to create lift.
    (gloss) (ja) 彼らの翼は揚力を作成することができ、その鳥の翼の上に成長分岐、毛状の構造。
    (gloss) (yi) א בראַנטשינג, האָר-ווי ביניען אַז וואקסט אויף די פֿליגלען פֿון בירדס אַז אַלאַוז זייערע פֿליגלען צו שאַפֿן הייבן.
    (gloss) (th) แยกโครงสร้างผมเหมือนที่เติบโตบนปีกของนกที่ช่วยให้ปีกของพวกเขาในการสร้างลิฟท์
    (gloss) (fr) Une ramification, la structure ressemblant à des cheveux qui pousse sur les ailes des oiseaux qui permet à leurs ailes pour créer l'ascenseur.
    (gloss) (gr) Μια διακλάδωση, τα μαλλιά-όπως δομή που αναπτύσσεται πάνω στα φτερά των πουλιών που επιτρέπει τα φτερά τους για να δημιουργήσουν ανελκυστήρα.
    etc, etc...

Not only is it impossible to update a gloss without desynchronising the sense for every other language with defines "feather", it's also impossible to know if another language has linked this sense for a word which truly only means "flight feathers" (I believe Latin penna has this more narrow definition); so the change to just the English definition would make Latin lemma entries incorrect. It's a much too brittle structure and makes it impossible for mortals to edit. Wikipedia doesn't have too many problems linking between concepts which are very-similar-but-slightly-different-in-scope, but it would be chaos for a dictionary to do this. We require much more precise relationships between glosses for a Wikidata Wiktionary, with "exactly equivalent" being just one of those relationships. So I agree with Francis Tyers' thoughts that "this is going about it the wrong way". —Pengo (talk) 23:48, 4 January 2016 (UTC)

Return to the project page "Lexicographical data/Archive/2016/01".