Wikidata talk:Lexicographical data/Archive/2021/05

Latest comment: 3 years ago by VIGNERON in topic Field size for glosses
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Flagging trivial senses

Wiktionary defines sailing in three different ways:

  1. as a form of verb sail,
  2. as a noun and an adjective with trivial sense that mirrors meaning of verb sail, and
  3. as a noun in its own right with several non-trivial senses.

Dictionaries have different inclusion criteria and different formatting for the three definition types. The most common are 1+2+3 (Wiktionaries, at least enwikt), 1+3 or 2+3 (in languages with strong redundancy or even identity between 1 and 2, e.g. Slovak and Czech, @Lexicolover: comments?), and just 3 (paper and concise online dictionaries). I would like to allow downstream dictionaries to differentiate definition types 1-3, so that they can filter and/or format them according to local policy. Wikidata clearly differentiates forms (type 1) from lexeme senses, but is there a way to flag trivial senses (type 2) with a statement to differentiate them from non-trivial senses (type 3)?

There's derived from lexeme (P5191) and lexeme classes like verbal noun (Q1350145) (@Fnielsen: comments?), but those do not indicate which sense is the trivial one and they seem to be inappropriate on senses. Omitting all trivial senses from Wikidata would lose information, because trivial senses can have useful statements. There were some mentions of clarifying meaning beyond the crude item for this sense (P5137), but I have not found any examples of that. — Robert Važan (talk) 13:08, 24 April 2021 (UTC)

Just to note this seems like a good idea and a good thing to raise here. I'd been generally avoiding adding trivial senses like this, but I can see how it might be useful. ArthurPSmith (talk) 13:48, 24 April 2021 (UTC)
Hmmm... In Danish, it is easier as the verbal noun (Q1350145) is different from the present participle (Q10345583) form. For instance, the verb sejle (L302330) (sail) is derived to the verbal noun (Q1350145) sejling (L481495), while sejle (L302330) also has the present participle (Q10345583) form "sejlende" L302330-F8. There might be semantic difference between the present participle (Q10345583) form and the derived adjective (Q34698), e.g., stråle (L253058) can mean "shine", while "strålende" can mean "shining" (in its rare verb form) and "glittering" (a glittering performance) as an adjective lexeme. Sometimes I would create the adjective lexeme. Particular if there is an antonym with u- corresponding to un- in English. For English and Danish, there is no unsailing or usejling. But, e.g., for excite there is exciting and unexciting. So I would say that "unexiciting" is derived from the adjective lexeme "exciting", and "exciting" is derived from "excite". Here there seems to be no "exciting" noun, perhaps because of "excitation" and "excitement", - I suppose? In some Danish dictionaries, it seems that in the rare cases where there is a semantic difference then the adjective lexeme is recorded as an individual lexeme, - otherwise not. But I suppose that shouldn't stop us from doing it. — Finn Årup Nielsen (fnielsen) (talk) 23:54, 28 April 2021 (UTC)
I now see that the Danish verbal noun (Q1350145) is more like deverbal noun (Q1135151). — Finn Årup Nielsen (fnielsen) (talk) 00:11, 29 April 2021 (UTC)
I think "exciting" is a bit of a different case, it has a distinctive meaning as an adjective that's different from what the verb would imply (reverses subject and object), so that's not one of the "trivial" senses. That's presumably why we have "excitation" etc. for what would otherwise be the trivial meaning of "exciting". ArthurPSmith (talk) 12:52, 29 April 2021 (UTC)

I checked existing senses on English lexemes and some do include trivial sense (jumping (L322886), drumming (L319827), scratching (L327289), coating (L30646), crawling (L318830), ...) without flagging it any way while some other English lexemes currently exclude the trivial sense even though Wiktionary includes it (building (L3870), winning (L52428)). A policy is needed to avoid repeated additions/deletions of trivial senses. I think the policy discussion can happen once there's a property proposal for trivial senses. I am not going to propose it myself, at least not right now, because I am not yet sure what value should the property take if any. Meantime, I am going to informally flag such senses with formulaic gloss. — Robert Važan (talk) 13:37, 1 May 2021 (UTC)

Now that I think about it, adding these "trivial" senses may be good so that exceptions may be noted by their absence, though perhaps there's a better way to do that too? ArthurPSmith (talk) 17:47, 3 May 2021 (UTC)

In Czech language there would be huge number of lexemes with what could be called trivial senses (verbal nouns, verbal adjectives, many adjectives derived from nouns, adverbs derived from adjectives). These are trivial in a sense those inherit senses from words these are derived from (and senses are modified to specific part of speech function). Printed dictionaries usually deal with these lexemes with some kind of short note (for example "kočičí (L245443): adjective to kočka (L1778)"). This works for single language dictionaries but not for multilanguage database for many reasons so I think we should add standard senses. But at this point it becomes somehow untrivial:

  1. The word kočka (L1778) has several meanings and each of them might or might not apply to derived adjective. Keeping the distinction is important for synonymy/antonymy and translations.
  2. The word kočičí (L245443) actually does not have one sense but several senses: 1) being part of cat (cat fur), 2) made for cats (cat food), 3) created/produced by cat (cat voice), 4) similar to cat (cat monster), 5) made out of cat (cat mount) etc. Some languages might have more than one lexeme to express these senses.

At this point we get to M×N number of senses instead od trivial "derived from", it is not trivial to source such senses and we can't expect any consistency if this would be made manualy. I have no idea what good solution would be but I believe that our current data model and used properties are somehow limiting of what we can do. --Lexicolover (talk) 12:36, 4 May 2021 (UTC)

@Lexicolover: This is quite enlightening. I suspected there can be multiple trivial senses, but I could not come up with an example. Blanket "trivial sense" statement will not work for all cases unless it links to a sufficiently descriptive item and/or includes qualifiers. In the end, it would essentially express sense definition as Wikidata statement(s) instead of plain text. Expressing definitions as statements means opening Pandora's box of specialized properties, but I think it is worth it for the simplest cases. — Robert Važan (talk) 14:01, 5 May 2021 (UTC)

Glosses vs. definitions

There has been some inconclusive discussion whether sense glosses should be as short as Wiktionary glosses or as long and detailed as Wiktionary definitions, but I think this discussion misses the point that both glosses and definitions are needed. But let's first clarify what are glosses and definitions as used in Wiktionary and what are they good for:

  • Glosses are unique within lexeme. Gloss might be (and ideally should be) just one word. For example, tea in Wiktionary has glosses plant, leaves, beverage, beverages similar to tea, and a light meal. Glosses are useful as links (e.g. tea (beverage)) and as short in-page references to senses.
  • Definitions are globally unique (except for synonyms), but they are not just identifiers. They also try to map out scope of the sense and delineate boundaries between senses. Clear definition is essential for adding correct translations, synonyms, and other sense-related data.

Without consensus on gloss use, sense glosses end up containing a mixture of glosses and definitions. Such inconsistency makes the data useless as either glosses or definitions. I can think of a few solutions:

  1. Define new property for definitions. Datatype would be unformatted multilingual text. Translations would be probably just native (for downstream use) and English (for internal use).
  2. For every language, define a list of external dictionaries that will be used to look up definitions. Definitions are looked up manually when needed.
  3. Define new properties that link to specific definitions in external dictionaries.

Solutions (2) and (3) are brittle, because external dictionaries change. (2) might be ambiguous. (3) might face technical challenges. I am therefore in favor of (1). Comments? — Robert Važan (talk) 07:07, 21 May 2021 (UTC)

@Robert Važan: "unformatted multilingual text" is not a datatype here. Do you mean monolingual text? In any case, feel free to propose this, but in my view glosses here are working reasonably well, except that those cut-and-pasted from Wikidata item descriptions can generally be trimmed. The primary "definition" of a sense should ideally be embodied in the linked item (and we should try to link as many lexeme sense to items as we can) where the "definition" can consist of a full Wikipedia page in whatever languages have provided one for that item. ArthurPSmith (talk) 13:19, 21 May 2021 (UTC)
@ArthurPSmith: There's MultilingualTextValue. Proposing the property is a technical detail. Right now the question is whether this is a good idea and what are the alternatives. I am aware of item for this sense (P5137), but that often (usually?) fails to discern between senses of a single lexeme, so it's even less specific than a short gloss. Trimmed item descriptions are effectively definitions. If glosses are used to store definitions, then the data model does not provide any place to store true short glosses. IMO, such short glosses are more valuable downstream than definitions. Definitions are mostly useful internally as a sort of agreement among editors about what each sense represents. — Robert Važan (talk) 14:55, 21 May 2021 (UTC)
@Robert Važan: The datatype was proposed, but as far as I know never implemented. I guess I'm having a hard time seeing what it is you are arguing for, do you have some more examples? For tea (L5355) the first gloss is pretty short as "type of beverage", though I guess it could be shortened to just "beverage". The second is a copy-paste which could be cut to "mealtime" or something like that. Neither one is a substantive "definition" though. ArthurPSmith (talk) 16:53, 21 May 2021 (UTC)
@ArthurPSmith: For example, I am just editing center (L251395) and center (L299084) (to be merged if in sense is implemented). I would like to keep short glosses "player" and "position". There's no suitable item to link to, only specialized ones (e.g. center (Q222052)). So I can (1) create an item for this purpose, (2) expand the glosses, or (3) keep the ambiguous glosses. Another example is bežec (L473650). There's no corresponding item and there probably wouldn't ever be any, because the metaphorical meaning is way too specific. Gloss "machine part" is too broad. Here I have only two options: (1) expand the gloss or (2) accept ambiguity. — Robert Važan (talk) 17:15, 21 May 2021 (UTC)
@ArthurPSmith: PS: bežec approximately translates as slider as in zipper slider, but it can be any other moving part of any machine or device. I doubt there will be a corresponding item. — Robert Važan (talk) 17:27, 21 May 2021 (UTC)

I would float one more idea. To discourage introduction of redundant unstructured content, it is possible to have property "sense clarification" (instead of "sense definition"), which would contain only information that is not represented otherwise. The property would be absent if there is nothing to add. This would avoid the overhead of adding definitions to every sense as well as the associated copyright issues. — Robert Važan (talk) 09:21, 22 May 2021 (UTC)

Esperanto verbs: imperative? volitive? something else?

Lepticed7 edited Wikidata:Wikidata Lexeme Forms/Esperanto (diff) to change the last form (dormu) from imperative (Q22716) to volitive (Q10716592). but before updating the tool accordingly, I’d like to ensure that this is correct. (For instance, wikt:Template:eo-conj still calls that form “imperative”.) Any opinions from other Esperanto speakers? Pinging the ones I know: Robin van der Vliet, Jens Ohlig, Nikki. Lucas Werkmeister (talk) 20:17, 28 March 2021 (UTC)

Hi, I want to mention this discussion I had when I modified the label on the English Wiktionary. What is clear is that "imperative" is not good, because it is not only imperative, but also subjonctive. Some English grammars of Esperanto, foundable on the Internet, describe this mood as "jussive". But it is not the case for French grammars of Esperanto, for example. Finally, I think that it is relevant to use "volitive", because it is the way Esperanto grammars name this mood. If Wikidata Lexemes want to be international, I think the best way to describe a language is to use the materials available in this language, and not in other languages. Hence, we should use Esperanto vocabulary to describe Esperanto. Lepticed7 (talk) 21:19, 28 March 2021 (UTC)
As the person who created the template: en:Volitive modality says "The volitive in Esperanto is really a generic deontic mood" and en:Deontic modality says something similar. en:Jussive mood says "The jussive mood, called the volitive in Esperanto, is used for wishing and requesting, and serves as the imperative.". Based on those pages, it could be volitive (Q10716592), deontic (Q5260031), jussive (Q462367) or imperative (Q22716). English and Esperanto sources seem to use either "imperative"/"imperativo" or "volative"/"volativo" while German sources only seem to use "Imperativ". The only one of the items I listed which has a page on the Esperanto Wikipedia is imperative (Q22716), so I went with that. - Nikki (talk) 21:24, 28 March 2021 (UTC)
Not exactly the same but grammatical features for verbs are often puzzling me a bit. What is the best way to go and know what item to use? It would be very useful to have some guidelines (and ideally a schema ;) ).
Some cases, I've seen recently :
  • for French, past imperfect (Q12547192) seems to me a mix-up of several concepts, see the talk page Talk:Q12547192
  • for Breton, the grammars can wildly vary in the name of times, for instance, for the same time, I found "conditional 2", "past conditional", "conditional irreal", "conditional irréalis", "irreal", and some comparison/link with "subjunctive imperfect" or "pluperfect" etc. (and most grammar are more or less mixing : time, aspect, and mood :/ )
As anyone good tip or ressources on how to deal with this? For instance, did linguists wrote about that?
Cheers, VIGNERON (talk) 17:32, 31 March 2021 (UTC)
As an Esperanto speaker and someone with an interest in linguistics (totally non-professional, I should add), I'm almost certain "imperative" and "jussive" are too narrow to cover the various uses of -u in Esperanto. Imperative is usually the first use people learn (e.g. "venu ĉi tien" for "come here"), but it's also very commonly used in expressions like "mi volas, ke vi helpu min" (I want you to help me) or "mi iru dormi" (I should go to sleep). Going by the descriptions and examples at https://glossary.sil.org/, "deontic" might actually be accurate. Among its three subcategories Esperanto -u is not used for commissive modality, but it does have uses that seem to fall under directive (e.g. deliberative, imperative, obligative) and volitive.
As for linguists that wrote about it:
  • In Plena Manlibro de Esperanta Gramatiko (currently the most thorough and authoritative Esperanto grammar handbook, aimed at non-linguists) written by linguist Bertilo Wennergren, the term "vola modo" (a literal translation of "volitive mood") is used, and its function is described like this: "The -U form indicates that the action or state is not real, but desired, wanted, ordered or aimed for." (source) Wennergren also used the term "volitivo" (volitive) in a summary of PMEG written for Lernu.net to be translated to other languages. (source with example sentences and translations in English).
  • The influential earlier work Plena Analiza Gramatiko (an Esperanto grammar aimed more at linguists than PMEG), by linguists Kalocsay and Waringhien, uses the term "volativo" for the -u ending, and writes on page 133 "the volitive in Esperanto corresponds in European languages simultaneously to the imperative and the subjunctive". (source) (If it's at all useful, I could translate how this book describes how the -u ending is used. It's on pages 158-159, counted as 152-153 in the pdf I linked.)
  • Plena Ilustrita Vortaro de Esperanto (the most authoritative Esperanto dictionary, written by a team that included linguists) defines the -u ending as "verbal ending indicating a desire or want", and gives "u-modo" (i.e. the mood of the -u ending) as a synonym of "volitivo". (source)
So regardless of what is technically more accurate, it seems pretty clear to me that "volitive" is commonly used among Esperanto-speaking linguists.
PS: it's worth noting wikt:Template:eo-conj and wikt:Template:eo-form of currently use "imperative", but the more common wikt:Template:eo-head uses "volitive", so Wiktionary is not consistent at the moment. Whatever is the final verdict here, it would be a good idea to use the same term in all of these (and any other Esperanto templates that use it). Rajzin (talk) 01:06, 1 April 2021 (UTC)
@Lucas Werkmeister: This discussion showed that volitive is the word to use in Esperanto. Is it possible to use it in Wikidata Lexeme Forms? Thanks, Lepticed7 (talk) 17:56, 22 May 2021 (UTC)
@Lepticed7 thanks for reminding me, I’ve deployed the change now. @Nikki, @VIGNERON, @Rajzin: just FYI :) Lucas Werkmeister (talk) 18:27, 24 May 2021 (UTC)

Creating lexemes for radicals?

Hi,

I’ve just created abel/ (L494263) to use it in the etymology of abelisto (L311374). But I just found out that the existence of the property word stem (P5187), used in agrokulturo (L311458). I don’t know what I am supposed to do. Can we create lexemes for radicals, or should I use the property? And if I use the property, how can I add the information to the etymology?

Cheers, Lepticed7 (talk) 18:08, 22 May 2021 (UTC)

Field size for glosses

Hello,

if I edit the sense of a Lexeme and enter a gloss for this then the field is little and I cant see the whole text I entered. Is it possible to make the field bigger or is this something what depends on the browser settings or my screen size. I am using a Computer, Firefox as a Browser and the desktop version for editing. --Hogü-456 (talk) 21:07, 27 May 2021 (UTC)

@Hogü-456: Gloss edit field can fit several words, which should be plenty for most glosses, but it's not enough for definitions. There were several discussions about glosses/definitions in the past (see archives) and I just started another one earlier this month (see above). — Robert Važan (talk) 11:29, 28 May 2021 (UTC)
In my view with my browser I can see 21 one characters. For a gloss I need 5 to 7 words. and so I think it were good if it is possible to see 50 characters at one view. Maybe I think with what I want to write a definition and not a gloss.--Hogü-456 (talk) 14:27, 28 May 2021 (UTC)
+1 too small. 21 character in some languages are enough only to 2-3 words. --Infovarius (talk) 20:18, 30 May 2021 (UTC)
It's maybe an unpopular opinion but I think that most gloses should be very short and follow (more or less) the guidelines of Help:Description, especially this part « In most cases, the proper length is between two and twelve words. » Right now, I feel like most gloses are way too long (see https://w.wiki/3QvU ; some might even cause copyright problems). Cheers, VIGNERON (talk) 13:45, 31 May 2021 (UTC)
Return to the project page "Lexicographical data/Archive/2021/05".