Wikidata:Property proposal/Verbalization by lexeme

Verbalization by lexeme

Originally proposed at Wikidata:Property proposal/Lexemes

Not done

Description	A property to indicate which lexeme should by use when verbalizing a given item in a Natural Language Generation system
Data type	one lexeme-sense per language code-invalid datatype (not in Module:i18n/datatype)
Domain	item
Example 1	woman (Q467) → en: woman (L3338-S1), he: אשה/אִשָּׁה (L63925-S1)
Example 2	man (Q8441) → en: man (L3337-S1), he: איש/אִישׁ (L63920-S1)
Example 3	human (Q5) → en: human (L3080-S1), he: אדם/אָדָם (L63680-S1)
Planned use	Linking of items to the lexeme sense best used when wishing to verbalize this item in a Natural Language Generation system. Typically, this will be the lexeme sense corresponding to the main label of the item (for every language).
See also	phab:T320263

Motivation

For Abstract Wikipedia, there is a need to link items to the canonical lexeme which should be used for their verbalization by Natural Language Generation systems. While there are currently links from lexemes to items, the reverse look-up is not accessible by all APIs. Moreover, the lexeme-to-item link is a many-to-one relation. Here I propose that a single representative lexeme needs to be singled out for verbalization purposes. See also phab:T320263. AGutman-WMF (talk) 09:48, 17 November 2022 (UTC)[reply]

Update: Given some questions, at this stage I envisage this to be applicable only to the main label of an item (per language). Insofar we want a single canonical verbalization of the item per language, I think the alias labels can be safely ignored. In the future, we may want to link these too to lexemes, but it is less pressing. Once a link to a lexeme has been established, related lexemes (for instance, synonyms, or spelling variants), can be explored using statements in the lexeme namespace. AGutman-WMF (talk) 20:20, 17 November 2022 (UTC)[reply]

Discussion

Oppose Who will maintain this? Who will update the statements when the labels change? How is this different from picking the sense which links to the item using item for this sense (P5137) where the label matches the lemma? Inverse properties have always caused problems with data getting out of sync and generally only exist because Wikimedia developers have still not implemented support for fetching the data people want to fetch. If the WMF needs the feature we've been requesting for years, perhaps they should implement it instead of proposing workarounds that create more maintenance burden for the community. - Nikki (talk) 10:47, 17 November 2022 (UTC)[reply]
In fact, I already have a simple user script that links labels to lexemes: User:Nikki/LinkLabelsToLexemes.js. - Nikki (talk) 10:51, 17 November 2022 (UTC)[reply]
Oppose It seems unlikely anyone will be able to agree on a single word for a given item in a language. This also seems likely to lead to misunderstandings - for example, in many languages, it is rude to address a woman you know as "woman." How do we know the context this word is being used in is appropriate? For some concepts, the word used would depend on the age of the speaker and recipient (and if your recipient is unknown, assume they are your elder to be safe), or the word is supposed to be different depending on the recipient's religion and so on. The labels are not meant to be the sum of the item's content; they are necessarily a compromise to make items easier to locate and navigate. Often the main label is arbitrary or incidental. -عُثمان (talk) 05:50, 18 November 2022 (UTC)[reply]

Not done no support for creation of this property --DannyS712 (talk) 01:00, 3 December 2022 (UTC)[reply]