Wikidata talk:Lexicographical data/Archive/2020/08

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Lexeme Forms documentation improvement for variants

Variants seem to be a very confusing state of things currently, with several overlapping practices. I would like this discussion to help improve the documentation, since there seems to be evolving standards, or simply that the standards and best practices are not well documented. Let's improve that and discuss this here.

This page section is lacking a bit more information to help with the following problem:

Example of how to deal with phrases or idioms that include hyphens? (I am not asking about "compound words", since the meaning of a compound word is one that expresses a meaning different from its individual words it is made from (ex. moonlight - moon, light or sunflower - sun, flower). Instead I am simply asking about variants where the sense or meaning is the same, but only differs in Form or its Representation)

We already know that in many languages any phrase potentially could be hyphenated or not. For example:

eye to eye


This might affect pattern matching and tokenization efforts with Abstract Wikipedia later on or not, don't know but maybe @Denny: could chime in on that a bit. So I think it's worth mentioning here. I do know that in some languages, various glyphs (like hyphen -)are deemed important enough where sometimes it changes the meaning of a phrase. In other languages, the optional glyphs are sometimes not important and have no effect on changing the meaning of a phrase. For English, my hunch is that these 2 Lexemes should be merged, and instead 2 Forms are created for the hyphenated form and regular form? But I'm not 100% sure based on current reading of some documentation and other Talk pages floating around.

Once someone can help me to update the wiki docs with this kind of example and best practices for variants, I think it will be much more useful to understand how to handle more kinds of spelling variations for a particular sense's form that occur within a single language.

I also see alternatives of spelling variants being done like so: ax | axe

and also would like to see documentation updated in regards to how that actually works and the meaning of spelling variant en-x- ?

It seems there are also side discussions and questions around 'variants' handling in the following:

Thanks in advance for any advice! Thadguidry (talk) 17:36, 15 August 2020 (UTC)


  • On a meta level across languages, I don't think you get around considering all three approaches (i.e. as separate form, on the same form, as separate lexeme).
That said, from your comment it's not entirely clear if you are primarily interested in phrases (which have additional problems) or just the color/colour type of thing.
Phrases can have the same meaning without the same words being present in every use of the phrase. As a linguist once summarized it, they have key elements that are always present. --- Jura 07:15, 16 August 2020 (UTC)

Wikidata:Property proposal/WordNet 3.1 Synset Id

Hello, because you deal with lexicographical data on Wikdata, could you give your opinion about this property proposal, especially whether it should be applied to item or to lexeme. Pamputt (talk) 06:45, 25 August 2020 (UTC)

Looking for a co-speaker to present Lexemes to the Italian-speaking community

Hello all,

I got asked to present Wikidata Lexemes & how they can be used on Wiktionary at the Italian WikiCon (taking place online on October 24-25, the presentation would be in English). I can of course give an overview of Lexemes, the features, etc. but I think it would be much more interesting if there's also someone from the community who is editing Lexemes, and trying to connect Lexemes and Wiktionary content.

Would anyone be interested to work with me on giving this presentation? Thanks in advance :) Lea Lacroix (WMDE) (talk) 09:58, 31 August 2020 (UTC)

@Lea Lacroix (WMDE): I'd be down to do it, as I've recently given similar introductions to speakers of Marathi and Sanskrit. (Not sure how connecting Wiktionary to lexemes has been going in the absence of work on phab:T212843 and its subtasks.) Mahir256 (talk) 12:26, 31 August 2020 (UTC)
Return to the project page "Lexicographical data/Archive/2020/08".