Wikidata:Lexicographical data/Documentation/Lexeme languages

The language to which a lexeme belongs is a reference to a Wikidata item for a language.

For most languages, this is a straightforward determination: English (Q1860), Thai (Q9217), Manchu (Q33638), and Gun (Q3111668) are just four of the many possibilities, since they have supported language codes en, th, mnc, and guw.

Some languages, however, have begun to require for their lexemes that particular language items be used. While this page lists some of those choices, more information about them may be found on the documentation pages for those languages.

Enlarged scopes of existing language items edit

  • Turkish (Q256) encompasses the language spoken in Turkey both before and after the introduction of Latin script in 1928; items referring to 'Ottoman Turkish' should not be used in lexemes as languages (although they may be used in variety of lexeme, form or sense (P7481) statements on lexemes and senses).
  • Punjabi (Q58635) encompasses the language spoken in Punjab on both sides of the Radcliffe line, rather than merely east of the line as is implied by the composition of pa.wikipedia.org; items referring to a 'Western Punjabi' should not be used in lexemes at all. There are some language varieties which form a continuum with Punjabi which are modeled separately, such as Saraiki (Q33902) and Hindko (Q382273), following lines drawn in the references cited on these lexemes. Pothohari(-Pahari) or Mirpuri is a variety of Punjabi and not an exceptionally divergent one at that; accordingly, separate treatment is not warranted.

Reduced scopes of existing language items edit

Uses of less-expected language items edit

Unresolved problem areas edit