Wikidata:Lexicographical data/Documentation/Languages/cs

Czech
language, modern language
Subclass ofCzech–Slovak languages, West Slavic Edit
Native labelčeština, český jazyk Edit
CountryCzech Republic Edit
Indigenous toCzech Republic Edit
Has tensepresent tense, preterite, future tense, pluperfect Edit
Has grammatical moodindicative, conditional, imperative Edit
Has grammatical gendermasculine, feminine, neuter, masculine inanimate, masculine animate Edit
Writing systemCzech alphabet Edit
Language regulatory bodyInstitute of the Czech Language Edit
Ethnologue language status1 National Edit
Studied inCzech studies Edit
Related categoryCategory:Czech pronunciation Edit
Entry in abbreviations tableчеш. Edit
Wikimedia language codecs Edit

Czech (Q9056) is one of West Slavic (Q145852). It is mostly used by people of Czech Republic (Q213).

This page documents how to enter Czech lexemes in Wikidata.

General information

edit

Lexical category

edit

Here's the list of lexical categories used with Czech language:

Lemma

edit

Lemma is the form of the word that is used in the title of the page. It is the same form as used in dictionaries. For lexemes that can have multiple forms, lemmas have following gramatical features:

noun (Q1084)
nominative case (Q131105), singular (Q110786); in case of plurale tantum (Q138246) lexemes it is nominative case (Q131105) plural (Q146786)
adjective (Q34698)
positive (Q3482678), nominative case (Q131105), singular (Q110786), masculine animate (Q54020116)
pronoun (Q36224)
same as adjectives
numeral (Q63116)
same as adjectives
verb (Q24905)
infinitive (Q179230); use only infinitives ending with -t, older infinitives ending with -i should be added in Forms section along with statement instance of (P31): former form (Q56247521)
adverb (Q380057)
positive (Q3482678)
preposition (Q4833830)
non-vocalic form (Q55082712) (eg. k (L20562), bez (L23250)), vocalic form (Q55082724) should be added in Forms section
interjection (Q83034)
the shortest and simpliest form (eg. vr and not vrrr), verified forms could be added in Forms section
proper noun (Q147276)
generally same as nouns

Spelling variants

edit

NOTE: this issue is to be subject of further discussion

If the lexeme have multiple spelling variants the most recent orthography should be used (currently based on Rules of Czech Orthography (Q12046868) and Q12045315). Spelling variants that are not considered valid anymore can by added in Forms section marked with language code specifying the last orthography reform that considered such variant valid. Here are some examples of language codes that might be used:

spřežkový pravopis
cs-x-Q191494
bratrský pravopis
cs-x-Q1019741
Pravidla českého pravopisu z roku 1957
cs-x-Q5311

Currently unvalid spelling variant must always be verified.

In case when the latest orthography allows dublets, both should be used in page title. Because of how Wikidata software deals with language codes in case of dublets only one can be marked with cs code while others must use cs-x- code. The cs code should be attributed to basic and neutral variant.

Dublets according to Q12045315 can be divided into several categories:

  1. original s is alway read as /z/: variants with z are considered neutral
    • analýza cs
    • analysa cs-x-Q12045315
  2. -ns-/-nz-; -rs-/-rz-; -ls-/-lz- group: both are posible and both are neutral
    • diskurz cs
    • diskurs cs-x-Q12045315
  3. -ismus/-izmus suffix: -ismus is considered basic and neutral
    • symbolismus cs
    • symbolizmus cs-x-Q2065
  4. original s can be read both as /s/ and /z/: variants with s are considered neutral
    • diskuse cs
    • diskuze cs-x-Q12045315
  5. lenght of vowels: 1993 Rules of Czech Orthography (Q12046868) allows only short vowels while 1957 Rules of Czech Orthography (Q12046868) allowed long vowels only
    • balon cs
    • balón cs-x-Q12045315

Statements

edit

There are several properities to be used with lexemes. Those add more information about the lexeme as a whole. Information that is dependent on the form or a sense should not be used here but along the specified form/sense.

Gender

edit

Grammatical gender of the noun can be added here with grammatical gender (P5185) property. The applicable values for the Czech language are: masculine (Q499327) (masculine animate (Q54020116) / masculine inanimate (Q52943434)), feminine (Q1775415) and neuter (Q1775461).

Types of lexical category

edit

Most lexical categories are divided into several types. You can add them by instance of (P31) property. The applicable values are (the list is not complete):

noun (Q1084)
plurale tantum (Q138246), mass noun (Q489168), collective noun (Q504952)
adjective (Q34698)
possessive (Q2105891)
pronoun (Q36224)
personal pronoun (Q468801), possessive pronoun (Q1502460), demonstrative pronoun (Q34793275), interrogative word (Q2304610), relative pronoun (Q1050744), zájmeno záporné (negative pronoun), indefinite pronoun (Q956030)
numeral (Q63116)
cardinal numeral (Q1329258), ordinal numeral (Q923933)
verb (Q24905)
adverb (Q380057)
preposition (Q4833830)
conjunction (Q36484)
grammatical particle (Q184943)
interjection (Q83034)
proper noun (Q147276)
toponym (Q7884789), given name (Q202444), family name (Q101352)

NOTE: Those three types of proper noun (Q147276) are common part of linguistic dictionaries. Do not add other types of proper nouns such as names of the people, brands, company names, political parties etc.

NOTE: It is not yet clear how to deal with male form, female form ad family form (eg. Novák/Nováková/Novákovi) of the surname and thus it is discouraged to add them at the moment.


Forms

edit

Senses

edit

<TBD> Senses are not yet released.