Wikidata:Lexicographical data/Documentation/Languages/sk

Slovak
language, modern language
Subclass ofCzech–Slovak languages Edit
Native labelslovenčina, slovenský jazyk Edit
CountryHungary, Serbia, Slovakia, Romania, Ukraine Edit
Indigenous toBorsod-Abaúj-Zemplén County, Zala County, Bratislava Region, Slovakia Edit
Has tensepresent tense, preterite, future tense, pluperfect Edit
Has grammatical moodindicative, conditional, imperative Edit
Writing systemLatin script Edit
Uses capitalization fortoponym Edit
Language regulatory bodyĽudovít Štúr Institute of Linguistics Edit
UNESCO language status1 safe Edit
Ethnologue language status1 National Edit
Studied inSlovak studies Edit
Related categoryCategory:Slovak pronunciation Edit
Entry in abbreviations tableсловац. Edit
Wikimedia language codesk Edit

Slovak (Q9058) is the national language of Slovakia (Q214).

This page describes in great detail how Slovak lexicon maps to Wikidata lexemes. It is both a guide for Wikidata editors and documentation for data consumers.

Sources edit

Standard Slovak (spisovná slovenčina = literary Slovak) is codified in four documents published by Ľudovít Štúr Institute of Linguistics (Q2368451), the language regulator (Q2093358) for Slovak language, and Matica slovenská (Q763567):

  1. Pravidlá slovenského pravopisu (4th edition) (Q107406453) (PDF, dictionary)
  2. Krátky slovník slovenského jazyka (5th edition) (Q107406568) (2003 edition available online: introduction, dictionary)
  3. Pravidlá slovenskej výslovnosti (2nd edition) (Q107409937)
  4. Morfológia slovenského jazyka (Q107406442) (online)

Note that Wikidata also carries lexemes for non-standard Slovak, including regional dialects. JÚĽŠ SAV itself publishes a number of dictionaries that go beyond the scope of standard Slovak.

Information in English about Slovak language is available from slovake.eu, coauthored by Ľudovít Štúr Institute of Linguistics (Q2368451), and of course from Wikipedia's Slovak language entry.

Notability edit

There's a proposal for common lexeme notability rules. This section focuses on specifics of Slovak language. It is usually obvious what is and what is not a Slovak word or phrase. Some words and phrases are however better represented by items than lexemes. The rules below clarify ambiguous cases.

Included edit

Excluded edit

Lexeme granularity edit

Lexeme granularity is based on distinction between inflection and derivation on morphological level and between homonymy and polysemy on semantic level. Derivation generates lexemes while inflection generates forms. Homonymy generates lexemes while polysemy generates senses. Literature is clear about which morphological phenomena constitute inflection or derivation, but definition of homonym varies between authors and applications. Since most definitions of homonym take into account relative similarity of meaning, there is always a gray zone of ambiguous cases.

Borderline derivation methods that generate lexemes (obvious cases are not listed):

Semantic phenomena that generate lexemes:

Morphological and semantic phenomena that generate forms or senses:

Handling gray zone between homonymy and polysemy:

  • Sense-dependent noun gender (e.g. kura (L481695) vs. kura (L404168)), including fine-grained masculine gender (e.g. pamätník (L402333) vs. pamätník (L404436)), is a strong indicator that these senses are actually homonyms, but similarity of meaning can weigh against it (e.g. netvor (L410253)).
  • Homonyms are assumed to exist if they are listed in authoritate source (usually one of JÚĽŠ dictionaries).
  • When not sure about homonymy/polysemy, keep the suspected homonyms as senses of one lexeme.

Temporary lexemes that exist as workarounds for technical limitations:

Lexical category edit

Slovak language has 10 basic lexical categories:

  1. noun (Q1084) (includes proper noun (Q147276))
  2. adjective (Q34698)
  3. pronoun (Q36224)
  4. numeral (Q63116)
  5. verb (Q24905)
  6. adverb (Q380057)
  7. preposition (Q4833830)
  8. conjunction (Q36484)
  9. grammatical particle (Q184943)
  10. interjection (Q83034)

These 10 categories (plus proper noun (Q147276)) should be used to classify all regular Slovak lexemes. More fine-grained categorization can be added via instance of (P31) statements. The advantage of instance of (P31) is that it is not exclusive nor exhaustive, allowing categorization along multiple axes as well as partial categorization.

Pronoun category edit

Slovak pronouns include many more words than what the definition of pronoun (Q36224) would lead you to believe. Pronoun scope varies by language. Here's comparison of several Slovak and English words for illustration:

Type of Slovak pronoun Example Lexical category in Wikidata
English Slovak English Slovak
substantive pronoun (substantívne zámeno) nothing (L4317) nič (L245513) pronoun (Q36224) pronoun (Q36224)
adjective pronoun (adjektívne zámeno) such (L248802) taký (L245463) determiner (Q576271) pronoun (Q36224)
adverbial pronoun (príslovkové zámeno) everywhere (L8978) všade (L249269) adverb (Q380057) pronoun (Q36224)
numeral pronoun (číslovkové zámeno) much (L4212) veľa (L245539) determiner (Q576271) pronoun (Q36224)

Slovak pronouns approach pro-form (Q2006180) in scope. There are however several reasons to avoid pro-form and its subclasses as lexical categories:

  1. Slovak zámeno literally translates as pronoun. This suggests it was historically the same concept that just evolved to fit the language.
  2. Slovak language historically used narrower definition of pronoun that was later expanded (per Morfológia slovenského jazyka p. 233).
  3. People readily translate zámeno as pronoun. Nobody knows what's a pro-form.
  4. Term pronoun is used in pronoun section of slovake.eu for all Slovak pronouns including adverbial and numeral ones. Website slovake.eu is coauthored by Slovak language regulator Ľudovít Štúr Institute of Linguistics (Q2368451).
  5. Several Slavic languages have similarly broad pronoun definitions and not one of them uses pro-form (Q2006180) for classification here on Wikidata.

Even though pronoun (Q36224) is used as lexical category, it may still be useful to differentiate pronouns by role (substantive, adjective, adverbial, numeral) using instance of (P31) statements.

Non-word categories edit

In addition to the parts of speech listed above, Slovak lexemes can belong in one of the special lexical categories:

Lemma edit

For inflected lexemes other than verbs, lemma is identical to the form with the following grammatical features (if available):

For verbs, lemma is the infinitive (Q179230) form.

For lexical categories that are not inflected, lemma is the simplest, shortest form (e.g. zas (L250046)).

Statements edit

All applicable properties defined in Wikidata can be used in Slovak lexemes, forms, and senses. This section merely provides overview of the most commonly used properties and classes.

Related lexemes edit

Classes edit

Lexical categories can be further subdivided using instance of (P31) statements.

noun (Q1084) (includes proper noun (Q147276))

adjective (Q34698)

Noun gender edit

Slovak nouns have one of the following grammatical genders assigned via grammatical gender (P5185):

Sources usually only mention the coarse-grained gender (masculine, feminine, neuter) and treat animate/inanimate and personal/impersonal distinction as additional traits of masculine nouns that influence inflection of adjectives as well as inflection of the noun itself. The end result is nevertheless the same as accepting fine-grained gender as defined above. The two views are semantically equivalent in Wikidata. For example, masculine animate non-personal (Q52943193) is identical to combination of masculine (Q499327), animate (Q51927507), and impersonal (Q67372837) via subclass of (P279) relationships. Using masculine animate non-personal (Q52943193) instead of its three superclasses is therefore just a matter of convenience and brevity.

Special cases:

  • Some nouns have ambiguous gender that is shared by all senses (e.g. džínsy (L460043)). In that case, there are multiple grammatical gender (P5185) statements.
  • Some masculine nouns have fine-grained masculine gender specified on sense level (e.g. baran (L465326)).
  • Grammatical gender of lexemes denoting persons is usually identical to natural gender, but there are exceptions (e.g. gorila (L525324)). Exceptions also exist for fine-grained masculine gender (e.g. ježko (L469905)).

Verb aspect edit

Slovak verbs on Wikidata are classified via grammatical aspect (P7486) property as having one of two aspects:

Most Slovak verbs form perfective-imperfective pairs (e.g. písať (L245830) - napísať (L245926)). Some sources use more fine-grained classification.

Exceptions:

Forms edit

Noun forms edit

Slovak nouns have 12 forms, one for every combination of grammatical features:

Exceptions:

Adjective forms edit

If we define one form for all 6 cases, 2 numbers, 5 genders, and 3 comparison degrees, we will end up with 180 forms per adjective. That would be hard to edit and hard to display in Wiktionaries. We will instead try to minimize duplication by carefully choosing sensible combinations of grammatical features.

This adds up to 81 forms per adjective. These forms are largely free of duplication. Although item not masculine personal (Q54152717) is not a gender, it's a commonly used gender group that is allowed as a grammatical feature. We could similarly unify 4 singular cases for masculine and neuter genders to reduce form count down to 69, but such masculine/neuter gender group is not commonly used anywhere. Creating it as a new concept just for Wikidata would make the data harder to use.

Exceptions:

Adverb forms edit

The only grammatical feature of Slovak adverbs is comparison degree:

Exceptions:

  • Many adverbs are not comparable (podvečer (L402432)).
  • Rare few adverbs do not have positive degree (prv (L252020)).
  • Some adverbs have comparative but no superlative (potichu (L402264)). This has nothing to do with attestation. Some superlatives are just inherently meaningless.
  • Adverb naj (L402152) does not have positive degree.

Senses edit

Sense granularity edit

What constitutes separate sense is a difficult theoretical question. This document builds on two core principles:

  1. Truly different meanings (with different etymology) are isolated in homograph lexemes. Senses in single homograph are all related.
  2. Sense is unique if it has (or should have) unique set of statements, especially item for this sense (P5137), translation (P5972), and synonym (P5973).

While this seems straightforward, complexity and fluidity of Wikidata complicates matters a lot.

The following rules are observed in senses of Slovak lexemes:

  • In Slovak, masculine gender doubles as default gender. Nouns denoting persons (e.g. volič (L249216)) have secondary sense denoting male person unless there is no corresponding feminine noun (e.g. predok (L252106)).

Glosses edit

There is no Wikidata-wide consensus on content of sense glosses. Glosses are intended for sense disambiguation, following the same rules as item descriptions (see Help:Description). Most Slovak glosses follow some basic rules:

  • Gloss is not a definition. See below for information about definitions.
  • Gloss has only two hard requirements:
    1. Gloss is unique within the lexeme. If the lexeme has homographs, glosses should be unique across all homographs.
    2. Gloss is informative enough to allow skilled speakers of Slovak to pair glosses with definitions in external dictionaries.
  • Of glosses that meet the above two requirements, the best gloss is usually the shortest one. Ideal gloss length is single word.
  • Glosses in other languages translate Slovak gloss, not the sense itself. For example, if a sense for bažant (L484448) has gloss vták, add English gloss bird, not pheasant. To add translations, use translation (P5972) property or rely on indirect translations via item for this sense (P5137).
  • While glosses in other languages have some use in downstream dictionaries, adding translation (P5972) statements is more important.

Hint: A good way to choose gloss is to reference the class of the corresponding item, e.g. animal for any lexeme naming some animal. If that is not precise enough, combine several such classes, e.g. male animal.

Definitions edit

Wikidata does not store textual definitions. Sense is primarily defined via item for this sense (P5137) while other statements add nuance. Glosses should not be abused to store definitions. Nor should they be abused to add clarifications on top of structured data. Glosses should be however clear enough to identify definition in one of the external sources listed below.

Translations edit

Please do add English translations even for senses that have item for this sense (P5137). Combination of the two will enable accurate triangulation of translations. Translations to other languages are useful only if they can outperform accuracy of automated triangulation.

Translations can be used to semi-automatically inherit information from corresponding senses in other languages: item for this sense (P5137), foreign language glosses, context labels (language style (P6191), field of usage (P9488), etc.), and translations to other languages. Translations, especially English translations, are therefore more valuable than other statements. Doing them first saves time. Addition of symmetric translations can be also mostly automated.

Queries edit

Statistics edit

Available data edit

Missing data edit

Invalid data edit

Resources edit

Language information edit

Corpora edit

Frequency lists edit

Dictionaries edit

Tools edit

Contact edit

Please ping Robert Važan when discussing Slovak lexemes on Wikidata.