Wikidata:Lexicographical data/Documentation/Lexical categories

The lexical category to which a lexeme belongs is a reference to a Wikidata item for a particular group of words with specific syntactic behavior in a language. This usually corresponds with the "part of speech" of the lexeme: nouns, verbs, adjectives, adverbs, and so on.

Lexical categories and P31

edit

A lexeme's lexical category should be somewhat general, as a broader reflection of how that lexeme behaves syntactically within a language. More fine-grained distinctions, such as plurale tantum (Q138246), weak verb (Q60655), absolute adjective (Q332375), and cardinal numeral (Q1329258)—as specializations of noun (Q1084), verb (Q24905), adjective (Q34698), and numeral (Q63116) respectively—are better added as instance of (P31) values on the lexeme.

Similarly, the lexical category of a lexeme does not need to be re-added as a instance of (P31) statement. (One exception to this is when there is a reference for the statement that a lexeme belongs to a particular lexical category and when that reference is added to that statement.)

Comparison table

edit

Different languages may necessarily use different lexical categories, but some are frequent enough across languages that a comparison may be made. The following table, when expanded, provides examples of lexemes in each language falling into some of the most common lexical categories across Wikidata lexemes.

Sample lexemes by language and lexical category
verb noun pronoun adjective adverb preposition postposition conjunction interjection numeral determiner grammatical particle
Arabic ذَهَبَ (L7882) كِتَاب (L2233) أنا (L7883) جَمِيل (L7884) عادَة (L7885) فِي (L2452) لَكِنَّ (L7886)) يَعْنِي (L7887) واحِد (L7891) هذا (L7892)
English go (L3006) book (L536) I (L487) beautiful (L3360) usually (L4114) in (L2987) ago (L3240) but (L1387) oh (L4327) one (L327) this (L2994)
German wissen (L2058) Zukunft (L80) ich (L7877) ausgezeichnet (L530) querbeet (L7059) in (L6748) aber (L7879) ach (L7889) eins (L7880) dieser (L7881)
Korean 먹다 (L17) 사람 (L130) (L246) 괴롭다 (L100) 함께 (L168) 가만 (L86) / (L83) 고전적/古典的 (L49)
Spanish ir (L7385) libro (L317) yo (L55951) hermoso (L55952) normalmente (L55953) en (L11741) N/A pero (L55954) oh (L692468) uno (L44969) esto (L55955)
French aller (L750) livre (L6873) je (L9094) beau (L7026) toujours (L9105) dans (L9148) mais (L9261) merci (L11618) un (L9167) ce (L9203)
New Persian رفتن/рафтан/raftan (L2921) کتاب/китоб (L226813) من/ман (L2377) زیبا/зебо (L238420) معمولاً/маъмулан (L749792) در/дар (L230487) اما/аммо (L678620) آخ/ох (L749794) یک/як (L303349) این/ин (L742781)
Russian быть (L2111) вода (L189) я (L2027) хороший (L10951) хорошо (L10948) в/въ (L2109) N/A и (L2108) всё (L2115) три (L32930) N/A не (L2110)
Swedish göra (L38963) boll (L32310) han (L35645) listig (L39404) ofta (L35726) (L35650) - och (L35648) hej (L246342) fem (L46944) den (L47066) ju (L53540)
Punjabi ਸਕਣ/سکݨ (L689075) ਡੱਡੂ/ڈڈّو (L678986) ਉਹ/اوہ (L686605) ਕਾਲਾ/کالا (L684186) ਨਹੀਂ/نہیں (L686542) - ਵਿਚ/وِچ (L679728) ਕਿਉਂਕਿ/کیوں کہ (L686369) ਆਹੋ/آہو (L689404) - ਇਕ/اِک (L686328) ਤਾਂ/تاں (L686341)
Italian amare (L5137) casco (L580895) io (L21271) bizzarro (L1199728) amichevolmente (L1155269) con (L7405) N/A o (L2779) ciao (L313550) otto (L5161)