Wikidata:Lexicographical data/Documentation/Languages/zu

language, modern language
Subclass of	Nguni, Zunda
Native label	isiZulu
Country	Lesotho, Mozambique, South Africa
Indigenous to	Gauteng, Mpumalanga, KwaZulu-Natal
Has grammatical case	vocative case, locative case
Has tense	present tense, past tense, future tense
Writing system	Latin script
Language regulatory body	Pan South African Language Board
Ethnologue language status	1 National
Wikimedia language code	zu

This page hosts information about the representation of isiZulu (the Zulu (Q10179) language) lexemes on Wikidata. Note that this page is subject to change as more information from various sources is accumulated and experience gained one what is the most convenient, yet still computationally easily usable, way of representing isiZulu lexical information.

There is also general documentation on adding and updating lexicographic data in Wikidata, including a diagram of the lexeme data model of the key components of an entry and what they mean.

Besides filling this page with content, you also may like to browse the isiZulu Wikipedia or read up on the language in the entry in Wikipedia about the Zulu language, and browse the Wikidata entries that are already linked to isiZulu using the 'What links here' feature.

Lexical category

This list of categories is to be refined at a future date. See further below for guidelines to add more lexemes of the various categories to Wikidata.

Categories for words

noun (Q1084) - List of words in this category

verb (Q24905) - List of words in this category

adjective (Q34698) - List of words in this category
- true adjective (Q65453883) - List of words in this category

adverb (Q380057) - List of words in this category

interjection (Q83034) - List of words in this category

conjunction (Q36484) - List of words in this category

pronoun (Q36224) - List of words in this category

Categories for word parts

prefix (Q134830) - List of words in this category

suffix (Q102047) - List of words in this category

clitic (Q213458) (also called concords) - List of words in this category
- proclitic (Q108819306) - List of words in this category, such as ngi (L731133) (subject concord) and angi (L732280) (negative subject concord)
- enclitic (Q6548647) - List of words in this category

Nouns

As with all Niger-Congo B languages, isiZulu has a system of noun classes where each noun belongs to a noun class. There are different such classification systems. The one used most often is the one based on Meinhof, with extensions up to Canonici (i.e., also having classes 1a, 2a, 3a, and 9a). Alternative ones that one may find are Doke and Grout. To disambiguate, the annotation of a noun with the noun class (on the form, not on the lexeme) should be selected appropriately, such as "noun class 1 (Meinhof)" or "noun class 1 (Doke)". At the time of writing only Meinhof's up to nc17 was added to Wikidata.

The general default is to use the noun in the singular for the lexeme, except when it exists only in the plural form (e.g., amanzi). Then add that again under "Form" and select the grammatical feature to add the appropriate noun class. Since it is clear from the noun class whether it is singular or plural, that need not to be added. Then add the plural form (if applicable) and the noun class it belongs to. The singular/plural pairings of the noun classes for the updated Meinhof list of noun classes are: 1/2, 1a/2a, 3a/2a, 3/4, 5/6, 7/8, 9/10, 9a/6, 11/10, 14, 15, 17. This approach may not meet the full list of requirements for noun class information across NCB languages^[1] in the best way, but seems to work for now.

Especially for new nouns, it may not be settled yet in which noun class it belongs (if not the default 5 or 6) and older documentation will not have 9a but may still list 16. The system allows annotation of a form with more than one noun class and omitting it altogether. The downside of either of those two choices is that it will cause a bug in any computational use of said noun. Therefore, if the data is in disagreement, please add sources and a note of clarification or motivation and hopefully it will be agreed upon before it's needed.

There are different types of nouns. Since most nouns are count nouns, one may adopt a default to state just noun and only annotate other types of nouns explicitly; notably, mass nouns (things that can be counted only in quantities, like water, gold, wood) and collective nouns (collectives of things, such as an electorate, a herd). For examples, see amanzi (L8426) for mass noun entry, umuntu (L37485) as count noun entry, and umphakathi (L700292) for a collective noun entry.

It is recommended to add a sense in some other language of choice to help other people understand what it's about. This is added under "sense". One can also link it to an Q-item in Wikidata with the property item for this sense (P5137).

One optionally may like to add as "Statement" the stem of the noun, using the property word stem (P5187). The elsewhere customary preceding dash should be omitted.

The noun prefixes are productive and added separately; see the section on prefixes below for details.

Word parts: affixes, clitics, and concords

They are useful to add to Wikidata to the extent they are useful for generating whole words, i.e., are productive in some way. For instance, noun prefixes are also used for numbers (e.g., the 'ama' in engama-25), the clitics/concords, such as the subject concord for verb conjugation (e.g., u- to complete the stem -dla), and wh-questions (e.g. -phi). The CARP extensions in verbs, while strictly speaking affixes to the verb root, are not 'interesting' in that regard and would thus not be added separately to Wikidata.

Affixes

The noun prefixes have been added in the same way as described for the concords below; see umu (L689510).

Other productive affixes, such as the wh-questions and locatives, are yet to be added at the time of writing.

Concords

The clitics are better known as concords. There are numerous concords, such as relative, possessive, and adjectival concord. Since they are productive and needed for natural language generation to get Abstract Wikipedia working, they have to be added to Wikidata. Since the lexeme is supposed to be a lexeme and not a name of a concord, a slight workaround is used, as follows, and illustrated in engi (L688517) for the relative concord:

1) pick the first entry from the list, which will be first person singular or noun class 1, and use that string for lexeme name;
2) add all concords for each noun class as a form each;
3) annotate each form with the applicable noun class as 'grammatical feature'.

There may indeed be multiple noun class annotations for a single form; that's fine. It may also be the case that a string is empty; this is still important to know, and then add the emptyset symbol (alike ø) as a form and annotate it with the noun class.

The lexical category of clitic (Q213458) may be refined to proclitic (Q108819306) or enclitic (Q6548647).

Verbs

The verbs have been added by stem so far (not root nor infinitive) and without the dash; e.g., shaya (L677334) and shayela (L677332). Thus for 'eat' one would add 'dla', not -dla, -dl-, or ukudla. Also here it helps to add a sense to it, especially if there are multiple senses for the verb.

It is possible to specialise the type of verb, such as transitive verb (Q1774805) and intransitive verb (Q1166153). This is a statement on the lexeme level (in analogy with the type of noun).

Regarding forms, since isiZulu is highly inflectional, it is deemed not feasible to add all the forms of a verb, but they will be computed on the fly when needed. One may repeat the stem there as form. It is also possible to add particular forms, such as the imperative (then add grammatical feature 'imperative') or the stem under negation (final vowel = i).

One optionally may like to add as "Statement" the verb stem, using the property word stem (P5187) (the preceding dash should be omitted). Note that the verb root, the verb rad, and the verb stem are different things, but currently there are not enough properties in Wikidata for that. Graphically in this figure and textually:

verb stem = verb rad + final vowel
verb rad = verb root + CARP extension (i.e., any of is, el, an, or w)
verb root = the basic verb without the CARP extension

Examples:

bonana (to see each other) is a verb stem, where bonan is the verb rad, a is the final vowel, bon is the verb root, and an is the reciprocative in the extension.
bonisana (to show each other) is a verb stem, where bonisan is the verb rad, a is the final vowel, bon is the verb root, and the extension is made up of both is (causative) and an (reciprocative).
bona (to see) is a verb stem, where bon is the verb rad and a is the final vowel, and since the extensions is empty, bon is also the verb root.

Since we already record by verb stem, word stem (P5187) either can be ignored or repurposed for the root if the extension is not empty.

If you want to add how a verb is composed with the combines lexemes (P5238) property, then use the final vowel lexeme forms a (L740041) and, if applicable, any of the CARP extensions: is (L732948), el (L732943), an (L732945), w (L732952).

Adjectives

In short: Record the stem (without the dash at the start), since it is formed into a word depending on the noun class of the noun it is an adjective of. Example: see, e.g., de (L705896).

More precisely, one may consider there to be two types of adjectives: 1) a small (closed) set of true adjectives (true adjective (Q65453883)) and 'adjectives' that are relatives. The latter used to default to 'relativity' relativity (Q983751) as preferred string in the editing interface but meanwhile a new item has been introduced for these 'adjectives' that are relatives: relative adjective (Q115388551). If you're not sure and the 'adjective' has not already been added as a true adjective in Wikidata (the 14 listed on Wikipedia have been added d.d. 29-11-2022), select relative adjective (Q115388551) as lexical category.

A key reason to choose explicitly either true adjective (Q65453883) or relative adjective (Q115388551) when adding a new lexeme is because in the natural language generation, the former takes the adjectival concord and the latter takes the relative concord.

Adverbs

Record the word. Example: see ngamandla (L705899). To be considered in more detail on what else may need to be recorded.

Statements

Statements are at the lexeme level (see, e.g., unyaka (L686326)). It is not immediately obvious from the interface which are permitted. Here's a selection

instance of (P31) something, such as root (Q111029), derivation (Q728001), or compound (Q245423) etc.
word stem (P5187)
derived from lexeme (P5191) with object form (P5548)
root (P5920) (if instance of (P31) is derivation (Q728001))
combines lexemes (P5238) (if instance of (P31) is compound (Q245423))
homograph lexeme (P5402) with language of work or name (P407)
usage example (P5831) with subject form (P5830) and subject sense (P6072)

Forms

The forms are currently mainly used for annotating the lexeme with, at least, the noun class and as a way to add the list of prefixes and concord. Among the options available, at least the following ones may be of use:

Grammatical features, such as:
- noun class (Q1598075), and more precisely: noun class 1 (Meinhof) (Q113331807), noun class 2 (Meinhof) (Q113380991), noun class 1a (Meinhof) (Q113195674), noun class 2a (Meinhof) (Q113195677), noun class 3a (Meinhof) (Q113195763), noun class 3 (Meinhof) (Q113194639), noun class 4 (Meinhof) (Q113194715), noun class 5 (Meinhof) (Q113383619), noun class 6 (Meinhof) (Q113383630), noun class 7 (Meinhof) (Q113383641), noun class 8 (Meinhof) (Q113383650), noun class 9 (Meinhof) (Q113383660), noun class 10 (Meinhof) (Q113383679), noun class 9a (Meinhof) (Q113383667), noun class 11 (Meinhof) (Q113383687), noun class 14 (Meinhof) (Q113383695), noun class 15 (Meinhof) (Q113383696), noun class 17 (Meinhof) (Q113383703).
- grammatical person (Q690940) (where applicable): first person (Q21714344), second person (Q51929049), third person (Q51929074)
- Grammatical number (optional since implied by the noun class): singular (Q110786) / plural (Q146786)
- animacy (Q1250335) (for nouns, optional): the options are animate (Q51927507) and inanimate (Q51927539)
pronunciation audio (P443)
IPA transcription (P898)

See also the lexical categories about suggestions on recording forms, above, notably for verbs, nouns, and adjectives.

Senses

At least one sense should be added for each lexeme and, ideally, linked to at least one Q item.

Sometimes this is non-trivial or there are no 1:1 mappings to items or other lexemes in Wikidata. One then can add a description in the sense field (see, e.g., ihawozi (L688559)) and leave it at that, or add more senses if sources do not agree. It may also happen that the informal sense description is not exactly the same but sufficiently synonymous with the label of the item (e.g., umfazi (L677330) may be translated as married woman or wife).

The main properties that can be used on a sense are:

Resources

Note: these lists are incomplete and will be extended over time.

Lexicographic data

isiZulu Wiktionary
Free online Zulu-English dictionary isiZulu.net
Selection of isiZulu concords in Wiktionary and more concords
Dent, G.R. and C.L.S. Nyembezi. Scholar's Zulu Dictionary. Pietermaritzburg: Shuter & Shooter. Revised version of 2010
De Schryver, G.-M., et al. Oxford Bilingual School Dictionary: Zulu and English / Isichazamazwi Sesikole: isiZulu - isiNgisi. New (2010) or revised (2015).

Grammar resources

Copulative in isiZulu in Wiktionary
More on nouns in Wiktionary
Resource grammar library for Grammatical Framework RGL for zu

Organisations

Other

M.H. Mpungose (7 September 2012). "Analysis of the Word-Initial Segment with Reference to Lemmatising Zulu Nasal Nouns". Lexikos. 8 (1). doi:10.5788/8-1-945. ISSN 1684-4904. Wikidata Q115513730.
Shamila Naidoo (December 2005), Intrusive stop formation in Zulu: An application of feature geometry theory (PDF), Stellenbosch, hdl:10019.1/1262, Wikidata Q115512634
Shamila Naidoo (January 2002). "The palatalisation process in isiZulu revisited". South African Journal of African Languages. 22 (1): 59–69. doi:10.1080/02572117.2002.10587498. ISSN 0257-2117. Wikidata Q115513685.
Lionel Posthumus (December 2016), A systemized explanation for vowel phoneme change in the inadmissible phonological structure /VV/ in Zulu, Wikidata Q115539250
Peter E. Raper (2012). "Bushman (San) influence on Zulu place names". Acta Academica. hdl:11660/2898. ISSN 0587-2405. Wikidata Q115512771.
Andrew Van der Spuy (26 November 2014). "Bilabial Palatalisation in Zulu: A morphologically conditioned phenomenon". Spil plus. 44 (0): 71. doi:10.5842/44-0-645. ISSN 1726-541X. Wikidata Q115513686.

References

↑ Keet, C. Maria; Khumalo, Langa; Mahlaza, Zola (April 25, 2022). Considerations for a model for NCB noun classes in Wikidata (PDF). WikiWorkshop 2022. online.

[1] Keet, C. Maria; Khumalo, Langa; Mahlaza, Zola (April 25, 2022). Considerations for a model for NCB noun classes in Wikidata (PDF). WikiWorkshop 2022. online.

[1]