Wikidata:Lexicographical data/Documentation/Languages/id

language, standard language, modern language, literary language
Subclass ofMalay Edit
Native labelBahasa Indonesia Edit
Short nameIndonesian Edit
Named afterIndonesia Edit
CountryIndonesia Edit
Indigenous toIndonesia Edit
Coordinate location6°10′30″S 106°49′39″E Edit
Linguistic typologysubject–verb–object, agglutinative language, zero-marking language, noun-adjective, synthetic language Edit
Writing systemLatin script Edit
Language regulatory bodyAgency for Language Development and Cultivation Edit
Ethnologue language status1 National Edit
Studied inindonesiology Edit
Described at URL Edit
Related categoryCategory:Indonesian pronunciation Edit
Wikimedia language codeid Edit

Information about Indonesian lexeme model for Wikidata. You're welcome to edit and contribute to this page. See also other language models for comparison.



Lexical categories


Non-word categories



  • Lemma is the form of the word that is used in the head of dictionary. Indonesian dictionary usually uses root word as a lemma.
  • Going with the definition of id:Leksem from Gunawan, et al. (1948), a lexeme is the smallest unit in a language that represent a concept or a symbol. It doesn't have to be a root word. Many root words in Indonesian are in fact bound morpheme, that have no meaning nor lexical class until it receives affixation (derivation).
  • A Wikidata lexeme of Indonesian language, therefore, is not necessarily root words only, but could also be derived words.
  • Therefore, all these are valid separate lexemes: cinta (L15072) (noun), cinta (L238377) (adjective), mencinta (L31535) (verb), ajar (L31557), pelajar (L6592), pengajar (L6593)
  • Indonesian lemmas should start with lowercase, just like in Wiktionary. (case-sensitive)

Spelling variants

subject of further discussion

The dictionary forms are largely used in written, formal speech, and in encyclopedia like Indonesian Wikipedia. In daily usage, most people would use informal spelling, informal pronunciations, and slangs. These includes informal speech, informal writing (social media, personal writings, etc.), and the varieties differ from region to region, depends and influenced by the many of the 700+ local languages in Indonesia. These variants very seldom enter into dictionaries (usually marked as 'regional'/'local language' words), and some tried to compile them into "kamus gaul" (slang dictionaries), even though many are no longer considered slang, but already very common. Other factors to consider includes: high number of bilingualist/trilingualist, code-switching/code-mixing, and bahasa 'gado-gado'

In case when the orthography require special letter(s) beside the 26 Indonesian alphabets, both orthographies (with and without special letters) should be used in page title.

The following lexemes has at least two spelling variants. The first (proper) variant should be as defined from KBBI, and the second variant is marked as id-x-Q4200642.

Regional languages / variants

subject of further discussion

Need a way to better handle lexemes in 700+ languages of Indonesia. e.g.: lexemes (usually nouns) in multiple languages that have the same sense.

  • Indonesian languages (Brandstetter, Blagden (tr.), 1916)
    • Indonesian (ind)
    • Malayic languages
      • (Historical "Standard Malay"): Riau Malay/Court Malay
      • Indonesian Malay
        • (non-creole Malay): Jambi Malay, Minang, Banjar, etc.
      • Malaysian Malay
        • Malay (mly) (i.e. Standard Malaysian) (zsm), Kedah Malay, Sabah Malay, etc.
      • Singaporean Malay
      • Bruneian Malay
      • (Malayic language outside of the above): Thai Malay (Pattani Malay), Sri Lankan Malay, etc.
      • (Trade and creole Malay): Baba Malay, Betawi, Ambonese Malay, Manado Malay, Papuan Malay, etc.
    • Indonesian non-Malayic languages (excluding foreign languages, Hokkien, other Sinitic, Arabic, Indic, etc.)
      • Languages of Java: Javanese, Sundanese, Madurese, etc.
      • Languages of Kalimantan
      • Languages of Sulawesi
      • Languages of Sumatra (non-Malayic): Aceh, Batak, Nias, Mentawai, etc.
      • Languages of Maluku
      • Languages of Lesser Sunda Islands: Balinese , Lombok, Bima, Tetun, etc.
      • Languages of Indonesian Papua (this region alone have around 270 languages)
    • (Indonesian languages outside of the above): Philippines, Madagascar, Formosa



Dialects: Indonesian dictionaries incorporate a lot of local languages, and they are marked as such in the lemmas, although there are almost no etymology dictionary in Indonesian.


  • ragam bahasa: arkais, percakapan, hormat, kasar, klasik, dll.
  • bidang ilmu: (banyak macamnya)

Homography, homophony, and homonymy


If two similar lexeme share either lemma, pronuncation, or both, they are to be separated and connected to each other via the following statements:



Concept or symbol represented by the lexeme, glosses (in Indonesian language), definitions (meaning): e.g. kucing (L498558)





to be completed

root (Q111029)
Always root word that has no other lexical category (precategorial (Q107000399)). It has no other forms, no sense/gloss. E.g.: bunuh (L31532). Lemma with multiple (ambiguous) root berikan (L574420) - ber+ikan, beri+kan; berilah (L573868) - ber+ilah, beri+lah
noun (Q1084)
the base noun form most of the time could be used to denote singular or plural (Q106644026). It could be a root word, or an affixed word (-an, pe- (an), per- (an), etc.).
singular (Q110786) (tunggal); plural (Q146786) (jamak/bentuk terulang); interrogative (Q12021746) (interogatif/penanya); affirmation and negation (Q3745428) (affirmatif/penegas/penekanan); first-person possessive (Q71470598) (posesif orang pertama); second-person possessive (Q71470837) (posesif orang kedua); third-person possessive (Q71470909) (posesif orang ketiga); and their combinations.
TODO: false reduplication (singular)
adjective (Q34698)
positive (Q3482678) (root), equative case (Q3177653) (se-), superlative (Q1817208) (ter-), excessive (Q1385613) (ke-an)
TODO: reduplication, infix (-em-), foreign suffixes, compound (synonym/antonym) adjective forms, denomynal & deverbal adjectives (pe-, meng-, ber-, ter-)
verb (Q24905)
It could be a root word, or an affixed word. If it's the latter, the base verb form should be: active (Q1317831) (me-/member-/memper-, ber-), or passive (Q1194697) intransitive verb (Q1166153) (ter-)
For active voice (me-), add these forms: passive (Q1194697); transitive case (Q17140008); first-person possessive (Q71470598); second-person possessive (Q71470837); third-person possessive (Q71470909); affirmation and negation (Q3745428); interrogative (Q12021746); and their combinations.
(Indonesia: untuk bentuk me-/memper-/member- (i, kan), tambahkan bentuk pasif di-/ku-/kau-(per) (i, kan), bentuk transitif (per-/ber-) -i/-kan (minus me-), posesif (-ku, -mu, -nya), afirmatif/penegas/penekanan (-lah), dan interogatif/penanya (-kah)
For passive intransitive voice (ter-) (tidak sengaja/selesai dilakukan): TODO
TODO: reduplication verb forms, compound verb forms
Adverb, numerals, etc.



Pernyataan yang diletakkan di Forms:

Old spellings


Some other variants include archaic/classical words (usually from Sanskrit, Arabic, Dutch, Chinese, etc. origin), variants before spelling reforms (several reforms), variants of pronouncing the letter 'e' (schwa or non-schwa), and variants considered incorrect by Great Dictionary of the Indonesian Language (Q4200623) (KBBI). Other things to consider: pronunciations, affixation variants (mempelajari/memelajari, ), preposition variants (di) and prefix (di-), f/p/v variants, swarabakti (-er-/-r-) variants, mem- + [p] and men- + [t] variants, etc.

If the lexeme have multiple spelling variants the most recent orthography should be used (currently based on Great Dictionary of the Indonesian Language (Q4200623)). Spelling variants that are not considered valid anymore can by added in Forms section marked with language code specifying the last orthography reform that considered such variant valid. Here are some examples of language codes that might be used:

cuci (L498556) - verb, kucing (L498558) - noun
cuci, kucing
id-x-Q65205295 Q65205295 (Ejaan Baru, 1967-1972) Ejaan LBK [Lembaga Bahasa dan Kesusastraan], then called "Ejaan Baru"
id-x-Q5378777 Enhanced Indonesian Spelling System (Q5378777) (EYD, 1972-2015) in 1972 Ejaan Baru was codified into EYD [Ejaan Bahasa Indonesia Yang Disempurnakan] with minor revisions in 1987, 2009, and 2015
id-x-Q25470128 Q25470128 (EBI, 2015-now) in 2015 EYD was renamed EBI [Ejaan Bahasa Indonesia]
Note: no need to enter the codes above, they should all be -> id
tjutji, kutjing
id-x-Q7314707 Republican Spelling System (Q7314707) (Ejaan Soewandi/Ejaan Republik)
tjoetji, koetjing
id-x-Q7330819 Van Ophuijsen Spelling System (Q7330819) (Ejaan van Ophuijsen) - Dutch spelling

Currently non-valid spelling variant must always be verified.

Regional variants code:



(the following instructions in Indonesian. Feel free to help translate them)


  1. tambahkan importScript( 'User:Bennylin/jsonLexeme.js' ); ke common.js pribadi (misalnya
  2. di bilah sebelah kiri akan muncul dua tombol baru: Buat Leksem dan Sunting Leksem
  3. Klik Buat Leksem, lalu masukkan kode JSON, klik Buat. 😎
  4. (Kode JSON bisa didapat dengan memanggil modul Lexeme-id, dan menyuplai parameter yang sesuai. Lihat dokumentasi di
  5. Sunting Leksem mirip dengan Buat Leksem, tapi perlu memasukkan ID Leksem yang ingin disunting. Atau kalau sedang berada di halaman Leksem tertentu ketika mengeklik "Sunting Leksem", maka ID Leksem akan otomatis terisi

List of lexemes

Click [expand] to view the content

Lexemes, forms and senses

Statistic Number Query link Date
Lexemes 20,150 [1] 12:59, 30 June 2024 (UTC)
Forms 412,524 [2] 12:59, 30 June 2024 (UTC)
Senses 615 [3] 12:59, 30 June 2024 (UTC)

By lexical categories and affixation


Verba: ~15000 leksem

(Q24905 verbaquery)
12976 hasil 08:22, 29 June 2024 (UTC)

Nomina: ~43000 leksem

(Q1084 nominaquery)
7067 hasil 08:22, 29 June 2024 (UTC)

Adjektiva: ~5000 leksem

(Q34698 adjektivaquery)
184 hasil 08:22, 29 June 2024 (UTC)

Adverbia: ~300 leksem

3 hasil 08:22, 29 June 2024 (UTC)
(Q380057 adverbiaquery)

Partikula: ~200 leksem

3 hasil 08:22, 29 June 2024 (UTC)
(Q184943 partikulaquery)

Numeralia: ~200 leksem

1 hasil 08:22, 29 June 2024 (UTC)
(Q63116 numeraliaquery)

Pronomina: ~50 leksem

4 hasil 08:22, 29 June 2024 (UTC)
(Q36224 pronominaquery)

Lainnya: interogativa 45, preposisi 45, konjungsi 24, interjeksi 7, artikula 6

(Q2304610 kata tanya (interrogative) ← query): 0 hasil 08:22, 29 June 2024 (UTC)
(Q4833830 preposisi (preposition) ← query): 5 hasil 08:22, 29 June 2024 (UTC)
(Q36484 konjungsi (conjuction) ← query): 5 hasil 08:22, 29 June 2024 (UTC)
(Q83034 interjeksi (interjection) ← query): 2 hasil 08:22, 29 June 2024 (UTC)
(Q103184 artikula (article) ← query): 1 hasil 08:22, 29 June 2024 (UTC)
(Q1867204 kata tugasquery): 0 hasil 08:22, 29 June 2024 (UTC)
(Q3916780 kata bantu bilanganquery): 0 hasil 08:22, 29 June 2024 (UTC)
(Q63153 kata penggolong (classifier) ← query): 5 hasil 08:22, 29 June 2024 (UTC)
(Q35102 peribahasa (proverb) ← query): 4 hasil 08:22, 29 June 2024 (UTC)
    • wd:L498540 masuk angin
    • wd:L1119282 pucuk dicinta, ulam tiba
    • wd:L1119283 patah tongkat berjeremang
    • wd:L1120358 tong kosong nyaring bunyinya
(Q9788 huruf (letter) ← query): 1 hasil 08:22, 29 June 2024 (UTC)
    • wd:L498537 a
(Q102786 singkatan (abbreviation) ← query): 1 hasil 08:22, 29 June 2024 (UTC)
    • wd:L498542 AD
(Q101244 akronim (acronym) ← query): 1 hasil 08:22, 29 June 2024 (UTC)
    • wd:L498543 angkot


Longest lemmas

Longest words with affixation (without reduplication/phrasal affixation)
Lihat pula



Indonesian nouns (noun (Q1084))

SELECT ?l ?lemma WHERE {
  ?l a ontolex:LexicalEntry ; dct:language wd:Q9240 ; wikibase:lexicalCategory wd:Q1084 ; wikibase:lemma ?lemma .
Try it!

Indonesian verbs (verb (Q24905))

SELECT ?l ?lemma WHERE {
  ?l a ontolex:LexicalEntry ; dct:language wd:Q9240 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
Try it!

Indonesian adjectives (adjective (Q34698))

SELECT ?l ?lemma WHERE {
  ?l a ontolex:LexicalEntry ; dct:language wd:Q9240 ; wikibase:lexicalCategory wd:Q34698 ; wikibase:lemma ?lemma .
Try it!

Get all existing Indonesian lexemes


The following query uses these:

  • Items: Indonesian (Q9240)     
    SELECT ?lexeme ?lemma ?category ?categoryLabel WHERE {
      ?lexeme dct:language wd:Q9240; 
              wikibase:lemma ?lemma;
              wikibase:lexicalCategory ?category;
              wikibase:lemma [].
      FILTER(LANG(?lemma) = "id")
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],ml". }
    ORDER BY ?categoryLabel ?lemma
    LIMIT 100
Ordered by newest to oldest creation time

The following query uses these:

  • Items: Indonesian (Q9240)     
    SELECT ?lno ?lexeme ?lemma ?category ?categoryLabel WHERE {
      ?lexeme dct:language wd:Q9240; 
              wikibase:lemma ?lemma;
              wikibase:lexicalCategory ?category .
      FILTER(LANG(?lemma) = "id")
      BIND(xsd:integer(substr(str(?lexeme), 33)) as ?lno)
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],ml". }
    ORDER BY DESC(?lno)
    LIMIT 100
Uses date last modified

The following query uses these:

  • Items: Indonesian (Q9240)     
    SELECT ?lexeme ?lemma ?modified
    WHERE {
       ?lexeme dct:language wd:Q9240; wikibase:lemma ?lemma; schema:dateModified ?modified.
    ORDER BY DESC(?modified)
    LIMIT 100

Get the count of lexemes in Indonesian belonging to different lexical categories


The following query uses these:

  • Items: Indonesian (Q9240)     
    SELECT ?category ?categoryLabel (count(?category) as ?count) WHERE {
      ?lexeme dct:language wd:Q9240; 
              wikibase:lexicalCategory ?category;
              wikibase:lemma [].
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],ml". }
    GROUP BY ?category ?categoryLabel
    ORDER BY ?count




Main page: WD:Lexicographical data/Documentation/Resources § Indonesia

See also
