Wikidata:WikiProject Toki Pona

Toki pona.svg

The goal of this WikiProject is to build a complete representation of all lexemes in the constructed language toki pona (Q36846), which use the lexeme data structure as exemplary as possible. This would probably be the first (and last) language represented in Wikidata in its entirety! :)

Toki Pona is interesting because words often can be used in different lexical categories, have a lot of different senses, and never have multiple forms.

Looking at the dataEdit

Example lexemes: toki (L220792), pona (L220753)

You can look at the current status quo using these sites and queries:

Simple (incomplete) noun dictionaries, using the item for this sense (P5137) linking:




  • A lot of Toki Pona words are used in multiple lexical categories, and thus, are represented as separate lexemes. Because of that, all properties of their forms have to be replicated across these lexemes, leading to a lot of redundancy. When one form is edited, the others are not kept in sync. Can we solve that problem somehow?


  • Can we enable the language shorthand "tok" in Wikidata? Currently, we're using "mis-x-Q36846", like it's done for ama/𒂼 (L1). Question asked here: Wikidata_talk:Lexicographical_data#Process_for_adding_a_new_language_code?   Not done
    • Seems to be feasible – I opened a ticket. blinry (talk) 18:14, 30 October 2019 (UTC)
      • The Language Committee already answered, but declined the request. :( blinry (talk) 09:51, 31 October 2019 (UTC)
  • Should non-noun senses have a item for this sense (P5137)? Asked here: Wikidata_talk:Lexicographical_data#Best_practices
    • There's currently no consensus on this. blinry (talk) 18:14, 30 October 2019 (UTC)
  • How could we add sitelen pona (or other scripts) to the lexemes?
  • Is preverb (Q1552433) the correct lexical category for lexemes like kama (L220663)? The official Toki Pona book lists it as "pre-verb", but maybe we should create a custom category for these words?
  • Is adjective (Q34698) the correct lexical category for lexemes like pona (L220753)? All adjectives in Toki Pona can be used as adverbs, essentially. Should grammatical modifier (Q732699) be used instead?
  • Should we split transitive and intransitive meanings of verbs?
    • It seems to be common practice to use the broadest possible category. blinry (talk) 18:14, 30 October 2019 (UTC)
    • Intransitive verbs are nearly indistinguishable from modifiers (In the book, she calls them both adjectives)
  • Because parts of speech for content words are flexible in toki pona, should we get rid of the standard lexical categories we find in English and other languages such as noun, adjective, verb, etc. and replace them with a more general lexical category like content word (Q789016)? It would also help the problem with multiple entries for one word with different lexical categories.
  • From what I read so far, Toki Pona words match exactly their corresponding IPA representation. Is there any risk of adding IPA transcription (P898) to all lexemes in bulk? -- Pedropaulovc (talk) 05:06, 2 November 2019 (UTC)
    • I see no problems with that! Wasn't aware that words exactly denote their IPA, that's pretty cool! But this article confirms that this is in fact the "norm pronounciation". Thanks! :) blinry (talk) 11:11, 3 November 2019 (UTC)

Potential applicationsEdit

  • Create Anki decks from the data (later maybe even for sitelen pona?).
  • Render pretty dictionaries in many languages.

Lexical categoriesEdit

These are the lexical categories currently used for Toki Pona lexemes:


These are the most important properties for lexemes: