Wikidata:Property proposal/hyphenation


Originally proposed at Wikidata:Property proposal/Lexemes

   Done: hyphenation (P5279) (Talk and documentation)
Data typeString
DomainForms on Lexemes
  • example@en → ex‧am‧ple
  • hyphenate@en → hy‧phen‧ate
  • orthography@en → or‧thog‧ra‧phy


See Wikidata talk:Lexicographical data/Archive/2017/08#Syllabification?. Authorities sometimes differ on hyphenation, e.g., dictionary is hyphenated as dic‧tion‧ar‧y according to the WordReference Random House Unabridged Dictionary of American English but as dic‧tio‧nary according to Merriam-Webster. Both variants can be stated with a reference.

Which separator to use? There is U+2027 (HYPHENATION POINT) but the English Wiktionary uses U+00B7 (MIDDLE DOT), the Wikidata model example Leiter (noun, German) uses a hyphen and some dictionaries use a vertical line. In the Russian Wiktionary, a green hyphen is used at break points and a red middle dot at syllable boundaries where a word cannot be hyphenated, e.g., о·гры́-зок or веч-но-дви-га-те-ле-стро·е́-ни·е. In German, there are also break points that do not correspond to syllable boundaries (e.g., Lin‧oleum). Anyway, I think this property should not cover phonological syllabification (for which the IPA transcription can be used, e.g., French /ɛɡ.zɑ̃pl/).

How to handle words spelled with a hyphen like three-dimensional? One can hyphenate between three- and dimensional but no additional hyphen is inserted. After all, we do not hyphenate like this:


But like this:


So what to write? Still three-‧di‧men‧sion‧al or rather three‧di‧men‧sion‧al (compare with the traditional German orthography: it was written Schiffahrt and Zucker but hyphenated Schiff‧fahrt and Zuk‧ker; today it is Schifffahrt and Zu‧cker) or simply three-di‧men‧sion‧al (which would mean that there are two different separators)?

On another note, the German word Interesse, for instance, may be hyphenated as Inter‧esse based on morphology but also as Inte‧resse based on pronunciation. In the German Wiktionary, all break points are indicated in a single occurrence of the spelling, so we have “In·te·r·es·se”. But this may be confusing because r by itself would barely make sense. In Wikidata, we could give two variants instead:

  • In‧te‧res‧se
  • In‧ter‧es‧se

This is rather theoretical because a single word rarely extends over three lines. As a more complex example:

  • Ge‧ri‧a‧trie
  • Ge‧ri‧at‧rie
  • Ger‧ia‧trie
  • Ger‧iat‧rie

(In the German Wiktionary, all break points are economically subsumed into “Ge·r·i·a·t·rie”.)

IvanP (talk) 17:30, 1 June 2018 (UTC)


  •   Support In Czech Wikitionary we use hyphen (-) and use economical variant [(po-lo-os-t-rov) for peninsula (Q34763)]. JAn Dudík (talk) 05:14, 1 June 2018 (UTC)
  •   Support This sounds useful to have. Is it a property of forms, or on the lexeme as a whole? Whatever you decide, you should put the standard form we want in as two or three specific examples. ArthurPSmith (talk) 19:29, 1 June 2018 (UTC)
    • On Forms, so that a hyphenation can be given for forms other than the basic form as well. IvanP (talk) 21:44, 1 June 2018 (UTC)
  •   Comment From the samples, it looks like you want datatype string (or monolingual string), not "Form" (as values). Is that correct? Domain would be the "Forms"-parts of lexemes.
    --- Jura 18:05, 3 June 2018 (UTC)
    • Yes, changed. -- IvanP (talk) 20:22, 3 June 2018 (UTC)
    • Ok. I added the domain above.
      --- Jura 04:08, 4 June 2018 (UTC)
  •   Support Duesentrieb (talk) 13:49, 6 June 2018 (UTC)

@ArthurPSmith, Duesentrieb, Jura1, JAn Dudík, IvanP:   Done: hyphenation (P5279). − Pintoch (talk) 08:34, 9 June 2018 (UTC)