Property proposal:	Generic	Authority control	Person	Organization
	Creative work	Place	Sports	Sister projects
	Transportation	Natural science	Computing	Lexeme

Wikibase lexeme

Javanese (language) register

Under discussion

Description	suggest the relationship between similar Javanese lexemes, between its various registers (social variants), mainly ngoko (Q12500634) register (plain Javanese), krama (Q12492493) register (high/polite Javanese), and madya (Q13091955) register (middle Javanese)
Data type	Lexeme
Domain	lexeme senses, in particular forms with spelling alternatives
Example 1	kowé/kowe/ꦏꦺꦴꦮꦺ (L2328) "ngoko" register and sampéyan/sampeyan/ꦱꦩ꧀ꦥꦺꦪꦤ꧀ (L1322036) "krama" register both means "you", but have different social register, where the former is considered casual, and the latter more formal and polite. For reference, please see the online Javanese dictionary in https://www.sastra.org/leksikon (make sure to tick "kata utuh" checkbox when searching to exclude partial matches). For more information regarding this ngoko/krama, see the introduction in this Javanese-English dictionary: https://www.sastra.org/bahasa-dan-budaya/kamus-dan-leksikon/1703-javanese-english-dictionary-horne-1974-1968, especially section 4.1. Organization of the Entries, and 5. SOCIAL STYLES. See also: en.wp, https://jv.wiktionary.org/wiki/Wikisastra:Tabel_krama-ngoko jv.wikt
Example 2	(update 18 August) gunung/ꦒꦸꦤꦸꦁ (L680638) (ngoko), redi/rêdi/ꦉꦢꦶ (L45622) (krama)
Example 3	(update 18 August) endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183) (ngoko), sirah/ꦱꦶꦫꦃ (L999025) (krama), mastaka/ꦩꦱ꧀ꦠꦏ (L413863) (krama inggil)

Motivation

I'm planning to add more Javanese lexeme, but there are many words with different registers, and using synonym (P5973) is not correct, because although they have different meaning, but they have different usage, and also there are many synonyms within the same registers (for example, "you" have 4 or more synonyms in "ngoko", and 3 or more different words in "krama"). Using a dedicated property would enable to search and query the relationship between different registers. As you can ses from the links provided above, the relationship between these registers are not one-to-one, and while "ngoko" form is considered the default, not all "ngoko" have "krama" equivalent (only about 1000 without affixation, much more with affixation), much less "madya" and other register ("krama inggil", etc.) and some "krama" are equivalent to several "ngoko", because they are not true "synonym" equivalent, but rather substitutions words for different social context. Therefore this property should support multiple relationships. For example:

"you"

ngoko: kowe, (synyonym: ko'ên, kohên, kowên)
madya: samang, andika, (synyonym: dika)
krama: sampeyan, (synyonym: bênampeyan, bênangpeyan)
krama inggil: panjênêngan, (synyonym: nandalêm, paduka)

"to say, to tell"

ngoko: kandha, (synyonym: clathu, ngomong, kêcap, wara, gotèk, cluluk, wuwus, etc.)
krama: criyos, sanjang, (synyonym: sajang, wicantên, etc.)
krama andhap: matur
krama inggil: andika, ngêndika, (synyonym: unandika)
kawi: angling

(things related to hand / "tangan")

ngoko: tangan, krama inggil: asta, simple noun, but the verbs get complicated:
krama inggil: ngasta (ng- + asta) serve as substitutions for ngoko: 1 nyambut gawe (to work), 2 nggawa (to bring, take, carry), 3 nandang (to do), 4 nyekel (to hold, grasp, to handle), 5 mulang (to teach)

Bennylin (talk) 18:23, 9 August 2024 (UTC)[reply]

Update 18 August

Just to make it clearer, on behalf of Javanese speakers, we would like to request 5 new properties:

ngoko variations (see ngoko (Q12500634)
madya variations (see madya (Q13091955)
krama variations (see krama (Q12492493)
krama inggil variations (see krama inggil word (Q16893583)
krama andhap variations (see krama andhap word (Q66724909)

The first and foremost reasoning is that most Javanese dictionaries (monolingual, bilingual jv-id, jv-en, jv-nl) separate Javanese lexemes into mainly these 5 registers and link to their counterparts seamlessly. Secondly, the current available property (synonym (P5973)) doesn't fit our need for specific-linking from one lexeme to another - besides, synonymy in Javanese is called dasanama (lit. ten names), instead of register (Jv: unggah-ungguh) - and in the future I believe using these 5 new properties would make it much easier to "transform" words, phrases, sentences from one register to another (e.g. via WikiFunctions or other tools).

I've given in the form above two new examples:

mountain: gunung/ꦒꦸꦤꦸꦁ (L680638) (ngoko), redi/rêdi/ꦉꦢꦶ (L45622) (krama)
- L680638-S1, instead of having property "synonym: L45622-S1", should instead have property "krama variations: L45622-S1"
- Likewise L45622-S1, instead of having property "synonym: L680638-S1", should instead have property "ngoko variations: L680638-S1"
- both lexemes could have the following synonyms: ancala, indra, endra, ancala, ardi; ardya, arga, asalingga, awukir, aldaka, hyang parwata, imandri, himawan, himawat, nala, cala, dri, tambana, wanawasa, wukir, wukira, parsa of parswa, parasu, parswa = paraswa, praswa, parwaka, par(of pwar)wata, prawata, parja, pradesa, pra(of prê)bata, par(of pêr)bata, par(of pêr)bwata, par(of pêr)byata, padaka, jambangan, mahahimawan, mahendra, mèru, malaya, gana, gunungan, giri, gori, girindra, girinata, gorata, giriwara, gêgêr, basulingga, byata, ngasrama. These all means "mountain" in Javanese language
head: endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183) (ngoko), sirah/ꦱꦶꦫꦃ (L999025) (krama), mastaka/ꦩꦱ꧀ꦠꦏ (L413863) (krama inggil)
- endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S1) and S3 (ngoko), should have "krama variations: L999025-S1" only, while
- endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2) (ngoko), should have "krama variations: L999025-S1", and "krama inggil variations: L413863-S1", while
- sirah/ꦱꦶꦫꦃ (L999025-S1) (krama), should have "ngoko variations: L413183-S1, S2, S3", and "krama inggil variations: L413863-S1", and
- sirah/ꦱꦶꦫꦃ (L999025-S2) (ngoko and krama), has no other variations
- mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1) (krama inggil) should have "ngoko variations: L413183-S2", and "krama variations: L999025-S1"
- mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S2) and S3 (ngoko and krama), has no other variations
- all three lexemes could have the following synonyms: utamăngga, hulu, cêngêl, rajawèni, katumangga, katumăngga, kapala, kumba, têndhas, swa, sidhira, pasuhunan, murda, mukyana. All of them means "head"

Discussion

Support Thersetya2021 (talk) 14:39, 13 August 2024 (UTC)[reply]
Support Empat Tilda (talk) 01:20, 14 August 2024 (UTC)[reply]
Support Alfiyah Rizzy Afdiquni (talk) 04:41, 14 August 2024 (UTC)[reply]
Comment What's wrong with using something like language style (P6191) or variety of lexeme, form or sense (P7481) for this purpose? (Korean suffixes currently mark the register in which they are used with the former of these properties.) Mahir256 (talk) 16:53, 14 August 2024 (UTC)[reply]
I don't think you get what I mean, so I am going to give another example later. Meanwhile could you give the link for said Korean suffixes, and preferably lexemes? Bennylin (talk) 12:03, 18 August 2024 (UTC)[reply]
@Bennylin: There are a number of registers used in Korean, such as hasoseo-che (Q115744995), hapsyo-che (Q115744896), haeyo-che (Q115744904), and hae-che (Q115744915), where each is named for the verb meaning 'to do' in that language with the appropriate suffix used for indicative sentences in that language. The interrogative suffixes 나이까 (L749506), ᆸ니까 (L749614), and ᆯ까 (L1346003), to give examples of specific lexemes, have the same meaning(s) but differ only in the register used. More generally, though, it is not clear from this proposal why register differences between vocabulary items (especially register differences within a single language) should be treated differently from other stylistic differences between words in other languages with the same meaning (and indeed, the property 'language style', usable with a lot of language styles broadly construed, has at least five aliases containing the word 'register' in it) when an application (such as Ninai/Udiron and its deployment as Elemwala) can filter for senses in a language with particular language styles without requiring specialized links for them. Mahir256 (talk) 21:48, 20 August 2024 (UTC)[reply]
Give a simple query each for these questions:
What is the krama (Q12492493) for endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2)?

What is the krama inggil word (Q16893583) for sirah/ꦱꦶꦫꦃ (L999025-S1)?

What is the ngoko (Q12500634) and krama (Q12492493) for mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1)?

Bennylin (talk) 10:35, 22 August 2024 (UTC)[reply]
@Bennylin: Here are queries for endhas in the krama register, sirah in the krama inggil register, and mastaka belonging to both the ngoko and krama registers. Mahir256 (talk) 15:24, 22 August 2024 (UTC)[reply]

They're incorrect

The krama variant for endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2) is only one: sirah/ꦱꦶꦫꦃ (L999025-S1). The rest of them, while they have the krama register, are not the krama _for_ endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2).
The ngoko variant for mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1) is only one: endhas/êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀ (L413183-S2). The rest of them, while they have the ngoko register, are not the ngoko _for_ mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1).

So, you see, many synonym of endhas/sirah/mastaka (head) have the register ngoko, krama, or both, but none of them are paired as _the_ register variant to the triplet endhas/sirah/mastaka. Therefore we need dedicated properties to store these values. Most have one-to-one relations, while some rarely have one-to-two or two-to-one, but never one-to-many. Bennylin (talk) 11:04, 23 August 2024 (UTC)[reply]

@Mahir256, would you like to give your opinion? Regards, ZI Jony ^(Talk) 18:31, 16 September 2024 (UTC)[reply]

Wikibase form

Wikibase sense

Other

has kanji reading

Under discussion

Description	phonetic reading or pronunciation of the kanji
Data type	String
Domain	instances of sinogram (Q17300291)
Example 1	四 (Q3594955)→よん Qualifiers subject lexeme (P6254)→四/よん (L625228) sinogram reading pattern (P5244)→kun'yomi (Q1147749)
Example 2	四 (Q3594955)→シ Qualifiers subject lexeme (P6254)→四/し (L641752) sinogram reading pattern (P5244)→on'yomi (Q718498)
Example 3	海 (Q3594998)→うみ Qualifiers subject lexeme (P6254)→海/うみ (L5120) sinogram reading pattern (P5244)→kun'yomi (Q1147749)
Example 4	海 (Q3594998)→カイ Qualifiers subject lexeme (P6254)→no value sinogram reading pattern (P5244)→on'yomi (Q718498)
See also	sinogram reading pattern (P5244)

Motivation

In japanese, chinese characters can be read as different vocalisations. With lexemes we currently only cover those sounds that make up actual words. See the examples 四/よん (L625228) and 四/し (L641752) where forms that use the kanji have a sinogram reading pattern (P5244) statement.

Sometimes however, readings don't make up real words but are merely affixes that can be used in compounds. We currently clutter these readings under a lexeme, that happens to have the same Kanji representation. But those usually have a different ethymology and external ids that don't apply to the reading. These readings also sometimes don't share the same senses.

I want to split all these lexemes, so that every lexemes only represents a single reading. Those readings that do not constitute words would be deleted in the process, but I'd strive to preserve those. And I think the sinogram entity is the right place for that. –Shisma (talk) 10:00, 27 August 2024 (UTC)[reply]

I'm merely interested in, but am not a speaker of japanese. If I said something horribly wrong here, please correct me. –Shisma (talk) 10:18, 27 August 2024 (UTC)[reply]

should we transliterate on'yomi readings to katakana? – Shisma (talk) 11:45, 27 August 2024 (UTC)[reply]

Indeed, in kanji dictionaries published in Japan, on'yomi (Q718498) readings are usually written in katakana (Q82946). --Okkn (talk) 01:37, 28 August 2024 (UTC)[reply]

updated –Shisma (talk) 14:14, 28 August 2024 (UTC)[reply]

Discussion

@Duesentrieb, Afaz, Was a bee, Deryck Chan, NMaia, Okkn: pinging everybody involved with the proposal of sinogram reading pattern (P5244) –Shisma (talk) 10:16, 27 August 2024 (UTC)[reply]

Notified participants of WikiProject Japan

Support Some users and I had previously tried to do something similar using name in kana (P1814) (Query), but I think this proposed property is much better. Since the proposed property is limited at this time to use for Japanese kanji, I am only concerned about the confusion that might arise from a generic name "has reading". --Okkn (talk) 01:28, 28 August 2024 (UTC)[reply]
I assumed it could be used for other languages that use sinograms, like say Korean and Vietnamese (?). I just didn't mention it because I know nothing about it. I aggree that the name is too vague. Let's update it to sinogram has reading? – Shisma (talk) 07:22, 28 August 2024 (UTC)[reply]
- I think it would make sense to limit it to Japanese for now, until there's been some discussion about whether/how other languages should use this. We already have Vietnamese reading (P5625) for Vietnamese and Hangul pronunciation (P5537) for Korean. For Chinese, we don't have the language code cmn for Mandarin yet and we would need to decide whether we should be using properties like Hanyu Pinyin transliteration (P1721) and Jyutping transliteration (P9311) instead. - Nikki (talk) 14:09, 3 September 2024 (UTC)[reply]
  I wasn't aware of these. I suggest the label should then be changed to Japanese reading and the field can be of the string type rather then multilingual – Shisma (talk) 14:34, 3 September 2024 (UTC)[reply]
  The sinogram used in Japan is called kanji (kanji (Q82772)) in Japanese language. How about the label "has kanji reading"? --Okkn (talk) 14:44, 3 September 2024 (UTC)[reply]
  
  Incidentally, sinogram reading pattern (P5244) was also initially intended to apply only to Japanese kanji, so the original label was "reading pattern of kanji". https://www.wikidata.org/w/index.php?title=Property:P5244&oldid=690306551 --Okkn (talk) 14:58, 3 September 2024 (UTC)[reply]
  I'm also fine with reading pattern of kanji 😅 – Shisma (talk) 15:30, 3 September 2024 (UTC)[reply]
I've added a link to this proposal on Wikidata talk:WikiProject CJKV character since this seems relevant to that wikiproject too.
I don't think we should use subject lexeme (P6254) as a qualifier. We already have Han character in this lexeme (P5425) which links in the other direction (which is used on compounds too, but it is easy to determine whether a lemma only contains one character) and we try to avoid modelling things in ways that require linking in both directions, because it creates redundant data that's difficult to maintain.
It would make sense to allow it as a qualifier of Han character in this lexeme (P5425) on lexemes too, to replace transliteration or transcription (P2440) (e.g. on 姉妹/しまい (L406337)).
- Nikki (talk) 14:09, 3 September 2024 (UTC)[reply]
Since one sinogram item can have multiple "has_reading" property values, I wonder if it would be difficult to identify it from the opposite direction unless the lexeme corresponding to the value is explicitly indicated in some way. Also, the information on sinogram reading pattern (P5244) as a qualifier is also redundant with the information on the corresponding lexeme, but if the qualifier is not used, the Wikidata cannot have this information unless the lexeme exists (Not all sinogram readings are worthy of lexeme), so the method proposed by Shisma seems to be better after all. --Okkn (talk) 14:38, 3 September 2024 (UTC)[reply]

also, there are cases where same word can be written with different Kanji (like 綺麗/きれい/キレイ (L1234276)): It is not a 1:1 relationship. The subject lexeme (P6254) qualifier only makes sense if the reading by itself is a lexeme. – Shisma (talk) 15:20, 3 September 2024 (UTC)[reply]

I updated the type and description in accordance with this discussion –Shisma (talk) 09:17, 11 September 2024 (UTC)[reply]
@Nikki and @Okkn, would you like to give your opinions? Regards, ZI Jony ^(Talk) 18:55, 16 September 2024 (UTC)[reply]
I agree with the proposal as is. --Okkn (talk) 00:17, 17 September 2024 (UTC)[reply]

Wikidata:Property proposal/Lexemes

See also

Contents

Wikibase lexeme

Javanese (language) register

Motivation

Update 18 August

Discussion

Wikibase form

Wikibase sense

Other

has kanji reading

Motivation

Discussion