Wikidata:Property proposal/character in this lexeme

character in this lexeme edit

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
Descriptioncharacter(s) this lexeme consists of
Representscuneiform sign (Q23017336)
Data typeItem
Domainlexeme, form
Example 1ga/𒂷 (L726974)𒂷 (Q87555355)
Example 2dumu/𒌉 (L643788)𒌉 (Q87556519)
Example 3dingir/𒀭 (L724542)𒀭 (Q87555087)
Planned useLinking lexemes to character representations
See alsoHan character in this lexeme (P5425) which links Han Chinese characters in Japanese and Chinese lexemes to Unicode, https://www.wikidata.org/wiki/Wikidata:Property_proposal/Cuneiform_character_in_this_lexeme for the previous discussion on the property "cuneiform character in this lexeme"

Motivation edit

Currently, we lack a property in Wikidata to link lexeme representations to QIDs of characters of a given script. The examples above show how to link cuneiform lexemes to their character QIDs which represent their Unicode code points, but the property can be used to link any lexeme to relevant parts of the script it uses. Already, the property Han character in this lexeme (P5425) allows to link Han Chinese characters in Chinese, Japanese, and Vietnamese to their respective representations in Wikidata. This property proposal wants to generalize this property Han character in this lexeme (P5425) as "character in this lexeme" or let this anticipated property become a super property of Han character in this lexeme (P5425).

See also the discussion of https://www.wikidata.org/wiki/Wikidata:Property_proposal/Cuneiform_character_in_this_lexeme which led to the creation of this property proposal instead.

  Support seems fine to me, either as a superproperty or replacement. ArthurPSmith (talk) 20:06, 4 May 2023 (UTC)[reply]
  Support this is an excellent idea, since the more general use of 'character' will allow for a number of languages with the same issue to progress in morpho-graphemic annotation, including: Sumerian, Akkadian, Hittite, Hurrian, Ugaritic, Elamite, Old Persian, just to name a few. Admndrsn (talk) 9:11, 8 May 2023 (EST)
  Comment Would this also mean that it could be used to say rød (L2310) character in this lexeme Ø (Q28827) and D (Q9884)Finn Årup Nielsen (fnielsen) (talk) 18:45, 8 May 2023 (UTC)[reply]
Yes, you could also use it for the Latin alphabet and in the example you proposed, even though I see people use this more in langauges like Chinese or Cuneiform where the individual characters often express its own meaning.
But why not? You could query all lexemes with Ø (Q28827) if that is interesting to do. Situxx (talk) 13:35, 9 May 2023 (UTC)[reply]
You already can do that: See this query for example. - Nikki (talk) 20:03, 24 June 2023 (UTC)[reply]
@Fnielsen:, would you like to give your opinion? Regards, ZI Jony (Talk) 06:14, 24 January 2024 (UTC)[reply]
  Support will help and advance digital cuneiform studies Enki75 (talk) 11 May 2023
  Strong oppose Having a generic property like this is a really bad idea. Linking characters in lexemes to the corresponding items can easily be done automatically, so there should be a really good reason to add links manually instead. For Han character in this lexeme (P5425), that is because items for Han characters have useful lexicographical data on them which would otherwise end up duplicated as lexemes. My opposition to the previous proposal was because items for Cuneiform characters do not have useful lexicographical data on them, and that is still the case, looking at the items in the examples.
If this is added, people will surely start mass-adding it for every lexeme eventually. We are already having problems with the query service because of the amount of data, and adding millions more statements linking every character in a lexeme would only cause more problems for us. - Nikki (talk) 20:03, 24 June 2023 (UTC)[reply]
Here's a simple script I just made to list the characters in a lexeme automatically: User:Nikki/LexemeLinkCharacters.js. - Nikki (talk) 21:27, 24 June 2023 (UTC)[reply]
Hi!
Thanks for your comment.
In the previous proposal, you wrote about the following properties as examples which would constitute the lexicographical data you are missing in the cuneiform examples:
I quote: "(e.g. stroke count (P5205), grade of kanji (P5277), radical (P5280), ideographic description sequences (P5753))"
All of this information can be added, we are just lacking properties for that as well, hence you only see information which can be added right now with the properties we have, which are the following:
- stroke count: Is currently proposed here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gottstein_code
- radicals are currently represented using "has parts" relations (at least for Unicode signs which allow so) (see here for an example https://www.wikidata.org/wiki/Q87555001)
- depicts relations describe what the character depicts (which is often different from the sense of the Lexemes using the character)
- dictionary references to signlists
You can find an example in this web application which would also illustrate the main use case I have in mind: https://situx.github.io/paleordia/c/?q=Q87554995&qLabel=%F0%92%80%80
The script you posted is certainly useful to link from a Lexeme to its characters, but my usecase is actually the opposite.
I would for example like to know which Lexemes contain a cuneiform sign. Unless I have missed a better solution, the SPARQL query to achieve this would need a set of languages written in the cuneiform script and check the lemmas (maybe also forms) of all of these languages with regex matching.
It is also on the homepage and runs only over Sumerian, but is already quite slow (https://situx.github.io/paleordia/c/?q=Q87554995&qLabel=%F0%92%80%80).
If we had a property like the one proposed here I think it would be easier to query for the lexemes which fit a cuneiform sign or whatever other sign in other languages for that matter.
Finally, there is the issue of paleographic sign variants:
There might be certain Lexemes which are only written with sign variants of a specific shape.
You can look at the sign AN here https://situx.github.io/paleordia/c/?q=Q87555087&qLabel=%F0%92%80%AD
which looks different depending on the time period.
We currently cannot express that as well, but we have the information and will gradually add them to Wikidata as a prototype of a digital paleography. Situxx (talk) 11:38, 26 June 2023 (UTC)[reply]
@Nikki:, any changes in your opinion based on the response. Regards, ZI Jony (Talk) 06:14, 24 January 2024 (UTC)[reply]
  Oppose the current (far too general) property name, but would be   Neutral, now that phonetic value (P12436) is being used, if it were restricted to cuneiform symbols. Mahir256 (talk) 14:15, 6 March 2024 (UTC)[reply]