Wikidata:Property proposal/GlyphWiki ID

GlyphWiki ID edit

Originally proposed at Wikidata:Property proposal/Generic

   Done: GlyphWiki ID (P5467) (Talk and documentation)
Descriptionidentifier of GlyphWiki, in which glyphs of Han characters are managed
RepresentsGlyphWiki (Q28684738)
Data typeExternal identifier
Domaininstance of sinogram (Q53764738)
Example 1(Q4025820)u4e00-j
Example 2(Q54872914)ufa47 (writing system (P282): traditional Chinese characters (Q178528), kyūjitai (Q1147857))
Example 3(Q54872914)u6f22-j (writing system (P282): shinjitai (Q1055887))
Example 4(Q54879255)ufa38 (writing system (P282): traditional Chinese characters (Q178528), kyūjitai (Q1147857))
Example 5(Q54879255)u5668 (writing system (P282): shinjitai (Q1055887))
Sourcehttp://en.glyphwiki.org/wiki/GlyphWiki:MainPage
Number of IDs in sourcemore than 310,000
Expected completenessalways incomplete (Q21873886)
Formatter URLhttp://en.glyphwiki.org/wiki/$1

Motivation

Recently we have created stroke count (P5205) and radical (P5280), and now we are trying to represent Han characters as items. However, glyphs of Han characters are sometimes different by writing systems, by countries, or by fonts. GlyphWiki is a wiki of glyph (Q36975), and every one can use freely(, although public domain license is not explicitly presented). By linking to this site, other people will be able check which glyph the stroke count (P5205) or radical (P5280) claim describes. --Okkn (talk) 07:18, 6 July 2018 (UTC)[reply]

Discussion

@KevinUp: Do you have any thoughts? --Okkn (talk) 16:38, 6 July 2018 (UTC)[reply]

  Support At the moment we are unable to represent the glyph shape of Han characters in data form, hence a property is needed for it. Although alternatives such as a system using Ideographic Description Characters exist, where Han characters can be broken down and represented by their constituent characters, this system is prone to errors and not all characters can be represented by it. By linking to GlyphWiki, users can check the composition of each character and visually identify the difference between glyphs used in different regions. KevinUp (talk) 18:03, 6 July 2018 (UTC)[reply]

@KevinUp, ديفيد عادل وهبة خليل 2, Okkn:   Done: GlyphWiki ID (P5467). − Pintoch (talk) 07:48, 17 July 2018 (UTC)[reply]

@Pintoch: Thanks for creating this new property. Now we can start using it. KevinUp (talk)
@Okkn: It would be good if we could mark identical glyphs, eg. (Q3595028): Group 1: "u96e8-j", "u96e8-k", "u96e8-g" and Group 2: "u96e8-t", "u96e8-h", "u96e8-v". But I'm not sure if that is possible. KevinUp (talk) 11:21, 24 July 2018 (UTC)[reply]
@KevinUp: u96e8-k and u96e8-g are aliases of “u96e8-j”. Also, -t, -h and -v are aliases of “koseki-478690”. I think we should only store “u96e8-j” and “koseki-478690”. That’s what I meant in the project: https://www.wikidata.org/w/index.php?title=Wikidata:WikiProject_CJKV_character&diff=713401420&oldid=713399030 . Both “u96e8-j” and “koseki-478690” can have three qualifiers of applies to jurisdiction (P1001) so as to specify where those glyphs are used. This solution may achieve what you want to do. --Okkn (talk) 19:19, 24 July 2018 (UTC)[reply]
@Okkn: I think it would be better to use regional codes for the property value (even if the glyphs are the same) as GlyphWiki is still a work in progress. The regional suffixes are much more stable compared to "koseki-478690” which might change if GlyphWiki users found out that there is a typing mistake in the registration number. On the other hand, I think using applies to jurisdiction (P1001) to specify multiple regions for the same glyph is slightly risky as people might assume that the glyphs are the same without referring to the actual glyph form in the Unicode charts. For example, u64e7-t (which is used in Taiwan (Q865)) is slightly different from "u64e7-h" (not yet available) used in Hong Kong (Q8646). KevinUp (talk) 10:23, 26 July 2018 (UTC)[reply]
@KevinUp: The purpose of this property is to specify the glyph forms. IDs with regional suffixes is not stable, because, just like you say, they can be totally changed, by replacing the alias mapping. Also, storing IDs with regional codes, instead of actual glyph ID, is no more useful than simple IDs (without regional codes, like "u96e8"), because each suffixes are obviously corresponding to regions. We are not the database for storing what regional IDs GlyphWiki has for each Unicode coepoints. If glyph in Taiwan (Q865) and that in Hong Kong (Q8646) are different, they should be linked to different GlyphWiki ID, I think. --Okkn (talk) 11:00, 26 July 2018 (UTC)[reply]