Property talk:P487

Latest comment: 2 years ago by Theknightwho in topic Limit to special Unicode-character items only?

Documentation

Unicode character
Unicode character representing the item
DescriptionUnicode character of the item
RepresentsUnicode character (Q29654788)
Associated itemUnicode Consortium (Q1572774)
Data typeString
Template parametertbd
DomainAll items that an Unicode character represents (note: this should be moved to the property statements)
Allowed values
According to this template: Unicode symbol
According to statements in the property:
.{1,2}.{0,6}
When possible, data should only be stored as statements
Exampleſ (Q484140)ſ
da capo (Q1138573)𝄊
Latin cross (Q200674)
Vesta (Q178710)
basketball (Q5372)🏀
flag of Friesland (Q1004161)🏴󠁮󠁬󠁦󠁲󠁿
SourceUnicode specification (note: this information should be moved to a property statement; use property source website for the property (P1896))
Formatter URLhttps://util.unicode.org/UnicodeJsps/character.jsp?a=$1
Tracking: usageCategory:Pages using Wikidata property P487 (Q26250051)
See alsoUnicode code point (P4213), Unicode range (P5949), Unicode block (P5522)
Lists
Proposal discussionProposal discussion
Current uses
Total164,090
Main statement164,000>99.9% of uses
Qualifier90<0.1% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Format “.{0,6}: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P487#Format, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P487#Entity types
Scope is as main value (Q54828448): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P487#Scope, SPARQL
 
This property is being used by:

Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.)

  Pattern ^([Uu]\+)?([A-F0-9]{4,5})$ will be automatically replaced to \2 and moved to Unicode code point (P4213) property.
Testing: TODO list

Constraints edit

As Wikidata:Database reports/Constraint violations/P487 shows both constraints do not make sense. Infovarius (talk) 08:45, 15 June 2013 (UTC)Reply

Format violations edit

I don't see how all these items violate the format "." - could it be that the regex is interpreted falsely and the bot doesn't allow arbitrary unicode characters for '.'? --DSGalaktos (talk) 18:50, 25 June 2013 (UTC)Reply

Apparently, this got fixed. --DSGalaktos (talk) 21:34, 5 October 2013 (UTC)Reply

Dutch translation edit

The Dutch translation of this property is wrong, both in spelling and in wording. I think it should be 'Unicode-symbool'. Can I change this? Bever (talk) 01:14, 25 July 2013 (UTC)Reply

External identifier edit

Isn't this an identifier for the item within the Unicode character table? --- Jura 10:46, 14 October 2015 (UTC)Reply

More format violations edit

Where is the format actually defined? I added some links to unicode characters which got flagged as format violations. They actually use 8 bytes to encode which is unusual but valid UTF8. – Jberkel (talk) 23:01, 15 August 2016 (UTC)Reply

Distinct values constraint bug edit

It seems that something is broken in the distinct values constraint check. For example, the articles for these similar Cyrillic characters (Q5809477, Q4914472, Q5299849, Q6901498) variably tag each other as violating the distinct values constraint, even though they visibly have distinct values. I suspect something is up with unicode support in the software, but have no idea what would cause this bug. Mathmitch7 (talk) 18:27, 3 August 2018 (UTC)Reply

I would imagine that this is caused by Unicode equivalence, which means that it's this property which needs to be implemented in a different way rather than something inherently wrong with the distinct values constraint. If the values are simply entered as a character, any comparison will find equivalent characters - even if they have different codepoints. This is an inherent aspect of the Wikimedia software, as far as I know, as 99% of the time it's a desirable feature. If you put ꙮ, Ꙫ, Ꙭ or Ꙩ into Ctrl+F, you'll notice that your browser considers them equivalent, too. My inclination is that it would help if we could enter the relevant HTML entity directly into the page source, which here would be &#xa66e;, &#xa66a;, &#xa66c; and &#xa668; (respectively). Even though they'll display as the actual characters, any code accessing the page source to do a comparison should hopefully be comparing the entity strings (and therefore should be perceiving them as different). Not sure how this could be implemented on Wikidata, however. Theknightwho (talk) 22:34, 15 November 2021 (UTC)Reply

Multi-character strings edit

I am about to remove [1] – a string of six code points. Incnis Mrsi (talk) 07:15, 13 July 2019 (UTC)Reply

[2] wrong. Control characters—those having General_Category = Cx—are counted separately. These are modifiers—General_Category = Mx—which are appended to the preceding character. Incnis Mrsi (talk) 18:25, 14 July 2019 (UTC)Reply

Formatter URL edit

Broke when the Consortium updated their sites, now checking if there's a replacement link. Arlo Barnes (talk) 03:43, 3 June 2020 (UTC)Reply

Limit to special Unicode-character items only? edit

User:Theknightwho has redefined the constraints so that the property is now obviously expected to be used only on specialized items representing Unicode characters (e.g. (Q87526993) or (Q87505717)). While I can see some logic in that, and even agree it might be useful,

  1. It contradicts the original proposal and even the examples shown in the property definition.
  2. There are thousands of other uses of the property currently (e.g. Mars (Q111)Unicode character (P487)♂︎) which are correct according to the original definition.
  3. We have no good property to replace these uses currently, IIANM. (I.e. being able to state “there is a Unicode character representing this concept” (“Mars can be represented by ♂︎”) is useful. If this usage would be outlawed, how are we supposed to represent that?)

So, I am reverting the newly added restrictions, at least pending further discussion. --Mormegil (talk) 15:30, 24 November 2021 (UTC)Reply

Mormegil So I agree this does need discussion, and I probably did jump the gun. My perspective is:
  1. This property should only be used on Unicode characters.
  2. We should use properties such as depicted by (P1299), notation (P913) or icon (P2910), depending on which is the most applicable, with a link to the relevant item (in this case (Q87526785). This therefore also covers symbols which aren't yet encoded in the standard.
As an aside, the items for Unicode characters are currently a bit misleading, as they're not supposed to be specific to Unicode. For example, (Q87526785) is not "the male sign as encoded in Unicode". It's "the male sign", and any other information associated with that character should also be added, whether or not it's specific to Unicode (e.g. background info, other encodings etc.). The descriptions seem to be a quirk of the fact that whoever did the batch upload just gave all of them the description "Unicode character". There is a long road ahead of item merges, but that's not a conversation for here. Theknightwho (talk) 15:58, 9 December 2021 (UTC)Reply
OK, the redefinition seems quite logical to me (even though, process-wise… it’s interesting the usage would go so far from the original definition… never mind). icon (P2910) has the datatype “Commons media file”, i.e. it is used for image files, so that is obviously out. depicted by (P1299) is used for creative works depicting the subject, e.g. Daniel (Q171724)depicted by (P1299)Book of Daniel (Q80115) (its values are restricted to work (Q386724), artwork series (Q15709879), performing artist (Q713200), artistic theme (Q1406161), caricature (Q482919)), so I’d say that does not fit very well. On the other hand, notation (P913) seems to be quite apt, even though the current description (“mathematical notation or another symbol”) seems to be a bit mathematically inclined (also, it’s an instance of Wikidata property related to mathematics (Q22988631)); but still, just a small tweak of the description and it would be a good fit, I’d say (probably discuss there first?).
So, the best way forward would be 1. agree on P913 to be used for the current usage and tweak its definition, 2. migrate all current non-canonical usage to P913, 3. change the constraints here?
--Mormegil (talk) 16:38, 13 December 2021 (UTC)Reply
So I mostly agree, but I actually think that depicted by (P1299) does apply in a few instances. For example, the character exclamation mark (Q166764) (an exclamation mark) is depicted by the characters (Q87527388), (Q87527394), ! (Q87544533), (Q87544151) and (Q87544008), as they are all specific types of exclamation mark. This is conceptually different from notation (P913), which I agree needs to be broadened to mean "symbolically represented by" (but that's a really awkward way of putting it). Theknightwho (talk) 17:25, 13 December 2021 (UTC)Reply
Return to "P487" page.