Property talk:P5279

Latest comment: 4 years ago by Yurik in topic if a word can't be hyphenated…

Documentation

hyphenation
positions where a word can be hyphenated
[create Create a translatable help page (preferably in English) for this property to be included here]
Scope is as main value (Q54828448): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P5279#Scope, SPARQL
Allowed entity types are Wikibase form (Q54285143): the property may only be used on a certain entity type (Help)
List of violations of this constraint: Database reports/Constraint violations/P5279#Entity types, hourly updated report
Format “[^‧](.*[^‧])?: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P5279#Format, SPARQL

Hyphenation character edit

Did we ever get to a consensus regarding which character to use as an indicator for hyphenation before this property was created? — Finn Årup Nielsen (fnielsen) (talk) 16:38, 11 June 2018 (UTC)Reply

If you want to change it later, Template:Autofix should work.
--- Jura 19:08, 12 June 2018 (UTC)
Reply
Actually, probably not yet. As I don't think Lexemes are supported.
--- Jura 18:34, 13 June 2018 (UTC)Reply

I have used "‧" see Lexeme:L2955. It is a pain to type in. The vertical bar, "|" would be easier. That character is used, e.g., in Retskrivningsordbogen (Q3398246), see https://dsn.dk/?retskriv=penge. — Finn Årup Nielsen (fnielsen) (talk) 13:09, 14 June 2018 (UTC)Reply

I have used "•" see Lexeme:L2977, but I agree vertical bar would be easier. KaMan (talk) 11:40, 15 June 2018 (UTC)Reply
@Fnielsen, KaMan: IMHO, ‧ U+2027 HYPHENATION POINT is the best Unicode character to use. The problem of data entry could be solved with tooling, I think – perhaps a user script that replaces | with ‧ when typing values for hyphenation (P5279)? --Lucas Werkmeister (talk) 13:18, 19 June 2018 (UTC)Reply
perhaps a user script like this one? :) --Lucas Werkmeister (talk) 13:42, 19 June 2018 (UTC)Reply
@Lucas Werkmeister: generally this script works fine but sometimes it fails to replace last character in sequence. Second try helps. KaMan (talk) 10:01, 21 June 2018 (UTC)Reply
@KaMan: hm, I’m not sure why that would happen to be honest (and it seems to be working for me)… are there any errors in the browser console? --Lucas Werkmeister (talk) 12:35, 21 June 2018 (UTC)Reply
@Lucas Werkmeister:, No error in console. You can look at my try at shoulder (L3541). Chrome 67.0.3396.87. KaMan (talk) 12:48, 21 June 2018 (UTC)Reply

if a word can't be hyphenated… edit

…because it has only one syllable like ask (L53). Should we set hyphenation (P5279) to no value or ask (no hyphenation marks)? --Shisma (talk) 16:39, 15 September 2018 (UTC)Reply

@Shisma: Good question. I am perhaps leaning towards the use of novalue. I might be easier to validate and query. For instance, if you want to now whether the form can be hyphenated it is a question of whether the triple "?form wdt:P5279 ?hyphenation" is there or not. It seems to be easier than to query after the hyphenation literal and then examining whether there is a "‧" character. In terms of validation with ShEx it seems that we can do it with "a [ wdno:P5279 ] | ps:P5279 /.+‧.+/ ;" as I am currently doing at [1]. — Finn Årup Nielsen (fnielsen) (talk) 14:45, 7 June 2019 (UTC)Reply

@Shisma:, @Fnielsen:: if there is no P5279, it usually means no-one has added it yet. We need to distinguish between that and non-hypenatable word. I am leaning towards adding the word as is. Also, I don't think there is that much value in searching for non-hypenatable words (if anything, people could search for words that have only one syllable). --Yurik (talk) 21:43, 11 August 2019 (UTC)Reply

@Shisma:, @Fnielsen:, @Yurik: does this mean that it is easier to search for words with only one syllable if we choose Yuriks suggestion (e.g. check the value for the hyphen dot, if none it is one syllable), if the value is null we don't know? It is less typing work for us to input no value but from what I understand this cannot be done via quickstatements (if it ever start supporting lexemes it would be nice to be able to add hyphenation through it also)--So9q (talk) 22:47, 25 November 2019 (UTC)Reply
any kind of "find me words with just a single syllable" (or any other specific count) is always going to be slow linear search - because you have to go through all the matching lexemes, and do a regex matching on them. If we want to optimize for that use case (which I doubt is very useful or common), we would store the number of syllables as a separate property/facet. As for typing -- typing should not be the priority when it sacrifices clarity/usability of the data. Quickstatements should be fixed/improved, and I'm sure we can work on it soon enough to get all the needed lexeme support into it. There is some rumored basic support of lexemes already added, hopefully there will be more soon enough. --Yurik (talk) 23:10, 25 November 2019 (UTC)Reply
I like the idea of being able to easily list lexeme forms according to number of syllables. I also doubt its usefulness, but hey I can't think of all the wonderful ways of using our data, so lets go ahead and add a property for indicating it. Will you do it? We could surely create some kind of game helping people to fill i this data. Kids love to clap words so we could make a word clap game that enables you to randomize the number of claps (maybe that is more fun/challenging to play than a game that does not know the number of claps needed for the form). We could even make a game that listens to the users microphone for claps and store the number as "number of syllables".--So9q (talk) 12:10, 26 November 2019 (UTC)Reply
We could also add a property of how long the word is. And how many vowels it has. And a flag if the word has more than one consonant one after another. But we shouldn't, unless there is a very direct and practical reason to add that data. Adding easily compute-able values to wikidata makes it into a giant unmanageable pile of junk rather than a useful human-curated data source. Lets limit to what is actually non-trivial data, and let code do the rest, but not store it together with something humans will be looking at. --Yurik (talk) 19:53, 11 December 2019 (UTC)Reply

hyphenation vs syllables edit

My understanding is that most of the time hyphenation can happen at the syllable boundaries, except that you can't hyphenate with just a single character being on one line (at least that's the Russian rules, and might be similar in English?). So should this property be storing just the breaking boundaries, or should it store syllables, and assume no-single letter rule is done by the data consumer?
P.S. what about words with dashes in them - should there be a separation symbol before and after the dash? --Yurik (talk) 20:56, 6 August 2019 (UTC)Reply

I suppose there might be different rules depending on the language. In Danish, there are even two main rules of hyphenation. Hyphenation in connection a dash would always (I think) be after the dash in Danish. In principle you can have one letter on one line in Danish, e.g., "æ-ble" [2]. In Danish, the syllable boundary does not necessarily fall together with the hyphenation (dependent on what you call a syllable), e.g., æb-le or "æ-ble" is ok [3]. — Finn Årup Nielsen (fnielsen) (talk) 20:14, 11 August 2019 (UTC)Reply
@Fnielsen: there might be different rules per language, but can we assume that no language needs to store both the hyphenation and the syllables as part of the word info? Or are they too different and both need to be stored? --Yurik (talk) 21:45, 11 August 2019 (UTC)Reply
Probably different. I suppose English has strange (to Russian reader) rules of hyphenation and they don't correspond to syllabification too. By the way, in Russian there can be several hyphenations in one word (ду-блет и дуб-лет) while syllables are more unique. --Infovarius (talk) 09:39, 13 August 2019 (UTC)Reply
Return to "P5279" page.