Wikidata:Property proposal/plural forms

plural forms

Originally proposed at Wikidata:Property proposal/Natural science

Not done

Description	stores the string used by GNU Gettext (Q937302) and compatible tools to describe simply how many plural forms a language has and what ranges of numbers each covers
Represents	grammatical number (Q104083)
Data type	String
Domain	item, language (Q34770)/dialect (Q33384)/language variety (Q3329375)
Allowed values	`nplurals=[number here]; plural=[string with particular format described at https://gnu.org/software/gettext/manual/html_node/Plural-forms.html#FOOT5]`
Example 1	Arabic (Q13955) →`nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);`^[1]
Example 2	Montenegrin (Q8821) → `nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10`
Example 3	English (Q1860) → `nplurals=2; plural=(n != 1);`
Example 4	Indonesian (Q9240) → `nplurals=1; plural=0;`
Source	https://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html?id=l10n/pluralforms https://help.launchpad.net/Translations/PluralForms / https://translations.launchpad.net/+languages https://developer.mozilla.org/en/docs/Localization_and_Plurals https://unicode.org/cldr/data/charts/supplemental/language_plural_rules.html / https://unicode.org/cldr/trac/browser/trunk/common/supplemental/plurals.xml
Planned use	add languages listed in the source tables
Number of IDs in source	~150
Expected completeness	always incomplete (Q21873886)
Wikidata project	WikiProject Linguistics (Q10857957)

Motivation

These strings are important for software localisation, but resources online are scattered; this seems to be a good fit for Wikidata's mission of being a central repository of such information. Arlo Barnes (talk) 21:35, 10 June 2022 (UTC)[reply]

Discussion

user:Arlo Barnes user:Daniel Mietchen Finn Årup Nielsen (fnielsen) user:Infovarius user:Lore.mazza81 user:Middle river exports user:Nikki user:Popcorndude user:SM5POR user:SynConlanger
Notified participants of WikiProject Linguistics Arlo Barnes (talk) 21:35, 10 June 2022 (UTC)[reply]

Comment @Arlo Barnes: I like the idea a lot but the proposal need to be totally reworked (so technically Oppose as it is). First, the string datatype feels very bad, it's obscure and prone to mistakes. Also, it too depeendant on one system (GNU) where we should have a more general and neutral solution (for instance, this solution seems to ignore decimal number, "1.5" is followed by a singular in French but by a plural in English for instance, it also ignores gender and other grammatical agreement). Finally, it seems more to be something for Wikifunctions than for Wikidata. That said, on the Wikidata side, I see that we don't have a property "has grammatical number", like we do for has grammatical gender (P5109) (and so many others ; or am I missing something?). PS: as a breton speaker this whole system feels very funny :D as we don't agree on number after number (be do we agree on gender after number and low numbers also cause mutation) Cheers, VIGNERON (talk) 16:58, 11 June 2022 (UTC)[reply]

Comment I’m curious about embedding a mini language inside the statement’s value. Have you considered modeling it as multiple values that are items of the class grammatical number (Q104083)? – Minh Nguyễn ^💬 17:02, 11 June 2022 (UTC)[reply]

The solutions presented in yall's comments (using Wikifunctions once it becomes ready, using a multiplex of statements) are certainly more elegant, but I still think having the unparsed string stored has utility, because it means someone can look up the language and copy and paste the whole slug into their localisation software (or better, the software can look it up by itself). Perhaps as a qualifier to a more semantically-specified statement? Arlo Barnes (talk) 17:17, 11 June 2022 (UTC)[reply]

There are multiple common formats for this information. For example, CLDR uses the XML-based LDML format. Would it be possible to model plural rules with enough structure that either gettext or LDML could be generated from it, using an ordinary SPARQL query, no need for Wikifunctions? I would like to be able to state, for example, that Vietnamese (Q9199) has one form according to some sources but two forms according to others. But if a format only used by some sources only tells part of the story, are we responsible for translating the format used by other sources into gettext format? At a glance, I'm not sure that the LDML format can be converted losslessly into gettext format in every case, though maybe it won't matter for any of the 150 initial occurrences proposed above. CLDR is also considering additional attributes to be applied at a higher level than the condition. Minh Nguyễn ^💬 23:36, 11 June 2022 (UTC)[reply]

I like your line of thinking here where a SPARQL query could yield a variety of formats, but surely translating from the gettext string to a series of statements is equivalent effort to converting from other formats into gettext where possible -- the human entering the data still has to be able to read two formats, the source and whatever we're storing it as. The advantage of using an existing format is that those can sometimes be the same and so modulo a 'stated in' reference it can just be entered verbatim. I'm ambivalent as to which system might be of best advantage in such a situation, although if there are incompatibilities then the most expressive one would be preferable of course. If nothing suits, then I guess a Wikidata-internal system might well do to try to maximize flexibility. This would be equivalent to informally specifying a new format in RDF, if I'm not mistaken. Arlo Barnes (talk) 01:10, 12 June 2022 (UTC)[reply]

Comments from a Wikimedian on the WD:Discord:

Wifey: [for the property name] Something verbose but precise would work I think. Like "GNU Gettext formatter string for plural forms"... The more clear it is from the label alone what you are supposed to put in it, the better (since realistically people are going to see it in autocomplete before they see the documentation). I would also suggest changing "eventually complete (Q21873974)" to "always incomplete (Q21873886)". There's no complete list of every language to date, so unless there's a very finite set of them which can have this property it's unlikely to have a complete set (further complicating this is the possibility of dialectal variations in plural form).

Just dropping this here: betawiki:plural#Plural syntax in MediaWiki Arlo Barnes (talk) 18:54, 27 February 2023 (UTC)[reply]
Oppose there's no good reason to store this as strings. The information should be stored in items that are easily understandable. ChristianKl ❪✉❫ 12:51, 27 March 2023 (UTC)[reply]

References

↑ http://wiki.arabeyes.org/Plural_Forms

Not done, no consensus of proposed property at this time based on the above discussion. Regards, ZI Jony ^(Talk) 10:47, 2 June 2023 (UTC)[reply]

[1] ttp://wiki.arabeyes.org/Plural_Forms

[1]