Open main menu

Wikidata:Property proposal/Lexemes

See alsoEdit

This page is for the proposal of new properties.

Before proposing a property

  1. Check if the property already exists by looking at Wikidata:List of properties (research on manual list) and Special:ListProperties.
  2. Check if the property was previously proposed or is on the pending list.
  3. Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
  4. Select the right datatype for the property.
  5. Start writing the documentation based on the preload form below and add it in the appropriate section.

Creating the property

  1. Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
  2. Creation can be done 1 week after the proposal, by a property creator or an administrator.
  3. See steps when creating properties.

  On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2019/09.

LexemeEdit

Etymology

Grammar

FormEdit

variety of formEdit

   Under discussion
Description(qualifier) optional qualifier to link a form to a dialect, or a graphic or phonetic variety
Data typeForm
Domainlexeme / form
Allowed valuesall Q-item which represent a dialect
Example 1ostal (L41768)(L41768-F2) belongs to Languedocien (Q942602) variety
Example 2ostal (L41768)(L41768-F1) belongs to Provençal (Q241243) variety
Example 3color/colour (L1347)(en-uk) belongs to British English (Q7979) variety
Example 4color/colour (L1347)(en) belongs to American English (Q7976) variety

MotivationEdit

In occitan language, as in many other languages, we have words that exists in a dialect and not in another one. We need a property to indicate if a form (or one of its particular sense) exists in one or many variety of our language or not. This can be useful, for instance, if we want to use Wikidata base to build a tool that automatically recognise the dialect of a text, if we want to translate a text (words that exists in two dialects can have a different sense in two of them), if we want to produce a text coherent in its dialectality (for instance, one verb can have different conjugations according to the dialect), if we want to analyse the meaning of a text... and, more generally, to reflect the reality of our language. There are many dialectal languages (arabic for instance) or languages written with many graphical systems (japanese for instance) that will probably, soon or later, encounter the same problem. We have to deal with the variety in our languages, and having a property to do so will help us very much in building tools based on Wikidata. It's particulary true for languages like occitan, for which Wikidata is the only database of its kind. If we can't be able to deal variety when using Wikidata, we won't be able to build our NLP tools.  – The preceding unsigned comment was added by Aitalvivem (talk • contribs) at 12:48, July 18, 2019‎ (UTC).


DiscussionEdit

  •   Comment I think this makes sense, and is analogous to pronunciation variety (P5237) used for the spoken sound. The examples given here though include color/colour (L1347) which is currently being handled with single forms with two "languages" ("en" and "en-GB"). The approach proposed here may make more sense if the list of "languages" is to be limited and not include the varieties needed... I'm not really sure though? ArthurPSmith (talk) 18:21, 18 July 2019 (UTC)
  •   Comment what about using different "languages" to handle such cases. The border between a dialect and a language is often not clear. So I guess some users may add zords in occitan (with its code) and other in occitan varities (with their own code). To avoid mixing we should allow to add words in dialect, considering this dialect as a different language. Items will make the links between language and dialects. Pamputt (talk) 05:25, 20 July 2019 (UTC)
@Pamputt: I think it will be complicate to use different languages because it will force to create new languages every time that it is needed. And, for example, I am not sure that anyone would read a Wikidata page in the Occitan variety "vivaro alpenc" which has just a few speakers left. So I think it would be useless to create a language for it. But we still have to specified the dialect if we add a Lexeme only existing in this variety. So using a claim seems the best way. And a such property could also be used for sub-dialects (and we have plenty of them in Occitan). In Occitan we don't really have a normalized language. A dialect (the Lengadocian) is seen as "standard" (not by everyone, Occitan linguists and speakers are arguing about this all the time) but in terms of numbers of speakers there are others dialect as used as the Lengadocian. A last problem I see concerns the automatic treatment of Lexemes. I am working on this bot our data are comes from different dictionaries and we have no ways to link Lexemes between dictionaries. For example ostal (L41768) exists in two varieties (Lengadocian and Gascon) but we have no way to link the Lexemes which came from the Lengadocian dictionary with the one coming from the Gascon dictionary. The way i found is to create the Lexeme and then add a claim every time I found this Lexeme in an other dialect. But still I am not sure this is the right way to do it. Aitalvivem (talk) 08:55, 30 July 2019 (UTC)

SenseEdit

Siddhaṃ alphabetEdit

Siddham script nameEdit

   Under discussion
DescriptionThe name of this item in Siddham script
RepresentsSiddhaṃ alphabet (Q250379)
Data typeMonolingual text
Allowed valuesany
Example 1Avalokiteśvara (Q193849) → 𑖀𑖪𑖩𑖺𑖎𑖰𑖝𑖸𑖫𑖿𑖪𑖨
Example 2Vairocana (Q239847) → 𑖦𑖮𑖯𑖪𑖹𑖨𑖺𑖓𑖡
Example 3Manjusri (Q471696) → 𑖦𑖗𑖿𑖕𑖲𑖫𑖿𑖨𑖱

MotivationEdit

Siddham script name has a great role in Buddhism, such as the name of Buddha and Bodhisattva. I hope can complete the proposal. 我爱大日如来 (talk) 16:25, 17 August 2019 (UTC)

DiscussionEdit

Popcorndude Nikki SynConlanger Infovarius Finn Årup Nielsen (fnielsen) (talk) Daniel Mietchen (talk) Lore.mazza81   Notified participants of WikiProject Linguistics Visite fortuitement prolongée (talk) 20:26, 20 August 2019 (UTC)

  • @Mahir256, 我爱大日如来: As it might take months/years till this is create, shall we create this property as an intermediary solution and delete it once the language code is available? It should be fairly straightword to convert the text. BTW, if so datatype should probably be string, not "monolingual text". On a side note: @我爱大日如来 is your interested in both sa and pi or just sa? --- Jura 17:15, 25 August 2019 (UTC)
    • I interested in Sanskrit and Pali.--我爱大日如来 (talk) 09:32, 28 August 2019 (UTC)
    • I would not mind if the Siddham script names were added as aliases in Sanskrit or Pali in the interim, since both those languages have been written in various Indic scripts. I'm not a big fan of having this be a stopgap property unless there is precedent for having done this for other scripts (where the property doesn't otherwise serve as an aid like a romanization or other transcription). Mahir256 (talk) 19:22, 25 August 2019 (UTC)

PronunciationEdit