Wikidata:Property proposal/Lexemes

Property proposal: Generic Authority control Person Organization
Creative work Place Sports Sister projects
Transportation Natural science Computing Lexeme

See also Edit

This page is for the proposal of new properties.

Before proposing a property

  1. Search if the property already exists.
  2. Search if the property has already been proposed.
  3. Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
  4. Select the right datatype for the property.
  5. Read Wikidata:Creating a property proposal for guidelines you should follow when proposing your property.
  6. Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.

Creating the property

  1. Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
  2. Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
  3. See property creation policy.

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2023/10.

Wikibase lexeme Edit

character in this lexeme Edit

   Under discussion
Descriptioncharacter(s) this lexeme consists of
Representscuneiform sign (Q23017336)
Data typeItem
Domainlexeme, form
Example 1ga/𒂷 (L726974)𒂷 (Q87555355)
Example 2dumu/𒌉 (L643788)𒌉 (Q87556519)
Example 3dingir/𒀭 (L724542)𒀭 (Q87555087)
Planned useLinking lexemes to character representations
See alsoHan character in this lexeme (P5425) which links Han Chinese characters in Japanese and Chinese lexemes to Unicode, https://www.wikidata.org/wiki/Wikidata:Property_proposal/Cuneiform_character_in_this_lexeme for the previous discussion on the property "cuneiform character in this lexeme"

Motivation Edit

Currently, we lack a property in Wikidata to link lexeme representations to QIDs of characters of a given script. The examples above show how to link cuneiform lexemes to their character QIDs which represent their Unicode code points, but the property can be used to link any lexeme to relevant parts of the script it uses. Already, the property Han character in this lexeme (P5425) allows to link Han Chinese characters in Chinese, Japanese, and Vietnamese to their respective representations in Wikidata. This property proposal wants to generalize this property Han character in this lexeme (P5425) as "character in this lexeme" or let this anticipated property become a super property of Han character in this lexeme (P5425).

See also the discussion of https://www.wikidata.org/wiki/Wikidata:Property_proposal/Cuneiform_character_in_this_lexeme which led to the creation of this property proposal instead.

 Support seems fine to me, either as a superproperty or replacement. ArthurPSmith (talk) 20:06, 4 May 2023 (UTC)Reply[reply]
 Support this is an excellent idea, since the more general use of 'character' will allow for a number of languages with the same issue to progress in morpho-graphemic annotation, including: Sumerian, Akkadian, Hittite, Hurrian, Ugaritic, Elamite, Old Persian, just to name a few. Admndrsn (talk) 9:11, 8 May 2023 (EST)
 Comment Would this also mean that it could be used to say rød (L2310) character in this lexeme Ø (Q28827) and D (Q9884)Finn Årup Nielsen (fnielsen) (talk) 18:45, 8 May 2023 (UTC)Reply[reply]
Yes, you could also use it for the Latin alphabet and in the example you proposed, even though I see people use this more in langauges like Chinese or Cuneiform where the individual characters often express its own meaning.
But why not? You could query all lexemes with Ø (Q28827) if that is interesting to do. Situxx (talk) 13:35, 9 May 2023 (UTC)Reply[reply]
You already can do that: See this query for example. - Nikki (talk) 20:03, 24 June 2023 (UTC)Reply[reply]
 Support will help and advance digital cuneiform studies Enki75 (talk) 11 May 2023
 Strong oppose Having a generic property like this is a really bad idea. Linking characters in lexemes to the corresponding items can easily be done automatically, so there should be a really good reason to add links manually instead. For Han character in this lexeme (P5425), that is because items for Han characters have useful lexicographical data on them which would otherwise end up duplicated as lexemes. My opposition to the previous proposal was because items for Cuneiform characters do not have useful lexicographical data on them, and that is still the case, looking at the items in the examples.
If this is added, people will surely start mass-adding it for every lexeme eventually. We are already having problems with the query service because of the amount of data, and adding millions more statements linking every character in a lexeme would only cause more problems for us. - Nikki (talk) 20:03, 24 June 2023 (UTC)Reply[reply]
Here's a simple script I just made to list the characters in a lexeme automatically: User:Nikki/LexemeLinkCharacters.js. - Nikki (talk) 21:27, 24 June 2023 (UTC)Reply[reply]
Hi!
Thanks for your comment.
In the previous proposal, you wrote about the following properties as examples which would constitute the lexicographical data you are missing in the cuneiform examples:
I quote: "(e.g. stroke count (P5205), grade of kanji (P5277), radical (P5280), ideographic description sequences (P5753))"
All of this information can be added, we are just lacking properties for that as well, hence you only see information which can be added right now with the properties we have, which are the following:
- stroke count: Is currently proposed here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gottstein_code
- radicals are currently represented using "has parts" relations (at least for Unicode signs which allow so) (see here for an example https://www.wikidata.org/wiki/Q87555001)
- depicts relations describe what the character depicts (which is often different from the sense of the Lexemes using the character)
- dictionary references to signlists
You can find an example in this web application which would also illustrate the main use case I have in mind: https://situx.github.io/paleordia/c/?q=Q87554995&qLabel=%F0%92%80%80
The script you posted is certainly useful to link from a Lexeme to its characters, but my usecase is actually the opposite.
I would for example like to know which Lexemes contain a cuneiform sign. Unless I have missed a better solution, the SPARQL query to achieve this would need a set of languages written in the cuneiform script and check the lemmas (maybe also forms) of all of these languages with regex matching.
It is also on the homepage and runs only over Sumerian, but is already quite slow (https://situx.github.io/paleordia/c/?q=Q87554995&qLabel=%F0%92%80%80).
If we had a property like the one proposed here I think it would be easier to query for the lexemes which fit a cuneiform sign or whatever other sign in other languages for that matter.
Finally, there is the issue of paleographic sign variants:
There might be certain Lexemes which are only written with sign variants of a specific shape.
You can look at the sign AN here https://situx.github.io/paleordia/c/?q=Q87555087&qLabel=%F0%92%80%AD
which looks different depending on the time period.
We currently cannot express that as well, but we have the information and will gradually add them to Wikidata as a prototype of a digital paleography. Situxx (talk) 11:38, 26 June 2023 (UTC)Reply[reply]

use with article Edit

   Under discussion
DescriptionThe term (usually proper noun) is usually used with the specific article, or novalue if article should not be used (by default, any articles may be used if appropriate)
Data typeLexeme
Domainlexeme, or sense, or form, if only apply to some sense(s) or form(s)
Example 1United States (L43377)the (L2768)
Example 2Scotland (L254526) → novalue
Example 3Louvre (L749827)le (L2770)
Example 4Mississippi (L447503-S2)the (L2768)
Example 5Palouse (L749829-S2)the (L2768)
Example 6Carolinas (L254532-F2)the (L2768)

GZWDer (talk) 20:44, 20 December 2022 (UTC)Reply[reply]

Discussion Edit

  •  Support. I added a French example. UWashPrincipalCataloger (talk) 21:51, 20 December 2022 (UTC)Reply[reply]
  •  Support -wd-Ryan (Talk/Edits) 22:01, 20 December 2022 (UTC)Reply[reply]
  •  Oppose We can already use requires grammatical feature (P5713) definite article (Q2865743) to say a definite article is required. - Nikki (talk) 13:41, 21 December 2022 (UTC)Reply[reply]
    • But Nikki, is there still a need to specify which definite article is used? UWashPrincipalCataloger (talk) 00:04, 24 December 2022 (UTC)Reply[reply]
      @AdamSeattle: Do you have any examples where the article can't be inferred from other information (e.g. language, grammatical gender where applicable)? - Nikki (talk) 10:59, 20 May 2023 (UTC)Reply[reply]
      • @Nikki: I'm trying to think of an example in English where it wouldn't be "the". Perhaps something like Swede (L34485), where you can have "a Swede", or "the Swedes", or Swedes without an article (Swedes are often fluent in English). AdamSeattle (talk) 22:44, 20 May 2023 (UTC)Reply[reply]
        How is that different from any normal English noun though? e.g. for cat (L7) you can have "a cat", "the cats" and "cats". - Nikki (talk) 21:45, 24 June 2023 (UTC)Reply[reply]
  • Question Are there any examples from languages where there is more than one definite article available? -عُثمان (talk) 17:40, 7 March 2023 (UTC)Reply[reply]
  •  Comment leaning towards  Oppose for the moment. @GZWDer: I don't rally understand the proposition right now, for the examples in English, it's a boolean (L2768 and novalue), the need for a value-centric property is not clear and other property can be used. For the French example, it's a bit strange (there was no gender on Louvre (L749827) and le (L2770) covers both masculine and feminine), again I don't see the need for this property. At the very least, more example and explanations are needed. Cheers, VIGNERON (talk) 09:45, 29 April 2023 (UTC)Reply[reply]

Dicionário Aberto ID Edit

   Under discussion
Descriptionidentifier for entries on Dicionário Aberto
Data typeExternal identifier
DomainPortuguese lexemes
Allowed values[a-záàãâéêíóõôúç0-9\s'-.]+
Example 1atividade/actividade (L500628)atividade
Example 2vender (L52324)vender
Example 3para (L618867)para
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Planned useadd to Portuguese lexemes or their forms
Formatter URLhttps://dicionario-aberto.net/search/$1

Motivation Edit

Enaldodiscussão 23:32, 11 January 2023 (UTC)Reply[reply]

Discussion Edit

Online Etymology Dictionary ID Edit

   Under discussion
DescriptionID of a entry in Online Etymology Dictionary (with URI fragment)
RepresentsURI fragment
Data typeExternal identifier
Domainlexeme in English, Proto-Indo-European
Allowed values.*#etymonline_v_\d+
Example 1bow (L14698)bow#etymonline_v_15679
Example 2bow (L184508)bow#etymonline_v_15680
Example 3 *ḱwṓ (L184995)*kwon-#etymonline_v_52685
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Formatter URLhttps://www.etymonline.com/word/$1

GZWDer (talk) 17:58, 13 January 2023 (UTC)Reply[reply]

Discussion Edit

Dictionary of the Russian Language (Ozhegov) ID Edit

Dictionary of the Russian Language (Ozhegov) ID (vedu.ru) Edit

   Under discussion
DescriptionID of a word in Dictionary of the Russian Language (Ozhegov) provided by www.vedu.ru
RepresentsDictionary of the Russian Language (Q4423784)
Data typeExternal identifier
Domainlexeme in Russian
Example 1поиск (L147604) → 24340
Example 2судно (L717298) → 34272
Example 3судно (L184116) → 34271
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Formatter URLhttps://www.vedu.ru/expdic/$1/
See alsoWikidata:Property proposal/Great Encyclopedic Dictionary ID

Dictionary of the Russian Language (Ozhegov) ID (slovarozhegova.ru) Edit

   Under discussion
DescriptionID of a word in Dictionary of the Russian Language (Ozhegov) provided by slovarozhegova.ru
RepresentsDictionary of the Russian Language (Q4423784)
Data typeExternal identifier
Domainlexeme in Russian
Example 1поиск (L147604) → 22117
Example 2судно (L717298) → 30962
Example 3судно (L184116) → 30961
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Formatter URLhttps://slovarozhegova.ru/word.php?wordid=$1

Note: It seems that the contents provided by the two websites are same, but ID are different. GZWDer (talk) 12:02, 14 January 2023 (UTC)Reply[reply]

Discussion Edit

  • I don't speak russian. I couldn't find any ownership information on the first site. For the second site, it appears to be based on an OCR-copy of a dictionary written by someone who died in 1964. https://ru.wikipedia.org/?oldid=10326782 . Someone who knows russian law should have a look to see if this has fallen into public domain or not. I couldn't find ownership information for the second site either. Infrastruktur (talk) 23:16, 12 March 2023 (UTC)Reply[reply]

Aragonario ID (6th version) Edit

   Under discussion
Descriptionidentifier for an Aragonese or Spanish lexeme in the Aragonese-Spanish online dictionary (version since January 2023)
Data typeExternal identifier
DomainAragonese and Spanish lexemes
Allowed values[1-9][0-9]{6}
Example 1augua (L8226)1074499
Example 2sangonera (L307650)1108110
Example 3abanderato (L647971)1070015
Sourcehttps://aragonario.aragon.es/
Planned useadd to existing Aragonese and Spanish lexemes
Number of IDs in sourcebetween 75,137 (45,112 + 30,025) and 82,145 (1114581 - 1032437)
Expected completenesseventually complete (Q21873974)
Formatter URLhttps://aragonario.aragon.es/words/$1/
See alsoAragonario ID (5th version) (P11071)

Motivation Edit

It appears that this month, a new version of the Aragonario was launched with several thousand more entries compared to the original version, leading to the invalidation of all IDs from the previous version. This proposal covers IDs from the new version, in line with there being separate properties for new and former schemes.

(Those former IDs are not all lost, however, as the proposal for the property covering the previous version has a link to a spreadsheet with a complete list of all of those IDs I compiled a few months ago--they should continue to be added for posterity. I have now begun compiling a list of the newer IDs, and it is hoped that reconciling information between the two versions—which I intend to do myself—will be made easier as a result.) Mahir256 (talk) 17:57, 20 January 2023 (UTC)Reply[reply]

Discussion Edit

  • @Aradgl, Uesca: and @Nikki, عُثمان, Bovlb: from the previous proposal. Mahir256 (talk) 17:57, 20 January 2023 (UTC)Reply[reply]
    I'm afraid, as expected, the aragonario has changed its routes and the Aragonario ID no longer works.
    It is not useful to use a web ID that can change. If someone wants to inquire about that lexeme, they can do so by searching for the lexeme itself in the Aragonario using the lexema or another source of information (paper dictionaries, for example).
    I do not have any kind of control over the Aragonario, nor can we demand anything of him.
    @Mahir256 Uesca (talk) 18:15, 20 January 2023 (UTC)Reply[reply]
    @Uesca: There is still merit to retaining the old identifiers; many of them are still accessible through the Internet Archive, and its function of serving as an identifier has not really diminished (see, in addition to the 'former scheme' properties, ones like ISOCAT ID (P2263), Google+ ID (P2847), and other properties for discontinued websites). As for the issue of changes in IDs, these too can be reflected in the data; if their ability to change made them not useful, then properties for social media accounts--whose IDs can frequently be changed by their users--would also not be useful. Mahir256 (talk) 18:23, 20 January 2023 (UTC)Reply[reply]
    Do they have any policy about identifiers? Do we have any contacts on their team that can advise us? I'd love to be able to map this stuff, but it's not very satisfactory to support a property for an identifier that can be invalidated on a whim. Bovlb (talk) 18:57, 20 January 2023 (UTC)Reply[reply]
    @Bovlb: I sent an email to the address posted on the 'Contacto' page of that site asking about the stability of their identifiers. Mahir256 (talk) 20:13, 20 January 2023 (UTC)Reply[reply]
    @Uesca, Bovlb: After resending the message once, I eventually got a reply. Mahir256 (talk) 17:33, 3 February 2023 (UTC)Reply[reply]
    Hmm. Thanks for following up.
    When they say "right now we are creating permanent and stable links", does that mean that the links they currently create are permanent and stable, or does it mean that they're currently designing yet another version of identifiers, this time to be permanent and stable? Bovlb (talk) 18:38, 3 February 2023 (UTC)Reply[reply]
    @Bovlb: This is what they had to say about that. Mahir256 (talk) 14:10, 8 February 2023 (UTC)Reply[reply]
    @Mahir256 Hmm. From that response, it doesn't sound like we should proceed with this property at this time. Bovlb (talk) 15:47, 8 February 2023 (UTC)Reply[reply]
    @Bovlb: I would agree, but it appears @Uesca: has begun adding these new IDs (accidentally?) using the existing property intended for the old IDs (e.g. fuyita (L1016834) has an ID which would not have worked prior to January 2023); if they are to be shifted, it would need to be to this proposed new property (lest they be removed completely or shifted to described at URL (P973)). Mahir256 (talk) 18:41, 12 February 2023 (UTC)Reply[reply]
  •  Support Per above, at the very least these can be archived. --عُثمان (talk) 20:21, 20 January 2023 (UTC)Reply[reply]

Dicionário inFormal ID Edit

   Under discussion
Descriptionidentifier for an entry on Dicionário inFormal
RepresentsDicionário inFormal (Q116273055)
Data typeExternal identifier
Domaindictionary entry (Q1580166)
Example 1menino (L669443)menino
Example 2taxista (L447872)taxista
Example 3chato (L671156)chato
Example 4merda (L448068)merda
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Planned usePortuguese lexemes or forms
Expected completenessalways incomplete (Q21873886)
Implied notabilityWikidata property for an identifier that does not imply notability (Q62589320)
Formatter URLhttps://www.dicionarioinformal.com.br/$1/
See alsoInfopédia entry (P11485)
Wikidata projectWikiProject Brazil (Q11134020), WikiProject Portugal (Q11142608)

Motivation Edit

Crowdsourced dictionary with relevant lexical information that sometimes isn't available on traditional dictionaries, such as slangs and gender-neutral language. –Guttitto (talk) 04:13, 21 January 2023 (UTC)Reply[reply]

Discussion Edit

 Oppose. Unfortunately, it is full of redundant, meaningless or extremely offensive entries. I don't see how this can be useful for the lexemes. Enaldodiscussão 22:50, 25 January 2023 (UTC)Reply[reply]

See how many times it is linked on ptwiki or even enwikt, it's not a giant usage but I think it is notable enough to deserve an external identifier. I see it as a similar situation to Urban Dictionary, both are crowdsourced, of course not every entry is good. Although the property proposal for Uban Dictionary wasn't done, I think it should be reproposed. The usefulness is precisely on those entries which aren't available on traditional dictionaries, a term being offensive doesn't mean that it isn't a lexical unity, otherwise we wouldn't have merda (L448068) as a lexeme. –guttitto(talk · contribs) 03:22, 26 January 2023 (UTC)Reply[reply]

 Comment I'm not sure it's really much of an ID; it just returns definitions for whatever word you put in the web address. For example, I can try to look up the definition of "Wikidata" (which comes back empty). So I'm not sure about the need for this property when the lemma itself is the ID. --Yirba (talk) 13:14, 4 February 2023 (UTC)Reply[reply]

That makes sense, but we do have many other external IDs for dictionaries that are similar, e.g. Collins Online English Dictionary entry (P11230), Dicionário Priberam ID (P11526), The Britannica Dictionary entry (P11263) and Merriam-Webster online dictionary entry (P11130). –guttitto(talk · contribs) 16:32, 4 February 2023 (UTC)Reply[reply]

gf-wordnet-lexeme Edit

Motivation Edit

GF WordNet is a variant of WordNet which brings together lexicons for several languages. It is also compatible with the grammar libraries in GF which makes it possible to use the lexicon for parsing and natural language generation.

Unlike the traditional WordNet, a synset is a set of abstract word identifiers. Each identifier is then mapped to a word in a concrete language. The abstract words make it possible to preserve the translation equivalents, which is lost if only synsets are linked across languages.

Wikidata already links entities to WordNet. In fact half of the links are exported from GF WordNet. Linking to synsets, however, is not always enough to identify lexical items unambiguously. For example household_N is best linked to Q259059 but if the entire synset is linked then it also contains family_1_N, home_8_N and house_10_N.

Since we now experiment with using using GF for natural language generation, it is useful if we preserve more explicit links. A prototype for the data already exists.  – The preceding unsigned comment was added by Kr.angelov (talk • contribs) at 10:44, 8 February 2023 (UTC).Reply[reply]

SignPuddle page id Edit

   Under discussion
Descriptionthe page id of this lexeme in the SignPuddle sign language database
Data typeExternal identifier
Allowed values\d+&sid=\d+
Example 1Lexeme:L101401153&sid=1295
Example 2Lexeme:L10139114&sid=2643
Example 3Lexeme:L101415253&sid=1645
Source
Formatter URLhttps://www.signbank.org/signpuddle2.0/canvas.php?ui=1&sgn=$1

Motivation Edit

SignPuddle is an open platform (CC BY-SA) where users can add signs in various sign languages. Each entry has a Puddle Page id. It holds each sign in FSW and SWU as well as a sutton signwriting representation

The sgn parameter stands for the collection (eg. Language), the sid parameter is the id of a particular sign within that collection.

You can find these ids by

  1. Go to signbank.org/signpuddle.
  2. Choose a language (in the blue area)
  3. Chose Dictionary or an equivalent link in the selected language.
  4. Navigate to Search by Word or an equivalent link in the selected language.
  5. use the search
  6. in the bottom of each result there is a link like Puddle Page 123456.

When clicking that links a page like signbank.org/signpuddle2.0/canvas.php?ui=1&sgn=4&sid=2643 should open.

Alternatively you can search Database xml dumps. Pick a collection like sgn53.spml for German Sign Language (Q33282). The first entry in the file would be 53&sid=1.


Shisma (talk) 20:54, 11 February 2023 (UTC)Reply[reply]

Discussion Edit

  •  Oppose as currently proposed because this is not an ID, it's a URL fragment. It doesn't correspond to the ID attributes in the downloadable files, nor to the URLs used by SignPuddle 3 (e.g. the URL for "sgn=53&sid=1295" can also be https://signpuddle.com/client/#!/dictionary/gsg-DE-dictionary-public/entry/1295 for the UI or https://signpuddle.com/server/dictionary/gsg-DE-dictionary-public/search/id/1295 for the API). Each dictionary is a separate ID space, so if we want these as identifiers, they should be separate properties. If we're going to store something that only applies to one URL, then it might as well stay as a URL. Also, since SignPuddle can have numerous entries for a single sign (e.g. 1394, 2067, 2948, 3533 and 5298 for ASL), if we are going to link to it, the links should probably be on forms not lexemes. - Nikki (talk) 19:11, 24 February 2023 (UTC)Reply[reply]
    And if you're thinking of importing the data into Wikidata: Please don't. The SignPuddle data is uncurated and unverified. Anyone can add new data and there's no way (for normal users at least) to edit, remove or even flag bad or duplicate entries. The only way you can distinguish the good data from the bad data is by being able to read SignWriting and knowing the sign it's supposed to represent. - Nikki (talk) 19:26, 24 February 2023 (UTC)Reply[reply]
  •  Oppose per above -عُثمان (talk) 17:53, 7 March 2023 (UTC)Reply[reply]

‎part of other combined lexeme Edit

Motivation Edit

This property can be used to specify that components of a compound lexeme A appear due to another compound lexeme B actually being used within A. It is particularly important whenever the components coming from B are disconnected, reordered (as in the first two examples), or inflected (as in the last two examples) within A. Mahir256 (talk) 16:04, 10 April 2023 (UTC)Reply[reply]

Discussion Edit

  • Comment: The use case in the first two examples is clear, but I am having trouble seeing what advantages using a property like this for continguous compounds offers. If the reason is to avoid adding forms on the compound lexeme, it doesn't negate the reasons to add forms for any other reason than in combines statements. For example, in order to put statements for "subject form" on usage examples on a given compound, it makes sense to have that form present. Simple compounds are also likely to form parts of other compounds, and it seems like it would be more confusing to be expected to represent a nest of compounds on each derived lexeme. For example, if we have a compound consisting of an adjective and a three-member verb construction like مار لے جاوݨ, then any additional lexeme employing that compound would have links from جاوݨ to مار لیݨ, لے جاوݨ, and مار لے جاوݨ. It seems a lot simpler just to link to the single compound, unless querying statements on the constituent lexemes has problems I am not aware of. -عُثمان (talk) 22:09, 11 April 2023 (UTC)Reply[reply]

‎Beta Code Edit

   Under discussion
Descriptionrepresentation of Ancient Greek as ASCII characters
RepresentsBeta Code (Q752325)
Data typeString
Domainform
Allowed values[ "-}]+
Example 1Ἠέλιος/*)he/lios (L1095729) → *)he/lios
Example 2εἲρειν (L1020946) → ei)\rein
Example 3γλύφω (L7961) → glu/fw
See alsoALA-LC romanization (P8991)

Motivation Edit

Beta Code is a form of representing Ancient Greek (Q35497) letters as ASCII used by some research institutions, e.g. Perseus. You can try it in this web converter.-- Kristbaum (talk) 14:43, 25 April 2023 (UTC)Reply[reply]

Discussion Edit

  • Maybe it would be useful for forms too, but in most dictionaries it's only done once for the lemma. Kristbaum (talk) 14:51, 25 April 2023 (UTC)Reply[reply]
  •  Comment @Kristbaum: do we really need a property for that? I think it can simply be a form, see what I tried on L:L1095729#F1. Cheers, VIGNERON (talk) 11:24, 1 May 2023 (UTC)Reply[reply]
    • @VIGNERON: Cool idea, but wouldn't that conflict with the idea of a spelling variant? It's not a different spelling just a different representation. Is there maybe another example of a similar format to model this after? (If not I'm fine with your suggestion) --Kristbaum (talk) 12:14, 1 May 2023 (UTC)Reply[reply]
  •  Support Ionenlaser (talk) 13:01, 6 May 2023 (UTC)Reply[reply]
  • Comment; I would consider a property like this to be more suitable than adding a variant representation on the form. On the example of ਲੌਟਣ/لَوٹݨ (L1096159) above, I would use ISO 15919 transliteration (P5825) if I wanted to add a Romanized transcription, as this information is for specialized purposes and not part of the general written use of the language, and because it is useful to add qualifiers to statements about transcriptions. I would actually suggest that this be used on forms rather than the lexeme itself—print dictionaries are not necessarily concerned with recording every form of a word and make necessary compromises for space. As we do not have these limitations on Wikidata, I think it makes sense to attach these transcriptions to forms (and this would be in line with how existing properties are used). --عُثمان (talk) 18:07, 8 May 2023 (UTC)Reply[reply]

lexical unit Edit

   Under discussion
Representslexical unit (Q115862390)
Data typeItem
Domainlexeme, Q111352
Example 1lugal/𒈗 (L643713)lexical unitMesopotamian city (Q117000106)
Example 2MISSING
Example 3MISSING
Sourcehttps://framenet.icsi.berkeley.edu/fndrupal/WhatIsFrameNet, https://metaphor.icsi.berkeley.edu/pub/en/index.php/Category:Frame

Motivation Edit

See discussion of the proposed property 'frame element of' for more on the frame semantics properties proposed here.

Discussion Edit

  • It's unclear whether this should be about a lexeme or sense. ChristianKl❫ 11:41, 20 March 2023 (UTC)Reply[reply]

‎Kamus Besar Bahasa Indonesia Daring entry Edit

   Under discussion
Descriptionidentifier for an entry in the online version of Kamus Besar Bahasa Indonesia
RepresentsGreat Dictionary of the Indonesian Language (Q4200623)
Data typeExternal identifier
Domainlexeme
Allowed values[a-z0-9\.,'\-_\(\); ]+
Example 1cagar budaya (L739124)cagar budaya and cagar_budaya
Example 2Yth. (L1119265)Yth.
Example 3Al-Qur'an (L1119263)Al-Qur'an
Example 4S-1 (L1119266)S-1
Example 5umbi-umbian (L700147)umbi-umbian
Example 6patah tongkat berjeremang (L1119283)patah tongkat berjeremang (patah sayap bertongkat paruh; patah tongkat bertelekan)
Example 7pucuk dicinta, ulam tiba (L1119282)pucuk dicinta, ulam tiba (hendak ulam pucuk menjulai)
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Number of IDs in source119,345
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://kbbi.kemdikbud.go.id/entri/$1

Motivation Edit

KBBI is the most widely used dictionary by Indonesian. This will help folks at WikiProject Indonesia describing their source and providing useful link to authoritative information about Indonesian lexemes to others when they are creating or editing Indonesian lexemes. This property proposal would also enrich the existing properties such as Oxford English Dictionary numeric ID (P5275), Collins Online English Dictionary entry (P11230), Lëtzebuerger Online Dictionnaire ID (P9397), and Cambridge Dictionary entry (British English) (P11422). Labdajiwa (talk) 06:00, 24 May 2023 (UTC)Reply[reply]

Discussion Edit

  •  Support Seems ok to me. ArthurPSmith (talk) 20:49, 1 June 2023 (UTC)Reply[reply]
  • Comment: Given that this would potentially be as useful for the lexemes currently modeled as Malay (Q9237), I would like to see it clarified how either a) these languages should be merged or b) these languages will be maintained separately, that is, should identifiers from this dictionary be placed on equivalent/identical Malay and Indonesian lexemes, and is there a plan to ensure that information sourced from this dictionary is used to update lexemes in both varieties where applicable? -عُثمان (talk) 17:15, 6 June 2023 (UTC)Reply[reply]
    Well, there was a discussion about this years ago and folks over there doesn't seem agree to merge Indonesian with Malay. I'd say this identifier will be used mainly for Indonesian. If this dictionary can be used in Malay lexemes, well there are plenty of entries of this dictionary marked as Malay (i.e. sama ada). Perhaps this dictionary can also be used in the lexemes of regional languages in Indonesia where many of them doesn't have its reliable online dictionary. And FYI, as of June 2023, Indonesian has 19,864 lexemes, while Malay has 2,729 lexemes. Labdajiwa (talk) 14:31, 7 June 2023 (UTC)Reply[reply]
    @Labdajiwa: Are you sure the 'Mal' annotation doesn't simply mean 'used primarily in Malaysia' (i.e. Q15065–to be distinguished from the more general Q9237 to which at least a plurality of the entries in this dictionary would apply), in much the same way that there are Malay words used, say, mainly in Brunei or mainly in Singapore? You should clarify your 'perhaps' statement: can this or can this not be used for such regional languages? And lexeme count is not relevant when lexemes lack meanings; all Malay lexemes have at least one, while this cannot be said for the Indonesian ones. Mahir256 (talk) 04:39, 31 July 2023 (UTC)Reply[reply]
    This Malay-Indonesian debate seems to be because of confusion. AFAIK, linguistically, in English, the official and standardized form of language in each country is called "Malaysian Malay" and "Indonesian". Both language are descended from (or "grouped in" probably could be another correct term) "Malay". Malay, translated to Indonesian, is Melayu. Melayu is commonly used in Indonesia to refer "a language that is used in Malaysia". Malaysian Malay could be translated to Indonesian as Melayu Malaysia, but nobody used that word in Indonesia. Merging proposal sounded like to merge Malaysian Malay and Indonesian, which I think is not possible because "A language is a dialect with an army and navy". Hddty (talk) 02:22, 12 August 2023 (UTC)Reply[reply]

‎etymologiebank.nl ID Edit

   Under discussion
Descriptionidentifier for an entry in the online Dutch etymology dictionary hosted by Instituut voor de Nederlandse Taal
Data typeExternal identifier
DomainDutch lexemes
Allowed values[a-z]+[1-9]?
Example 1spreken (L1130640)spreken
Example 2melk (L630689)melk1
Example 3bier (L495801)bier1
Sourcehttps://etymologiebank.nl/
Planned useadjust P973 values pointing to etymologiebank.nl to instead use this property
Formatter URLhttps://www.etymologiebank.nl/trefwoord/$1
See alsoOudnederlands Woordenboek GTB ID (P5937), Vroegmiddelnederlands Woordenboek GTB ID (P5938), Middelnederlandsch Woordenboek GTB ID (P5939), Wurdboek fan de Fryske taal GTB ID (P9158)

Motivation Edit

This property will provide some authority control for Dutch lexemes. Mahir256 (talk) 15:58, 25 June 2023 (UTC)Reply[reply]

Discussion Edit

‎Vazhaju Word ID Edit

   Under discussion
Descriptionidentifier for an entry in the Vazhaju Tajik dictionary
Data typeExternal identifier
Example 1لیفت/лифт (L1149428) Kdo2KJn
Example 2دیرین‌اقلیم‌شناسی/дериниқлимшиносӣ (L1004914) XKdnZetm
Example 3آب زدن/об задан (L584728) atm1ydo
Formatter URLhttps://vazhaju.tj/word/_/$1

Motivation Edit

Vazhaju is an aggregate online dictionary of Tajik Persian which includes Perso-Arabic script lemmas. It would be useful for New Persian (Q56356571) lexemes. The entries are each drawn from a variety of sources; some from Dehkhoda Dictionary (Q1182988), some from Explanatory Dictionary of the Tajik Language (Q25592497), some from Q25582847, and a number with definitions from glossaries produced by the Vazhaju editors themselves. Example 1 is an entry for a recent loanword which does not appear in many Persian dictionaries. Example 2 is from a Tajik “translation” produced by Vazhaju of Dictionary of approved words of the Persian Language and Literature Academy (Q115664843), a glossary of technical neologisms current in Iran from which we have many lexemes. Example 3 is an entry for a compound verb drawing from a source other than Dehkhoda, which is already linked to many Persian lexemes. This site is also more stable than Dehkhoda, which has a tendency to go offline for days to weeks at a time (it is currently unavailable). -عُثمان (talk) 15:03, 16 August 2023 (UTC)Reply[reply]

The formatter URL is https://vazhaju.tj/word/_/$1 (I don’t know why this is not in the property proposal template anymore.) -عُثمان (talk) 15:06, 16 August 2023 (UTC)Reply[reply]

Discussion Edit

‎Sayed Ganj Balochi Glossary ID Edit

   Under discussion
Descriptionentry in the Qamosona online reproduction Sayad Zahoor Shah Hashmi’s Balochi dictionary
Data typeExternal identifier
Example 1وٹ (L1122042) aea45e5cb1
Example 2انگوری (L1149859) aeaa5b5a
Example 3واب (L1150090) aea55861b0
Formatter URLhttps://qamosona.com/G3/index.php/term/,6f57b19b61545fad9b9ea5$1.xhtml

Motivation Edit

Sayed Ganj is one of the most important monolingual dictionaries of Balochi, and this online reproduction of it by Qamosona would be useful to link to Balochi lexemes via a property. The way the URLs for the entries work is the part of the string spanning from term/ to ...ea5 is specific to this dictionary; entry URLs in other dictionaries on Qamosona begin with a different unique string and can have separate properties. Note that in this dictionary the English word “synonym” is being used to mean diacriticized form of the headword rather than synonym, in case that is not clear. --عُثمان (talk) 14:37, 28 August 2023 (UTC)Reply[reply]

Discussion Edit

Adoption variant Edit

Constraints

Motivation

I observed this relation between loanwords only in japanese but i'm sure it exists in other languages as well.

Example: cappuccino (L618738) has be adopted to japanese with slight variations:

All the words I have found share exacly the same meaning but I suppose they don't have to 🤷

Discussion

If you have non-japanese examples, feel free to add them –Shisma (talk) 13:28, 22 September 2023 (UTC)Reply[reply]


Wikibase form Edit

rhythmic weight Edit

   withdrawing
Descriptionsyllabification annotation for Urdu-based orthographies
Representsvazn (Q115484594)
Data typeString
Domainform
Allowed values(۰|۱|۲|\s)+
Example 1ਵੱਜਿਆ/وجّیا (L741125-F2) ISO 15919: (vaj.jiā); IPA: /ʋəd͡ʒˈjɑ/; Vazn: ۲۱ (21)
Example 2ਆਇ/آۓ (L677817-F6) ISO 15919: (ā.i); IPA: /ɑ.i/; Vazn: ۰۰ (00)
Example 3ISO 15919: (maṁg.vau.ṇa.gī.ā̃); IPA: /mɐŋɡˈʋɔ.ɳᵊˈɡɪ.ɑ̃/; Vazn: ۲۱۱۱۱ (21111)
Example 4ݙاڈھے (L740480-F3) ISO 15919: (ḏ̣ā.ḍhe); IPA: /ᶑɑ.ɖʰɛ/; Vazn: ۱۱ (11)
Sourcean explanation may be observed by clicking on the “i” next to Vazn on Rekhta Dictionary entries
Planned useadd to lexeme forms in Punjabi, Hindustani, and other languages used in Pakistan written with an Urdu-based orthography

Motivation Edit

We have the existing hyphenation property for indicating the syllabification of lexeme forms, but this is poorly suited to languages which are typically written with vowels omitted and/or a cursive script which is not legible when hyphenated. The term "vazn" is used in the English version of the Rekhta online Hindustani dictionary, hence the name used here, but this property could also be labeled as "rhythmic weight" in English if that is preferable. Either way, I do intend the scope for this property to be for languages used in Pakistan commonly written with an Urdu-based orthography - without context for the ways in which syllable timing occurs in other languages which use cursive scripts, or for the syllabification conventions used elsewhere, it is hard to say how applicable this might be. If anyone does think this could be applied more broadly however, I am interested to hear.

To explain how it works, the idea is that each syllable is given a number 1 (۱) if it contains one consonant or cluster, or 2 (۲) if it contains two. Typically, short vowels between consonants are not indicated, so we can use these numbers to infer where syllable boundaries occur. The rules have to be bent a little bit for Punjabi as vowel-only syllables are very common compared to Urdu - 0 (۰) will represent no consonant syllables, and stressed semi-vowels will be treated as consonants. A transliterated example for the benefit of those not able to read those given in the proposal, taking a Punjabi word and pronunciation rules. Singular oblique form لفظ /ləf.ɐz/ : 21. Plural oblique form لفظاں /ləf.zɑ̃/ : 22. For the purposes of this format, the letter ں nun gunna is treated as a consonant. Its actual sound is conditioned by context and can be a nasal consonant or nasalised vowel, the latter of which still makes sense to treat as consonant-like for syllabification. عُثمان (talk) 17:26, 27 November 2022 (UTC)Reply[reply]

Discussion Edit

 Comment I don't know anything about this topic, but I think "rhythmic weight" or some other Enlish phrase would be a better label. "Vazn" can be used as an alias. — The Erinaceous One 🦔 07:41, 30 November 2022 (UTC)Reply[reply]

@The-erinaceous-one Fair enough; I have updated the title. I was thinking about it and there are ambiguities with وزن and other senses of the term; I will probably make this more specific in Punjabi as well. عُثمان (talk) 00:19, 17 December 2022 (UTC)Reply[reply]
  •  Comment Withdrawn in part due to lack of interest, and because I think when I proposed this my understanding of syllabification in these languages was oversimplified. -عُثمان (talk) 17:18, 6 June 2023 (UTC)Reply[reply]

usage context of form or sense Edit

Currently when more than one type of context label property (Q116547761) is used, they are usually considered as having and relation. When more than one value is used in same property, the relationship is usually and other than variety of lexeme, form or sense (P7481), location of sense usage (P6084) and field of usage (P9488) which can only be or. This property will allow users to specify more complex context labels, such as a term that is obsolete in some places but common in another place. GZWDer (talk) 05:37, 1 February 2023 (UTC)Reply[reply]

Discussion Edit

‎form decomposition Edit

   Under discussion
Descriptionform decomposition
Data typeItem
Domainform
Example 1nin9-zu/𒎐𒍪 (L643660-F2)nin9/𒎐 (L643660-F1), nin9-zu/𒎐𒍪 (L643660-F2)zu/𒍪 (L1116255-F1) (see also elaboration of Example 1)
Example 2lugal/𒈗 (L643713-F10) → lugal[king][-ak][-ø] N.GEN.ABS (see elaboration of Example 2)
Example 3in-pa3/𒅔𒅆𒊒 (L741253-F2) → i-n-pad[name][-ø] FIN.3-SG-H-A.V.3-SG-P (see elaboration of Example 3)
Planned useLinking forms to their compositions (Lexemes) and to attach the grammatical role of these compositions
See alsocombines lexemes (P5238) which allows to decompose a Lexeme into other lexemes which are parts of it

Motivation Edit

Sumerian, as an agglutinative language derives its grammatical features from compositions of mainly suffixes which are attached to a Lexeme.

In Wikidata, we can already model Lexemes of the individual suffixes and we can create QIDs for the grammatical features that we need to describe a Lexeme Form.

What we miss is a way to decompose a lexeme form to represent how the suffixes represent the grammatical features which are assigned to the form.

One might argue that this is a trivial matter, as only suffixes are added and they can be described sufficiently to represent a grammatical feature.

However, in Sumerian, the interpretation of a word is usually broken down into a description of the chain of suffixes, or even vowels in suffixes, as exemplified here:

http://oracc.museum.upenn.edu/etcsri/parsing/index.html


This interpretation of a Sumerian form can become quite complex and is worth modeling in Wikidata, in my opinion.

To do that, we would need a property that allows for representing the decomposition of a form, similarly to "combines lexemes". Then, we would be able to list the individual suffixes or parts of suffixes in a list e.g. with "series ordinal" to explain the decomposition of the lexeme form completely in RDF.

Usage for other languages Edit

There can be many other potential application cases for this property in other languages such as:

  • Turkish, Japanese as agglutinative languages (even though maybe with a clearer representation of Suffixes), e.g. all forms of 因る/よる (L11476)
  • Arguably Indo-European languages, e.g. German gehst (L1026-F4) "gehst" could be separated into "geh" - STEM and "st" "second person singular present, indicative, active"
  • Akkadian Cuneiform will need similar patterns for verbs, but also includes verbal roots, maybe Arabic is then also applicable

Elaboration on examples Edit

This section elaborates the aforementioned three examples for Sumerian.

Example 1: ninzu (nin9-zu/𒎐𒍪 (L643660-F2)) Edit
  • Form: nin9-zu / 𒎐𒍪
  • Grammatical interpretation: nin9=HEAD.zu=2-SG-POSS

This noun has a second person singular possessive case which is marked with the suffix zu/𒍪 (L1116255).


We would like to express that the suffix is marked with zu/𒍪 (L1116255) and that nin/𒊩𒌆 (L643660) is the HEAD and carries the meaning of the noun.

Representation in Wikidata Edit
Example 2: lugal (lugal/𒈗 (L643713-F10)) Edit

Taken from: https://github.com/cdli-gh/CDLI-CoNLL-to-CoNLLU-Converter/blob/master/resources/P100065.conll

  • Genitive absolutive form of lugal (king)
  • r.1.4 lugal lugal[king][-ak][-ø] N.GEN.ABS


This example shows, that the three forms (lugal/𒈗 (L643713-F7), lugal/𒈗 (L643713-F9), lugal/𒈗 (L643713-F10)) are written in the same way: "lugal". Therefore, additional elaborations on why these forms are written in this way are needed.


The genitive absolutive case of lugal/𒈗 (L643713), lugal/𒈗 (L643713-F10) is comprised of three components:

  1. the STEM (lugal)
  2. the particle (-ak)
  3. the non-written marker for the absolutive case (it is always left empty)

In lugal/𒈗 (L643713-F10), the (-ak) is also not written, hence it is indistinguishable from the forms lugal/𒈗 (L643713-F7), lugal/𒈗 (L643713-F9) without additional context.

Hence, we would like to break down the grammatical composition with reference to the written and non-written parts of the form.

Representation in Wikidata Edit
Example 3: pad3 (in-pa3/𒅔𒅆𒊒 (L741253-F2)) Edit

Taken from: https://github.com/cdli-gh/CDLI-CoNLL-to-CoNLLU-Converter/blob/master/resources/P100065.conll

  • r.3.3 in-pa3 i-n-pad[name][-ø] FIN.3-SG-H-A.V.3-SG-P

The example for inpad shows a representation of the in-pa3/𒅔𒅆𒊒 (L741253-F2) with the sense "to name" in Sumerian.

The verb describes its directly associated subject and its associated direct object with different grammatical parameters.

  • Subject: The subject is described as "third person singular finite human agent", which manifests itself in the prefix "in"
  • Direct Object: The direct object is described as "third person singular" and manifests itself in the non-written suffix (L1117775-F1) .
Representation in Wikidata Edit

 – The preceding unsigned comment was added by Situxx (talk • contribs) at 21:45, April 28, 2023‎ (UTC).

Discussion Edit

  •  Support this property is a useful and essential addition for abstracting complex linguistic issues. KaCeBe (talk) 11:53, 17 May 2023 (UTC)Reply[reply]
  •  Comment This seems useful but could be clarified a bit. Wouldn't object has role (P3831) be more appropriate as a qualifier given that the object is the form being linked to, and the subject is the form carrying the statement? Maybe even a different property (or a new one) would make more sense here. I will try to find some examples from other languages to see if that helps clarify anything --عُثمان (talk) 17:32, 6 June 2023 (UTC)Reply[reply]
  • @Situxx: OK, having thought about it a bit more I have some more specific comments: I am not sure if it is necessary to use subject stated as to qualify the statements. An issue with that property is that it does not allow specifying a language code for the string, and for languages where this information could be presented in multiple ways it is unclear how to use. One approach I have been going with for "zero morphemes" which are common in a Punjabi is to use non-printing Unicode characters as representations of individual forms. ("Left to Right Mark" and "Arabic Letter Mark" for LTR and RTL representations respectively.) This allows attaching additional data to the zero representation, and indicating that a form is an empty string without using a qualifier. See ‎/؜ (L718607) for example where this verbal suffix is most often unrealized, but has different forms historically and in some dialects. Rather than using subject named as, I think it would make sense to separate forms like this and select the combining form which has the correct representative string(s).
Then, for example, it could be stated that ਉੱਠ/اُٹھّ (L1044310-F13) employs the suffix ‎/؜ (L718607-F1), while ਉੱਠੀ/اُٹھّی (L1044310-F14) employs ‍ੀ/‍ی (L718607-F3). It would not be clear in the second case how to represent both ਈ and ی using subject named as whereas using the linked form in both cases we can get a representation of the combining form for each language/script code -عُثمان (talk) 20:25, 7 June 2023 (UTC)Reply[reply]
Thank you very much for your remarks. I think using zero morphemes as forms for the suffixes we have in Sumerian that can be omitted is a great idea. I will adapt that and update my proposal accordingly. As for subject has role vs. object has role I think you are right. It should be object has role as the role of the suffix is described and not the grammatical feature of the subject (the form) which is already described in the grammatical feature description. I will adapt that as well and give you a heads up once I am done. Situxx (talk) 13:47, 9 June 2023 (UTC)Reply[reply]

Wikibase sense Edit

FrameNet Frame ID Edit

   Under discussion
Descriptionidentifier of a concept in FrameNet
RepresentsFrameNet frame identifier (Q113847330)
Data typeExternal identifier
DomainWikibase item (Q29934200)
Allowed values[A-Z][a-zA-Z_]*
Example 1addiction (Q12029)Addiction
Example 2parturition (Q34581)Giving_birth
Example 3laying eggs (Q65129133)Giving_birth
Example 4noxa (Q50379880)Toxic_substance
Sourcehttps://framenet.icsi.berkeley.edu/fndrupal/frameIndex
Number of IDs in source1214
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://framenet2.icsi.berkeley.edu/fnReports/data/frame/$1.xml
Applicable "stated in"-valueFrameNet (Q1322093)
Single-value constraintyes
Distinct-values constraintno
Wikidata projectlexicographical data in Wikidata (Q51955175)

Motivation Edit

FrameNet is an English language reference in the field of frame semantics (Q2713996) which allows items in Wikidata (which may be linked to senses) to be mapped to a relevant frame in FrameNet. Dhx1 (talk) 16:05, 8 September 2022 (UTC)Reply[reply]

Discussion Edit

World Loanword Database word ID Edit

   Under discussion
DescriptionID for a word in World Loanword Database (link to senses)
RepresentsWorld Loanword Database (Q104243588)
Data typeExternal identifier
Domainsense
Example 1L3341-S1 → 72212319483460164
Example 2L3341-S2 → 72212319483460164
Example 3L3334-S2 → 72212320343848176
Example 4L3334-S3 → 72212319922910265
Example 5L5725-S2 → 72212320295447116, 92181432331044465-1
Example 6L749443-S1 → 92181432215014552-1
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Number of IDs in source57926 (target word only, the database also contain source words)
Formatter URLhttps://wold.clld.org/word/$1
See alsoWikidata:Property proposal/World Loanword Database meaning ID

GZWDer (talk) 15:52, 19 December 2022 (UTC)Reply[reply]

Discussion Edit

  •  Comment It's not really an external identifier for senses, but neither is it for lexemes - more for etymologies I guess but we don't really have a granularity level associated with that in Wikidata. I think I'd prefer it to be linked to lexemes rather than senses though. ArthurPSmith (talk) 17:19, 20 December 2022 (UTC)Reply[reply]
    What is most similar to a "word form+meaning" combination is a sense, not a lexeme. GZWDer (talk) 10:24, 22 December 2022 (UTC)Reply[reply]

‎semantic derivation Edit

Explanation Edit

For the benefit of those unfamiliar with Punjabi, the examples illustrate the following:

  1. The Punjabi adjective ਹਜ਼ਾਰਹਾ/ہزارہا (L1088987) is derived in form from the plural of the Persian noun هزار (L1088983). The primary sense of the Persian noun is the number “one thousand,” and the primary sense of the derived adjective is equivalent to “many thousands of,” for use in expressions with meanings like “thousands of years ago.” This may be considered a sense which elaborates on or extends from the original Persian sense.
  2. The Hindustani compound verb अंग लगना/انگ لگنا (L615866) may be considered a loan of Punjabi ਅੰਗ ਲਾਵਣ/انگ لاوݨ (L1093797) due to the fact that both its constituents are themselves borrowings from Punjabi. Sense 1 on the Hindustani lexeme may be described as “to be embraced,” or in a more literal translation of the definition given in Hindi/Urdu dictionaries, “to be held with the chest.” The corresponding sense 2 on the Punjabi lexeme is very close but not the same: “for bodies to have gone and touched each other.” This sense is more broadly applicable and is not necessarily restricted to chest hugs—there are other senses on similar compounds for expressing that in Punjabi. The Hindustani sense may be considered a contraction or narrowing of the original sense.
  3. The Hindustani compound verb अंग लगाना/انگ لگانا (L615868) may be similarly treated as a loan of Punjabi ਅੰਗ ਲੱਗਣ/انگ لگݨ (L1093794). However, these two lexemes have no senses which may be considered exact matches for translation between each other. The Hindustani sense used in the example has the meaning of “to take in marriage.” This meaning is a specific one which would be included in sense 4 on the Punjabi compound, which has a more generic meaning of “to bond with / form a relationship with.” This sense in Hindustani may be considered a contraction or narrowing of the original meaning.

Qualifier values Edit

These values for mode of derivation (P5886) on senses are proposed. The items are based on a list provided in the book The Punjabi Language: Sources and Forms (Q115327155) (in Punjabi). This list is likely not exhaustive and could be expanded.

(A review of the English Wikipedia article section on this topic may be helpful for more context on where these values come from.)

Motivation Edit

While it is currently possible to indicate some semantic derivation using derived from lexeme (P5191) on the lexeme level using qualifiers for the object and subject sense, this is not well suited for more complex situations. There may be multiple senses between lexemes which are unquestionably related to each other, but which cannot be considered translations or synonyms of one another. Linking these senses to each other avoids ambiguity about which senses correspond to each other between lexemes with multiple senses.

Any feedback and/or more examples would be very much welcome.

Note that “derived from sense” may also be an appropriate English label, but this name was taken by a page for a previous property proposal. -عُثمان (talk) 23:02, 17 April 2023 (UTC)Reply[reply]

Discussion Edit

  • Tend to  Support in general, as it seems like something whose scope could be broadened beyond the five mode derivations given. Mahir256 (talk) 23:38, 17 April 2023 (UTC)Reply[reply]

Klingon Word Wiki id Edit

   Under discussion
Data typeExternal identifier
Domainsense
Allowed values^[A-Za-z-]+#[1-9]\d+$
Example 1'ul/ (L1174117-S1)-ul#1
Example 2DIS/ (L624951-S1)DIS#2
Example 3DIS/ (L624951-S3)DIS#1
Example 4Sop/ (L1001115-S1)Sop#1
Sourceklingon.wiki (english), klingon.wiki (german)
Number of IDs in source5119 senses (in 4788 lexemes)
Expected completenesseventually complete (Q21873974)
Formatter URLhttp://klingon.wiki/Word/$1 (english) http://klingon.wiki/Wort/$1 (german)
Applicable "stated in"-valueKlingon Word Wiki (Q122879303)
Distinct-values constraintyes

Motivation Edit

klingon.wiki (Klingon Language Wiki (Q122886642)) is a wiki that documents the use of klingon in Books, Films & TV shows. It also provides a dictionary (Klingon Word Wiki (Q122879303)) in English and German of which English is the most complete.

Each Page representing a lemma is namespaced with Word or Wort depending on the target language.

Pages in this namespace represent lemmas, or groups of homograph lexemes. A lemma may contain different lexemes like Word/HoH#1 (noun) and ord/HoH#2 (verb). Ultimatly each fragment (#1, #2)represents a sense, as seen in Word/DIS#1 (noun, sense #1) and Word/DIS#2 (noun, sense #2).

Every sense contains a section about the source of each particular sense. most contain a usage example. All contain information about the lexical category and paradigm class of the associated lexeme.

In order a get a permalink to a sense you can use the 🔗 link next to the headline.

While the lemmas are expected to be stable, the senses might flip to a different ordinal whenever a new sense emerges. Which shouldn't happen often. –Shisma (talk) 09:07, 1 October 2023 (UTC)Reply[reply]

Talk Edit

Shisma (talk) 13:50, 30 September 2018 (UTC) Nick (talk) LydiaPintscher (talk) Luca Mauri (talk) 2020-06-27 EEMIV (talk) 16:59, 30 June 2020 (UTC) DGtal (talk) GHA (talk) 11:00, 1 November 2021 (UTC) Notified participants of WikiProject Star TrekShisma (talk) 10:38, 1 October 2023 (UTC)Reply[reply]

Other Edit

Jiten Online kanji ID Edit

   Under discussion
Descriptionidentifier for a CJK character on Jiten Online
RepresentsJiten Online (Q115665803)
Data typeExternal identifier
DomainCJK character (Q53764732)
Allowed values\d+
Example 1(Q109757480)26151
Example 2(Q54901253)670
Example 3(Q54553170)168
Example 4(Q55414623)1773
Sourcehttps://kanji.jitenon.jp/
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Number of IDs in source27693
Expected completenesseventually complete (Q21873974)
Implied notabilityWikidata property for an identifier that does not imply notability (Q62589320)
Formatter URLsee below – WRONG FORMAT, MISSING "$1"
See alsoGlyphWiki ID (P5467), Wikidata:Property proposal/Kanjipedia ID
Applicable "stated in"-valueJiten Online (Q115665803)
Single-value constraintyes
Distinct-values constraintyes
Wikidata projectWikiProject CJKV character (Q114615731), WikiProject Japan (Q8504015)

Motivation Edit

Jiten Online is a Japanese online dictionary. It has a large amount of kanji data sometimes including hanzi or hanja.  – The preceding unsigned comment was added by Laftp0 (talk • contribs).

Discussion Edit

formatter URL
Normal rank https://kanji.jitenon.jp/kanji/$1.html
applies if regular expression matches ^\d\d{0,1}$|^[1-4]\d{2}$|^500$
series ordinal 1
0 references
add reference
Normal rank https://kanji.jitenon.jp/kanji_/$1.html
series ordinal 2
0 references
add reference


add value


Is this not allowed?

Syunsyunminmin (talk) 12:53, 25 November 2022 (UTC)Reply[reply]

  •  Support --Okkn (talk) 05:50, 26 November 2022 (UTC)Reply[reply]
  • I support Syunsyunminmin's formatter. "kanji" prefix is really unnecessary, but it's a little unnatural that identifiers start with /. Laftp0 (talk) 06:36, 26 November 2022 (UTC)Reply[reply]
  •  Weak oppose I also find that site useful. However, the operating company is small and there are concerns about permanence.(私もサイトは有用だと思います。しかし、運営会社が小さく永続性に不安があります。)--Camillu87 (talk) 15:10, 27 November 2022 (UTC)Reply[reply]
  • @Camillu87:As per consensus rules, the closing property creator is free to disregard claims that are not substantiated. Do you have a reference that supports the claim that the company might be in trouble? Infrastruktur (talk) 19:20, 28 November 2022 (UTC)Reply[reply]
    The operating company was established in 2020 with capital of 3 million yen and 4 employees.([9],[10]) It is an unnamed company and website where no useful mention can be found even if you search on google etc., so it is doubtful whether it will still be in operation 5 or 10 years from now. I have no further information.(運営会社は2020年に設立され、資本金300万円、従業員4人の会社です。googleなどで検索しても有益な言及が見つからない無名の会社、webサイトですので、5年先、10年先も運営されてるかは疑問があります。それ以上の情報は持っていません。)--Camillu87 (talk) 13:44, 29 November 2022 (UTC)Reply[reply]
    Wayback Machineを見ると少なくとも2015年から運営されているのが確認できますし、予測しても仕方ないので、何か問題が起きてから削除依頼でもお願いします Laftp0 (talk) 12:46, 3 December 2022 (UTC)Reply[reply]
  • If there are no objections, I'll apply the Syunsyunminmin's formatter above. Laftp0 (talk) 12:47, 3 December 2022 (UTC)Reply[reply]
What Syunsyunminmin suggested won't work, Wikidata doesn't support applies if regular expression matches (P8460) (the property was created without any input from the Wikidata developers). - Nikki (talk) 18:55, 31 March 2023 (UTC)Reply[reply]

World Loanword Database meaning ID Edit

   Under discussion
DescriptionID for a meaning in World Loanword Database
RepresentsWorld Loanword Database (Q104243588)
Data typeExternal identifier
Domainitem
Example 1rabbit (Q9394)3-614
Example 2world (Q16502)1-1
Example 3murder (Q132821)21-42
Example 4petroleum (Q22656)23-195
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Number of IDs in source1814
Formatter URLhttps://wold.clld.org/meaning/$1
See alsoWikidata:Property proposal/World Loanword Database word ID

GZWDer (talk) 15:30, 19 December 2022 (UTC)Reply[reply]

Discussion Edit

  •  Support this seems like a useful database to link to, and since these pages are multi-lingual, linking to items rather than lexemes is the right thing here. ArthurPSmith (talk) 17:22, 20 December 2022 (UTC)Reply[reply]
  •  Oppose As far as I am aware, this database is no longer actively maintained beyond what is necessary to keep it online. As they do not take corrections or plan on expanding their database, the utility of linking to these is rather limited. -عُثمان (talk) 16:18, 27 December 2022 (UTC)Reply[reply]
    Even if the database is not updated, it may still be a useful resource. GZWDer (talk) 17:26, 27 December 2022 (UTC)Reply[reply]

Akkadian phonetic value Edit

   Under discussion
DescriptionAkkadian phonetic value of cuneiform signs
Representscuneiform sign (Q23017336)
Data typeString
Domainitem, cuneiform sign (Q23017336)
Example 1𒀭 (Q87555087)Akkadian phonetic valuean
Example 2𒆠 (Q87555819)Akkadian phonetic valueki
Example 3𒆠 (Q87555819)Akkadian phonetic valueqi₂
Example 4𒆠 (Q87555819)Akkadian phonetic valueke
Example 5𒆠 (Q87555819)Akkadian phonetic valueqe₂
Planned useAssign Akkadian phonetic values to cuneiform signs
Expected completenesseventually complete (Q21873974)

Motivation Edit

Adding Akkadian phonetic values to cuneiform signs. Sartma (talk) 18:51, 16 March 2023 (UTC)Reply[reply]

Discussion Edit

  •  Support --عُثمان (talk) 18:10, 24 April 2023 (UTC)Reply[reply]
  •  Comment is "item" really the corect domain? and can't other existing property be used like IPA transcription (P898) ? if not, is it really specific to Akkadian? Plus, "Akkadian phonetic value" only gives 8 results in Google, is it really the right name and an appropriate data? @Situxx: who may tell us more. PS: is it the same thing as the sux-Latn representation and the value for transliteration (P2440) in L:L1000845#F1? Cheers, VIGNERON (talk) 11:39, 1 May 2023 (UTC)Reply[reply]
    Hi there!
    @Item: Yes I think item is the correct domain, as signs themselves are currently represented using QIDs (the IDs of the Unicode code points) and from my point of view that is fine.
    There are of course numerous paleographic sign variants one could consider (which could get their own QIDs in the future), but the phonetic value of the sign usually does not change as far as I know.
    We have started to add more triples to e.g. https://www.wikidata.org/wiki/Q87555676 (cuneiform sign KA) for example (what it depicts, dictionary references a.s.o.) and would continue to do that for signs.
    @PropertyProposal:
    My question would be: What do we do with Sumerian, Hittite, Elamite and so on? Should we not rather define a property "phonetic value" and add the languages in which they occur with a qualifier?
    "The" reference list about that I know is Nuolenna: https://github.com/tosaja/Nuolenna/blob/master/sign_list.txt which contains about 11000 phonetic sign values, but to my knowledge irrespective of language.
    So I would think we would want a solution that fits all languages.
    "is it the same thing as the sux-Latn representation and the value for transliteration (P2440) in L:L1000845#F1?"
    ---> The way it is written in the proposal it seems to follow the ORACC transliteration style. However, the phonetic value is just the value of one of the syllables in this example "zi" would be one phonetic compound.
    @IPATranscription:
    I would not say that this is appropriate, because as far as I know the IPA is not the basis of the transliterations we use for Cuneiform.
    @Transliteration:
    There are many competing notations of transliterations of cuneiform texts, two of which I use for Sumerian (CDLI and ORACC formats).
    For phonetic values, these differ in the main following points:
    - Subscript numbers vs. diacritics: Some transliterations use diacritics for subscript 2 and 3 like in French, some do not
    - Usage of sz vs. š and some other similar cases of characters
    In Sumerian I currently use the CDLI Notation for the sux-latn representation and add transliterations to the forms.
    As I am in contact with a group of Digital Assyriologists, we are thinking about whether this is the right transliteration, but could, if we decide differently, apply some rules to convert from CDLI to other transliteration styles for sux-latn.
    @Opinion about the property proposal:
    We have talked about that it would be benefitial to have the list of phonetic values when querying a cuneiform sign in Wikidata directly, so in my opinion that would be a good addition.
    An alternative way would be to query for Lexemes which contain exactly the cuneiform sign that we are interested in and get is phonetic values from the transliterations. But that seems way more cumbersome than adding a property like this for all cuneiform languages. Situxx (talk) 15:03, 6 May 2023 (UTC)Reply[reply]
  •  Comment I agree with Situxx that it would be best to include a more general 'phonetic value' and qualify with the language at hand. This would then be useful for not only Akkadian, but for many other languages which use glyphs and ideograms in their writing system, such as Sumerian, Hittite, Ugaritic, Eblaite, Elamite, Old Persian, etc. I would also recommend qualifying such statements with the time period(s) when known, since these phonetic values do change depending on the time period, sometimes even collapsing different signs into the same reading in the first millennium BCE. Admndrsn (talk) 15:32, 6 May 2023 (UTC)Reply[reply]