Wikidata:Property proposal/Lexemes
Property proposal: | Generic | Authority control | Person | Organization |
Creative work | Place | Sports | Sister projects | |
Transportation | Natural science | Computing | Lexeme |
See also Edit
- Wikidata:Property proposal/Pending – properties which have been approved but which are on hold waiting for the appropriate datatype to be made available
- Wikidata:Properties for deletion – proposals for the deletion of properties
- Wikidata:External identifiers – statements to add when creating properties for external IDs
- Wikidata:Lexicographical data – information and discussion about lexicographic data on Wikidata
This page is for the proposal of new properties.
Before proposing a property
- Search if the property already exists.
- Search if the property has already been proposed.
- Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
- Select the right datatype for the property.
- Read Wikidata:Creating a property proposal for guidelines you should follow when proposing your property.
- Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.
Creating the property
- Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
- Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
- See property creation policy.
![]() |
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2023/10. |
Wikibase lexeme Edit
character in this lexeme Edit
Description | character(s) this lexeme consists of |
---|---|
Represents | cuneiform sign (Q23017336) |
Data type | Item |
Domain | lexeme, form |
Example 1 | ga/𒂷 (L726974) → 𒂷 (Q87555355) |
Example 2 | dumu/𒌉 (L643788) → 𒌉 (Q87556519) |
Example 3 | dingir/𒀭 (L724542) → 𒀭 (Q87555087) |
Planned use | Linking lexemes to character representations |
See also | Han character in this lexeme (P5425) which links Han Chinese characters in Japanese and Chinese lexemes to Unicode, https://www.wikidata.org/wiki/Wikidata:Property_proposal/Cuneiform_character_in_this_lexeme for the previous discussion on the property "cuneiform character in this lexeme" |
Motivation Edit
Currently, we lack a property in Wikidata to link lexeme representations to QIDs of characters of a given script. The examples above show how to link cuneiform lexemes to their character QIDs which represent their Unicode code points, but the property can be used to link any lexeme to relevant parts of the script it uses. Already, the property Han character in this lexeme (P5425) allows to link Han Chinese characters in Chinese, Japanese, and Vietnamese to their respective representations in Wikidata. This property proposal wants to generalize this property Han character in this lexeme (P5425) as "character in this lexeme" or let this anticipated property become a super property of Han character in this lexeme (P5425).
See also the discussion of https://www.wikidata.org/wiki/Wikidata:Property_proposal/Cuneiform_character_in_this_lexeme which led to the creation of this property proposal instead.
Support seems fine to me, either as a superproperty or replacement. ArthurPSmith (talk) 20:06, 4 May 2023 (UTC)
Support this is an excellent idea, since the more general use of 'character' will allow for a number of languages with the same issue to progress in morpho-graphemic annotation, including: Sumerian, Akkadian, Hittite, Hurrian, Ugaritic, Elamite, Old Persian, just to name a few. Admndrsn (talk) 9:11, 8 May 2023 (EST)
Comment Would this also mean that it could be used to say rød (L2310) character in this lexeme Ø (Q28827) and D (Q9884) — Finn Årup Nielsen (fnielsen) (talk) 18:45, 8 May 2023 (UTC)
- Yes, you could also use it for the Latin alphabet and in the example you proposed, even though I see people use this more in langauges like Chinese or Cuneiform where the individual characters often express its own meaning.
- But why not? You could query all lexemes with Ø (Q28827) if that is interesting to do. Situxx (talk) 13:35, 9 May 2023 (UTC)
- You already can do that: See this query for example. - Nikki (talk) 20:03, 24 June 2023 (UTC)
Support will help and advance digital cuneiform studies Enki75 (talk) 11 May 2023
Strong oppose Having a generic property like this is a really bad idea. Linking characters in lexemes to the corresponding items can easily be done automatically, so there should be a really good reason to add links manually instead. For Han character in this lexeme (P5425), that is because items for Han characters have useful lexicographical data on them which would otherwise end up duplicated as lexemes. My opposition to the previous proposal was because items for Cuneiform characters do not have useful lexicographical data on them, and that is still the case, looking at the items in the examples.
- If this is added, people will surely start mass-adding it for every lexeme eventually. We are already having problems with the query service because of the amount of data, and adding millions more statements linking every character in a lexeme would only cause more problems for us. - Nikki (talk) 20:03, 24 June 2023 (UTC)
- Here's a simple script I just made to list the characters in a lexeme automatically: User:Nikki/LexemeLinkCharacters.js. - Nikki (talk) 21:27, 24 June 2023 (UTC)
- Hi!
- Thanks for your comment.
- In the previous proposal, you wrote about the following properties as examples which would constitute the lexicographical data you are missing in the cuneiform examples:
- I quote: "(e.g. stroke count (P5205), grade of kanji (P5277), radical (P5280), ideographic description sequences (P5753))"
- All of this information can be added, we are just lacking properties for that as well, hence you only see information which can be added right now with the properties we have, which are the following:
- - stroke count: Is currently proposed here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gottstein_code
- - radicals are currently represented using "has parts" relations (at least for Unicode signs which allow so) (see here for an example https://www.wikidata.org/wiki/Q87555001)
- - depicts relations describe what the character depicts (which is often different from the sense of the Lexemes using the character)
- - dictionary references to signlists
- You can find an example in this web application which would also illustrate the main use case I have in mind: https://situx.github.io/paleordia/c/?q=Q87554995&qLabel=%F0%92%80%80
- The script you posted is certainly useful to link from a Lexeme to its characters, but my usecase is actually the opposite.
- I would for example like to know which Lexemes contain a cuneiform sign. Unless I have missed a better solution, the SPARQL query to achieve this would need a set of languages written in the cuneiform script and check the lemmas (maybe also forms) of all of these languages with regex matching.
- It is also on the homepage and runs only over Sumerian, but is already quite slow (https://situx.github.io/paleordia/c/?q=Q87554995&qLabel=%F0%92%80%80).
- If we had a property like the one proposed here I think it would be easier to query for the lexemes which fit a cuneiform sign or whatever other sign in other languages for that matter.
- Finally, there is the issue of paleographic sign variants:
- There might be certain Lexemes which are only written with sign variants of a specific shape.
- You can look at the sign AN here https://situx.github.io/paleordia/c/?q=Q87555087&qLabel=%F0%92%80%AD
- which looks different depending on the time period.
- We currently cannot express that as well, but we have the information and will gradually add them to Wikidata as a prototype of a digital paleography. Situxx (talk) 11:38, 26 June 2023 (UTC)
- Here's a simple script I just made to list the characters in a lexeme automatically: User:Nikki/LexemeLinkCharacters.js. - Nikki (talk) 21:27, 24 June 2023 (UTC)
use with article Edit
Description | The term (usually proper noun) is usually used with the specific article, or novalue if article should not be used (by default, any articles may be used if appropriate) |
---|---|
Data type | Lexeme |
Domain | lexeme, or sense, or form, if only apply to some sense(s) or form(s) |
Example 1 | United States (L43377) → the (L2768) |
Example 2 | Scotland (L254526) → novalue |
Example 3 | Louvre (L749827) → le (L2770) |
Example 4 | Mississippi (L447503-S2) → the (L2768) |
Example 5 | Palouse (L749829-S2) → the (L2768) |
Example 6 | Carolinas (L254532-F2) → the (L2768) |
GZWDer (talk) 20:44, 20 December 2022 (UTC)
Discussion Edit
Support. I added a French example. UWashPrincipalCataloger (talk) 21:51, 20 December 2022 (UTC)
Support -wd-Ryan (Talk/Edits) 22:01, 20 December 2022 (UTC)
Oppose We can already use requires grammatical feature (P5713) definite article (Q2865743) to say a definite article is required. - Nikki (talk) 13:41, 21 December 2022 (UTC)
- But Nikki, is there still a need to specify which definite article is used? UWashPrincipalCataloger (talk) 00:04, 24 December 2022 (UTC)
- @AdamSeattle: Do you have any examples where the article can't be inferred from other information (e.g. language, grammatical gender where applicable)? - Nikki (talk) 10:59, 20 May 2023 (UTC)
- @Nikki: I'm trying to think of an example in English where it wouldn't be "the". Perhaps something like Swede (L34485), where you can have "a Swede", or "the Swedes", or Swedes without an article (Swedes are often fluent in English). AdamSeattle (talk) 22:44, 20 May 2023 (UTC)
- But Nikki, is there still a need to specify which definite article is used? UWashPrincipalCataloger (talk) 00:04, 24 December 2022 (UTC)
- Question Are there any examples from languages where there is more than one definite article available? -عُثمان (talk) 17:40, 7 March 2023 (UTC)
Comment leaning towards
Oppose for the moment. @GZWDer: I don't rally understand the proposition right now, for the examples in English, it's a boolean (L2768 and novalue), the need for a value-centric property is not clear and other property can be used. For the French example, it's a bit strange (there was no gender on Louvre (L749827) and le (L2770) covers both masculine and feminine), again I don't see the need for this property. At the very least, more example and explanations are needed. Cheers, VIGNERON (talk) 09:45, 29 April 2023 (UTC)
Dicionário Aberto ID Edit
Description | identifier for entries on Dicionário Aberto |
---|---|
Data type | External identifier |
Domain | Portuguese lexemes |
Allowed values | [a-záàãâéêíóõôúç0-9\s'-.]+ |
Example 1 | atividade/actividade (L500628) → atividade |
Example 2 | vender (L52324) → vender |
Example 3 | para (L618867) → para |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Planned use | add to Portuguese lexemes or their forms |
Formatter URL | https://dicionario-aberto.net/search/$1 |
Motivation Edit
Enaldodiscussão 23:32, 11 January 2023 (UTC)
Discussion Edit
Online Etymology Dictionary ID Edit
Description | ID of a entry in Online Etymology Dictionary (with URI fragment) |
---|---|
Represents | URI fragment |
Data type | External identifier |
Domain | lexeme in English, Proto-Indo-European |
Allowed values | .*#etymonline_v_\d+ |
Example 1 | bow (L14698) → bow#etymonline_v_15679 |
Example 2 | bow (L184508) → bow#etymonline_v_15680 |
Example 3 | *ḱwṓ (L184995) → *kwon-#etymonline_v_52685 |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Formatter URL | https://www.etymonline.com/word/$1 |
GZWDer (talk) 17:58, 13 January 2023 (UTC)
Discussion Edit
Support --Tinker Bell ★ ♥ 20:25, 11 May 2023 (UTC)
Support The word arboricide is absent from our current English dictionary properties (Oxford is paywalled so I don't count it). Online Etymology has it arboricide -عُثمان (talk) 17:21, 6 June 2023 (UTC)
Dictionary of the Russian Language (Ozhegov) ID Edit
Dictionary of the Russian Language (Ozhegov) ID (vedu.ru) Edit
Description | ID of a word in Dictionary of the Russian Language (Ozhegov) provided by www.vedu.ru |
---|---|
Represents | Dictionary of the Russian Language (Q4423784) |
Data type | External identifier |
Domain | lexeme in Russian |
Example 1 | поиск (L147604) → 24340 |
Example 2 | судно (L717298) → 34272 |
Example 3 | судно (L184116) → 34271 |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Formatter URL | https://www.vedu.ru/expdic/$1/ |
See also | Wikidata:Property proposal/Great Encyclopedic Dictionary ID |
Dictionary of the Russian Language (Ozhegov) ID (slovarozhegova.ru) Edit
Description | ID of a word in Dictionary of the Russian Language (Ozhegov) provided by slovarozhegova.ru |
---|---|
Represents | Dictionary of the Russian Language (Q4423784) |
Data type | External identifier |
Domain | lexeme in Russian |
Example 1 | поиск (L147604) → 22117 |
Example 2 | судно (L717298) → 30962 |
Example 3 | судно (L184116) → 30961 |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Formatter URL | https://slovarozhegova.ru/word.php?wordid=$1 |
Note: It seems that the contents provided by the two websites are same, but ID are different. GZWDer (talk) 12:02, 14 January 2023 (UTC)
Discussion Edit
Support,
Notified participants of WikiProject Russia —MasterRus21thCentury (talk) 07:01, 17 January 2023 (UTC)
Comment Are you sure that this site is not copyright violation? AndyVolykhov (talk) 07:53, 17 January 2023 (UTC)
- I strongly object. Both online versions of this vocabulary are illegal. It is absolutely unclear who own the websites. It is even unknown which edition of the dictionary is published. Андрей Романенко (talk) 14:35, 17 January 2023 (UTC)
Oppose Agree. AndyVolykhov (talk) 12:09, 19 January 2023 (UTC)
- I don't speak russian. I couldn't find any ownership information on the first site. For the second site, it appears to be based on an OCR-copy of a dictionary written by someone who died in 1964. https://ru.wikipedia.org/?oldid=10326782 . Someone who knows russian law should have a look to see if this has fallen into public domain or not. I couldn't find ownership information for the second site either. Infrastruktur (talk) 23:16, 12 March 2023 (UTC)
Aragonario ID (6th version) Edit
Description | identifier for an Aragonese or Spanish lexeme in the Aragonese-Spanish online dictionary (version since January 2023) |
---|---|
Data type | External identifier |
Domain | Aragonese and Spanish lexemes |
Allowed values | [1-9][0-9]{6} |
Example 1 | augua (L8226) → 1074499 |
Example 2 | sangonera (L307650) → 1108110 |
Example 3 | abanderato (L647971) → 1070015 |
Source | https://aragonario.aragon.es/ |
Planned use | add to existing Aragonese and Spanish lexemes |
Number of IDs in source | between 75,137 (45,112 + 30,025) and 82,145 (1114581 - 1032437) |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | https://aragonario.aragon.es/words/$1/ |
See also | Aragonario ID (5th version) (P11071) |
Motivation Edit
It appears that this month, a new version of the Aragonario was launched with several thousand more entries compared to the original version, leading to the invalidation of all IDs from the previous version. This proposal covers IDs from the new version, in line with there being separate properties for new and former schemes.
(Those former IDs are not all lost, however, as the proposal for the property covering the previous version has a link to a spreadsheet with a complete list of all of those IDs I compiled a few months ago--they should continue to be added for posterity. I have now begun compiling a list of the newer IDs, and it is hoped that reconciling information between the two versions—which I intend to do myself—will be made easier as a result.) Mahir256 (talk) 17:57, 20 January 2023 (UTC)
Discussion Edit
- @Aradgl, Uesca: and @Nikki, عُثمان, Bovlb: from the previous proposal. Mahir256 (talk) 17:57, 20 January 2023 (UTC)
- I'm afraid, as expected, the aragonario has changed its routes and the Aragonario ID no longer works.
- It is not useful to use a web ID that can change. If someone wants to inquire about that lexeme, they can do so by searching for the lexeme itself in the Aragonario using the lexema or another source of information (paper dictionaries, for example).
- I do not have any kind of control over the Aragonario, nor can we demand anything of him.
- @Mahir256 Uesca (talk) 18:15, 20 January 2023 (UTC)
- @Uesca: There is still merit to retaining the old identifiers; many of them are still accessible through the Internet Archive, and its function of serving as an identifier has not really diminished (see, in addition to the 'former scheme' properties, ones like ISOCAT ID (P2263), Google+ ID (P2847), and other properties for discontinued websites). As for the issue of changes in IDs, these too can be reflected in the data; if their ability to change made them not useful, then properties for social media accounts--whose IDs can frequently be changed by their users--would also not be useful. Mahir256 (talk) 18:23, 20 January 2023 (UTC)
- Do they have any policy about identifiers? Do we have any contacts on their team that can advise us? I'd love to be able to map this stuff, but it's not very satisfactory to support a property for an identifier that can be invalidated on a whim. Bovlb (talk) 18:57, 20 January 2023 (UTC)
- @Bovlb: I sent an email to the address posted on the 'Contacto' page of that site asking about the stability of their identifiers. Mahir256 (talk) 20:13, 20 January 2023 (UTC)
- @Uesca, Bovlb: After resending the message once, I eventually got a reply. Mahir256 (talk) 17:33, 3 February 2023 (UTC)
- Hmm. Thanks for following up.
- When they say "right now we are creating permanent and stable links", does that mean that the links they currently create are permanent and stable, or does it mean that they're currently designing yet another version of identifiers, this time to be permanent and stable? Bovlb (talk) 18:38, 3 February 2023 (UTC)
- @Bovlb: This is what they had to say about that. Mahir256 (talk) 14:10, 8 February 2023 (UTC)
- @Mahir256 Hmm. From that response, it doesn't sound like we should proceed with this property at this time. Bovlb (talk) 15:47, 8 February 2023 (UTC)
- @Bovlb: I would agree, but it appears @Uesca: has begun adding these new IDs (accidentally?) using the existing property intended for the old IDs (e.g. fuyita (L1016834) has an ID which would not have worked prior to January 2023); if they are to be shifted, it would need to be to this proposed new property (lest they be removed completely or shifted to described at URL (P973)). Mahir256 (talk) 18:41, 12 February 2023 (UTC)
- @Mahir256 Hmm. From that response, it doesn't sound like we should proceed with this property at this time. Bovlb (talk) 15:47, 8 February 2023 (UTC)
- @Bovlb: This is what they had to say about that. Mahir256 (talk) 14:10, 8 February 2023 (UTC)
- @Uesca, Bovlb: After resending the message once, I eventually got a reply. Mahir256 (talk) 17:33, 3 February 2023 (UTC)
- @Bovlb: I sent an email to the address posted on the 'Contacto' page of that site asking about the stability of their identifiers. Mahir256 (talk) 20:13, 20 January 2023 (UTC)
- Do they have any policy about identifiers? Do we have any contacts on their team that can advise us? I'd love to be able to map this stuff, but it's not very satisfactory to support a property for an identifier that can be invalidated on a whim. Bovlb (talk) 18:57, 20 January 2023 (UTC)
- @Uesca: There is still merit to retaining the old identifiers; many of them are still accessible through the Internet Archive, and its function of serving as an identifier has not really diminished (see, in addition to the 'former scheme' properties, ones like ISOCAT ID (P2263), Google+ ID (P2847), and other properties for discontinued websites). As for the issue of changes in IDs, these too can be reflected in the data; if their ability to change made them not useful, then properties for social media accounts--whose IDs can frequently be changed by their users--would also not be useful. Mahir256 (talk) 18:23, 20 January 2023 (UTC)
Support Per above, at the very least these can be archived. --عُثمان (talk) 20:21, 20 January 2023 (UTC)
Dicionário inFormal ID Edit
Description | identifier for an entry on Dicionário inFormal |
---|---|
Represents | Dicionário inFormal (Q116273055) |
Data type | External identifier |
Domain | dictionary entry (Q1580166) |
Example 1 | menino (L669443) → menino |
Example 2 | taxista (L447872) → taxista |
Example 3 | chato (L671156) → chato |
Example 4 | merda (L448068) → merda |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Planned use | Portuguese lexemes or forms |
Expected completeness | always incomplete (Q21873886) |
Implied notability | Wikidata property for an identifier that does not imply notability (Q62589320) |
Formatter URL | https://www.dicionarioinformal.com.br/$1/ |
See also | Infopédia entry (P11485) |
Wikidata project | WikiProject Brazil (Q11134020), WikiProject Portugal (Q11142608) |
Motivation Edit
Crowdsourced dictionary with relevant lexical information that sometimes isn't available on traditional dictionaries, such as slangs and gender-neutral language. –Guttitto (talk) 04:13, 21 January 2023 (UTC)
Discussion Edit
Oppose. Unfortunately, it is full of redundant, meaningless or extremely offensive entries. I don't see how this can be useful for the lexemes. Enaldodiscussão 22:50, 25 January 2023 (UTC)
- See how many times it is linked on ptwiki or even enwikt, it's not a giant usage but I think it is notable enough to deserve an external identifier. I see it as a similar situation to Urban Dictionary, both are crowdsourced, of course not every entry is good. Although the property proposal for Uban Dictionary wasn't done, I think it should be reproposed. The usefulness is precisely on those entries which aren't available on traditional dictionaries, a term being offensive doesn't mean that it isn't a lexical unity, otherwise we wouldn't have merda (L448068) as a lexeme. –guttitto(talk · contribs) 03:22, 26 January 2023 (UTC)
Comment I'm not sure it's really much of an ID; it just returns definitions for whatever word you put in the web address. For example, I can try to look up the definition of "Wikidata" (which comes back empty). So I'm not sure about the need for this property when the lemma itself is the ID. --Yirba (talk) 13:14, 4 February 2023 (UTC)
- That makes sense, but we do have many other external IDs for dictionaries that are similar, e.g. Collins Online English Dictionary entry (P11230), Dicionário Priberam ID (P11526), The Britannica Dictionary entry (P11263) and Merriam-Webster online dictionary entry (P11130). –guttitto(talk · contribs) 16:32, 4 February 2023 (UTC)
gf-wordnet-lexeme Edit
Data type | External identifier |
---|---|
Example 1 | apple (Q89) → https://cloud.grammaticalframework.org/wordnet#apple_1_N |
Example 2 | points of the compass (Q11114344) → https://cloud.grammaticalframework.org/wordnet#southwest_1_N |
Example 3 | household (Q259059) → https://cloud.grammaticalframework.org/wordnet#household_N |
Source | https://cloud.grammaticalframework.org/wordnet |
Number of IDs in source | about 30 000, more in the future |
Motivation Edit
GF WordNet is a variant of WordNet which brings together lexicons for several languages. It is also compatible with the grammar libraries in GF which makes it possible to use the lexicon for parsing and natural language generation.
Unlike the traditional WordNet, a synset is a set of abstract word identifiers. Each identifier is then mapped to a word in a concrete language. The abstract words make it possible to preserve the translation equivalents, which is lost if only synsets are linked across languages.
Wikidata already links entities to WordNet. In fact half of the links are exported from GF WordNet. Linking to synsets, however, is not always enough to identify lexical items unambiguously. For example household_N is best linked to Q259059 but if the entire synset is linked then it also contains family_1_N, home_8_N and house_10_N.
Since we now experiment with using using GF for natural language generation, it is useful if we preserve more explicit links. A prototype for the data already exists. – The preceding unsigned comment was added by Kr.angelov (talk • contribs) at 10:44, 8 February 2023 (UTC).
SignPuddle page id Edit
Description | the page id of this lexeme in the SignPuddle sign language database |
---|---|
Data type | External identifier |
Allowed values | \d+&sid=\d+ |
Example 1 | Lexeme:L1014011 → 53&sid=1295 |
Example 2 | Lexeme:L1013911 → 4&sid=2643 |
Example 3 | Lexeme:L1014152 → 53&sid=1645 |
Source | |
Formatter URL | https://www.signbank.org/signpuddle2.0/canvas.php?ui=1&sgn=$1 |
Motivation Edit
SignPuddle is an open platform (CC BY-SA) where users can add signs in various sign languages. Each entry has a Puddle Page id. It holds each sign in FSW and SWU as well as a sutton signwriting representation
The sgn
parameter stands for the collection (eg. Language), the sid
parameter is the id of a particular sign within that collection.
You can find these ids by
- Go to signbank.org/signpuddle.
- Choose a language (in the blue area)
- Chose Dictionary or an equivalent link in the selected language.
- Navigate to Search by Word or an equivalent link in the selected language.
- use the search
- in the bottom of each result there is a link like Puddle Page 123456.
When clicking that links a page like signbank.org/signpuddle2.0/canvas.php?ui=1&sgn=4&sid=2643 should open.
Alternatively you can search Database xml dumps. Pick a collection like sgn53.spml for German Sign Language (Q33282). The first entry in the file would be 53&sid=1
.
–Shisma (talk) 20:54, 11 February 2023 (UTC)
Discussion Edit
Oppose as currently proposed because this is not an ID, it's a URL fragment. It doesn't correspond to the ID attributes in the downloadable files, nor to the URLs used by SignPuddle 3 (e.g. the URL for "sgn=53&sid=1295" can also be https://signpuddle.com/client/#!/dictionary/gsg-DE-dictionary-public/entry/1295 for the UI or https://signpuddle.com/server/dictionary/gsg-DE-dictionary-public/search/id/1295 for the API). Each dictionary is a separate ID space, so if we want these as identifiers, they should be separate properties. If we're going to store something that only applies to one URL, then it might as well stay as a URL. Also, since SignPuddle can have numerous entries for a single sign (e.g. 1394, 2067, 2948, 3533 and 5298 for ASL), if we are going to link to it, the links should probably be on forms not lexemes. - Nikki (talk) 19:11, 24 February 2023 (UTC)
- And if you're thinking of importing the data into Wikidata: Please don't. The SignPuddle data is uncurated and unverified. Anyone can add new data and there's no way (for normal users at least) to edit, remove or even flag bad or duplicate entries. The only way you can distinguish the good data from the bad data is by being able to read SignWriting and knowing the sign it's supposed to represent. - Nikki (talk) 19:26, 24 February 2023 (UTC)
Oppose per above -عُثمان (talk) 17:53, 7 March 2023 (UTC)
part of other combined lexeme Edit
Motivation Edit
This property can be used to specify that components of a compound lexeme A appear due to another compound lexeme B actually being used within A. It is particularly important whenever the components coming from B are disconnected, reordered (as in the first two examples), or inflected (as in the last two examples) within A. Mahir256 (talk) 16:04, 10 April 2023 (UTC)
Discussion Edit
- Comment: The use case in the first two examples is clear, but I am having trouble seeing what advantages using a property like this for continguous compounds offers. If the reason is to avoid adding forms on the compound lexeme, it doesn't negate the reasons to add forms for any other reason than in combines statements. For example, in order to put statements for "subject form" on usage examples on a given compound, it makes sense to have that form present. Simple compounds are also likely to form parts of other compounds, and it seems like it would be more confusing to be expected to represent a nest of compounds on each derived lexeme. For example, if we have a compound consisting of an adjective and a three-member verb construction like مار لے جاوݨ, then any additional lexeme employing that compound would have links from جاوݨ to مار لیݨ, لے جاوݨ, and مار لے جاوݨ. It seems a lot simpler just to link to the single compound, unless querying statements on the constituent lexemes has problems I am not aware of. -عُثمان (talk) 22:09, 11 April 2023 (UTC)
- On further thought, I am inclined to
Oppose this proposal in its current form, for the concerns above, and because it would place the lexeme containing the senses which contribute to the expression in the qualifiers instead of the main statement, which runs counter to the expectation of how the constituent senses will be indicated. I also don't believe it is desirable or possible to leave lexemes like the compound in the last one formless as the tendencies governing their use are not regular or predictable. With respect to examples like the first one, I think it would be preferable to just use multiple values for series ordinal as in ਹੈ ਨਹੀਂ ਗਾ/ہے نہیں گا (L700902) or M563x814S11550494x564S2a20a491x600S11530491x638S35500483x446S1ec10518x474S21600519x465S2ef00545x474S30122481x506S37806438x535S37a00437x535S37a00437x578S37a00437x623S1f510491x697S14c10487x739S37a00437x671S37a00437x719S37a00437x766S37a06439x812S37906486x812S2a530489x775 (L1082244). Possibly these could benefit from some way to make it clearer how exactly the constituent lexemes are being combined. -عُثمان (talk) 02:12, 23 April 2023 (UTC)
Beta Code Edit
Description | representation of Ancient Greek as ASCII characters |
---|---|
Represents | Beta Code (Q752325) |
Data type | String |
Domain | form |
Allowed values | [ "-}]+ |
Example 1 | Ἠέλιος/*)he/lios (L1095729) → *)he/lios |
Example 2 | εἲρειν (L1020946) → ei)\rein |
Example 3 | γλύφω (L7961) → glu/fw |
See also | ALA-LC romanization (P8991) |
Motivation Edit
Beta Code is a form of representing Ancient Greek (Q35497) letters as ASCII used by some research institutions, e.g. Perseus. You can try it in this web converter.-- Kristbaum (talk) 14:43, 25 April 2023 (UTC)
Discussion Edit
- Maybe it would be useful for forms too, but in most dictionaries it's only done once for the lemma. Kristbaum (talk) 14:51, 25 April 2023 (UTC)
Comment @Kristbaum: do we really need a property for that? I think it can simply be a form, see what I tried on L:L1095729#F1. Cheers, VIGNERON (talk) 11:24, 1 May 2023 (UTC)
- @VIGNERON: Cool idea, but wouldn't that conflict with the idea of a spelling variant? It's not a different spelling just a different representation. Is there maybe another example of a similar format to model this after? (If not I'm fine with your suggestion) --Kristbaum (talk) 12:14, 1 May 2023 (UTC)
- @Kristbaum: good question, the model is flexible enough to accept it but is it okay to do it? I'm not sure. Maybe a discussion on WD:LD would help to see clearer. Meanwhile, I don't have exact equivalent but I know that representations are used for very different things (from basic spelling variation L:L1473#F1 to script variation ਲੌਟਣ/لَوٹݨ (L1096159), or even syllables in Latin: L:L10907#F1 - the last probably being wrong), but I didn't crossed often such case. PS: see also the modeling on L:L1000845#F1. Cheers, VIGNERON (talk) 12:51, 1 May 2023 (UTC)
- @VIGNERON: Great examples, thank you! But wouldn't a property for Beta Code fit in with the numerous Romanization properties? E.g ALA-LC romanization (P8991) or ALA-LC romanization for Ukrainian (P9453)? Or do you think it would be possible to model them as spelling variants too? – The preceding unsigned comment was added by Kristbaum (talk • contribs) at 10:57, 1 May 2023 (UTC).
- @Kristbaum: good question, the model is flexible enough to accept it but is it okay to do it? I'm not sure. Maybe a discussion on WD:LD would help to see clearer. Meanwhile, I don't have exact equivalent but I know that representations are used for very different things (from basic spelling variation L:L1473#F1 to script variation ਲੌਟਣ/لَوٹݨ (L1096159), or even syllables in Latin: L:L10907#F1 - the last probably being wrong), but I didn't crossed often such case. PS: see also the modeling on L:L1000845#F1. Cheers, VIGNERON (talk) 12:51, 1 May 2023 (UTC)
- @VIGNERON: Cool idea, but wouldn't that conflict with the idea of a spelling variant? It's not a different spelling just a different representation. Is there maybe another example of a similar format to model this after? (If not I'm fine with your suggestion) --Kristbaum (talk) 12:14, 1 May 2023 (UTC)
Support Ionenlaser (talk) 13:01, 6 May 2023 (UTC)
- Comment; I would consider a property like this to be more suitable than adding a variant representation on the form. On the example of ਲੌਟਣ/لَوٹݨ (L1096159) above, I would use ISO 15919 transliteration (P5825) if I wanted to add a Romanized transcription, as this information is for specialized purposes and not part of the general written use of the language, and because it is useful to add qualifiers to statements about transcriptions. I would actually suggest that this be used on forms rather than the lexeme itself—print dictionaries are not necessarily concerned with recording every form of a word and make necessary compromises for space. As we do not have these limitations on Wikidata, I think it makes sense to attach these transcriptions to forms (and this would be in line with how existing properties are used). --عُثمان (talk) 18:07, 8 May 2023 (UTC)
lexical unit Edit
Represents | lexical unit (Q115862390) |
---|---|
Data type | Item |
Domain | lexeme, Q111352 |
Example 1 | lugal/𒈗 (L643713)lexical unitMesopotamian city (Q117000106) |
Example 2 | MISSING |
Example 3 | MISSING |
Source | https://framenet.icsi.berkeley.edu/fndrupal/WhatIsFrameNet, https://metaphor.icsi.berkeley.edu/pub/en/index.php/Category:Frame |
Motivation Edit
See discussion of the proposed property 'frame element of' for more on the frame semantics properties proposed here.
Discussion Edit
- It's unclear whether this should be about a lexeme or sense. ChristianKl ❪✉❫ 11:41, 20 March 2023 (UTC)
Kamus Besar Bahasa Indonesia Daring entry Edit
Description | identifier for an entry in the online version of Kamus Besar Bahasa Indonesia |
---|---|
Represents | Great Dictionary of the Indonesian Language (Q4200623) |
Data type | External identifier |
Domain | lexeme |
Allowed values | [a-z0-9\.,'\-_\(\); ]+ |
Example 1 | cagar budaya (L739124) → cagar budaya and cagar_budaya |
Example 2 | Yth. (L1119265) → Yth. |
Example 3 | Al-Qur'an (L1119263) → Al-Qur'an |
Example 4 | S-1 (L1119266) → S-1 |
Example 5 | umbi-umbian (L700147) → umbi-umbian |
Example 6 | patah tongkat berjeremang (L1119283) → patah tongkat berjeremang (patah sayap bertongkat paruh; patah tongkat bertelekan) |
Example 7 | pucuk dicinta, ulam tiba (L1119282) → pucuk dicinta, ulam tiba (hendak ulam pucuk menjulai) |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Number of IDs in source | 119,345 |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | https://kbbi.kemdikbud.go.id/entri/$1 |
Motivation Edit
KBBI is the most widely used dictionary by Indonesian. This will help folks at WikiProject Indonesia describing their source and providing useful link to authoritative information about Indonesian lexemes to others when they are creating or editing Indonesian lexemes. This property proposal would also enrich the existing properties such as Oxford English Dictionary numeric ID (P5275), Collins Online English Dictionary entry (P11230), Lëtzebuerger Online Dictionnaire ID (P9397), and Cambridge Dictionary entry (British English) (P11422). Labdajiwa (talk) 06:00, 24 May 2023 (UTC)
Discussion Edit
Support Seems ok to me. ArthurPSmith (talk) 20:49, 1 June 2023 (UTC)
- Comment: Given that this would potentially be as useful for the lexemes currently modeled as Malay (Q9237), I would like to see it clarified how either a) these languages should be merged or b) these languages will be maintained separately, that is, should identifiers from this dictionary be placed on equivalent/identical Malay and Indonesian lexemes, and is there a plan to ensure that information sourced from this dictionary is used to update lexemes in both varieties where applicable? -عُثمان (talk) 17:15, 6 June 2023 (UTC)
- Well, there was a discussion about this years ago and folks over there doesn't seem agree to merge Indonesian with Malay. I'd say this identifier will be used mainly for Indonesian. If this dictionary can be used in Malay lexemes, well there are plenty of entries of this dictionary marked as Malay (i.e. sama ada). Perhaps this dictionary can also be used in the lexemes of regional languages in Indonesia where many of them doesn't have its reliable online dictionary. And FYI, as of June 2023, Indonesian has 19,864 lexemes, while Malay has 2,729 lexemes. Labdajiwa (talk) 14:31, 7 June 2023 (UTC)
- @Labdajiwa: Are you sure the 'Mal' annotation doesn't simply mean 'used primarily in Malaysia' (i.e. Q15065–to be distinguished from the more general Q9237 to which at least a plurality of the entries in this dictionary would apply), in much the same way that there are Malay words used, say, mainly in Brunei or mainly in Singapore? You should clarify your 'perhaps' statement: can this or can this not be used for such regional languages? And lexeme count is not relevant when lexemes lack meanings; all Malay lexemes have at least one, while this cannot be said for the Indonesian ones. Mahir256 (talk) 04:39, 31 July 2023 (UTC)
- This Malay-Indonesian debate seems to be because of confusion. AFAIK, linguistically, in English, the official and standardized form of language in each country is called "Malaysian Malay" and "Indonesian". Both language are descended from (or "grouped in" probably could be another correct term) "Malay". Malay, translated to Indonesian, is Melayu. Melayu is commonly used in Indonesia to refer "a language that is used in Malaysia". Malaysian Malay could be translated to Indonesian as Melayu Malaysia, but nobody used that word in Indonesia. Merging proposal sounded like to merge Malaysian Malay and Indonesian, which I think is not possible because "A language is a dialect with an army and navy". Hddty (talk) 02:22, 12 August 2023 (UTC)
- @Labdajiwa: Are you sure the 'Mal' annotation doesn't simply mean 'used primarily in Malaysia' (i.e. Q15065–to be distinguished from the more general Q9237 to which at least a plurality of the entries in this dictionary would apply), in much the same way that there are Malay words used, say, mainly in Brunei or mainly in Singapore? You should clarify your 'perhaps' statement: can this or can this not be used for such regional languages? And lexeme count is not relevant when lexemes lack meanings; all Malay lexemes have at least one, while this cannot be said for the Indonesian ones. Mahir256 (talk) 04:39, 31 July 2023 (UTC)
- Well, there was a discussion about this years ago and folks over there doesn't seem agree to merge Indonesian with Malay. I'd say this identifier will be used mainly for Indonesian. If this dictionary can be used in Malay lexemes, well there are plenty of entries of this dictionary marked as Malay (i.e. sama ada). Perhaps this dictionary can also be used in the lexemes of regional languages in Indonesia where many of them doesn't have its reliable online dictionary. And FYI, as of June 2023, Indonesian has 19,864 lexemes, while Malay has 2,729 lexemes. Labdajiwa (talk) 14:31, 7 June 2023 (UTC)
etymologiebank.nl ID Edit
Description | identifier for an entry in the online Dutch etymology dictionary hosted by Instituut voor de Nederlandse Taal |
---|---|
Data type | External identifier |
Domain | Dutch lexemes |
Allowed values | [a-z]+[1-9]? |
Example 1 | spreken (L1130640) → spreken |
Example 2 | melk (L630689) → melk1 |
Example 3 | bier (L495801) → bier1 |
Source | https://etymologiebank.nl/ |
Planned use | adjust P973 values pointing to etymologiebank.nl to instead use this property |
Formatter URL | https://www.etymologiebank.nl/trefwoord/$1 |
See also | Oudnederlands Woordenboek GTB ID (P5937), Vroegmiddelnederlands Woordenboek GTB ID (P5938), Middelnederlandsch Woordenboek GTB ID (P5939), Wurdboek fan de Fryske taal GTB ID (P9158) |
Motivation Edit
This property will provide some authority control for Dutch lexemes. Mahir256 (talk) 15:58, 25 June 2023 (UTC)
Discussion Edit
Support ArthurPSmith (talk) 14:35, 27 June 2023 (UTC)
Vazhaju Word ID Edit
Description | identifier for an entry in the Vazhaju Tajik dictionary |
---|---|
Data type | External identifier |
Example 1 | لیفت/лифт (L1149428) Kdo2KJn |
Example 2 | دیریناقلیمشناسی/дериниқлимшиносӣ (L1004914) XKdnZetm |
Example 3 | آب زدن/об задан (L584728) atm1ydo |
Formatter URL | https://vazhaju.tj/word/_/$1 |
Motivation Edit
Vazhaju is an aggregate online dictionary of Tajik Persian which includes Perso-Arabic script lemmas. It would be useful for New Persian (Q56356571) lexemes. The entries are each drawn from a variety of sources; some from Dehkhoda Dictionary (Q1182988), some from Explanatory Dictionary of the Tajik Language (Q25592497), some from Q25582847, and a number with definitions from glossaries produced by the Vazhaju editors themselves. Example 1 is an entry for a recent loanword which does not appear in many Persian dictionaries. Example 2 is from a Tajik “translation” produced by Vazhaju of Dictionary of approved words of the Persian Language and Literature Academy (Q115664843), a glossary of technical neologisms current in Iran from which we have many lexemes. Example 3 is an entry for a compound verb drawing from a source other than Dehkhoda, which is already linked to many Persian lexemes. This site is also more stable than Dehkhoda, which has a tendency to go offline for days to weeks at a time (it is currently unavailable). -عُثمان (talk) 15:03, 16 August 2023 (UTC)
The formatter URL is https://vazhaju.tj/word/_/$1 (I don’t know why this is not in the property proposal template anymore.) -عُثمان (talk) 15:06, 16 August 2023 (UTC)
Discussion Edit
Sayed Ganj Balochi Glossary ID Edit
Description | entry in the Qamosona online reproduction Sayad Zahoor Shah Hashmi’s Balochi dictionary |
---|---|
Data type | External identifier |
Example 1 | وٹ (L1122042) aea45e5cb1 |
Example 2 | انگوری (L1149859) aeaa5b5a |
Example 3 | واب (L1150090) aea55861b0 |
Formatter URL | https://qamosona.com/G3/index.php/term/,6f57b19b61545fad9b9ea5$1.xhtml |
Motivation Edit
Sayed Ganj is one of the most important monolingual dictionaries of Balochi, and this online reproduction of it by Qamosona would be useful to link to Balochi lexemes via a property. The way the URLs for the entries work is the part of the string spanning from term/ to ...ea5 is specific to this dictionary; entry URLs in other dictionaries on Qamosona begin with a different unique string and can have separate properties. Note that in this dictionary the English word “synonym” is being used to mean diacriticized form of the headword rather than synonym, in case that is not clear. --عُثمان (talk) 14:37, 28 August 2023 (UTC)
Discussion Edit
Adoption variant Edit
Description | variant of this loanword with the same origin |
---|---|
Represents | adoption variant (Q116192653) |
Data type | Lexeme |
Domain | lexeme, loanwords |
Example 1 | カプッチーノ (L1163458)adoption variantカプチーノ (L691407) ↔ カプチーノ (L691407)adoption variantカプッチーノ (L1163458) (cappuccino (L618738)) |
Example 2 | ホチキス (L1159135)adoption variantホッチキス (L943113) ↔ ホッチキス (L943113)adoption variantホチキス (L1159135) (Hotchkiss (L943117)) |
Example 3 | リュックサック (L1163487)adoption variantルックザック (L1163486) ↔ ルックザック (L1163486)adoption variantリュックサック (L1163487) (Rucksack (L826807)) |
Expected completeness | eventually complete (Q21873974) |
See also | homograph lexeme (P5402), said to be the same as lexeme (P11577), alternative form (P8530) |
Constraints
- all statement should be symetric
- derived from lexeme (P5191) should be present on all lexemes with this property
Motivation
I observed this relation between loanwords only in japanese but i'm sure it exists in other languages as well.
Example: cappuccino (L618738) has be adopted to japanese with slight variations:
- カプッチーノ (L1163458) / kaputchīno
- カプチーノ (L691407) / kap(u)chīno
All the words I have found share exacly the same meaning but I suppose they don't have to 🤷
Discussion
If you have non-japanese examples, feel free to add them –Shisma (talk) 13:28, 22 September 2023 (UTC)
Wikibase form Edit
rhythmic weight Edit
Description | syllabification annotation for Urdu-based orthographies |
---|---|
Represents | vazn (Q115484594) |
Data type | String |
Domain | form |
Allowed values | (۰|۱|۲|\s)+ |
Example 1 | ਵੱਜਿਆ/وجّیا (L741125-F2) ISO 15919: (vaj.jiā); IPA: /ʋəd͡ʒˈjɑ/; Vazn: ۲۱ (21) |
Example 2 | ਆਇ/آۓ (L677817-F6) ISO 15919: (ā.i); IPA: /ɑ.i/; Vazn: ۰۰ (00) |
Example 3 | ISO 15919: (maṁg.vau.ṇa.gī.ā̃); IPA: /mɐŋɡˈʋɔ.ɳᵊˈɡɪ.ɑ̃/; Vazn: ۲۱۱۱۱ (21111) |
Example 4 | ݙاڈھے (L740480-F3) ISO 15919: (ḏ̣ā.ḍhe); IPA: /ᶑɑ.ɖʰɛ/; Vazn: ۱۱ (11) |
Source | an explanation may be observed by clicking on the “i” next to Vazn on Rekhta Dictionary entries |
Planned use | add to lexeme forms in Punjabi, Hindustani, and other languages used in Pakistan written with an Urdu-based orthography |
Motivation Edit
We have the existing hyphenation property for indicating the syllabification of lexeme forms, but this is poorly suited to languages which are typically written with vowels omitted and/or a cursive script which is not legible when hyphenated. The term "vazn" is used in the English version of the Rekhta online Hindustani dictionary, hence the name used here, but this property could also be labeled as "rhythmic weight" in English if that is preferable. Either way, I do intend the scope for this property to be for languages used in Pakistan commonly written with an Urdu-based orthography - without context for the ways in which syllable timing occurs in other languages which use cursive scripts, or for the syllabification conventions used elsewhere, it is hard to say how applicable this might be. If anyone does think this could be applied more broadly however, I am interested to hear.
To explain how it works, the idea is that each syllable is given a number 1 (۱) if it contains one consonant or cluster, or 2 (۲) if it contains two. Typically, short vowels between consonants are not indicated, so we can use these numbers to infer where syllable boundaries occur. The rules have to be bent a little bit for Punjabi as vowel-only syllables are very common compared to Urdu - 0 (۰) will represent no consonant syllables, and stressed semi-vowels will be treated as consonants. A transliterated example for the benefit of those not able to read those given in the proposal, taking a Punjabi word and pronunciation rules. Singular oblique form لفظ /ləf.ɐz/ : 21. Plural oblique form لفظاں /ləf.zɑ̃/ : 22. For the purposes of this format, the letter ں nun gunna is treated as a consonant. Its actual sound is conditioned by context and can be a nasal consonant or nasalised vowel, the latter of which still makes sense to treat as consonant-like for syllabification. عُثمان (talk) 17:26, 27 November 2022 (UTC)
Discussion Edit
Comment I don't know anything about this topic, but I think "rhythmic weight" or some other Enlish phrase would be a better label. "Vazn" can be used as an alias. — The Erinaceous One 🦔 07:41, 30 November 2022 (UTC)
- @The-erinaceous-one Fair enough; I have updated the title. I was thinking about it and there are ambiguities with وزن and other senses of the term; I will probably make this more specific in Punjabi as well. عُثمان (talk) 00:19, 17 December 2022 (UTC)
Comment Withdrawn in part due to lack of interest, and because I think when I proposed this my understanding of syllabification in these languages was oversimplified. -عُثمان (talk) 17:18, 6 June 2023 (UTC)
usage context of form or sense Edit
Currently when more than one type of context label property (Q116547761) is used, they are usually considered as having and relation. When more than one value is used in same property, the relationship is usually and other than variety of lexeme, form or sense (P7481), location of sense usage (P6084) and field of usage (P9488) which can only be or. This property will allow users to specify more complex context labels, such as a term that is obsolete in some places but common in another place. GZWDer (talk) 05:37, 1 February 2023 (UTC)
Discussion Edit
Question How about
monitor (L7068-S2)location of sense usage (P6084)Hong Kong (Q8646)
monitor (L7068-S2)location of sense usage (P6084)United Kingdom (Q145)
monitor (L7068-S2)language style (P6191)archaism (Q181970)valid in place (P3005)United Kingdom (Q145)
? Not that this would equally apply to example 1, but there the property in question is the same in both statements, so perhaps an OR relation would typically be inferred from that fact anyhow?―BlaueBlüte (talk) 06:09, 1 February 2023 (UTC)- For example: OALD labeled buggery (L1010889) as "British English, taboo or law". This can not be modeled if we only assume an and relation.--GZWDer (talk) 14:44, 1 February 2023 (UTC)
Comment @GZWDer: Based on your above comment it seems important to note that the term “label” in this proposal apparently isn’t (always?) used in the typical Wikidata sense “label of an item”, but in a dictionary-specific sense that the English Wiktionary calls “context label”. I think it might be helpful to amended the proposal to clarify what exactly each occurrence of “label” means. ―BlaueBlüte (talk) 20:53, 1 February 2023 (UTC)
- Renamed to "usage context of form or sense".--GZWDer (talk) 21:00, 1 February 2023 (UTC)
- See also Wikidata:Lexicographical data/Usage context for proposed modelling.--GZWDer (talk) 21:23, 2 February 2023 (UTC)
form decomposition Edit
Description | form decomposition |
---|---|
Data type | Item |
Domain | form |
Example 1 | nin9-zu/𒎐𒍪 (L643660-F2) → nin9/𒎐 (L643660-F1), nin9-zu/𒎐𒍪 (L643660-F2) → zu/𒍪 (L1116255-F1) (see also elaboration of Example 1) |
Example 2 | lugal/𒈗 (L643713-F10) → lugal[king][-ak][-ø] N.GEN.ABS (see elaboration of Example 2) |
Example 3 | in-pa3/𒅔𒅆𒊒 (L741253-F2) → i-n-pad[name][-ø] FIN.3-SG-H-A.V.3-SG-P (see elaboration of Example 3) |
Planned use | Linking forms to their compositions (Lexemes) and to attach the grammatical role of these compositions |
See also | combines lexemes (P5238) which allows to decompose a Lexeme into other lexemes which are parts of it |
Motivation Edit
Sumerian, as an agglutinative language derives its grammatical features from compositions of mainly suffixes which are attached to a Lexeme.
In Wikidata, we can already model Lexemes of the individual suffixes and we can create QIDs for the grammatical features that we need to describe a Lexeme Form.
What we miss is a way to decompose a lexeme form to represent how the suffixes represent the grammatical features which are assigned to the form.
One might argue that this is a trivial matter, as only suffixes are added and they can be described sufficiently to represent a grammatical feature.
However, in Sumerian, the interpretation of a word is usually broken down into a description of the chain of suffixes, or even vowels in suffixes, as exemplified here:
http://oracc.museum.upenn.edu/etcsri/parsing/index.html
This interpretation of a Sumerian form can become quite complex and is worth modeling in Wikidata, in my opinion.
To do that, we would need a property that allows for representing the decomposition of a form, similarly to "combines lexemes". Then, we would be able to list the individual suffixes or parts of suffixes in a list e.g. with "series ordinal" to explain the decomposition of the lexeme form completely in RDF.
Usage for other languages Edit
There can be many other potential application cases for this property in other languages such as:
- Turkish, Japanese as agglutinative languages (even though maybe with a clearer representation of Suffixes), e.g. all forms of 因る/よる (L11476)
- Arguably Indo-European languages, e.g. German gehst (L1026-F4) "gehst" could be separated into "geh" - STEM and "st" "second person singular present, indicative, active"
- Akkadian Cuneiform will need similar patterns for verbs, but also includes verbal roots, maybe Arabic is then also applicable
Elaboration on examples Edit
This section elaborates the aforementioned three examples for Sumerian.
Example 1: ninzu (nin9-zu/𒎐𒍪 (L643660-F2)) Edit
- Form: nin9-zu / 𒎐𒍪
- Grammatical interpretation: nin9=HEAD.zu=2-SG-POSS
This noun has a second person singular possessive case which is marked with the suffix zu/𒍪 (L1116255).
We would like to express that the suffix is marked with zu/𒍪 (L1116255) and that nin/𒊩𒌆 (L643660) is the HEAD and carries the meaning of the noun.
Representation in Wikidata Edit
- nin9-zu/𒎐𒍪 (L643660-F2) form decomposition nin9/𒎐 (L643660-F1)
- nin9-zu/𒎐𒍪 (L643660-F2) form decomposition zu/𒍪 (L1116255-F1)
Example 2: lugal (lugal/𒈗 (L643713-F10)) Edit
Taken from: https://github.com/cdli-gh/CDLI-CoNLL-to-CoNLLU-Converter/blob/master/resources/P100065.conll
- Genitive absolutive form of lugal (king)
- r.1.4 lugal lugal[king][-ak][-ø] N.GEN.ABS
This example shows, that the three forms (lugal/𒈗 (L643713-F7), lugal/𒈗 (L643713-F9), lugal/𒈗 (L643713-F10)) are written in the same way: "lugal".
Therefore, additional elaborations on why these forms are written in this way are needed.
The genitive absolutive case of lugal/𒈗 (L643713), lugal/𒈗 (L643713-F10) is comprised of three components:
- the STEM (lugal)
- the particle (-ak)
- the non-written marker for the absolutive case (it is always left empty)
In lugal/𒈗 (L643713-F10), the (-ak) is also not written, hence it is indistinguishable from the forms lugal/𒈗 (L643713-F7), lugal/𒈗 (L643713-F9) without additional context.
Hence, we would like to break down the grammatical composition with reference to the written and non-written parts of the form.
Representation in Wikidata Edit
- lugal/𒈗 (L643713-F10) form decomposition lugal/𒈗 (L643713-F7)
- lugal/𒈗 (L643713-F10) form decomposition -ak/𒀝 (L1117316-F1)
- lugal/𒈗 (L643713-F10) form decomposition -ø (L1117775-F1)
Example 3: pad3 (in-pa3/𒅔𒅆𒊒 (L741253-F2)) Edit
Taken from: https://github.com/cdli-gh/CDLI-CoNLL-to-CoNLLU-Converter/blob/master/resources/P100065.conll
- r.3.3 in-pa3 i-n-pad[name][-ø] FIN.3-SG-H-A.V.3-SG-P
The example for inpad shows a representation of the in-pa3/𒅔𒅆𒊒 (L741253-F2) with the sense "to name" in Sumerian.
The verb describes its directly associated subject and its associated direct object with different grammatical parameters.
- Subject: The subject is described as "third person singular finite human agent", which manifests itself in the prefix "in"
- Direct Object: The direct object is described as "third person singular" and manifests itself in the non-written suffix -ø (L1117775-F1) .
Representation in Wikidata Edit
- in-pa3/𒅔𒅆𒊒 (L741253-F2) form decomposition in-/𒅔 (L1117776-F1)
- in-pa3/𒅔𒅆𒊒 (L741253-F2) form decomposition pad3/𒅆𒊒 (L741253-F1)
- in-pa3/𒅔𒅆𒊒 (L741253-F2) form decomposition -ø (L1117775-F1)
– The preceding unsigned comment was added by Situxx (talk • contribs) at 21:45, April 28, 2023 (UTC).
Discussion Edit
Support this property is a useful and essential addition for abstracting complex linguistic issues. KaCeBe (talk) 11:53, 17 May 2023 (UTC)
Comment This seems useful but could be clarified a bit. Wouldn't object has role (P3831) be more appropriate as a qualifier given that the object is the form being linked to, and the subject is the form carrying the statement? Maybe even a different property (or a new one) would make more sense here. I will try to find some examples from other languages to see if that helps clarify anything --عُثمان (talk) 17:32, 6 June 2023 (UTC)
- @Situxx: OK, having thought about it a bit more I have some more specific comments: I am not sure if it is necessary to use subject stated as to qualify the statements. An issue with that property is that it does not allow specifying a language code for the string, and for languages where this information could be presented in multiple ways it is unclear how to use. One approach I have been going with for "zero morphemes" which are common in a Punjabi is to use non-printing Unicode characters as representations of individual forms. ("Left to Right Mark" and "Arabic Letter Mark" for LTR and RTL representations respectively.) This allows attaching additional data to the zero representation, and indicating that a form is an empty string without using a qualifier. See / (L718607) for example where this verbal suffix is most often unrealized, but has different forms historically and in some dialects. Rather than using subject named as, I think it would make sense to separate forms like this and select the combining form which has the correct representative string(s).
- Then, for example, it could be stated that ਉੱਠ/اُٹھّ (L1044310-F13) employs the suffix / (L718607-F1), while ਉੱਠੀ/اُٹھّی (L1044310-F14) employs ੀ/ی (L718607-F3). It would not be clear in the second case how to represent both ਈ and ی using subject named as whereas using the linked form in both cases we can get a representation of the combining form for each language/script code -عُثمان (talk) 20:25, 7 June 2023 (UTC)
- Thank you very much for your remarks. I think using zero morphemes as forms for the suffixes we have in Sumerian that can be omitted is a great idea. I will adapt that and update my proposal accordingly. As for subject has role vs. object has role I think you are right. It should be object has role as the role of the suffix is described and not the grammatical feature of the subject (the form) which is already described in the grammatical feature description. I will adapt that as well and give you a heads up once I am done. Situxx (talk) 13:47, 9 June 2023 (UTC)
Wikibase sense Edit
FrameNet Frame ID Edit
Description | identifier of a concept in FrameNet |
---|---|
Represents | FrameNet frame identifier (Q113847330) |
Data type | External identifier |
Domain | Wikibase item (Q29934200) |
Allowed values | [A-Z][a-zA-Z_]* |
Example 1 | addiction (Q12029) → Addiction |
Example 2 | parturition (Q34581) → Giving_birth |
Example 3 | laying eggs (Q65129133) → Giving_birth |
Example 4 | noxa (Q50379880) → Toxic_substance |
Source | https://framenet.icsi.berkeley.edu/fndrupal/frameIndex |
Number of IDs in source | 1214 |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | https://framenet2.icsi.berkeley.edu/fnReports/data/frame/$1.xml |
Applicable "stated in"-value | FrameNet (Q1322093) |
Single-value constraint | yes |
Distinct-values constraint | no |
Wikidata project | lexicographical data in Wikidata (Q51955175) |
Motivation Edit
FrameNet is an English language reference in the field of frame semantics (Q2713996) which allows items in Wikidata (which may be linked to senses) to be mapped to a relevant frame in FrameNet. Dhx1 (talk) 16:05, 8 September 2022 (UTC)
Discussion Edit
Support Looks interesting. ArthurPSmith (talk) 17:42, 9 September 2022 (UTC)
Support -Middle river exports (talk) 05:28, 11 September 2022 (UTC)
Question That's good, but shouldn't this be attached to lexemes? Eg at https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Giving_birth.xml the second Lexical Unit is beget. WD has no item of that name, but has a lexeme https://www.wikidata.org/wiki/Lexeme:L20916. Unfortunately I don't know much about lexemes: do they even have external-id properties? --Vladimir Alexiev (talk) 11:40, 14 September 2022 (UTC)
- This was proposed as a property for lexeme senses rather than lexemes themselves - lexemes typically do have external-id properties, but there are none that I know of for senses. Senses are often linked to each other using item for this sense (P5137), which can be used to derive information about how lexical items across (or within) languages have a concept in common with each other.
- I interpreted this proposal as suggesting that in adding this identifier to Wikidata items, there is some extra semantic data that is linked to on the item for the concept itself, that would then be indirectly linked any time a new sense is created which links to the item for that concept. If we were using this identifier on lexeme senses themselves, we would have to create duplicate statements for this. Middle river exports (talk) 04:18, 15 September 2022 (UTC)
Wait I find the example of parturition (Q34581) and laying eggs (Q65129133) problematic. Giving_birth looks to me like it refers to an superclass of parturition (Q34581) and laying eggs (Q65129133). I would prefer a distinct values constraint and generally creating superclasses in cases like this. Here, giving birth (Q3367001) should likely be fixed to be a superclass of parturition (Q34581) and laying eggs (Q65129133) and then linked with Giving_birth. ChristianKl ❪✉❫ 14:14, 28 February 2023 (UTC)
World Loanword Database word ID Edit
Description | ID for a word in World Loanword Database (link to senses) |
---|---|
Represents | World Loanword Database (Q104243588) |
Data type | External identifier |
Domain | sense |
Example 1 | L3341-S1 → 72212319483460164 |
Example 2 | L3341-S2 → 72212319483460164 |
Example 3 | L3334-S2 → 72212320343848176 |
Example 4 | L3334-S3 → 72212319922910265 |
Example 5 | L5725-S2 → 72212320295447116, 92181432331044465-1 |
Example 6 | L749443-S1 → 92181432215014552-1 |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Number of IDs in source | 57926 (target word only, the database also contain source words) |
Formatter URL | https://wold.clld.org/word/$1 |
See also | Wikidata:Property proposal/World Loanword Database meaning ID |
GZWDer (talk) 15:52, 19 December 2022 (UTC)
Discussion Edit
Comment It's not really an external identifier for senses, but neither is it for lexemes - more for etymologies I guess but we don't really have a granularity level associated with that in Wikidata. I think I'd prefer it to be linked to lexemes rather than senses though. ArthurPSmith (talk) 17:19, 20 December 2022 (UTC)
semantic derivation Edit
Description | links a lexeme sense to a particular sense it is derived from |
---|---|
Data type | Sense |
Example 1 | ਹਜ਼ਾਰਹਾ/ہزارہا (L1088987-S1): derived from: هزار (L1088983-S1); mode of derivation (P5886): extension of meaning (Q115378929) |
Example 2 | अंग लगना/انگ لگنا (L615866-S1): derived from: ਅੰਗ ਲਾਵਣ/انگ لاوݨ (L1093797-S2); mode of derivation (P5886): contraction of meaning (Q115379188) |
Example 3 | अंग लगाना/انگ لگانا (L615868-S1): derived from: ਅੰਗ ਲੱਗਣ/انگ لگݨ (L1093794-S4); mode of derivation (P5886): contraction of meaning (Q115379188) |
Explanation Edit
For the benefit of those unfamiliar with Punjabi, the examples illustrate the following:
- The Punjabi adjective ਹਜ਼ਾਰਹਾ/ہزارہا (L1088987) is derived in form from the plural of the Persian noun هزار (L1088983). The primary sense of the Persian noun is the number “one thousand,” and the primary sense of the derived adjective is equivalent to “many thousands of,” for use in expressions with meanings like “thousands of years ago.” This may be considered a sense which elaborates on or extends from the original Persian sense.
- The Hindustani compound verb अंग लगना/انگ لگنا (L615866) may be considered a loan of Punjabi ਅੰਗ ਲਾਵਣ/انگ لاوݨ (L1093797) due to the fact that both its constituents are themselves borrowings from Punjabi. Sense 1 on the Hindustani lexeme may be described as “to be embraced,” or in a more literal translation of the definition given in Hindi/Urdu dictionaries, “to be held with the chest.” The corresponding sense 2 on the Punjabi lexeme is very close but not the same: “for bodies to have gone and touched each other.” This sense is more broadly applicable and is not necessarily restricted to chest hugs—there are other senses on similar compounds for expressing that in Punjabi. The Hindustani sense may be considered a contraction or narrowing of the original sense.
- The Hindustani compound verb अंग लगाना/انگ لگانا (L615868) may be similarly treated as a loan of Punjabi ਅੰਗ ਲੱਗਣ/انگ لگݨ (L1093794). However, these two lexemes have no senses which may be considered exact matches for translation between each other. The Hindustani sense used in the example has the meaning of “to take in marriage.” This meaning is a specific one which would be included in sense 4 on the Punjabi compound, which has a more generic meaning of “to bond with / form a relationship with.” This sense in Hindustani may be considered a contraction or narrowing of the original meaning.
Qualifier values Edit
These values for mode of derivation (P5886) on senses are proposed. The items are based on a list provided in the book The Punjabi Language: Sources and Forms (Q115327155) (in Punjabi). This list is likely not exhaustive and could be expanded.
- extension of meaning (Q115378929)
- contraction of meaning (Q115379188)
- pejoration of meaning (Q115379263)
- transfer of meaning (Q115379210)
- amelioration of meaning (Q115379237)
(A review of the English Wikipedia article section on this topic may be helpful for more context on where these values come from.)
Motivation Edit
While it is currently possible to indicate some semantic derivation using derived from lexeme (P5191) on the lexeme level using qualifiers for the object and subject sense, this is not well suited for more complex situations. There may be multiple senses between lexemes which are unquestionably related to each other, but which cannot be considered translations or synonyms of one another. Linking these senses to each other avoids ambiguity about which senses correspond to each other between lexemes with multiple senses.
Any feedback and/or more examples would be very much welcome.
Note that “derived from sense” may also be an appropriate English label, but this name was taken by a page for a previous property proposal. -عُثمان (talk) 23:02, 17 April 2023 (UTC)
Discussion Edit
- Tend to
Support in general, as it seems like something whose scope could be broadened beyond the five mode derivations given. Mahir256 (talk) 23:38, 17 April 2023 (UTC)
Klingon Word Wiki id Edit
Data type | External identifier |
---|---|
Domain | sense |
Allowed values | ^[A-Za-z-]+#[1-9]\d+$ |
Example 1 | 'ul/ (L1174117-S1) → -ul#1 |
Example 2 | DIS/ (L624951-S1) → DIS#2 |
Example 3 | DIS/ (L624951-S3) → DIS#1 |
Example 4 | Sop/ (L1001115-S1) → Sop#1 |
Source | klingon.wiki (english), klingon.wiki (german) |
Number of IDs in source | 5119 senses (in 4788 lexemes) |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | http://klingon.wiki/Word/$1 (english) http://klingon.wiki/Wort/$1 (german) |
Applicable "stated in"-value | Klingon Word Wiki (Q122879303) |
Distinct-values constraint | yes |
Motivation Edit
klingon.wiki (Klingon Language Wiki (Q122886642)) is a wiki that documents the use of klingon in Books, Films & TV shows. It also provides a dictionary (Klingon Word Wiki (Q122879303)) in English and German of which English is the most complete.
Each Page representing a lemma is namespaced with Word
or Wort
depending on the target language.
Pages in this namespace represent lemmas, or groups of homograph lexemes. A lemma may contain different lexemes like Word/HoH#1 (noun) and ord/HoH#2 (verb). Ultimatly each fragment (#1
, #2
)represents a sense, as seen in Word/DIS#1 (noun, sense #1) and Word/DIS#2 (noun, sense #2).
Every sense contains a section about the source of each particular sense. most contain a usage example. All contain information about the lexical category and paradigm class of the associated lexeme.
In order a get a permalink to a sense you can use the 🔗 link next to the headline.
While the lemmas are expected to be stable, the senses might flip to a different ordinal whenever a new sense emerges. Which shouldn't happen often. –Shisma (talk) 09:07, 1 October 2023 (UTC)
Talk Edit
Notified participants of WikiProject Star Trek –Shisma (talk) 10:38, 1 October 2023 (UTC)
Other Edit
Jiten Online kanji ID Edit
Motivation Edit
Jiten Online is a Japanese online dictionary. It has a large amount of kanji data sometimes including hanzi or hanja. – The preceding unsigned comment was added by Laftp0 (talk • contribs).
Discussion Edit
Comment Do you know what the different prefixes signify? I.e. is there a systematic difference between
,kanjiy
, etc.? --Nw520 (talk) 09:55, 25 November 2022 (UTC)kanjib
- It just seems to move to next alphabet every 500 entries like 001~500 is "kanji" and 501~1000 is "kanjib". But it doesn't move from "kanjiy" which has more than 15000 entries. Also kanji[b-y] can be replaced with "kanji_". e.g: https://kanji.jitenon.jp/kanji_/26151.html Laftp0 (talk) 10:30, 25 November 2022 (UTC)
Support looks like a useful site. does this site just catalogue kanji? if so i think "kanji" can be dropped from the identifier, leaving either a number or an ID in the form "b/670". Infrastruktur (talk) 11:37, 25 November 2022 (UTC)
Notified participants of WikiProject Japan --Nw520 (talk) 09:57, 25 November 2022 (UTC)
Support
formatter URL |
| |||||||||||||||||||||||||
add value |
Is this not allowed?
Syunsyunminmin (talk) 12:53, 25 November 2022 (UTC)
Support --Okkn (talk) 05:50, 26 November 2022 (UTC)
- I support Syunsyunminmin's formatter. "kanji" prefix is really unnecessary, but it's a little unnatural that identifiers start with /. Laftp0 (talk) 06:36, 26 November 2022 (UTC)
Weak oppose I also find that site useful. However, the operating company is small and there are concerns about permanence.(私もサイトは有用だと思います。しかし、運営会社が小さく永続性に不安があります。)--Camillu87 (talk) 15:10, 27 November 2022 (UTC)
- @Camillu87:As per consensus rules, the closing property creator is free to disregard claims that are not substantiated. Do you have a reference that supports the claim that the company might be in trouble? Infrastruktur (talk) 19:20, 28 November 2022 (UTC)
- The operating company was established in 2020 with capital of 3 million yen and 4 employees.([9],[10]) It is an unnamed company and website where no useful mention can be found even if you search on google etc., so it is doubtful whether it will still be in operation 5 or 10 years from now. I have no further information.(運営会社は2020年に設立され、資本金300万円、従業員4人の会社です。googleなどで検索しても有益な言及が見つからない無名の会社、webサイトですので、5年先、10年先も運営されてるかは疑問があります。それ以上の情報は持っていません。)--Camillu87 (talk) 13:44, 29 November 2022 (UTC)
- @Camillu87:As per consensus rules, the closing property creator is free to disregard claims that are not substantiated. Do you have a reference that supports the claim that the company might be in trouble? Infrastruktur (talk) 19:20, 28 November 2022 (UTC)
- If there are no objections, I'll apply the Syunsyunminmin's formatter above. Laftp0 (talk) 12:47, 3 December 2022 (UTC)
- @Syunsyunminmin: There is a bug in it, the first formatter should be "https://kanji.jitenon.jp/kanji$1.html", and the regex should be fixed up to match. Infrastruktur (talk) 14:42, 3 December 2022 (UTC)
- @Infrastruktur: There seems to be a misunderstanding between you and me. I expect the value of this property to be numeric only. ex,
26151
Syunsyunminmin (talk) 14:59, 3 December 2022 (UTC)
- @Infrastruktur: There seems to be a misunderstanding between you and me. I expect the value of this property to be numeric only. ex,
- What Syunsyunminmin suggested won't work, Wikidata doesn't support applies if regular expression matches (P8460) (the property was created without any input from the Wikidata developers). - Nikki (talk) 18:55, 31 March 2023 (UTC)
- @Syunsyunminmin: There is a bug in it, the first formatter should be "https://kanji.jitenon.jp/kanji$1.html", and the regex should be fixed up to match. Infrastruktur (talk) 14:42, 3 December 2022 (UTC)
World Loanword Database meaning ID Edit
Description | ID for a meaning in World Loanword Database |
---|---|
Represents | World Loanword Database (Q104243588) |
Data type | External identifier |
Domain | item |
Example 1 | rabbit (Q9394) → 3-614 |
Example 2 | world (Q16502) → 1-1 |
Example 3 | murder (Q132821) → 21-42 |
Example 4 | petroleum (Q22656) → 23-195 |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Number of IDs in source | 1814 |
Formatter URL | https://wold.clld.org/meaning/$1 |
See also | Wikidata:Property proposal/World Loanword Database word ID |
GZWDer (talk) 15:30, 19 December 2022 (UTC)
Discussion Edit
Support this seems like a useful database to link to, and since these pages are multi-lingual, linking to items rather than lexemes is the right thing here. ArthurPSmith (talk) 17:22, 20 December 2022 (UTC)
Oppose As far as I am aware, this database is no longer actively maintained beyond what is necessary to keep it online. As they do not take corrections or plan on expanding their database, the utility of linking to these is rather limited. -عُثمان (talk) 16:18, 27 December 2022 (UTC)
Akkadian phonetic value Edit
Description | Akkadian phonetic value of cuneiform signs |
---|---|
Represents | cuneiform sign (Q23017336) |
Data type | String |
Domain | item, cuneiform sign (Q23017336) |
Example 1 | 𒀭 (Q87555087)Akkadian phonetic valuean |
Example 2 | 𒆠 (Q87555819)Akkadian phonetic valueki |
Example 3 | 𒆠 (Q87555819)Akkadian phonetic valueqi₂ |
Example 4 | 𒆠 (Q87555819)Akkadian phonetic valueke |
Example 5 | 𒆠 (Q87555819)Akkadian phonetic valueqe₂ |
Planned use | Assign Akkadian phonetic values to cuneiform signs |
Expected completeness | eventually complete (Q21873974) |
Motivation Edit
Adding Akkadian phonetic values to cuneiform signs. Sartma (talk) 18:51, 16 March 2023 (UTC)
Discussion Edit
Support --عُثمان (talk) 18:10, 24 April 2023 (UTC)
Comment is "item" really the corect domain? and can't other existing property be used like IPA transcription (P898) ? if not, is it really specific to Akkadian? Plus, "Akkadian phonetic value" only gives 8 results in Google, is it really the right name and an appropriate data? @Situxx: who may tell us more. PS: is it the same thing as the sux-Latn representation and the value for transliteration (P2440) in L:L1000845#F1? Cheers, VIGNERON (talk) 11:39, 1 May 2023 (UTC)
- Hi there!
- @Item: Yes I think item is the correct domain, as signs themselves are currently represented using QIDs (the IDs of the Unicode code points) and from my point of view that is fine.
- There are of course numerous paleographic sign variants one could consider (which could get their own QIDs in the future), but the phonetic value of the sign usually does not change as far as I know.
- We have started to add more triples to e.g. https://www.wikidata.org/wiki/Q87555676 (cuneiform sign KA) for example (what it depicts, dictionary references a.s.o.) and would continue to do that for signs.
- @PropertyProposal:
- My question would be: What do we do with Sumerian, Hittite, Elamite and so on? Should we not rather define a property "phonetic value" and add the languages in which they occur with a qualifier?
- "The" reference list about that I know is Nuolenna: https://github.com/tosaja/Nuolenna/blob/master/sign_list.txt which contains about 11000 phonetic sign values, but to my knowledge irrespective of language.
- So I would think we would want a solution that fits all languages.
- "is it the same thing as the sux-Latn representation and the value for transliteration (P2440) in L:L1000845#F1?"
- ---> The way it is written in the proposal it seems to follow the ORACC transliteration style. However, the phonetic value is just the value of one of the syllables in this example "zi" would be one phonetic compound.
- @IPATranscription:
- I would not say that this is appropriate, because as far as I know the IPA is not the basis of the transliterations we use for Cuneiform.
- @Transliteration:
- There are many competing notations of transliterations of cuneiform texts, two of which I use for Sumerian (CDLI and ORACC formats).
- For phonetic values, these differ in the main following points:
- - Subscript numbers vs. diacritics: Some transliterations use diacritics for subscript 2 and 3 like in French, some do not
- - Usage of sz vs. š and some other similar cases of characters
- In Sumerian I currently use the CDLI Notation for the sux-latn representation and add transliterations to the forms.
- As I am in contact with a group of Digital Assyriologists, we are thinking about whether this is the right transliteration, but could, if we decide differently, apply some rules to convert from CDLI to other transliteration styles for sux-latn.
- @Opinion about the property proposal:
- We have talked about that it would be benefitial to have the list of phonetic values when querying a cuneiform sign in Wikidata directly, so in my opinion that would be a good addition.
- An alternative way would be to query for Lexemes which contain exactly the cuneiform sign that we are interested in and get is phonetic values from the transliterations. But that seems way more cumbersome than adding a property like this for all cuneiform languages. Situxx (talk) 15:03, 6 May 2023 (UTC)
Comment I agree with Situxx that it would be best to include a more general 'phonetic value' and qualify with the language at hand. This would then be useful for not only Akkadian, but for many other languages which use glyphs and ideograms in their writing system, such as Sumerian, Hittite, Ugaritic, Eblaite, Elamite, Old Persian, etc. I would also recommend qualifying such statements with the time period(s) when known, since these phonetic values do change depending on the time period, sometimes even collapsing different signs into the same reading in the first millennium BCE. Admndrsn (talk) 15:32, 6 May 2023 (UTC)