Wikidata:Property proposal/Lexemes
Property proposal: | Generic | Authority control | Person | Organization |
Creative work | Place | Sports | Sister projects | |
Transportation | Natural science | Computing | Lexeme |
See also
edit- Wikidata:Property proposal/Pending – properties which have been approved but which are on hold waiting for the appropriate datatype to be made available
- Wikidata:Properties for deletion – proposals for the deletion of properties
- Wikidata:External identifiers – statements to add when creating properties for external IDs
- Wikidata:Lexicographical data – information and discussion about lexicographic data on Wikidata
This page is for the proposal of new properties.
Before proposing a property
- Search if the property already exists.
- Search if the property has already been proposed.
- Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
- Select the right datatype for the property.
- Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
- Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.
Creating the property
- Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
- Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
- See property creation policy.
![]() |
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2025/05. |
Wikibase lexeme
editMorphological category
editDescription | Morphology of the Arabic triliteral verbs |
---|---|
Data type | Lexeme |
Example 1 | ك ت ب (L2308) → 1st category (Q133366739) 2nd category (Q133369534) |
Example 2 | ش ر ب (L1215988) → 1st category (Q133366739) 4th category (Q133369952) |
Example 3 | ح س ب (L12112) → 1st category (Q133366739) 4th category (Q133369952) 5th category (Q133370261) 6th category (Q133370238) |
Motivation
editWe are planning to import database that includes all patterns of the arabic verbs.--Michel Bakni (talk) 08:55, 19 March 2025 (UTC)
Discussion
edit- Support ForzaGreen (talk) 09:34, 19 March 2025 (UTC)
- Support --عبد الجليل 09 (talk) 14:21, 19 March 2025 (UTC)
- Support, Very useful property. Ahmed Naji Talk 19:19, 24 March 2025 (UTC)
- Oppose This information seems better placed on lexemes for the verbs themselves (possibly with conjugation class (P5186), in the way that Hebrew binyanim are currently indicated?), rather than on lexemes for their roots. Maybe @عُثمان: has other ideas. Mahir256 (talk) 17:32, 25 March 2025 (UTC)
- Oppose but recommend a property like this for use on verb lexemes per above. “Derivational class” seems to be a common English label for these categories. It also looks like the items used in these proposal examples can be merged with existing ones such as Form I (Q56070812). While these are currently linked with the conjugation class (P5186), a separate property would be more appropriate and technically accurate since these aren't exactly conjugation classes. -عُثمان (talk) 17:53, 25 March 2025 (UTC)
- Hello @عُثمان:,
- I just saw Form I (Q56070812), which is for me, identical to 1st category (Q133366739), The problem is we do not use these names in Arabic, and we have have a sligtly different model to address this issue. so when I searched for it I could not find it. I think we need to sit down together and work on a model for the Arabic language that fits how Arabic people address thier language. it is better than just come and make Oppose. Could you please check here to understand the model I am talking about.
- In general, we have large data set taken from Arab Academy of Damascus (Q2822268), and we are trying to import it to wikidata. So if we can meet to show you what do we have, then you can propose the best way to do it in a way that makes sence for Arabic people. Is this ok for you? Michel Bakni (talk) 17:53, 27 March 2025 (UTC)
- @Michel Bakni This sounds good to me. I am not opposing the creation of a property for this use, and I encourage and welcome further discussion. I only oppose the specific details of the proposal above so it can be discussed in more detail. I am interested in seeing the data you have and your suggestions for modeling this. عُثمان (talk) 19:10, 27 March 2025 (UTC)
- Nice to hear that, thanks!
- I propose we meet using google meet, so I can share my screen and show you everything I have, and make use of your knowlege with wikidata.
- My email is: MichelBakni@gmail.com, please send me an email with your time avilablity, and I will pick up one and send you an invitation.
- Looking forward to meet you! Michel Bakni (talk) 20:00, 27 March 2025 (UTC)
- @Michel Bakni This sounds good to me. I am not opposing the creation of a property for this use, and I encourage and welcome further discussion. I only oppose the specific details of the proposal above so it can be discussed in more detail. I am interested in seeing the data you have and your suggestions for modeling this. عُثمان (talk) 19:10, 27 March 2025 (UTC)
Arabic morphological pattern
editDescription | A feature to adjust the pattern of Arabic words in lexemes |
---|---|
Data type | Lexeme |
Example 1 | ك ت ب (L2308) → Q133367044 Q133367045 |
Example 2 | ش ر ب (L1215988) → Q133367044 Q133367096 |
Example 3 | ح س ب (L12112) → Q133367044 Q133367096 Q133367117 Q133367139 |
Motivation
editWe are planning to import database that includes all patterns of the arabic verbs.--Michel Bakni (talk) 08:36, 19 March 2025 (UTC)
Discussion
edit- Support ForzaGreen (talk) 09:33, 19 March 2025 (UTC)
- Support --عبد الجليل 09 (talk) 14:20, 19 March 2025 (UTC)
- Support, Very important. Ahmed Naji Talk 19:23, 24 March 2025 (UTC)
- Oppose This information seems better placed on the lexemes for the verbs themselves, rather than on the lexemes for their roots. Maybe @عُثمان: has other ideas on the subject. Mahir256 (talk) 17:27, 25 March 2025 (UTC)
- Oppose as a property like this would be more appropriate for individual lexeme forms. Such a property could be labeled more generally as simply “morphological template” as this would be applicable to multiple languages. If you look at the forms linked in the far right column of this table, there are existing statements to this effect employing the generic property uses (P2283): User:Marsupium/Arabic_morphological_patterns A dedicated property for this purpose on lexeme forms would make it easier to model and query this information. -عُثمان (talk) 17:59, 25 March 2025 (UTC)
Description | suggest the relationship between similar Javanese lexemes, between its various registers (social variants), mainly ngoko (Q12500634) register (plain Javanese), krama (Q12492493) register (high/polite Javanese), and madya (Q13091955) register (middle Javanese) |
---|---|
Data type | Lexeme |
Domain | lexeme senses, in particular forms with spelling alternatives |
Example 1 | kowé/ꦏꦺꦴꦮꦺ/كووي/ꦏꦺꦴꦮꦺ (L2328) "ngoko" register and sampéyan/sampeyan/ꦱꦩ꧀ꦥꦺꦪꦤ꧀ (L1322036) "krama" register both means "you", but have different social register, where the former is considered casual, and the latter more formal and polite. For reference, please see the online Javanese dictionary in https://www.sastra.org/leksikon (make sure to tick "kata utuh" checkbox when searching to exclude partial matches). For more information regarding this ngoko/krama, see the introduction in this Javanese-English dictionary: https://www.sastra.org/bahasa-dan-budaya/kamus-dan-leksikon/1703-javanese-english-dictionary-horne-1974-1968, especially section 4.1. Organization of the Entries, and 5. SOCIAL STYLES. See also: en.wp, https://jv.wiktionary.org/wiki/Wikisastra:Tabel_krama-ngoko jv.wikt |
Example 2 | (update 18 August) gunung/ꦒꦸꦤꦸꦁ (L680638) (ngoko), redi/rêdi/ꦉꦢꦶ (L45622) (krama) |
Example 3 | (update 18 August) êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183) (ngoko), sirah/ꦱꦶꦫꦃ (L999025) (krama), mastaka/ꦩꦱ꧀ꦠꦏ (L413863) (krama inggil) |
Motivation
editI'm planning to add more Javanese lexeme, but there are many words with different registers, and using synonym (P5973) is not correct, because although they have different meaning, but they have different usage, and also there are many synonyms within the same registers (for example, "you" have 4 or more synonyms in "ngoko", and 3 or more different words in "krama"). Using a dedicated property would enable to search and query the relationship between different registers. As you can ses from the links provided above, the relationship between these registers are not one-to-one, and while "ngoko" form is considered the default, not all "ngoko" have "krama" equivalent (only about 1000 without affixation, much more with affixation), much less "madya" and other register ("krama inggil", etc.) and some "krama" are equivalent to several "ngoko", because they are not true "synonym" equivalent, but rather substitutions words for different social context. Therefore this property should support multiple relationships. For example:
- "you"
- ngoko: kowe, (synyonym: ko'ên, kohên, kowên)
- madya: samang, andika, (synyonym: dika)
- krama: sampeyan, (synyonym: bênampeyan, bênangpeyan)
- krama inggil: panjênêngan, (synyonym: nandalêm, paduka)
- "to say, to tell"
- ngoko: kandha, (synyonym: clathu, ngomong, kêcap, wara, gotèk, cluluk, wuwus, etc.)
- krama: criyos, sanjang, (synyonym: sajang, wicantên, etc.)
- krama andhap: matur
- krama inggil: andika, ngêndika, (synyonym: unandika)
- kawi: angling
- (things related to hand / "tangan")
- ngoko: tangan, krama inggil: asta, simple noun, but the verbs get complicated:
- krama inggil: ngasta (ng- + asta) serve as substitutions for ngoko: 1 nyambut gawe (to work), 2 nggawa (to bring, take, carry), 3 nandang (to do), 4 nyekel (to hold, grasp, to handle), 5 mulang (to teach)
Bennylin (talk) 18:23, 9 August 2024 (UTC)
Update 18 August
editJust to make it clearer, on behalf of Javanese speakers, we would like to request 5 new properties:
- ngoko variations (see ngoko (Q12500634)
- madya variations (see madya (Q13091955)
- krama variations (see krama (Q12492493)
- krama inggil variations (see krama inggil word (Q16893583)
- krama andhap variations (see krama andhap word (Q66724909)
The first and foremost reasoning is that most Javanese dictionaries (monolingual, bilingual jv-id, jv-en, jv-nl) separate Javanese lexemes into mainly these 5 registers and link to their counterparts seamlessly. Secondly, the current available property (synonym (P5973)) doesn't fit our need for specific-linking from one lexeme to another - besides, synonymy in Javanese is called dasanama (lit. ten names), instead of register (Jv: unggah-ungguh) - and in the future I believe using these 5 new properties would make it much easier to "transform" words, phrases, sentences from one register to another (e.g. via WikiFunctions or other tools).
I've given in the form above two new examples:
- mountain: gunung/ꦒꦸꦤꦸꦁ (L680638) (ngoko), redi/rêdi/ꦉꦢꦶ (L45622) (krama)
- L680638-S1, instead of having property "synonym: L45622-S1", should instead have property "krama variations: L45622-S1"
- Likewise L45622-S1, instead of having property "synonym: L680638-S1", should instead have property "ngoko variations: L680638-S1"
- both lexemes could have the following synonyms: ancala, indra, endra, ancala, ardi; ardya, arga, asalingga, awukir, aldaka, hyang parwata, imandri, himawan, himawat, nala, cala, dri, tambana, wanawasa, wukir, wukira, parsa of parswa, parasu, parswa = paraswa, praswa, parwaka, par(of pwar)wata, prawata, parja, pradesa, pra(of prê)bata, par(of pêr)bata, par(of pêr)bwata, par(of pêr)byata, padaka, jambangan, mahahimawan, mahendra, mèru, malaya, gana, gunungan, giri, gori, girindra, girinata, gorata, giriwara, gêgêr, basulingga, byata, ngasrama. These all means "mountain" in Javanese language
- head: êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183) (ngoko), sirah/ꦱꦶꦫꦃ (L999025) (krama), mastaka/ꦩꦱ꧀ꦠꦏ (L413863) (krama inggil)
- êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183-S1) and S3 (ngoko), should have "krama variations: L999025-S1" only, while
- êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183-S2) (ngoko), should have "krama variations: L999025-S1", and "krama inggil variations: L413863-S1", while
- sirah/ꦱꦶꦫꦃ (L999025-S1) (krama), should have "ngoko variations: L413183-S1, S2, S3", and "krama inggil variations: L413863-S1", and
- sirah/ꦱꦶꦫꦃ (L999025-S2) (ngoko and krama), has no other variations
- mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1) (krama inggil) should have "ngoko variations: L413183-S2", and "krama variations: L999025-S1"
- mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S2) and S3 (ngoko and krama), has no other variations
- all three lexemes could have the following synonyms: utamăngga, hulu, cêngêl, rajawèni, katumangga, katumăngga, kapala, kumba, têndhas, swa, sidhira, pasuhunan, murda, mukyana. All of them means "head"
Discussion
edit- Support Thersetya2021 (talk) 14:39, 13 August 2024 (UTC)
- Support Empat Tilda (talk) 01:20, 14 August 2024 (UTC)
- Support Alfiyah Rizzy Afdiquni (talk) 04:41, 14 August 2024 (UTC)
- Comment What's wrong with using something like language style (P6191) or variety of lexeme, form or sense (P7481) for this purpose? (Korean suffixes currently mark the register in which they are used with the former of these properties.) Mahir256 (talk) 16:53, 14 August 2024 (UTC)
- I don't think you get what I mean, so I am going to give another example later. Meanwhile could you give the link for said Korean suffixes, and preferably lexemes? Bennylin (talk) 12:03, 18 August 2024 (UTC)
- @Bennylin: There are a number of registers used in Korean, such as hasoseo-che (Q115744995), hapsyo-che (Q115744896), haeyo-che (Q115744904), and hae-che (Q115744915), where each is named for the verb meaning 'to do' in that language with the appropriate suffix used for indicative sentences in that language. The interrogative suffixes 나이까 (L749506), ᆸ니까 (L749614), and ᆯ까 (L1346003), to give examples of specific lexemes, have the same meaning(s) but differ only in the register used. More generally, though, it is not clear from this proposal why register differences between vocabulary items (especially register differences within a single language) should be treated differently from other stylistic differences between words in other languages with the same meaning (and indeed, the property 'language style', usable with a lot of language styles broadly construed, has at least five aliases containing the word 'register' in it) when an application (such as Ninai/Udiron and its deployment as Elemwala) can filter for senses in a language with particular language styles without requiring specialized links for them. Mahir256 (talk) 21:48, 20 August 2024 (UTC)
- Give a simple query each for these questions:
- What is the krama (Q12492493) for êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183-S2)?
- What is the krama inggil word (Q16893583) for sirah/ꦱꦶꦫꦃ (L999025-S1)?
- What is the ngoko (Q12500634) and krama (Q12492493) for mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1)?
- Bennylin (talk) 10:35, 22 August 2024 (UTC)
- Give a simple query each for these questions:
- @Bennylin: There are a number of registers used in Korean, such as hasoseo-che (Q115744995), hapsyo-che (Q115744896), haeyo-che (Q115744904), and hae-che (Q115744915), where each is named for the verb meaning 'to do' in that language with the appropriate suffix used for indicative sentences in that language. The interrogative suffixes 나이까 (L749506), ᆸ니까 (L749614), and ᆯ까 (L1346003), to give examples of specific lexemes, have the same meaning(s) but differ only in the register used. More generally, though, it is not clear from this proposal why register differences between vocabulary items (especially register differences within a single language) should be treated differently from other stylistic differences between words in other languages with the same meaning (and indeed, the property 'language style', usable with a lot of language styles broadly construed, has at least five aliases containing the word 'register' in it) when an application (such as Ninai/Udiron and its deployment as Elemwala) can filter for senses in a language with particular language styles without requiring specialized links for them. Mahir256 (talk) 21:48, 20 August 2024 (UTC)
- I don't think you get what I mean, so I am going to give another example later. Meanwhile could you give the link for said Korean suffixes, and preferably lexemes? Bennylin (talk) 12:03, 18 August 2024 (UTC)
- They're incorrect
- The krama variant for êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183-S2) is only one: sirah/ꦱꦶꦫꦃ (L999025-S1). The rest of them, while they have the krama register, are not the krama _for_ êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183-S2).
- The ngoko variant for mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1) is only one: êndhas/ꦲꦼꦤ꧀ꦝꦱ꧀/endas/endhas (L413183-S2). The rest of them, while they have the ngoko register, are not the ngoko _for_ mastaka/ꦩꦱ꧀ꦠꦏ (L413863-S1).
- So, you see, many synonym of endhas/sirah/mastaka (head) have the register ngoko, krama, or both, but none of them are paired as _the_ register variant to the triplet endhas/sirah/mastaka. Therefore we need dedicated properties to store these values. Most have one-to-one relations, while some rarely have one-to-two or two-to-one, but never one-to-many. Bennylin (talk) 11:04, 23 August 2024 (UTC)
- @Mahir256, would you like to give your opinion? Regards, ZI Jony (Talk) 18:31, 16 September 2024 (UTC)
- @Mahir256, would you like to give your opinion based on the response? Regards, ZI Jony (Talk) 08:54, 10 October 2024 (UTC)
- I am going to Oppose only because this property (these properties?) is specific to one language and has not as clearly been demonstrated to be useful (and distinct from language style (P6191)) for other languages with similar phenomena. While I don't claim to intuitively understand the system described better than the speakers that use it, from the links provided I'm not as convinced that the alignments are as rigid as claimed by the last response from the property's proposer (except perhaps for the Wiktionary table, although this was created/edited only by this property's proposer and has no external references provided on it); indeed, section 5.6 and 5.7 of the Horne dictionary's front matter suggests that there is fluidity in these correspondences. (There is also no need to add 'synonym' relationships between every pair of possible synonyms if the senses can be linked via item for this sense (P5137) to the same item.) If indeed there is a stronger correspondence between e.g. 'endhas' and 'sirah' compared to with e.g. 'utamangga' and 'pasuhunan' (despite the same Wikidata item being applicable to both), and if this correspondence may be substantiated with a more precise source, then this could perhaps be indicated using the synonym property only between the senses involved in that stronger correspondence qualified with 'language style' pointing to the register in question: 'endhas' 'synonym' 'sirah' ('language style' 'kromo'). Mahir256 (talk) 02:29, 11 October 2024 (UTC)
- @Mahir256, would you like to give your opinion based on the response? Regards, ZI Jony (Talk) 08:54, 10 October 2024 (UTC)
- @Mahir256, would you like to give your opinion? Regards, ZI Jony (Talk) 18:31, 16 September 2024 (UTC)
- They're incorrect
FVDP Vietnamese dictionary ID
editDescription | entry for a lexeme in the Free Vietnamese Dictionary Project’s monolingual Vietnamese dictionary |
---|---|
Represents | FVDP Vietnamese dictionary (Q130812916) |
Data type | External identifier |
Example 1 | xanh (L705061) 573415 |
Example 2 | gặp (L1011653) 104184 |
Example 3 | chuột (L1360864) 61855 |
Formatter URL | https://www.informatik.uni-leipzig.de/~duc/TD/td/index.php?bpos=$1&db=vv |
Motivation
editThis property is proposed for use as a reference to link to Vietnamese lexemes. -عُثمان (talk) 22:26, 3 November 2024 (UTC)
Discussion
edit- Support Mahir256 (talk) 22:44, 3 November 2024 (UTC)
Oppose I'm hesitant about claiming this ID means anything more than a very specific way to form a particular URL. I would support an external reference property about each of the FVDP dictionaries, but I think this property as proposed should be limited to qualifiers.
The Free Vietnamese Dictionary Project (FVDP) consists of a DICT (Q977872) Web server and desktop client, software to generate compatible dictionaries, and a collection of precompiled dictionaries compiled by a long-gone group of volunteers. [1] The software is licensed as open source, but I have no idea where to find the source code anymore. The provided dictionaries are available in two formats: StarDict Info (Q105858121) (which can be used with any compatible client and server) and a custom format specific to this client and server. [2]
This proposal relies on the FVDP Web server's
bpos
URL query parameter, which indicates the byte offset of the entry within the dictionary's index file (in which each entry is listed alphabetically, separated by 8 bytes). Specifically, it assumes the "DE1" server, one of two DICT servers that the author Hồ Ngọc Đức (Q102291268) runs out of the University of Leipzig. If you plug the same byte offset into a different server, it will likely return a different entry. For example, xanh (L705061) is 573415 on "DE1" but 573297 on "DE3" (which is currently malfunctioning). Another popular instance ("US2") no longer exposesbpos
at all.As I understand it, the purpose of this property is to durably link to an entry in the dictionary from a lexeme that inherently pertains to a specific word. The differences in byte offsets between servers illustrates that this is not an inherent property of a dictionary entry. The offsets have changed over time for a variety of reasons, such as adding more "00" front-matter entries and deleting duplicate entries. Moreover, the byte offset doesn't seem to be useful for offline distributions of this content. I think a primary external reference should follow wikt:Template:R:FVDP and its translations, which set the
word
parameter to the word itself. This would be a good way to indicate that the dictionary spells hóa/hoá differently in hóa đơn versus hoá nhi for no particular reason.– Minh Nguyễn 💬 18:19, 10 November 2024 (UTC)
- @عُثمان: The University of Leipzig recently reorganized their website, and I can no longer find Hồ Ngọc Đức's page or the original FVDP instance there. I think they took it offline at the beginning of this year. If so, then this property as proposed is no longer usable. – Minh Nguyễn 💬 01:40, 9 February 2025 (UTC)
Spanish-German Dictionary ID
editDescription | entry for a lexeme in the online Spanish-German dictionary hosted on Termania |
---|---|
Represents | Spanish-German dictionary (Q131078154) |
Data type | External identifier |
Example 1 | renuevo (L1324225) 6741728 |
Example 2 | florete (L1401239) 6731457 |
Example 3 | cuchilla (L1401236) 6728643 |
Number of IDs in source | 21,354 |
Formatter URL | https://www.termania.net/slovarji/spanish-german-dictionary/$1/_?ld=106 |
Motivation
editThis property may be used to add references to the many existing Spanish lexemes, and new ones. -عُثمان (talk) 15:08, 23 December 2024 (UTC)
- This may be of interest to @Hameryko عُثمان (talk) 15:09, 23 December 2024 (UTC)
Discussion
edit- Support Good for lexemes --Kdkeller (talk) 22:47, 4 February 2025 (UTC)
- Comment There's not much information here - I think it might be better used as a reference for translation statements if we want them? ArthurPSmith (talk) 15:58, 6 February 2025 (UTC)
reverse compound
editDescription | compound where the parts are reversed |
---|---|
Data type | Lexeme |
Domain | lexeme |
Allowed values | lexeme |
Example 1 | friweekend (L1281917) -> weekendfri (L1409318) |
Example 2 | menneskeabe (L1279633) -> abemenneske (L1153703) |
Example 3 | jægerstenalder (L1295265) -> stenalderjæger (L1298032) |
Source | should be fairly obvious in most cases |
Expected completeness | always incomplete (Q21873886) |
See also | combines lexemes (P5238) |
Motivation
editSome compounds that consists of two parts have a parallel lexeme where the parts have been reversed, e.g., menneskeabe and abemenneske. It is a bit of linguistic oddity and may not be of much use, but could be interesting to capture. I tried to make a SPARQL query to look for them, but it is apparently not that easy. My general query would time out. It is unclear how general the concept is. I can so far only think of Danish ones, but I suspect there might also be some in languages such as Swedish and German. I would say that it should only be used within-language and that interfixes are allowed. Some languages chooses to decompose lexemes with more than three parts in different ways, so that could present an ambiguity of what the compound consists off.
The property should be symmetric, i.e., if lexeme A points to lexeme B, then lexeme B should point to lexeme A. Whether the should link to itself, e.g., for mormor (L43084) I do not know, perhaps not. Lexemes that have the property should always be an instance of compound or subclasses.
I haven't been able to find any description of "reverse compounds" in the academic literature. My chatgpting where not able to point me to a previous used term or source.
— Finn Årup Nielsen (fnielsen) (talk) 13:29, 20 January 2025 (UTC)
Discussion
edit- Oppose for lack of a use beyond what can be summed up as "seems neat". Mahir256 (talk) 16:18, 21 January 2025 (UTC)
- Support Looks interesting--Trade (talk) 18:45, 10 March 2025 (UTC)
cognate citation
editDescription | lexeme with with the same etymological origin and where a source supports it. |
---|---|
Data type | Lexeme |
Domain | lexeme |
Example 1 | after (L3217) -> efter (L34829) -> stated in (P248) -> Concise etymological dictionary of the English language (Q131831089) p. 5 |
Example 2 | after (L3217) -> efter (L47263) -> stated in (P248) -> Concise etymological dictionary of the English language (Q131831089) p. 5 |
Example 3 | age (L3239) -> "aiws" (Gothic, not currently available) -> stated in (P248) -> Concise etymological dictionary of the English language (Q131831089) p. 5 |
Example 4 | apple (L3257) -> appel (L447551) -> stated in (P248) -> Concise etymological dictionary of the English language (Q131831089) p. 15 |
Motivation
editI have previously suggest a "cognate" property but this did not come through. The problem with the old proposal was that very many cognates can be established and the information might already be present with the derived from lexeme (P5191). This new proposal is also for cognates but with the added requirement that there should be a source. This should restrict the number of cognates listed. Furthermore, there is the issue where the cognate is stated but not the originating lexeme (for derived from lexeme (P5191)), — that it does not go deep enoug to establish appropriate cognates. I find this often to occur with Concise etymological dictionary of the English language (Q131831089). – The preceding unsigned comment was added by Fnielsen (talk • contribs) at 15:31, January 20, 2025 (UTC).
Previous proposal: Wikidata:Property proposal/cognate, ping participants @ArthurPSmith, Tinker Bell, Theklan, ImprovedWikiImprovment:. Cheers, VIGNERON (talk) 18:05, 3 February 2025 (UTC)
Discussion
edit- Neutral only because there isn't a good enforcement mechanism for sourcing requirements on properties. Mahir256 (talk) 16:20, 21 January 2025 (UTC)
- Support with the caveat that a constraint should be added only allowing this property on lexemes which have no "derived from" or "combines lexeme" statements. It is preferable to link etymologies where possible, but in some cases such as these word groups: capp vs. chapp we can only say they are related but we do not know how they are related. -عُثمان (talk) 23:22, 28 January 2025 (UTC)
- Weak oppose I don't see the real added value compare to derived from lexeme (P5191). If a source gives a pair of cognates without giving the common etymon, it's not a good sign for the said source quality. In the given examples, it took mean 10 seconds to find the common etymon (no surprise there, there is plenty of source for etymology of Germanic languages). I totally agree in theory with عُثمان condition but in practice, this type of constraint are not followed and we just end up with redundant data. Cheers, VIGNERON (talk) 18:02, 3 February 2025 (UTC)
Peh-oe-ji Peh-oe-ji
editDescription | writing system for Taiwanese Hokkien (Q36778) or other Southern Min (Q36495) language varieties in Fujian and South East Asia. |
---|---|
Represents | Pe̍h-ōe-jī (Q559173) |
Data type | String |
Template parameter | Any title or name in a Taiwanese subject, including romanized name for Taiwan place names. |
Domain | Any Taiwanese text. Geography and biography. Lexemes and forms. Should be tagged with nan-Latn-pehoeji . |
Allowed values | Any text. |
Example 1 | Louise Hsiao (Q700752) -> Siau Bí-khîm |
Example 2 | Lai Ching-te (Q3847080) -> Lōa Chheng-tek |
Example 3 | Taipei (Q1867) -> Tâi-pak-chhī |
Example 4 | Taiwan (Q22828636) -> Tâi-oân |
Example 5 | National Cheng Kung University (Q706708) -> Kok-li̍p Sêng-kong Tāi-ha̍k |
Example 6 | Yushan Main Peak (Q500275) -> Gio̍k-san |
Planned use | Marking Taiwanese Taigi place names in Taiwan, people from Taiwan, or published works. |
Motivation
editTaiwan's current Vice President Siau Bí-khîm added Hanyu pinyin to its name, but actually, she was a Taiwanese Taigi user from her father's side, Mandarin was not her mothertongue and adding Hanyu pinyin is quite strang. Pe̍h-ōe-jī is one of the writing system for Taiwanese Hokkien (Q36778) or other Southern Min (Q36495) language varieties in Fujian and South East Asia. It was invented by missionary in 185221) and spread to Xiamen (Q68744) and Taiwan (Q865). Taiwanese Hokkien (Q36778) was treated as oral language or not so many official documents or published work. But in recent years language revitalization in Taiwan has been an issue, and adding a new property to label Taiwanese Taigi in Pehoeji on old documents or new published works is important. We will also propose Taiwanese Taigi Romanization System (Q56929) for labeling the Taiwan government-supported published works.
Notified participants of WikiProject Taiwan
Supaplex (talk) 16:26, 13 February 2025 (UTC)
Discussion
edit- Support. --TongcyDai ฅ • ω • ฅ 16:30, 13 February 2025 (UTC)
- Comment Why not simply add Peh-oe-ji labels to items (and similarly with Tailo)? Mahir256 (talk) 17:37, 13 February 2025 (UTC)
- Just as the Louise Hsiao (Q700752), which adding Hanyu pinyin is quite non-sense, this example showing the complex issue of Hanji, which have many different Romanization writing systems or transliteration systems, the Hakka Sixian dialect (Q9668261) is seuˊ miˊ kimˇ. If Hanyu Pinyin as a property exist, why not Peh-oe-ji? This new property will help adding the name more accurate.Supaplex (talk) 04:56, 14 February 2025 (UTC)
- I guess the reason Hanyu Pinyin exists as a property is simply because we currently can't add labels, descriptions, aliases, and monolingual text in the corresponding language tag. -- Winston Sung (talk) 13:14, 16 February 2025 (UTC)
- Support--S8321414 (talk) 06:19, 14 February 2025 (UTC)
- Is the idea for this property to be used in the same way as Hanyu Pinyin transliteration (P1721)? The English label should probably then be "Peh-oe-ji transliteration"? In regard to Mahir256's comment - is the point that this is not how the language is written, but how it is transliterated to latin characters? ArthurPSmith (talk) 18:30, 18 February 2025 (UTC)
- Comment: As per Supaplex, the item Louise Hsiao (Q700752) explains the conundrum. POJ was developed to record Hokkien Southern Min in a systematic fashion, similar to Chữ Nôm (字喃) and Vietnamese alphabet/ Chữ Quốc ngữ (𡨸國語) on writing the Vietnamese language, also Alfabetul limbii române (Romanian alphabet) and Алфабетул молдовенеск (Moldovan Cyrillic alphabet). --Assanges (talk) 10:24, 24 February 2025 (UTC)
A Dictionary of Public Health entry ID
editDescription | identifier for an entry in online second edition of A Dictionary of Public Health |
---|---|
Represents | A Dictionary of Public Health (2nd ed.) (Q133990319) |
Data type | External identifier |
Domain | lexeme, item |
Allowed values | [1-9][0-9]* |
Example 1 | abandonment (L228102) --> 1 |
Example 2 | abortion (L13597) --> 7 |
Example 3 | dyslexia (L31979) --> 1215 |
Example 4 | echinococcosis (L1451241) --> 1225 |
Example 5 | isoniazid (L1451242) --> 2414 |
Example 6 | jet lag/jetlag (L1451243) --> 2426 |
Example 7 | lymphocyte (L227262) --> 2650 |
Example 8 | abortion (Q8452) --> 7 |
Example 9 | airbag (Q99905) --> 110 |
Example 10 | Earth Summit (Q751149) --> 1221 |
Example 11 | eclampsia (Q552348) --> 1226 |
Example 12 | cell-mediated immunity (Q189146) --> 645 |
Example 13 | Edward Jenner (Q40852) --> 2425 |
Example 14 | dipping tobacco (Q1114994) --> 5007 |
Example 15 | threshold dose (Q108378037) --> 4444 |
Example 16 | U.S. Consumer Product Safety Commission (Q2995379) --> 888 |
Source | https://www.oxfordreference.com/display/10.1093/acref/9780191844386.001.0001/acref-9780191844386 |
Planned use | adding to newly created lexemes or lexemes being edited or to new items or items being edited |
Number of IDs in source | over 5,000 (Cf. https://www.oxfordreference.com/display/10.1093/acref/9780191844386.001.0001/acref-9780191844386) |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | https://www.oxfordreference.com/display/10.1093/acref/9780191844386.001.0001/acref-9780191844386-e-$1 |
Applicable "stated in"-value | A Dictionary of Public Health (2nd ed.) (Q133990319) |
Distinct-values constraint | no |
Motivation
editA Dictionary of Public Health (2nd ed.) (Q133990319), published by Oxford University Press (Q217595) provides definitions of over 5,000 terms used in public health science and practice, including areas such as communicable disease control, epidemiology, genetics, nutrition, toxicology, social work, sanitation and public health engineering, environmental sciences, and administration. (https://www.oxfordreference.com/display/10.1093/acref/9780191844386.001.0001/acref-9780191844386). Some entries require a subscription or purchase, but many are open. AdamSeattle (talk) 03:51, 21 April 2025 (UTC)
Discussion
editMANDALA Tibetan Living Dictionary ID
editDescription | entry for a lexeme in the Tibetan Living Dictionary by MANDALA |
---|---|
Represents | Tibetan (Q34271) |
Data type | Lexeme |
Example 1 | ཤེས་རིག་ (L1022935) → 202578 |
Example 2 | ཞི་མི་ (L405263) → 168408 |
Example 3 | ཆུ (L8249) → 62480 |
Formatter URL | https://mandala.library.virginia.edu/terms/$1/overview |
Motivation
editTibetan Living Dictionary is a dictionary created by MANDALA, a project by University of Virginia Library. This property is proposed for future usage when more Tibetan lexicons are being created on Wikidata. – The preceding unsigned comment was added by Unite together (talk • contribs) at 07:17, April 21, 2025 (UTC).
Discussion
editWikibase form
editWikibase sense
editprototypical syntactic role of argument
editDescription | qualifier for has semantic argument (P9971) indicating the most basic/fundamental syntactic position of that argument for that verb sense (that is, when the argument structure is not subject to any alternations) |
---|---|
Data type | Item |
Domain | senses on verb lexemes |
Allowed values | any item indicating a syntactic position (subclasses of linguistic unit (Q11953984), but this may be too general?) |
Example 1 | (gather (L1163-S1) has semantic argument (P9971) gatherer (Q128357561)) → subject (Q164573) |
Example 2 | (liquefy (L332143-S1) has semantic argument (P9971) entity being liquefied (Q127789399)) → direct object (Q2990574) |
Example 3 | (accept (L5421-S2) has semantic argument (P9971) source of accepted entity (Q126380671)) → source location complement (Q3685157) |
Planned use | replace object of statement has role (P3831) qualifying has semantic argument (P9971) to use this property as a qualifier instead |
See also | predicate for (P9970), has semantic argument (P9971) |
Motivation
editThis property is intended as a substitute for object of statement has role (P3831) on the semantic arguments of verb lexemes, as it helps to clarify that the particular syntactic position taken by a semantic argument is not the only such position that can be taken.
Verb predicates are planned to be modeled so that they can be reasoned about in terms of the roles played by their arguments, rather than by any particular syntactic position they may take in a sentence—so that instead of talking about 'liquefaction' involving a 'subject' and a 'direct object', or as involving an 'agent' and a 'patient', it is instead thought of as involving a 'liquifier' and an 'entity being liquefied'. This has the advantage that someone wanting to model instances of liquefaction—whether in Wikidata items or in the future in Abstract Wikipedia content—need not have to worry about the linguistic question of whether the 'entity being liquefied' is a theme or patient or whatever.
It is certainly possible to tie these roles to the syntactic positions they take with respect to the specific verb 'liquefy' in English by using object of statement has role (P3831), but this may imply to the viewer that e.g. when using the verb 'liquefy' in English the 'liquifier' is always the subject and the 'entity being liquefied' always the direct object. Indeed, for many verbs across languages there are documented variations in their argument structure, and resources like ValPal and the Unified Verb Index can be browsed which highlight such alternations in lots of different verbs.
As an example in English, one of these is the Instrumental Subject alternation, that turns a phrase like "John liquefied the tomatoes with a blender"—where 'John' is the liquifier (an agent (Q392648)), 'tomatoes' is the entity being liquefied (a patient (Q170212)), and 'blender' is a tool used in the liquefaction (an instrument (Q6535309))—into "A blender liquefied the tomatoes"—where the three nouns still retain their semantic argument roles (the blender did not somehow gain the kind of conscious awareness expected of an agent) but appear in the sentence in different places or are removed completely (the sentence now begins by mentioning the blender instead of John, and the question of who/what is controlling the blender is now unstated).
These variations, however, imply that there is some 'basic' or 'prototypical' syntactic arrangement of arguments that is being modified by that alternation, and this proposal serves to indicate that 'basic' or 'prototypical' syntactic arrangement. The proposal here is independent of the particular means of recording applicable alternations of a lexeme sense's argument structure, which is to be determined later. – The preceding unsigned comment was added by Mahir256 (talk • contribs) at 21:57, 24 October 2024 (UTC).
Discussion
edit- Comment I'm afraid I find this proposal confusing, can you explain a bit better. How is a liquefied entity the direct object of "liquify", for example, wouldn't the direct object be the entity before it was liquified? ArthurPSmith (talk) 21:30, 4 November 2024 (UTC)
- @ArthurPSmith: Sorry about the confusion; I added a more in-depth explanation above. (Also the label for the role taken by the direct object of 'liquefy' was not precise enough and has been adjusted.) Mahir256 (talk) 04:10, 12 November 2024 (UTC)
- For a property to work well it has to be understood by reading the property description. If it's necessary to read the motivation part to understand what a property does, there a good chance that problems will arise in the practice of using the property. ChristianKl ❪✉❫ 14:06, 13 November 2024 (UTC)
- @ArthurPSmith: Sorry about the confusion; I added a more in-depth explanation above. (Also the label for the role taken by the direct object of 'liquefy' was not precise enough and has been adjusted.) Mahir256 (talk) 04:10, 12 November 2024 (UTC)
- Comment I would like to give this more thought, but I think this proposal could use some more clarity. I will note that conscious awareness or animacy are not prerequisites for agency. Take this Hindustani sentence (source) for example:
- مشین نے ہمیں ٹکٹ دے دیا
- मशीन ने हमें टिकट दे दिया
“Ticket got given to us by machine,” roughly translated, where the machine is marked as an oblique agent with the ergative postposition, the personal pronoun is in the dative case as a recipient of the ticket, and the ticket as the subject of the sentence. Their roles are the giver, recipient, and given entity respectively. The ergative construction emphasizes the fact that we/us do not have agency, our receipt of the ticket is dependent on whether or not the machine produces is it. Similarly to the liquefy sentences above are intended to exemplify, we can alter the sentence to exclude the agent:
- ہمیں ٹکٹ دے گیا
- हमें टिकट दे गिया
However, the syntactic roles are exactly the same and the alternation is external to the lexeme for the main verb. (Hindustani generally is a language that I think would not have reason to use this property at all if implemented as proposed.)
Considering liquefy again, we can construct a sentence like:
- An igneous intrusion liquefied its surroundings with molten rock.
Wherein an inanimate agent is used with an instrument - by comparison, it is possible to see how blender as well can be considered as filling the agent slot (in some cases). ValPal gives an interesting example for the instrumental subject alternation in “fill” which is less ambiguous:
- Water filled the cup.
Unlike a sentence such as, “raindrops hit the ground,” the water is still an instrument in the above sentence. The alternation effectively emphasizes that the water can be used to complete the action. We can make a case for blender as instrument rather than agent by comparing sentences which emphasize the fact that the blender can be used to complete the action of liquefying:
- The blender liquefies food.
- John liquefies food with his blender.
In the first sentence, it is implied that the blender is able to liquefy food (to completion), and in the second the action is given a more habitual connotation as ascribed to its agent subject.
All that is to say, it is not clear at the moment how to demonstrate which syntactic roles can be considered prototypical, whether an alternation requires those roles to change for the arguments or is expressed through other means, and how to model the relationship of alternations to other features of the predicate sense such as telicity. -عُثمان (talk) 04:00, 14 November 2024 (UTC)