Wikidata talk:Lexicographical data/Archive/2020/01

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Grammatical features for mutation

Hi,

After thinking a long time about it, I finally created items for mutated form (Q77663874, Q77667037, Q77667126, Q77667305, general items not specific to breton, so they could be re-used for other languages with mutations) and added them as grammatical features on ki (L69).

What do y'all think? Is there any question, comment, remark, etc. (before I deploy this structure on all bretons lexemes)

Cheers, VIGNERON (talk) 10:51, 7 December 2019 (UTC)

I'm not sure of the context here, are these "grammatical" features exactly? Or are they just variants in the way the words are spelled/pronounced from place to place? ArthurPSmith (talk) 19:35, 9 December 2019 (UTC)
@ArthurPSmith: good point, that was actually one of my question. It's a feature for sure and it's kind of grammatical (mostly, and depends on the type of mutation), so I though it's close enough to be store as a grammatical feature. If it's not a grammatical feature, what is it and where should it go ? Cdlt, VIGNERON (talk) 18:37, 10 December 2019 (UTC)
@VIGNERON: I could see new properti(es) (unless they exist already) for this—something like
⟨ subject ⟩ antecedent grammatical context Search ⟨ object or value ⟩

⟨ subject ⟩ succedent grammatical context Search ⟨ object or value ⟩

(not using present participles (“preceding”) in the property labels so as to avoid confusion about the directionality of the property; and instead of “grammatical” one might be somewhat more specific and perhaps say “phonomophologic”). Then you could make statements like
⟨ 
ma
(Breton possessive adjective)
 ⟩
succedent grammatical context Search ⟨ spirant-mutating ⟩

and

⟨ L69-F7 ⟩ antecedent grammatical context Search ⟨ spirant-mutating ⟩


⟨ L69-F10 ⟩ antecedent grammatical context Search ⟨ soft-mutating ⟩


so that building phrases becomes a question of matching succedent contexts of one element with antecedent contexts of the next element:
ma c'hi
, not
ma giez
(since the contexts match only for L69-F7)
This could also work for French:
⟨ L2379-F1 ⟩ succedent grammatical context Search ⟨ non-liaising ⟩

⟨ L2379-F2 ⟩ succedent grammatical context Search ⟨ liaising ⟩

see de (L2379)
and
⟨ L9265-F1 ⟩ antecedent grammatical context Search ⟨ liaising ⟩
(see homme (L9265))
⟨ L21306-F1 ⟩ antecedent grammatical context Search ⟨ non-liaising ⟩
(see honte (L21306))
so that you’d get
de
+
homme
d’homme
(“of man/human”) but
de
+
honte
de honte
(“of shame”).
(Pinging @Lea Lacroix (WMDE) to check on (grammatical understanding of) my French.)
In general, see also: w:en:Sandhi
BlaueBlüte (talk) 04:04, 11 December 2019 (UTC)
@BlaueBlüte: very interresting, it sounds more consistent. I need to think about that (at the very least, with that approach I don't need 4 but 8+ items for specific mutation in breton, which would be subclasses of theses 4 general mutation ; and I also need to look at the sources to see if it's possible, especially for cases like ar (L35217) where the gender also impact the mutation, qualifiers needed?). Maybe a dumb question but if we go with this way, why not just use follows (P155)/followed by (P156)? PS: mutation in Breton is not sandhi (not really, Breton also have sandhi but it's usually not written: pemp bloaz, bloaz is unmutated but affected by sandhi and pronunced pemp ploaz, meanwhile: ho ploaz, bloaz has an hard mutation always written). PPS: you example in French are good (beau (L7026) F1 and F5 could be a good example too). Cdlt, VIGNERON (talk) 09:14, 11 December 2019 (UTC)
@BlaueBlüte, ArthurPSmith: I though more about it and read some Breton grammars. I found a lot of examples where the mutation does not only/directly depends on the preceding word, it may also depends on the gender (an hini glas and an hini c'hlas - the blue one - mutation depend on the gender of the replaced word), the number, the syntax (un dra vat - a good thing - but un dra mat-kenan - a very good thing), the context (plasenn Kemper - in general, a square among any square in Quimper (Q342) - vs. plasen Gemper (a specific square named after Quimper (Q342)). And even without that, all words are affected by mutation, not sure if we can store that may data. So even if the antecedent/succedent idea is very interesting, I'm not sure how to deal with it (I still the idea in my head, as some words indeed « cause specific mutations afterward »). If there is no objection I will the option to store them as grammatical features on ki (L69). Maybe not the best solution but the easiest and more logical. Cheers, VIGNERON (talk) 12:22, 21 December 2019 (UTC)
@VIGNERON: Not sure how you’re counting 8+ items with the anteceding–succeeding split-property approach. Maybe even on the contrary, 8 items might be needed when using grammatical features. One reason to propose two properties rather than just using the single property grammatical features was that a single property cannot distinguish between cause and effect, or anteceding and succeeding: If a form has the grammatical feature ‘spirant mutation’, does that mean that the form is affected by it, or causing it in other words? So maybe each of the 4 mutation variants needs 2 items: Q77667037 and “causing soft mutation”, Q77667126 and “causing hard mutation”, … However, I don’t have any objection to starting with a ‘grammatical-feature’ approach and expanding or changing course later if needed.
Regarding the dependence on gender and number, I don’t image that mutation (whether modeled as two new properties or as ‘grammatical features’) would be the only grammatical feature of the forms; there would still be grammatical features for gender and number etc. But it may be necessary to make “causes … mutation” statements on the form level rather than the lexeme level if the mutation that is caused depends on the gender and number.
And as for
plasenn Kemper
vs.
plasen Gemper
, can that perhaps be modeled as case (Q128234) (something like locative case (Q202142) vs. accusative case (Q146078)) and/or definiteness (Q1182686))?
BlaueBlüte (talk) 08:57, 26 December 2019 (UTC)
@BlaueBlüte: there is 3 "types" of mutation (Q77667037, Q77667126, Q77667305 + Q77663874 and Q79844370) but 4 "groups" of mutation which makes 8 "cases" of mutation. Thes case are often designated by numbers : 0, 1, 1a, 1b, 2, 3, 4, 5 (see for instance this page in French who explained it ; and there is a 6 in some grammars for nasalisation but it only affect one single word in modern breton: dor (L229826)) ; 0 is Q77663874, 1, 1a and 1b being Q77667037, 2 and 5 are Q77667305 and 4 being both Q77667037 and Q77667126 at the same time (and only affects verbs).
The type is super easy to find (it's morphologic). "c'hi", "c'hlas" or "c'hallan " are abviously forms affected by soft mutation and is kind of easy to know it's in the second group. But it could be cases 1, 1a, 1b or 4, it's impossible to know without the part of the sentence (actually, here in this case, I can know only "c'hallan" is a verb so it has to be mutation 4, while "c'hi" and "c'hlas" are noun and adjective so it can't be 4, it may be 1, 1a or 1b).
That said, I'm not a grammarian and mutation are notably a hard subject (and on top of that different dialect can have slightly different mutation). Maybe the bast would be if someone try on an actual lexemes as I'm not sure to fully see you're suggested anteceding–succeeding approach.
PS: I see on Polainn (L42281) that Jimregan tried an approach similar to mine for Irish (Q9142).
Cheers, VIGNERON (talk) 12:08, 26 December 2019 (UTC)
@VIGNERON: I think it would be useful to have distinct items (‘grammatical features’) for all 8 ‘cases’. For example, based on the table in
Les mutations consonantiques
, in the case of a mutable word beginning with d, its forms would appear to differ depending on whether the grammatical context demands a Q77667037 (case 1) or a ‘form affected by case-1a mutation‘. Therefore, grouping case 1 and case 1a together into a single item Q77667037 would lead to underdetermined forms; and similarly for all other cases.―BlaueBlüte (talk) 01:34, 31 December 2019 (UTC)
Mhhh, that sounds interresting thanks BlaueBlüte. I'm still not sure to understand your idea. I'm not sure you are reading this table right (but to be fair, it took me a decade to fully understand mutations, so I can relate  ), here 1, 1a and 1b are all cases of the group/type/class/whatever Q77667037. Could you try to test and apply your idea on a lexeme? (deiz (L2786) or dibenn (L3044) for words starting with D). Cheers, VIGNERON (talk) 10:32, 5 January 2020 (UTC)

RTL support

I am trying to make a Hebrew form RTL: בַּיְּלוּ!

The ! keeps going to the right, which is the beginning.

I tried adding & r l m ; but it showed up as text.

How do I solve this? Uziel302 (talk) 19:09, 3 January 2020 (UTC)

For the record, the concerned Lexeme is בייל/בִּיֵּל (L205750) and the form L205750-F7.
Indeed this is problematic. I don't see where the problem comes from nor how to solve it; pinging @Amire80, Amir Sarabadani (WMDE): who know a bit about RTL coding.
Cheers, VIGNERON (talk) 09:47, 5 January 2020 (UTC)
It's a bug. Thanks for bringing this up. @Amir Sarabadani (WMDE): already reported something very close at https://phabricator.wikimedia.org/T194311 and I added this example as a comment there.
In practice, you shouldn't use the exclamation mark here anyway. Any don't use the rlm of any kind as a workaround. Just write the form as-is. This bug can probably be fixed soon.
(I'd go even further and suggest not to add conjugated forms manually or with a bot at all, but to wait for a proper framework for conjugated forms. It will take some time, but it's worth waiting for it.) --Amir E. Aharoni (talk) 11:34, 5 January 2020 (UTC)

Merging Q74674702 with adverb (Q380057)

There is the item Q74674702, used on a lot of Basque words, with no English label. I don't speak Basque, but when searching for if a Spanish lexeme I was planning to make would be redundant, I found a word in Basque (which has a lot of Spanish vocabulary) that is marked as being this form. According to Google Translate, its label in Basque ("aditzoin") means "adverbs", and its description ("aspektu markarik gabeko aditz forma") means "unmarked verb form", which is confusing to me. So apparently this means adverb, but maybe not...? Should this be merged, or should it be labeled something like "adverb in Basque" and kept its own item? DemonDays64 | Talk to me 06:11, 11 January 2020 (UTC) (please ping on reply)

I don't know if it should be merge (which is often a clue that it should not be merge) but the best way to know is to ask the creator: @Theklan:. I found this page www.ehu.eus/seg/morf/5/13/1 and it seems to be closer to a past participle (Q52434448) than a adverb (Q380057). I guess keeping it's own item is better. Cdlt, VIGNERON (talk) 12:54, 15 January 2020 (UTC)
Thanks @VIGNERON:! @DemonDays64: No, "aditzoin" is not an adverb, is the short form of the verb that normally is used in some forms (imperative, rhetorical questions, mixed sentences or injected sentencies). It doesn't have a translation to other languages. Also in Basque, we have two kind of adverds: adverbial phrase (Q12252799) and Basque full adverb (Q12252798). -Theklan (talk) 16:30, 15 January 2020 (UTC)
I see 2 solutions: adding "aditzoin" as label in all language or constructed a circumlocution to translate "aditzoin" in each language. Meanwhile, I added a description in English and French. @Theklan: could you point us to references about this aspect of Basque grammar so we can learn more (and to source the item which is quite empty right now). Cheers, VIGNERON (talk) 17:19, 15 January 2020 (UTC)
@VIGNERON: The best description of aditzoin is here: https://www.ehu.eus/seg/morf/5/13/1. I don't have good information about the other ones. I will ask for help. -Theklan (talk) 17:26, 15 January 2020 (UTC)

Understanding the relationship between Wikidata Items and Lexemes

Is there a direct connection between the two types of data? More specifically, how does Items data explicitly use or reference Lexemes?  – The preceding unsigned comment was added by Ktharani (talk • contribs) at 20:58, 22 January 2020‎ (UTC).

Hi @Ktharani: there is a lot of connections and ways to connect Items and Lexemes. If you take any Lexemes, you'll see that most properties links to Items. The other way round is very rare (logically since Items to Lexemes is a one-to-many model), personal pronoun (P6553) is one of the only property I remeber where items links to lexemes. I hope I'm clear, feel free to ask more if you want. Cheers, VIGNERON (talk) 16:10, 23 January 2020 (UTC)
Thanks for taking the time to answer my rookie question,@VIGNERON:) So what this means is that Lexemes could be used to translate wikidata items from one language to another. Is there an example/prototype of such translation somewhere that you are aware of? Also, if you know of any literature/whitepaper that talks about Wikidata's Lexicographical data and its "ultimate" vision/potential, please let me know. Thanks, Ktharani (talk) 22:37, 27 January 2020 (UTC)
@Ktharani: No problem. do you mean take the Lexeme lemma and copy it in the correspond item label? Not sure it's a good idea for multiple reason, mainly: 1. there is far more labels than lemmata so not sure there is actually something to copy 2. several lemmata correspond to one label (again, it's a one-to-many situation), how to choose which one to use?
Hmmm, I know that @Fnielsen: wrote some papers, like Validating Danish Wikidata lexemes or Danish in Wikidata lexemes. But I must say, I don't follow closely all publications.
Cheers, VIGNERON (talk) 11:59, 29 January 2020 (UTC)

Gerund or present participle?

In Spanish on lexemes, is the correct label for the form that "haciendo" and "hablando" are in gerund (Q1923028) or present participle (Q10345583)? The latter has "gerund form" as an alias, which is weird if they are different. I've seen both used on different Spanish verbs here about the same form. I have the same question about English, too.

Which should I use? Should the items even be distinct, or should they merge? DemonDays64 | Talk to me 15:07, 24 January 2020 (UTC) (please ping on reply)

If you can't choose, create a new one :) (specifically for Spanish) --Infovarius (talk) 19:11, 24 January 2020 (UTC)
Mhhh, good question, thanks for asking.
Before everything: they can not be merge because it's two different concepts (and they have two articles on some Wikipédia).
That said, as they both the exact same form, I'm not sure how to procede for forms. In a sentence, it's not difficult for me to make the distinction « En arrivant, j'ai vu la personne arrivant. » The first « arrivant » is gerund (Q1923028) and the second is present participle (Q10345583). I'm guessing this is more or less the same situation in Spanish. In that case, I would either :
@DemonDays64, Infovarius: what do you think?
Cdlt, VIGNERON (talk) 18:44, 25 January 2020 (UTC)
@VIGNERON: huh, this is way more confusing than it seemed at the beginning. I'm learning Spanish and in the class the terms were kind of confusing including present progressive, gerund, and gerundio. Trying to find more about what the proper terms are, I just found this article about it. In Spanish, the present participle is called the gerundio, but there isn't a direct translation for "gerund". Here is what the examples in it say, at least how I understand them: you can't say "me gusta comiendo" (I like eating)—you have to say "me gusta comer". However, you can say something like "Pasé el día leyendo."—"I spent the day reading".
I am not a pro grammar person, so I don't understand the distinction super well. What I can understand is that you can't use the -iendo/-ando (-ing) form as a noun, but you can use it in more than just sentences that mean "I am verbing" or whatever. This makes it seem like there isn't a gerund, but Wikipedia:Gerund says this about gerunds that complicates things a lot:

most often, but not exclusively, one that functions as a noun.

That article might be enlightening to someone who spends more time thinking about this. I am not knowledgeable enough about grammar to solve this. The information is too contradictory for me to figure this out. DemonDays64 | Talk to me 19:36, 25 January 2020 (UTC)
@DemonDays64: I'm not a grammarian either (but what I read is - more or less - what I thought, if it's used conjugated then it's present participle (Q10345583), if not it's gerund (Q1923028)) but there is no need to be a seasoned grammarian, use any of the two solution I proposed, the basic information is there and would allow someone else to improve it afterward (that's how wikis work, never perfect at the first strike). Cheers, VIGNERON (talk) 13:36, 27 January 2020 (UTC)
Return to the project page "Lexicographical data/Archive/2020/01".