Wikidata talk:Lexicographical data/Archive/2019/08

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

masculine inanimate (Q52943434) and masculine inanimate (Q54020181)

Hello. What is the purpose to have masculine inanimate (Q52943434)? I think it should be merged with masculine inanimate (Q54020181). I do not see any advantage to have "masculine in French", "masculine in Spanish", "masculin in Italian", "masculine in German", ... Pamputt (talk) 18:26, 1 July 2019 (UTC)

@Paweł Ziemian: because you created this item, could you tell us what you think (reply in Polish if you want :)). And maybe KaMan have some opinion on this as well. Pamputt (talk) 08:21, 6 July 2019 (UTC)
When I created masculine inanimate (Q52943434) as "rodzaj męskorzeczowy / masculine inanimate" to use it for words in Polish, the masculine inanimate (Q54020181) did not exist yet. The suffix "in Polish" was added later. See the history amd talk page of the items. Paweł Ziemian (talk) 21:53, 6 July 2019 (UTC)

Lexeme existing in only one form

Hello everyone.

In occitan, french and probably in others languages we have words existing in only one form like a lexeme which exists only as a masculine singular. I am wondering how I could show this specificity for the lexeme. To use singular (Q110786) + masculine (Q499327) seems not enough because it doesn't tell that this is the only form existing, some people may think that the feminines or the plurals forms are missing when they just don't exist at all.

So I am looking for a way to say « nothing is missing, this is just the only form existing » to avoid any lost of information. Maybe a claim like « this lexeme as no feminine » or « this lexeme is only masculine » but I don't know which property I should use. And should I add the claim on the lexeme or on the form ?

Here are some examples of concerned lexemes :

  • All the nouns : a noun is masculine or feminine (or neutral) but it can't be both.
  • "Accordaille" (an old french term for engagement (Q157512)) and "anglaises" ("English girls") are common nouns which only exists as a feminine plural.
  • antipasto (Q622440) (in french) is a commun noun only existing as a masculin singular
  • "aiçò" (in occitan) is a demonstrative pronoun which only exists as a neutral
  • "certains/certaines" are french determiners with no singular
  • chaque (L10023) is a determiner with no plural
  • "enceinte" (french for "pregnant") is an adjective with no masculine
  • some verbs like pleuvoir (L1917) only exists as the 3rd person of the singular

--Aitalvivem (talk) 10:29, 25 July 2019 (UTC)

When you say "only exists", is that an absolute, or could there theoretically be a circumstance where you might have other forms, even if they haven't yet occurred in the written or spoken language? Most of such cases I don't think you need to try to enforce the non-existence of alternate forms, just list the ones that are used. However, in English we do have cases like clothes (L4522) which have been indicated as only having a plural form, via instance of (P31) plurale tantum (Q138246). So in some cases (the ones with common situations) it is probably useful to use such a statement. ArthurPSmith (talk) 17:13, 25 July 2019 (UTC)
instance of (P31) + uninflected word (Q600894), though personally I'd create a new item for invariability to use with has characteristic (P1552). Circeus (talk) 22:37, 26 July 2019 (UTC)
When I say "only exists", I mean "only employed", this is not an absolute, those others forms are just never employed. To use instance of (P31) + plurale tantum (Q138246), instance of (P31) + singulare tantum (Q604984) or instance of (P31) + uninflected word (Q600894) seems good, thank you.
Several of the cases you named are really a matter of inherent gender rather than the form having a particular gender. Nouns have inherent gender, so "antipasto" is just a masculine noun like any other, its form should only be given as "singular" and not "masculine singular". The same for "accordaille" and "anglaises", whose gender is inherent in the noun rather than its forms. Occitan does not have a neuter gender, so how can "aiçò" be grammatically neuter? What nouns does it refer to if there are no neuter nouns? As for "enceinte", how would you use this with a masculine noun then? —Rua (mew) 14:49, 5 August 2019 (UTC)

Description of the same multilanguage senses

Hi, tell me, please, if we connect the same senses, why in many places are glosses given? For example water (L3302). The description of L3302-S1 shows examples in 8 languages, is this the right decision? Should I remove them? I think this info should be in translation (P5972) section, souldnt be? Iniquity (talk) 20:23, 31 July 2019 (UTC)

No. It's fine for there to be both a gloss/definition and a translation (P5972). Adding one or the otehr where it's missing is appropriate. Circeus (talk) 21:15, 1 August 2019 (UTC)
Thanks for your answer! Can ask developers to automatically substitute gloss from another language? Iniquity (talk) 21:44, 1 August 2019 (UTC)
Hello @Iniquity:, what do you mean by "automatically substitute gloss from another language"? Can you give an example? Thanks, Lea Lacroix (WMDE) (talk) 14:16, 8 August 2019 (UTC)
Hi @Lea Lacroix (WMDE):, for example, we have Russian gloss here (Lexeme:L189#S1) and Italian gloss here (Lexeme:L8214#S1), the native language values of these gloss fields should be displayed in translations here (Lexeme:L3302#S1) regardless of the user's language. I just don’t quite understand why this field is needed in different languages, since we connect the same senses. Iniquity (talk) 14:30, 8 August 2019 (UTC)

P:P5186 or P:P5911. What should we use for inflection class?

I'm a little confused, why two separate properties, if one (conjugation class (P5186)) is a subtype of the other (paradigm class (P5911))? What property should be used in a lexeme? That is, should I create a property for declination? Iniquity (talk) 18:12, 5 August 2019 (UTC)

We should probably only use the latter, since it already covers everything. —Rua (mew) 08:29, 6 August 2019 (UTC)
@Rua:, I'm just a little confused. In each language, there are several forms of word formation, except for the main ones: declension and conjugation. Now a very incomprehensible scheme for adding these properties to the token. We have conjugation class (P5186) and paradigm class (P5911), how and in what cases should they be used? For example, in the lexeme магия (L57919) we have two paradigms, one declension paradigm and one Zaliznyak paradigm. How to work with them? Is it necessary to use a separate main property for each word-formation scheme (for example, conjugation class (conjugation class (P5186)), declension class (not exist), Zalizniak classification (not exist)) or should it be subproperties for the word formation class (paradigm class (P5911))? Or should we use this property only for unknown forms? Iniquity (talk) 23:40, 9 August 2019 (UTC)
Thanks @Iniquity:. Zaliznyak's classification system is widely used in the Russian language, and it is fairly complex. See it in ru, en, fr. The issue is that there are several aspects (axis) of that classification - inflection aspect (1-8), type of the word's stem (a-d), and a few other minor things. The first big question is if each unique classification should be an Q-item of its own, e.g. "1a", "3d", or each part should be broken into several aspects. Together with lesser used types, the number of these items could grow as high as a few hundred (hard to estimate exactly, but judging by all templates in ruwikt, there are 239 unique ones, and some might be dups). If stored separately, we would need to create individual properties - one for 1-8, one for a-d, and possibly a few more for the random ones. Or to always place several values into the same property - "Zaliznyak property" = [1, a].
Another aspect of this issue is that some words are compounded (two+ words merged together), forming a new word, but each part is inflected according to the rules for that part. Thus in complex cases, we need to indicate which part the classification applies to (either by the position in the compound word, or by the stem, or maybe both).
P.S. there was a related discussion a few years ago, so CCing: @Dominic Z.:, @Fnielsen:, @IvanP:, @Njardarlogar:, @Pamputt:, @VIGNERON:. --Yurik (talk) 00:31, 10 August 2019 (UTC)
Thanks @Yurik:. This is the example schemes we were thinking about:

Russian lexem morphology storage ideas

Lexeme: мама (ru), noun
Statements:
 grammatical gender (P5185) = feminine (Q1775415)

Storing Zalyznyak classification (simple case)

Create a dedicated Q-item for each type in Zalyznyak classification:

 paradigm class (P5911) = Q-item for "1c"

or as facets of a well-known Q-items:

 paradigm class (P5911) = Q-item nouns in Zalyznyak classification
    has characteristic (P1552) = Q-item for stem type "1"
    has characteristic (P1552) = Q-item for inflection type "c"

Alternatively we could create a dedicated property for Zalyznyak classification, instead of P5911.

Storing Zalyznyak for multi-part words

Wulti-part words like молот-рыба (L58959) have an independent classification for each part:

 paradigm class (P5911) = Q-item for 1c in Zalyznyak classification
    word stem (P5187) = молот
    series ordinal (P1545) = 1
 paradigm class (P5911) = Q-item for 2a in Zalyznyak classification
    word stem (P5187) = рыб
    series ordinal (P1545) = 2

Or store stems as top props, and use position index to indicate which is which:

 paradigm class (P5911) = Q-item for 1c in Zalyznyak classification
    series ordinal (P1545) = 1
 paradigm class (P5911) = Q-item for 2a in Zalyznyak classification
    series ordinal (P1545) = 2
 word stem (P5187) = молот
    series ordinal (P1545) = 1
 word stem (P5187) = рыб
    series ordinal (P1545) = 2

or we could store word stem (P5187) as both a top level prop and as a qualifier (which seems odd).

 paradigm class (P5911) = Q-item for 1c in Zalyznyak classification
    word stem (P5187) = молот
    series ordinal (P1545) = 1
 paradigm class (P5911) = Q-item for 2a in Zalyznyak classification
    word stem (P5187) = рыб
    series ordinal (P1545) = 2
 word stem (P5187) = молот
    series ordinal (P1545) = 1
 word stem (P5187) = рыб
    series ordinal (P1545) = 2

Just like for the simple case, we could also use Q-item nouns in Zalyznyak classification with the classification facets plus the stem and position.

How to store Declensions

 paradigm class (P5911) = Q-item for Declension (declension (Q188078))
     Property for Declension class (the same with conjugation class (P5186)) = Q-item for type from declension class 1st declension in Russian (Q66327367)
          --- or --
 paradigm class (P5911) = Q-item for Declension (declension (Q188078))
     has characteristic (P1552) = Q-item for type from declension class 1st declension in Russian (Q66327367)
          --- or --
 paradigm class (P5911) = Q-item for type from declension class 1st declension in Russian (Q66327367)
          --- or --
 Property for Declension class (the same with conjugation class (P5186)) = Q-item for type from declension class 1st declension in Russian (Q66327367)

Archive

We considered these for storing Z classification, but they don't work well with multi-part words.

       -- always store two+ values as top level props
 paradigm class (P5911) = Q-item for stem type 1
 paradigm class (P5911) = Q-item for inflection type c
multiple edits by: Iniquity (talk) 02:58, 10 August 2019 (UTC) and Yurik (talk) 04:13, 10 August 2019 (UTC)

Brazilian Portuguese as a language?

Wikidata supports Brazilian Portuguese (PT-BR) as a separate language from Portuguese (PT) that is usually taken to mean European Portuguese. That's quite useful to avoid dialect wars, and I assume there's no problem in making use of it in Glose when adding senses, but should it be added to the language defining the lexeme, like in this case Lexeme:L46853? What's the best way to deal with regionalisms (or in this case, the lack of them)? - Sarilho1 (talk) 23:27, 9 August 2019 (UTC)

I would be inclined to ONLY add pt-br lexemes that differ from standard pt in some way. However, it would be nice to have a way to indicate that a given lexeme is standard in several language variants. Maybe we need a property for this? ArthurPSmith (talk) 14:06, 12 August 2019 (UTC)
My understanding is that the assumption that pt-pt is more standard than pt-br is problematic. --Denny (talk) 18:11, 12 August 2019 (UTC)
@Denny: I think my first comment was quite confusing. There's the PT (taken as pt-pt or non-pt-br) and PT-BR for Wikidata translations. I think that's fine, it's just a mechanic of the platform and it is useful. The question is that, when defining the language of the lexeme, I think only Q5146 should be used, as the language is, effectively, that one. I disagree with @ArthurPSmith: on that point, but I'm open to his suggestion, since it solves the incoming problem: I don't think creating items for multiple dialects makes sense when they are essentially the same lexeme in the same language. A similar problem would arise if one decides to create lexemes for "one" for London English, Yorkshire English, Liverpool English, New York English, Californian English, Australian English, etc., etc.. I prefer the suggestion of @Circeus: of indicating the dialect (and/or possibly the geographical distribution) as a statement or several ones, where Q922399, Q750553, etc., could all be there if necessary. If no one is opposed, I would like to change the mentioned lexeme to the language Q5146, instead of Q750553; if I'm allowed to do so, I would still like to know if properties should be added (or not) for lexemes that have language wide distribution. - Sarilho1 (talk) 16:34, 14 August 2019 (UTC)
I'm pretty sure regional senses (and that applies to senses that are only pt-pt!) are treated under location of sense usage (P6084), regardless of whether there is a nonregional sense. I do agree that as a geographical equivalent of location of sense usage (P6084), the name leaves to be desired, but in my experience getting a property renamed is basically impossible. Circeus (talk) 16:33, 13 August 2019 (UTC)

Dutch f/m shift (Q64448167) for Dutch

@MarcoSwart, ArthurPSmith, Infovarius: I noticed that this was created a while ago. I think this is a mistake and results from a misanalysis of the situation. What is going on here is not a change of historically feminine nouns to masculine, but rather a loss of the distinction between the two genders. A majority of Dutch speakers follows a common-neuter distinction, while a minority, primarily in Belgium, follows masculine-feminine-neuter. The pronoun used for nouns of the common gender is historically the masculine one, which gives the impression that nouns are becoming masculine. But for the speakers using these pronouns, there is actually no masculine or feminine gender at all, only common gender, so the change is actually feminine > common and masculine > common.

The official standard language does prescribe a feminine gender and gives lists of words that must always be feminine, while showing "f/m" for nouns that can be both. But this is prescriptive and stilted, and doesn't reflect the normal Dutch that people actually use. Wikidata, like Wiktionary, should be descriptive and follow actual usage, not what the standard prescribes or proscribes. English Wiktionary, for this reason, does not have the "f/m" distinction, but labels these nouns as feminine, which is the gender that is used by speakers that still have a feminine-masculine distinction in their speech. See wikt:Wiktionary:About Dutch#Gender and w:Gender in Dutch grammar for more details about this topic. —Rua (mew) 15:01, 5 August 2019 (UTC)

In general with Wikidata we allow conflicting opinions from reliable sources to both be shown. Is there a way to show it here both ways with a reference? ArthurPSmith (talk) 17:09, 5 August 2019 (UTC)
A prescriptive standard is not a reliable source on actual usage. —Rua (mew) 08:28, 6 August 2019 (UTC)
If I understand correctly, Dutch (Q7411), ISO 639-1 "nl" and ISO 639-3 "nld" all refer to the Dutch language as spoken for centuries (usually starting around 1500, after Middle Dutch (Q178806), ISO 639-3 "dum") in the Low Countries and the Caribbean. As far as I know it is undisputed that during this period there has been a gradual shift from a three gender system to a two gender system.
There is an ongoing debate between linguists whether this development should or should not be labeled masculinization, for instance:
"It is generally believed that with the disappearance of the distinction between masculine and feminine nominal gender, feminine nouns were masculinized, as former feminine nouns were increasingly referred to by masculine pronouns (Geerts 1966). However, Audring (2009: 89-91) debates whether masculine pronouns in fact agree lexically with common gender nouns in present-day Dutch and argues that when masculine pronouns are used with common gender nouns, this is always semantic agreement, not lexical agreement. A problem with this view, however, is that masculine pronouns appear to be used more easily with common gender nouns than with neuter gender nouns. This remains unexplained if masculine pronouns are not somehow considered to agree with common gender nouns"
- Kraaikamp, M. (2017). Semantic versus lexical gender: Synchronic and diachronic variation in Germanic genderagreement. Utrecht: LOT.
My use of the term is not based on a strong point of view in this matter, but simply to provide a simple explanation. Introducing common gender tends to make descriptions on how to use the property more complicated. The fact remains that Dutch in Belgium, in formal contexts in the Netherlands and in historic sources has a large class of feminine nouns that in present day colloquial Dutch in the Netherlands can be referred to with masculine pronouns. I fail to see how descriptions that reflect this actual usage can be considered "prescriptive". Just using the feminine marker means ignoring the fact that for some feminine nouns even in present day colloquial Dutch in the Netherlands feminine pronouns are still preferred, so the distinction between Dutch f/m shift (Q64448167) and feminine (Q1775415) serves a descriptive purpose too. The Word list of the Dutch language (Q462352) is updated on a ten year basis, based on corpus research. This should be frequent enough to be reliable for our purposes.
It is clear that en.wiktionary and nl.wiktionary have made different choices on this subject. Using Dutch f/m shift (Q64448167) in no way blocks en.wiktionary to maintain its preference, but enables nl.wiktionary to do the same. --MarcoSwart (talk) 12:07, 8 August 2019 (UTC)
The problem I have is that, if this logic were applied to English, English could be argued to have made all nouns neuter gender. In reality, gender distinctions were lost, and the original neuter pronoun came to be used by default, but this doesn't mean that these nouns are therefore neuter in gender. The same happened to many Dutch speakers as well, where the distinction between masculine and feminine was lost and the original masculine pronoun came to be used by default. These nouns now have common gender, and trying to assign masculine or feminine gender to them based on the pronoun is meaningless. Masculine pronouns no more imply masculine gender for two-gender Dutch speakers, than neuter pronouns imply neuter gender for English speakers.
Secondly, I have strong doubts whether people consistently use feminine pronouns for the nouns where the standard prescribes it. I know I don't, and I can't say I've ever heard anyone else do it either. In practice, all feminine nouns can be referred to by masculine pronouns by speakers who have lost the distinction, whether the standard labels them f/m or just f. This is why I disagree with making this distinction; it doesn't actually describe how people speak Dutch. —Rua (mew) 14:11, 9 August 2019 (UTC)
Dutch f/m shift (Q64448167) was explicitly created for Dutch. There are some scholars sharing your view, but there are also scholars disputing it. In reality most Dutch dictionaries, including all major ones and nl.wiktionary, use Dutch f/m shift (Q64448167). A simple reason is that Dutch is not just the language spoken presently in the Netherlands, but also the written language for several centuries in a larger area. So this information is valuable independent of the different views of scholars on the former. Those who share your view are free to ignore this distinction, but it's clearly within Wikidata's goals to enable users who feel otherwise to share this information. I suggest we describe the different views on the matter in an appropriate place, and go on using this property. --MarcoSwart (talk) 20:23, 18 August 2019 (UTC)
I still disagree with this decision for the reasons I stated. These nouns are not, and have never been masculine. Instead, if you insist on having special items just for Dutch, then they should reflect reality. In other words, there should be "f/c" and "m/c" items, to reflect that these nouns became common gender, not masculine. The same can be applied to Swedish as well, which is in the same situation as northern Dutch nowadays but historically had three genders. —Rua (mew) 10:35, 19 August 2019 (UTC)
"Reality" is a difficult criterion to use on Wikimedia projects, as people may have different views of reality. For me, Hindu deity (Q979507) reflects reality because it is part of well-described normative system, not because I think that any of these deities has ever been observed or that one should subscribe to Hinduism. So maybe you could likewise accept that there is a well-described system in which Dutch f/m shift (Q64448167) provides useful information, even though you personally feel otherwise. I am not saying your view is wrong, I just acknowledge that there are other reasonable views too. I don't think we should try to decide this matter for others, because we may have to engage in original research to do so.
The goal of Wikidata is to "act as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others". At the moment Dutch f/m shift (Q64448167) is part of the structured data used by nl.wiktionary, because some readers use this information. In its present form it corresponds with several major sources for lexicographic information about Dutch. As Wikidata is a cooperation project not requiring formal qualifications to join, it seems unwise to sacrifice the possibility for the average user to easily find and check information on the altar of linguistic precision.
If there are reliable sources enabling editors to use the entities you propose, I would be fine with adding them too. As you pointed out, these data would be more descriptive so it is probable that they cannot be derived from the sources I mentioned. --MarcoSwart (talk) 15:21, 19 August 2019 (UTC)

Sense for the joining conjunction

The following Senses seem to all have the same meaning: og (L3833)-S1, и (L2108)-S1, e (L2774)-S1, i (L58847)-S1, and probably others.

Now we could cross reference all of them to each other using translation (P5972), but that will lead to a lot of links. Alternatively, we could item for this sense (P5137) to a Q-item - but to which one? logical conjunction (Q191081) seems a bit too specific, as it only allows for the conjunction of assertions, whereas all these senses also allow to combine other language parts. Do we need a new Q-Item, or is there one already and I am unable to find it?

I am just wondering about thoughts on this question (Obviously this will have further ramifications). --Denny (talk) 18:21, 12 August 2019 (UTC)

@Denny: We've had some discussions here and elsewhere about whether we should add Wikidata items for other parts of speech - pretty much every item right now corresponds to either a common noun (lower-case label in English) or a proper noun (upper-case). There are a very few other parts of speech among the items - for example small (Q24245823), but of course they all have some close relation to an existing item too (like size (Q322481)) so it's not clear if perhaps some other mechanism than item for this sense (P5137) might be better for handling this... ArthurPSmith (talk) 14:44, 13 August 2019 (UTC)

Thanks, @ArthurPSmith:! As I don't want to repeat old arguments, I would love to be pointed to those previous discussions, so that I can catch up. If someone could point me to those discussions, I would be very grateful. --Denny (talk) 16:36, 14 August 2019 (UTC)

I think we should think if this relation is really transitive? I can imagine (but can't prove by now) that S1 can be translated as S2, S2 translates as S3, but S1 can't be translated as S3. If there are such triples (or longer) then the idea of central hub (Q-item or another) is false. --Infovarius (talk) 14:24, 14 August 2019 (UTC)

Thanks, @Infovarius:! Your point is excellent and well taken, and I very much agree with you that in many cases the transitivity may probably not assumed. But it seems there are a few cases where the meaning seems so primitive and direct that a central hub and the assumption of transitivity seems warranted for numerous languages, such as for the example here, the meaning of the English word 'and'. In those cases it seems that item for this sense (P5137) may do the job? --Denny (talk) 16:36, 14 August 2019 (UTC)

@Denny: On your earlier query - here's some of the previous discussion: Wikidata:Property proposal/adjective of, Wikidata:Requests for comment/What is the general position on adjectival items?, Wikidata talk:Lexicographical data/Archive/2018/11#Translations, Wikidata:Project chat/Archive/2018/11#[Consultation] Using Q-items for senses. None of these discussions seemed conclusive, perhaps because lexemes haven't been heavily adopted here yet... ArthurPSmith (talk) 18:03, 14 August 2019 (UTC)

@ArthurPSmith: Thank you for these pointers. I will read them, and even maybe try to come up with a document reflecting the state if I feel adventurous. Thank you! --Denny (talk) 19:35, 14 August 2019 (UTC)

Do we need to create another property for Declension class?

A little higher, we described general patterns of using properties. It seemed to us that this is a little complicated, so we ask it easier :) Is it necessary to create a separate property for w:en:Declension, like you created it for w:en:Grammatical conjugation - conjugation class (P5186)? Iniquity (talk) 20:12, 13 August 2019 (UTC)

The paradigm class (P5911) is a catch-all bucket for any form of a word classification, where as conjugation class (P5186) is more specific grammatically. In theory the first one can be used for everything, but I am not sure if the validation checker will be helpful. For example, if the word classification item only applies to the Russian words, is it possible to say that the value of P5911 must match the language of that classification? Otherwise we may end up with a French word having a Russian classification, which is obviously incorrect. Creating a separate property specific to one classification system makes it easier - the whole property should only exist on Russian-language lexemes, and all values for that property must be instance-of of that specific classification system. But it means that every language will have their own properties like that - is this a good approach? --Yurik (talk) 20:50, 13 August 2019 (UTC)
I see no need for either conjugation or declension class properties. Conjugation is inflection as it applies to verbs, while declension is inflection as it applies to non-verbs. The distinction isn't meaningful cross-linguistically, and is a leftover from old Latin and Greek grammar. —Rua (mew) 17:51, 15 August 2019 (UTC)
Hm, so, the conjugation class (conjugation class (P5186)) needs to be deleted? I’m not sure that declension is some kind of appendix, since all Slavic languages have this property. Iniquity (talk) 18:52, 15 August 2019 (UTC)
@Iniquity: I'm not saying the distinction can't apply to the Slavic languages. Indo-European languages in general have clear separation between verbal and nominal inflection. But that doesn't apply to all languages of the world. In Zulu, for example, adjectives inflect like verbs with an extra relative-clause prefix, while nouns have no such inflection, and it would be nonsensical to try to group them under "declension". This means that we cannot get rid of "inflection class" as a property; it needs to exist for cases like Zulu where the distinction between "conjugation" and "declension" is meaningless. So we have to decide whether we want to have just one property, "inflection class", for all languages, or whether we want to also have "conjugation class" and "declension class" alongside it for those cases where it is applicable. Since these two properties would be essentially redundant to "inflection class" and would differ only in terminology and scope, I don't think they should exist. —Rua (mew) 10:41, 19 August 2019 (UTC)
@Rua: Ah, I understand what you mean. Yes I agree. And, in principle, this is exactly what my question is about. At the moment, in Russian lexemes, we will use only the inflection class. And since there will be many lexemes, it is likely to become a practice for other languages. And I would like to understand how to live. Iniquity (talk) 16:27, 19 August 2019 (UTC)
I am beginning to feel the same way -- there is relatively little point in having a dedicated conjugation class property when the item itself is a type of conjugation, and we don't have that many of them. Lets consolidate? --Yurik (talk) 19:10, 15 August 2019 (UTC)

Different meanings of the word for the direction left

Does anyone know the meaning of this item: Q18340392? It look like an early try to create lexicographical data. --Kolja21 (talk) 17:25, 22 August 2019 (UTC)

@Diwas: könnten Sie uns den Zweck dieses Objekts mitteilen? Pamputt (talk) 18:13, 22 August 2019 (UTC)
I'm guessing the idea was to link to pages like en:Sinister - but there never seem to have been any actual page links on the page. I'd recommend it for deletion now. ArthurPSmith (talk) 18:22, 22 August 2019 (UTC)
see https://tools.wmflabs.org/reasonator/?q=Q18340392&lang=de There are some Disambiguation pages about words meaning left, sinister, gauche, ... in variations. Wikidata should be able to show those groups of disambiguation pages and other groups of objects. Q18340392 shows a way to link objects, that are not the same, but have particular the same or similar meanings. I do not need this data object, but wikidata loose a value-added service if it forget those links. I hope Wikidata:Lexicographical data will help to provide services to show users links between similar objects, like those or like a link between a profession an a professional, if one wikipedia language version have only an article about the profession and another language version have only an article about the professionals in this profession. --Diwas (talk) 02:41, 23 August 2019 (UTC)
I suppose that Lexemes could help: all these disambiguations can be linked to Lexemes which can be linked to some central item like Q18340392. Look e.g. left (L3350). --Infovarius (talk) 14:58, 23 August 2019 (UTC)
Ah, it was being used as a class to group disambiguation pages. That's an interesting approach, I guess it doesn't hurt to keep it then! ArthurPSmith (talk) 17:55, 23 August 2019 (UTC)
It should still have some sort of properties, but I'm rather stumped as to what they ought to ought to be. I'll ask over at Project Chat. Circeus (talk) 19:09, 23 August 2019 (UTC)

Adding reference URL to lexeme forms

Hi, I try to add reference URL for this lexeme form and based on the warning I get, I suppose that the reference should be inserted for a statement, not as a statement itself. Could anyone suggest me what is the correct way to do that if I want to reference the form, not any statement? --Strepon (talk) 14:17, 25 August 2019 (UTC)

A usage example (P5831) statement seems like the most obvious option. As you seem to be basically saying "this is here because its in dictionary X", though, maybe attested in (P5323) is a better option (big maybe; I can't say I'm enamored of it, though: it's a little too "let's import this dictionary wholesale" for my taste, and I assume that website is not likely to be free and open content). Mind you, a regular "form of" shouldn't require any sort of sourcing whatsoever IMO (because it just follows from the normal properties—i.e. classes/declinations—of the word combining with the language's normal rules). I'd only source it separately only if it doesn't follow from any of the usual patterns in the language (e.g. the Past historic of verbs in -traire, which are a source of considerable hesitation in books that teach French conjugation). Circeus (talk) 03:37, 26 August 2019 (UTC)
You pointed out the problem which I am not sure about: how to source forms. I agree that natural and predictable patterns does not require any reference (on the other hand: if someone raises doubts about them, how to prove they are correct?), however sometimes I don't know which variants are codified precisely and I need to look to the dictionary - then I think it is proper to add the source. Is there any consensus or rule regarding this topic?
From your suggestions, attested in (P5323) sounds better for me, as I'm not adding examples; but I understand your concerns related to non-free sources. --Strepon (talk) 20:01, 27 August 2019 (UTC)

Multiple Pronunciations per Form

There are often cases when the same written word form has different pronunciations. Each pronunciation has a number of properties itself, e.g. the sound file, IPA, region in which it is used, or the references to scholarly works about it. I would like to propose that we put pronunciation-related properties as qualifiers to a single property. The property itself would have the written form of the word, possibly repeated, with the linguistic stress marks applied to it (important for some languages like Russian).

Lexeme = <word>  e.g. "поперчивший"  (<he> peppered ...)
Forms = [
  {
    Form = <word-form>   e.g.  "поперчивший" (primary form is the same as lexeme)
    Statements = [
       Pronunciation Form = <word-form-pronounce1>    e.g. "попе́рчивший"  (one common usage, with the stress mark on the 2nd syllable)
          IPA = [pɐˈpʲert͡ɕɪfʂɨɪ̯]
          Sound = sound1.ogg
       Pronunciation Form = <word-form-pronounce2>    e.g. "поперчи́вший"  (another common usage, stress on the 3rd syllable instead)
          IPA = [pəpʲɪrˈt͡ɕifʂɨɪ̯]
          Sound = sound2.ogg
          Region = ...
          References = ...
    ]
  },
  ...
]

In some cases, even the stress will be the same, e.g. the word сессия (session) has two forms - [ˈsɛsʲ(ː)ɪɪ̯ə] and [ˈsʲesʲ(ː)ɪɪ̯ə] (see link for sound files). In this case we will simply have two identical values for the pronunciation forms but with a different set of qualifier values. If there are no objections, I would like to create a new property.
P.S. Another good example -- моветон -- 4 different pronunciations of the same word (with both IPA and sound files), and two of them reference a source. --Yurik (talk) 21:04, 27 August 2019 (UTC)

Property request created, please support. --Yurik (talk) 02:06, 28 August 2019 (UTC)

Merging duplicates--is it possible?

(L1027) and aluminium/aluminum (L18179) are duplicates but the "Merge Wizard" doesn't appear to support merging of lexemes. Is there another way to merge these two lexemes together? Dhx1 (talk) 18:10, 24 August 2019 (UTC)

@Dhx1: There is Special:MergeLexemes. --Shinnin (talk) 19:48, 24 August 2019 (UTC)
@Dhx1, Shinnin:   Done   Merged the trick is that to merge Lexemes they need to have the same main lemma for the same language (which was not the case, that's why I changed it Special:Diff/1004219174 before merging to unblock the merge). Cheers, VIGNERON (talk) 15:04, 28 August 2019 (UTC)

LexData: easy to use python libary to edit Lexemes

Since I couldn't found any library for editing Lexicographical data I wrote my own. It's still "beta" but working properly and quite easy to use. You can find it here [1] and the documentaion (incl. code example) here. Have found! I already created a tool like "Wikidata games" to add senses to existing lexemes – I will publish that as soon as it's presentable. -- MichaelSchoenitzer (talk) 23:20, 27 August 2019 (UTC)

Thanks for posting MichaelSchoenitzer! I have been writing to MW directly using the minimalistic pywikiapi library as part of my Russian lexem import project lexicator, but there could be cases where you want to simplify higher-level abstracts like lexem/form/sense. I also replied in your github (issues). --Yurik (talk) 00:08, 28 August 2019 (UTC)
Thanks a lot, that's amazing! I hope that will help people creating more tools for lexicographical data :) Lea Lacroix (WMDE) (talk) 08:47, 28 August 2019 (UTC)
@MichaelSchoenitzer: Gamification of addings senses is very welcome! There is some possibility in https://tools.wmflabs.org/hauki/browse/ru?sense=false but it is not ideal. --Infovarius (talk) 10:26, 28 August 2019 (UTC)

Incorrect datatype in the ttl file

While loading wikidata on Stardog on AWS, I am getting an error that that '-240000000-01-01T00:00:00Z' is not a valid value for datatype http://www.w3.org/2001/XMLSchema#dateTime. Can we fix this issue? I believe there are multiple such occurrences in the data, and would create an issue in any loading process.  – The preceding unsigned comment was added by Tushar1080 (talk • contribs).

@Tushar1080: How is 240 million years ago is related to any human language? --Infovarius (talk) 10:27, 28 August 2019 (UTC)
Return to the project page "Lexicographical data/Archive/2019/08".