Wikidata talk:Lexicographical data/Archive/2019/02

Latest comment: 5 years ago by VIGNERON in topic P31 on Lexemes
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Is conjugation a lexeme?

Hi all,

Last year Jura1 created invalid ID (L766) (see Wikidata_talk:Lexicographical_data/Archive/2018/06#Conjugation tables as lexeme ?) but I'm not sure what to do with this entity. Is it really a lexeme? I could add more forms but the main lemma and lexical category are very strange, so before going further and wasting my time, I'd like some more point of view.

Cheers, VIGNERON (talk) 11:16, 27 January 2019 (UTC)

@VIGNERON: Indeed it looks strange. I did myself lexemes for regular flexions as in -owy (L23887) or -ość (L23224). KaMan (talk) 11:33, 27 January 2019 (UTC)
@KaMan: these lexemes about suffixes make sense (a suffix is a lexeme) but here, invalid ID (L766) is about a conguation class (which is not a lexeme, or at the very best a odd one), I can't think about a main lemma, so I need help or idea. Cheers, VIGNERON (talk) 15:15, 30 January 2019 (UTC)
This looks strange to me too. — Finn Årup Nielsen (fnielsen) (talk) 18:11, 30 January 2019 (UTC)
It certainly looks strange, but can it be used to autogenerate forms?--Micru (talk) 23:13, 30 January 2019 (UTC)
@Fnielsen: how to make to un-strange ?
@Micru: it could (with a lot of caveats, this entity is still uncomplete and AFAIK there is still no tool for forms) but that beside my point here : does this information need to be store in a Lexeme? and if store in a Lexeme how to structure it? and if store elsehere: where and how?
Cdlt, VIGNERON (talk) 13:41, 1 February 2019 (UTC)
@VIGNERON: I have the feeling that it would be nice to have something like that in a separate namespace. The main reason being that they are not actually lexemes, but particles that are used to create forms. I need to ponder it a bit longer.--Micru (talk) 18:34, 1 February 2019 (UTC)
To make sure people correctly understand, maybe would could change the category from "verb" to "conjugation class" and translate the label into Breton. --- Jura 14:45, 3 February 2019 (UTC)
@Jura1:
For the first point, that more or less what I asked you last May…
For the second point, how is that of any help? "regular Breton conjugation"@en or "displegadur reizh brezhonek"@br seems equally unrelated and disconnected to this entity and its content.
Cheers, VIGNERON (talk) 14:25, 5 February 2019 (UTC)

Toponymy in Finland

I have also started experimenting with toponymy, and I have a couple of questions.

  • My place name is Alavus (L42325). In Finnish language we have very irregular ways of expressing movement to/from the place or being in the place, and lexicographical data is superb for this! We could create text with Wikidata entries, if we would have access to the conjugations. So I hope the project will allow that. The question: I have been able to locate the grammatical cases for those three situations: locative case (Q202142), separative case (Q614341) and lative case (Q260425). The last one seems to be a broader concept instead. I am not a linguist and I am inviting the crowd (of linguists) to help me find or establish the correct case.
  • It would be great to link the demonym to the sense of the actual lexeme that it is related to, instead of the place item that it relates to. Can this be established and how should/will it be done? Perhaps a property could be devised for that? In my examples I have done it in two different ways.
    1. In the place name Alavus (L42325) it is declared as having a form demonym (P1549) "alavutelainen". There is no link between the place name lexeme and the demonym lexeme. I am not sure if this would be the correct way to do it anyway.
    2. In the demonym lexeme alavutelainen (L42333) it is declared as the demonym of (P6271) the place Alavus (Q5981). Here, too, it misses the connection between the place name lexeme and the demonym of it.

Thanks a million! – Susanna Ånäs (Susannaanas) (talk) 12:40, 11 February 2019 (UTC)

@Susannaanas: I don't know your language, but as far as I understand other languages "alavutelainen" is not form of Alavus (L42325) but separate lexeme (noun). See Wikidata:Lexicographical data/Glossary what form is. KaMan (talk) 07:36, 12 February 2019 (UTC)
I completely agree, and I made this one experiment to chart out the options and pitfalls. –Susanna Ånäs (Susannaanas) (talk) 10:33, 14 February 2019 (UTC)
@Susannaanas: I also think that the demonym is not a form of the place (and I guess this demonym has form of its own) but not sure, so I ping @Fnielsen: that could maybe help. Cdlt, VIGNERON (talk) 09:24, 12 February 2019 (UTC)
Thank you! I think this is quite an ordinary demonym, therefore I created the separate lexeme for that. I would like to be able to establish a connection between the demonym lexeme and the place name lexeme from which it is formed. Perhaps there is another property for that? –Susanna Ånäs (Susannaanas) (talk) 10:33, 14 February 2019 (UTC)

P31 on Lexemes

Hi,

I see that instance of (P31) is used on a lot on Lexemes: 1129 Lexemes right now (query).

I we look more into it we found that there is 96 different values for instance of (P31), 236 Lexemes using the value count noun (Q1520033), 83 diminutive noun (Q56682890), 78 reconstructed word (Q55074511), etc. (full query).

For most value (and at least the first two values), I would say it should be in the Lexical category, shoudln't it?

The third value is a different matter, but still I'm unsure that instance of (P31) is a good fit for that data.

Cheers, VIGNERON (talk) 10:01, 12 February 2019 (UTC)

If the value is itself a lexical category (and count noun (Q1520033), diminutive noun (Q56682890) probably qualify) then yes, that makes sense to me. But I think instance of (P31) is perfectly fine to use for other situations related to structure or etymology for example. ArthurPSmith (talk) 14:32, 12 February 2019 (UTC)
I think that high level lexical categories (verb, noun, adjective, etc) should be reserved for Lexical category, while instance of (P31) can be used for various other characteristics such as count noun (Q1520033), unadapted loanword (Q493000), etc. — Finn Årup Nielsen (fnielsen) (talk) 14:37, 12 February 2019 (UTC)
Consider jul (L37293) (Christmas). It is definitely a Danish noun, but less clear if it is a appellative (Q498187) and a singulare tantum (Q604984) (to me at least). Using instance of (P31), we have the ability to reference the instance of (P31) statement and avoid the need to choose whether appellative (Q498187) or singulare tantum (Q604984) should have been used as the lexical category. — Finn Årup Nielsen (fnielsen) (talk) 14:59, 12 February 2019 (UTC)
I agree with Finn Årup Nielsen, lexical category is for basic set of parts of speech like generic noun, verb etc. All other features can be set with P31 KaMan (talk) 15:03, 12 February 2019 (UTC)
Good points, adding a source makes use of statements somewhat better than using the built-in relations. ArthurPSmith (talk) 15:53, 12 February 2019 (UTC)
  • Given the use that is made of P31 on other entities (i.e. properties and items), I think it would be preferable to not use P31 at all on lexemes. The feature "lexical category" can be seen as the equivalent for P31 on lexemes. I don't think it would be good to repeat its value in P31 systematically on lexemes. For information currently included in P31, but not in the lexical category, "has quality" could do. --- Jura 16:09, 12 February 2019 (UTC)


If I may try to summarize, the point of view are as follow :
  • there is always some exceptions where instance of (P31) could be useful, fro mayn reason (everybody seems to agree on that, maybe except Jura)
  • instance of (P31) could be used to be more precise or complement the Lexical category (and here the point of view diverge a bit, mostly it's seems to be on a case-by-case basis)
@ArthurPSmith, Fnielsen, KaMan, Jura1: do you agree with this summary? And going forward, couldn't some of this value be used on an other property/place ? and what is expected in the Lexical category: only a few general value or could more precise be acceptable? (in the case where there is no doubt obviously) We had some discussion about this in the past but as far as I can see, no clear consensus.
Cheers, VIGNERON (talk) 14:39, 14 February 2019 (UTC)
The summary is ok. I usually expect in lexical category one of w:en:Part of speech + phrase (Q187931). KaMan (talk) 11:38, 15 February 2019 (UTC)
@VIGNERON: Yes that about summarizes it. I was thinking just now about count noun (Q1520033) vs mass noun (Q489168). At least in English the same lexeme can frequently have either of these natures, depending on the sense - an example is the word 'mass' itself. So I don't think either of them actually makes sense to use at the "lexical category" level - unless we want to duplicate lexemes and subdivide the senses in these cases? So in general I think now I'm more in favor of limiting lexical category to just the most general possible terms - 'noun', 'verb', 'adjective', etc. though I do think proper nouns and numerals deserve their own categories. ArthurPSmith (talk) 16:15, 15 February 2019 (UTC)
@ArthurPSmith: thanks. So I guess you would agree with the start of this related discussion: Wikidata talk:Lexicographical data/Archive/2018/03#Lexemes categories and grammatical features vocabulary (with only 17 lexical categories). Not sure if it's the same problem but for "tour"@fr I decided to create 3 lexemes : tour (L2330) tour (L2332), tour (L2331). It's much much simple to use and easier to understand in this way. Maybe the same solution can work for "mass"@en. Cdlt, VIGNERON (talk) 16:25, 15 February 2019 (UTC)
Oh, I missed that. I'm not sure that specific list is exactly what's wanted (and "conjunction" appears twice?) but something close to that, yes. Maybe we can codify it somewhere here? This might be a good topic for a "Wikiproject Lexemes" to get started with??? As to the "tour"@fr case - I think that's very different from "mass"@en; in English this problem affects thousands of noun lexemes, not just a few, and is unrelated to etymology or the forms. See this Wiktionary search for many many more cases. ArthurPSmith (talk) 16:35, 15 February 2019 (UTC)
@ArthurPSmith: this list is based on external ontology, so it could be useful to link with others lexical database (conjunction appears twice because Universal POS tags make the distinction between conjunction and subordinating conjunction, but Wikidata hasn't really that distinction, we now have subordinating conjunction (Q11655558) but it's far from perfect). I'll look into countable vs. uncountable but that seems to be a whole can of worms… Cheers, VIGNERON (talk) 16:31, 17 February 2019 (UTC)
@VIGNERON: what do you think about the case when some lexeme belongs to several categories simultaneously? Example: L:L2123. --Infovarius (talk) 15:21, 15 February 2019 (UTC)
@Infovarius: very good question. I'm not sure but it depends. Is it really one lexeme or several lexemes? Is it in several category because the sources are contradictory themselves or is it because of a lack of source? And anyway, even if a word is really is several category, is instance of (P31) the best property ? why not - for example - a property "Lexical category" for this kind of cases ? Last, is there a lot of these cases or is it minor exception ? (I don't believe we should build a whole structure based on anomalies ; for a poor analogy: we still use father (P22) even if some species can reproduce without a father). I have questions and no answer, I hope this discussion could help find some. Cdlt, VIGNERON (talk) 15:56, 15 February 2019 (UTC)

lexeme bot

hi do way to creat many lexeme with bot import from wiktionary.org and many sites Amirh123 (talk) 12:26, 12 February 2019 (UTC)

@Amirh123: not sure to fully understand but no. There is no API (except some basics) for bots now on Lexemes, almost everything is done by hand (and it's a good thing as the structure is not stable yet). Cheers, VIGNERON (talk) 13:44, 12 February 2019 (UTC)

Lexeme bot proposal

This proposal could probably use more comment - GZWDer has added 100 sample Welsh verbs from a Welsh-English dictionary that's in Project Gutenberg, and is proposing to import the rest. Ok with us here? Can we recruit somebody familiar with Welsh to comment? ArthurPSmith (talk) 16:21, 15 February 2019 (UTC)

Return to the project page "Lexicographical data/Archive/2019/02".