Wikidata talk:Lexicographical data/Archive/2022/02

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Same (spelling) forms, different pronunciation

Latest comment: 2 years ago2 comments2 people in discussion

I want to discuss an interesting case: wikt:ru:яр. There are 2 main meanings: one (II) has the simplest declination paradigm (1a), the other (I) can be declinated in the same way but also in "1c"-way which differs only in pronunciation (stress on different syllables). Can we (and should we) model both cases in one lexeme (яр (L184179))? And even if not, how better to model 1c/1a variativity? To create couples of forms with the same spelling and different pronunciations or to collect both pronunciations in single forms? --Infovarius (talk) 21:29, 31 January 2022 (UTC)

I'd say having two lexemes might be the more cleaner approach in this case, given that half the forms (and not just one or two) are affected by the choice of meaning for that word. As for modeling variation due to choice of declension paradigm, perhaps "applies to part" (class 1a/1c) might qualify "pronunciation" on the affected forms? Mahir256 (talk) 00:51, 1 February 2022 (UTC)

Please advise - place names

Latest comment: 2 years ago4 comments3 people in discussion

Hello, I'm preparing an import of municipality names in Czech to Wikidata. I have a few questions based on a study of existing lexemes:

in senses, should there be more than one language? See Třebíč (L437) which states that the word has (the same) meaning in both Czech and English. Is this desirable?
Aberdeen (L494798) has sense S1 for "city in Scotland" and sense S2 for other places of the same name. Is this what is expected? Vojtěch Dostál (talk) 11:52, 24 February 2022 (UTC)

@Vojtěch Dostál: The issue with L437 is a mistake by the user who added it, which I have fixed. As for L494798, the approach taken is something I would be comfortable with if it were applied more broadly (@ArthurPSmith: might be able to explain more), but at the moment that is not the case (see e.g. Guhkesjávri (L633165) and @Jon Harald Søby: who created it). Mahir256 (talk) 14:58, 24 February 2022 (UTC)

@Mahir256 Thank you for your help on this. I'll be using the current version of Lexeme:L437 for further work. Happy to hear details from others. Vojtěch Dostál (talk) 15:13, 24 February 2022 (UTC)

I was long reluctant to add proper nouns as Wikidata lexemes because that would mean every organization, location, person, etc. could have a lexeme in every language as well as an item. However, some proper nouns are very widely used, either because they represent a very prominent real-world entity, or because they have many different real-world entities with that name, and it does seem useful to have them included here. "Aberdeen" is an example of both. Practically it seems silly to create a separate sense for each real-world entity with that name (how would we even do that with given names?) but maybe it's not a big burden for Wikidata, I don't really know. No strong feeling on that from me. ArthurPSmith (talk) 17:17, 24 February 2022 (UTC)