Wikidata talk:Lexicographical data/Archive/2020/03

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Redirecting senses or forms?

Is there a way to redirect senses or forms? I imaging that that could be useful when a sense or form turns out to have been mistakenly assigned to a lexeme and needs to be moved, or when it turns out that what has been modeled as one lexeme should be split into more than one, for example along the line of senses or (groups of )forms.―BlaueBlüte (talk) 04:00, 31 December 2019 (UTC)

  • I'm not really sure about doing that with senses. The S1 of "edit" could re-direct to the S3 of a Chinese word?
For forms, maybe we should go a step further, see #Form_(sub-)entities. --- Jura 14:49, 3 March 2020 (UTC)

Form (sub-)entities

Is there a way to create forms without connecting them to lexemes-entities? Looking at c:Category:Pronunciation-files, I think it could be interesting to create forms for most of them and later connect the forms to lexemes.

The short answer is probably that currently one cannot, but maybe the development shouldn't be that complicated. --- Jura 09:33, 15 February 2020 (UTC)

Same Lexeme, different derivation: how to handle?

Hi - I'm looking at plane (L3794) and plane (L46157), the English noun "plane".

These two lexemes have identical languages, lemmas, and surface forms, so by the definition in the Data Model, they should be one lexeme. But, as currently defined, they have different derivations, through "planum" and "airplane". While arguably those two words share a Latin root, there are other cases, such as English "dub", where two completely different etymologies derive the same modern lexeme.

From anything but an etymological perspective, these are identical lexemes and should be merged, yes? Are we making any attempt to represent the derivational basis of different senses? Michaelrhanson (talk) 15:45, 25 February 2020 (UTC)

Yes, I think the intention from the beginning was to allow lexemes to be considered separate if they have distinct etymologies (and associated meanings) even if they are formally otherwise the same. ArthurPSmith (talk) 19:30, 25 February 2020 (UTC)
@Michaelrhanson: +1, I agree with @ArthurPSmith:, there is a lot of reason why 2 (or more) Lexemes could have the same attributes, including but not limited to etymology (from and to the lexemes). See also tour (L2330), tour (L2331), tour (L2332) (different etymology and different gender). PS: this is not new nor specific to Lexemes, dictionaries do the same since forever. Cheers, VIGNERON (talk) 15:02, 27 February 2020 (UTC)
Got it, that makes perfect sense. I wonder if it would make sense to update paragraph 3 of the Lemma section of WikibaseLexeme/Data_Model [1] to include a sentence to this effect? Right now, that paragraph only mentions a difference in Form, which led me to wonder if a difference in etymology was considered a valid reason. 2620:10D:C090:500:0:0:5:20A0 17:04, 27 February 2020 (UTC)
I tried a better wording, is it clearer that way? Cheers, VIGNERON (talk) 14:51, 29 February 2020 (UTC)
Definitely clearer, thanks! Michaelrhanson (talk) 15:15, 9 March 2020 (UTC)

ain't, can't, couldn't, etc.

To hold the contracted forms, I made

The lemma seems suboptimal. Maybe it should be "not to be", but then we don't use "to be" either. --- Jura 10:31, 9 March 2020 (UTC)

why not use "isn't" as the lemma? ArthurPSmith (talk) 17:17, 9 March 2020 (UTC)
I guess I'm somewhat used to lexemes where only some forms contract, but not all (e.g. F3 of dudit (L19364).
In the meantime I found will not (L17044) and cannot (L16211). Whatever the solution, "ain't" is probably hard to insert.
A suggestion on how to cross-reference them from non-bots at Wikidata:Property_proposal/has_contraction_2. --- Jura 13:04, 10 March 2020 (UTC)

homonymiс forms

There are some number of "homoforms" generated by bot like каракулю (L115099). It is a form of каракуле (L115094) and каракуля (L254375). Can we and should we to create such pages? I know that English Wiktionary create all such half-empty pages but Russian tends not to create pages for form of 1 word (which are in fact redirects to the main lemma). We can't redirect such "lexemes" because there are 2 variants to where... @Cinemantique, DonRumata, Yurik: dear colleagues, please discuss too. --Infovarius (talk) 12:30, 10 March 2020 (UTC)

It is not a lemmas, but a froms of the lexeme. See Lexeme:L144760#F3, Lexeme:L144760#F6 and Lexeme:L144760#F2. Don Rumata 12:53, 10 March 2020 (UTC)
@Infovarius: not sure to understand, what is the problem exactly? Anyway, there is nothing wrong to have several "homoforms" lexemes (whatever it means, каракуле (L115094) and каракулю (L115099) have no forms right now - which is bad and makes it hard to tell) *if it's justified*. Cheers, VIGNERON (talk) 14:19, 10 March 2020 (UTC)
@VIGNERON: In russian wikt:ru:каракуля is a lemma with own forms and, at the same time, the form of lemma wikt:ru:каракуль. Don Rumata 15:03, 10 March 2020 (UTC)
@DonRumata: thanks, then both are Lexemes (a bit like tour (L2330), tour (L2331) and tour (L2332) in French, same lemmata and forms but different Lexemes). Could you add the forms to make it clearer. Cheers, VIGNERON (talk) 15:36, 10 March 2020 (UTC)
@VIGNERON: No, it's like two lemmas wikt:trouser and wikt:trousers and two forms — plural noun trousers and verb third-person singular simple present trousers. Don Rumata 16:27, 10 March 2020 (UTC)

Spelling variant without specific dialect it is used in?

I have been adding lexemes in Spanish for the elements, and just made one for kriptón (L267811). However, I read that this element has two spellings: kriptón and criptón. I want to add a spelling variant, but this doesn't seem to be a specific thing to one dialect or another; the editor does not allow me to add a spelling variant that is also just under es.

If a spelling variant is not a regional variation, how can I add it to a lexeme? Should it be a separate lexeme or something? Thanks. DemonDays64 | Talk to me 04:43, 12 March 2020 (UTC) (please ping on reply)

@DemonDays64: good question, I would also say put the differents variants as forms. For the main lemma, if you really want to put the two variants (as indeed both sems equally used Google ngram), then you can use "bogus" codes like es-x-Q9922 and es-x-Q9820 (or anything else, I'm thinig about calque (Q204826) - lexicalisation (Q687185) also but it's not exactly the right concepts…). And it's a good thing that you cannot add two lemma with the same code, otherwise, the tools would be lost. Cheers, VIGNERON (talk) 10:49, 12 March 2020 (UTC)

Adding forms with LexData

Hi,

Since every Lexemes should have - at least - one form, since most Lexemes in French didn't have one form and since the Lemma is itself one form, I created a very simple script in Python that added these lemmata as a form using the library LexData created by @MichaelSchoenitzer: (thanks!) and I wanted to share it here since it may be useful to other people.

#!/usr/bin/python3
import LexData
from LexData.languages import en

repo = LexData.WikidataSession("VIGNERON", "myverysekretpassword")

EditList=["L entity where I want to add a form", "L123", "L1234"]

for i in EditList :
    L = LexData.Lexeme(repo, i)
    L.createForm(L.lemma,["Q110786"])

Cheers, VIGNERON (talk) 16:23, 2 March 2020 (UTC)

How did you generate the editlist?--So9q (talk) 23:19, 2 March 2020 (UTC)
@So9q: good question, I created it with a SPARQL query :
SELECT ?l ?lemma WHERE {
   ?l dct:language wd:Q150 ; wikibase:lemma ?lemma .
  FILTER NOT EXISTS {?l ontolex:lexicalForm ?form }
}
Try it!
And then I did some formatting with Notepad++
Cheers, VIGNERON (talk) 12:45, 3 March 2020 (UTC)
Seems the bot request for the above task was already archived.
It's really something that should be created by default by Special:NewLexeme.
To add forms with QuickStatements, if others are interested, maybe one could ask Magnus to add that. --- Jura 14:46, 3 March 2020 (UTC)
There is none. No bots were harmed nor involved in these edits.
Not sure to understand...
Yes, it has been asked several times, including but not limited to phabricator:T220985. That's why I learned how to use Python instead of QuickStatements (+ timeo hominem unius libriinstrumenti).
Cheers, VIGNERON (talk) 15:27, 6 March 2020 (UTC)
It's at Wikidata:Bot_requests/Archive/2018/10#Copy_lemma_to_F1. --- Jura 13:09, 10 March 2020 (UTC)
This bot request was to simply copy the lemma as a form. LexData allows to add other informations (like the grammatical features in my example above) at the same time. I could just do the simple copy but I think it would be better to add richer data. I will look into it for language I know and since I shared the tool, everyone can do it for the languages they know. Cheers, VIGNERON (talk) 17:07, 11 March 2020 (UTC)
Sure, the more the better. Maybe we could also create a script that autofills them based on some statement.
BTW I readded it to Wikidata:Bot_requests#Copy_lemma_to_F1. If if all French ones may be done, the other day I check some incomplete English ones.--- Jura 16:34, 26 March 2020 (UTC)

Strange Lexemes in Chinese

Hi y'all,

I stumble upon strange Lexemes in Chinese created by an IP : Special:Contributions/125.143.69.146 (invalid ID (L238134), invalid ID (L238135), invalid ID (L238136), invalid ID (L238137), invalid ID (L238138), invalid ID (L238139)).

Before making a Wikidata:Requests for deletions, I want to be sure: can someone confirm that this is not Lexemes at all? (my Chinese is not very good but I can't find these lemmata in any dictionaries). And if they shouldn't be deleted, there is a lot of work to complete and correct them (at least the Lexical categories are all wrong).

Cheers, VIGNERON (talk) 11:01, 19 March 2020 (UTC)

As a native Chinese speakers I can confirmed that they are not Chinese words.--GZWDer (talk) 00:47, 23 March 2020 (UTC)
Thanks GZWDer, RD   Done. Cheers, VIGNERON (talk) 20:25, 25 March 2020 (UTC)

item for this sense (P5137) for non-nouns

Hello, I know I am probably not the first to ask but still ... could someone explain to me (as simply as possible) how the senses for adjectives, adverbs and verbs are modelled using item for this sense (P5137), please? As it is now and what I understand from property talk page, it is not about senses of words but more about "it has something to do with X". I don't see what benefit it has like this. For example I see no way how to take synonyms or translations properly from this (it works quite well for nouns). --Lexicolover (talk) 21:42, 23 March 2020 (UTC)

@Lexicolover: I'm not sure there is a wide consensus but indeed Property talk:P5137 is probably the best place to sum up the current statu quo. The benefit here is: one same property for the same thing on all senses of all lexemes (as lexical category doesn't not impact the meaning) and because there is currently no other property (and creating of properties for each lexical category seems like a very bad idea to me).
« it works quite well for nouns » does it? A noun in one language is often not translated by a noun in an other language. To take a simple example: « J'ai faimnoun » in French (word to word « I have hunger ») is « I am hungryadj. » in English (notice also how the verb "to have" is replaced by "to be"). I wouldn't use Lexemes for translation, not alone at least. That said, if you really want it, it's trivial to filter by lexical category.
Cheers, VIGNERON (talk) 09:14, 26 March 2020 (UTC)
@VIGNERON: Thank you for your reply but it does not answer my question how the senses should be modelled. What is that property for if not for expressing sense of the word? Really just to say "it has something to do with X"? (I am not asking for special property for each lexical category at all.) And yes it works for nouns quite well (not perfectly but quite well) because nouns by definition express concepts. On the other hand, for example, adjectives modify other concepts - just linking adjective to the concept does not express what that adjective actually mean (in some cases it might do, in other cases it doesn't). Two adjectives linking to the same concept might be synonyms or antonyms or they might have some completely different relation between them, thus filtering by lexical category solves nothing. Your example is good, it explains why one can't translate word by word between different languages, but it is more of syntactic or phraseological issue not lexical. At this point I would not use this property for translation either (because of above mentioned issues) but it somehow kills the whole good idea of indirect synonyms and translations. --Lexicolover (talk) 16:05, 26 March 2020 (UTC)
@Lexicolover: yes, it is to « for expressing sense of the word », emphasis on word: meaning any words and not just nouns.
« Two adjectives linking to the same concept might be synonyms or antonyms », it shouldn't. Do you have any example? Antonymic concept should have distinct items (this is why there is opposite of (P461)). "peaceful" should link to peace (Q454) and "warring" to war (Q198) for instance. Am I missing something?
The statu quo may not be perfect but I think this is the best we have right now. Obviously, if you have other ideas, I'll be glad to hear them and talk about it.
Cheers, VIGNERON (talk) 16:50, 26 March 2020 (UTC)
@VIGNERON: Okay, some examples of what I have in mind:
  1. heavy × light
    Polar opposites. Both are sure to fall under concept of mass (Q11423). For the sake of it we could create (sub)concepts od heaviness and lightness (or do we have them already?)
  2. beautiful × nice × so-so × ugly
    Concept of beauty (Q7242), different points on the scale. Again we can create (sub)concepts but it starts to feel little streched. We don't know how many points that scale have, different languages might differ in understanding where those points lie (thus we are getting little messy here). Do we really expect that everyone is able to find correct Q-item or create it?
  3. kamenný@cs × kamenitý@cs
    First means 1) made of stone (as in 'kamenná socha' ~ 'stone statue'); 2) which evokes stone (as in 'kamenná tvář' ~ 'stone face'); the second means 1) which contains lot of stones (as in 'kamenitá cesta' ~ ?'stony road'). Different senses, not interchangeable, both under stone (Q22731). I can't imagine having Q-item kamenitost (the word itself sounds weird), even if we can have such Q-items we are getting to the point of having special Q-items to deal with subtleties of a language which might or might not have counterpart in any other language. And we are also getting to the point where we have special items just for the sake of one adjective (how should those items look like?) which is probably not the direction we want (because we want to use concepts not words, because concepts are broader).
  4. válečný@cs (as 'in válečný zločin' ~ 'war crime') × válčící@cs (verbal adjective; as in 'válčící státy' ~ 'warring states')
    Analogy to the previous example. Not interchangeble and even more strange to create special Q-item.
I came to the solution how to deal with cases like actor × actress (eg. námořnice (L290307)) which is not consensual but it works for me until we have something better but it does not seem applicable for adjectives or verbs (not to mention we should have something generally accepted and working). --Lexicolover (talk) 18:25, 26 March 2020 (UTC)
@Lexicolover: your examples are a bit strange, is this true example? "heavy" or "light" should *not* link to mass (Q11423) (except when "heavy" is used in the sens of "having mass, massive, weighty" but that's not the point here), which is obviously too general or at least not alone. Same for 2), use the possibility of Wikidata to create subitems, use qualifiers, multiples values and so on. The point 3) and 4) is more interesting but again seems to be the same as the previous one. Indeed if an item would be used by one and one lexeme, it's a bad idea (both for items and lexemes), but whatever precision you would put in the item could go in the lexeme.
Yes, a qualifier like in námořnice (L290307) (and the 15 others lexemes see query https://w.wiki/LLj including fyr (L33928) in Danish) is what should be done! (if "námořnice" is indeed just for female and not just for feminine ; unlike in French). I don't see how it's « not consensual » (that's done on a big number of lexemes with others qualifiers), to me it seems that this is what we do on Wikidata for more than 7 years now ;) And it shouldn't be limited only for adjective by the way, it would be useful for all words, including nouns.
Cheers, VIGNERON (talk) 09:11, 27 March 2020 (UTC)
@VIGNERON: Thank you for all your time but I am back to my original question - How should I do it? I know I can use qualifiers or multiple values, but I don't know desired syntax, I don't know how should I put it together so it would be understandable, usable and useful. If I use multiple values is the relation between them logical AND or logical OR (I personally think of it as logical OR and qualifiers as logical AND, but I don't know if that is correct, I have seen people proposing otherwise)? How far can I go with subitems? Do we have properties I could use as qualifiers to deal with above mentioned examples (it is not always easy to find properties matching something one have in mind)? Is there any reference material (internal or external) of what we want to achieve and best practices of how to achieve it? This whole time I don't want to change the property used, this whole time I want to know how should I do it so it will be useful. Right now I only have that mentioned property talk page and it has really simplified approach to the issue.
I don't know what you mean by your question whether it is true example. All of those words exist and I tried my best to describe my thought process of what issue I see with them. Where the "heavy" should link to then? (I could use an example of "long" and "short" that fall under concept of "length") Creating subitem like "heaviness" for item "mass" (or whatever better item) really feels to me the same as creating special occupation item for opposite gender (which I think is not desired). I am not veteran Wikidata editor who went throught dozens of discussions to be able to say this item is okay and this not or easily find reference material of how to do something. I have to ask, it really isn't meant to offense anyone.
Using "gender" qualifier is unconsensual. When I asked about this way of doing it befere there were voices that prefered different approaches. I've just chosen this one. And since 15 out of 16 lexemes in that query were created by me it shows it is not generally accepted (or that everyone else is working with languages where this is not an issue or they don't see it as an issue). I am happy to see someone say it is correct.
Modelling sense of the word is IMO not comparable to common statements. I could come up with some way myself but if it is not accepted by others it would be useless. Thank you for all of your time and effort with me. --Lexicolover (talk) 21:18, 27 March 2020 (UTC)

Hi y'all,

For people who, like me, edit both Wiktionary and Wikidata Lexemes, here is a little gadget that add links to the lexeme(s) with the same lemma as the Wiktionary entry title of the page where you are : fr:wikt:Utilisateur:VIGNERON/LienLex.js (which is the gadget itself, you need to call it on your personal js page, like this fr:wikt:Utilisateur:VIGNERON/common.js). The links are added "Tools" pannel on the left of the screen.

It's a bit crude and slow (it's going through a SPARQL query, maybe the query could be improved or maybe there is a whole other way) but I think it could be useful so I'm sharing it here.

Every question, comment and remark is obviously welcome.

Cheers, VIGNERON (talk) 16:57, 26 March 2020 (UTC)

PS: special thanks to @Abbe98: (I stole the code from User:Abbe98/osm.js to begin) and @Darmo117: (for the JS debugging and improvements).

@VIGNERON: I get this message from the web console: The resource from “https://fr.wiktionary.org/w/index.php?title=/Utilisateur:VIGNERON/LienLex.js?action=raw&ctype=text/javascript” was blocked due to MIME type (“text/html”) mismatch (X-Content-Type-Options: nosniff). --Vriullop (talk) 14:08, 27 March 2020 (UTC)
@Vriullop: thanks for the feedback but I'm not sure to understand: I've got no error in my console (not in Firefox nor in Chrome). I'll try to look into it. What web browser do you use? Could it be a conflict with some other gadget? My bad, I'm a bit dumb! Obviously, the URL should be "https://fr.wiktionary.org/wiki/Utilisateur:VIGNERON/LienLex.js?action=raw&ctype=text/javascript" and not just "/wiki/Utilisateur:VIGNERON/LienLex.js?action=raw&ctype=text/javascript" (which only works locally on the French Wiktionary as the script is on the same project). Cheers, VIGNERON (talk) 14:36, 27 March 2020 (UTC)
@VIGNERON: I tried with wikt:ca:Special:Permalink/1539621. I have removed all gadgets in my preferences and it doesn't work for me with Firefox nor Chrome. I get still the same message in the web console. --Vriullop (talk) 11:49, 28 March 2020 (UTC)
@Vriullop: and if you put explicitly the prefix https:// instead of just // does it work? Cheers, VIGNERON (talk) 12:44, 28 March 2020 (UTC)
Return to the project page "Lexicographical data/Archive/2020/03".