Open main menu

Wikidata talk:Lexicographical data









Support for Wiktionary


How to help






Wikidata:Lexicographical data

Lexicographical data
Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2018/11.


Query Lexemes in the Query ServiceEdit

Hello all,

Graph of Lexemes derived from L2087

I’m very happy to announce that another important feature for Lexicographical Data has been deployed: the ability to query Lexemes in the Query Service.

Here are a few examples:

The queries are based on the RDF mapping that you can find here. Feel free to help improving the documentation, so people can understand how to build queries out of Lexemes.

Thank you very much to Tpt who’s been doing a huge part of the work by mapping Lexemes in RDF, and Smalyshev (WMF) who made the RDF dumps available and integrated in the Query Service.

Feel free to play with it, bring some of these ideas of queries to life, and let us know if you find any issue or bug. These can be stored as subtasks of this one on Phabricator. If you have questions, you can also ping Stas onwiki or on IRC.

Cheers, Lea Lacroix (WMDE) (talk) 08:06, 16 October 2018 (UTC)

Many thanks to all involved. That's great news and a lot of testing to do :) KaMan (talk) 08:33, 16 October 2018 (UTC)
  • Good work. BTW, there seems to be a licensing incompatibility with some of the schemes referenced in the triples. Can they be replaced with "wikibase:". Makes writing queries easier too. --- Jura 10:21, 16 October 2018 (UTC)
    • @Jura1: Could you explain a bit more about licensing? Smalyshev (WMF) (talk) 17:55, 16 October 2018 (UTC)
    • WMDE wanted lexemes to be CC0. If you are adding a primary mapping for key features of them to a scheme that isn't, somehow they fail that objective. Supposedly, you could still add it as a secondary mapping. It might also limit re-use of the software outside WikiMedia. --- Jura 13:15, 17 October 2018 (UTC)
      • @Jura1, Smalyshev (WMF): Are you talking about ontolex? The file at states a licence (dc:rights) value of CC-Zero, so it's fine. ArthurPSmith (talk) 15:17, 17 October 2018 (UTC)
        • That seems to be contradicted by statements elsewhere. Using the standard wikibase: seems preferable. --- Jura 15:24, 17 October 2018 (UTC)
          • (citation needed). Using a common standard makes federated querying easier and so is preferable to using custom URI's if the meaning is the same. ArthurPSmith (talk) 16:10, 17 October 2018 (UTC)
            • Supposedly you read Lea's announcement about not using others'. Obviously, it would have been easier to this over at Wiktionary, especially I came to the conclude that the French one is actually fairly complete. --- Jura 16:18, 17 October 2018 (UTC)
              • You said this added a "primary mapping for key features of [lexemes] to a scheme that isn't [CC0]". What scheme is not CC0 in the new mapping? As I just linked, ontolex is definitely CC0. ArthurPSmith (talk) 17:21, 17 October 2018 (UTC)
    • @Smalyshev (WMF): forgot to ping you. Can we go ahead and change this. If we do it now, it's still fairly easy to update things. --- Jura 10:49, 18 October 2018 (UTC)
      • @Jura1: It's no easier to do it now than in any other moment in the future, however I am not sure why do it. Technically it is possible, sure, but why? ontolex: is a standard ontology, which is used in structured data word and would make it easier to integrate with non-wikibase resources. For querying, there's zero difference between them - it's just class name. I'm still not sure what's the license problem - seems to be CC0. It certainly can be changed, technically, but I'd like to understand the argument why. Smalyshev (WMF) (talk) 05:32, 19 October 2018 (UTC)
        • @Smalyshev (WMF): have a look at [1]. It's not really my role to check this, so you might want to ask the relevant staff. In any case, it doesn't seem to meet the intent of WMDE lexeme namespace proposal (maybe @Lydia Pintscher (WMDE): wants to clarify). For you, the effort may be the same, but for users it's better to fix this now than later. Even if a detailed study may find it to be compatible, I don't think there is much to be lost by making the prudent choice now. --- Jura 21:07, 23 October 2018 (UTC)
          • I've spend some time looking into this and for all I can tell we're fine based on among others --Lydia Pintscher (WMDE) (talk) 03:54, 26 October 2018 (UTC)
            • @Lydia Pintscher (WMDE): thanks for looking into this. The odd thing is that it's contradicted by the more detailed documents (the one I linked above and the full description of the framework). So the statement there might apply to the file, but not to the framework. Accordingly, users of Wikibase or Wikidata might run into problems assuming that it may be. Is there any downside in switching the prefixes? --- Jura 11:05, 26 October 2018 (UTC)
  • This is great! I created a page using the Wikidata list template to automatically do a few stats: Wikidata:Lexicographical data/Statistics/AutoGenerated. ArthurPSmith (talk) 14:15, 16 October 2018 (UTC)
    • @ArthurPSmith: Why did some numbers go down here and here? I don't see any recent activity that would explain it. Older versions seem to exhibit similar behaviour. --Njardarlogar (talk) 13:31, 27 October 2018 (UTC)
      • @Njardarlogar: and up and down again more recently. That's odd. I'm guessing possibly one of the servers responding to queries is missing some data? @Smalyshev (WMF): do you have any ideas why things might not be consistent from one day to the next? ArthurPSmith (talk) 14:50, 29 October 2018 (UTC)
  • @Lea Lacroix (WMDE): Hmm - to Jura's point just above, when I query using wikibase:Lexeme or wikibase:Form I get nothing, but using the ontolex types I find everything. It looks like the export doesn't quite match what is stated in mw:Extension:WikibaseLexeme/RDF mapping? ArthurPSmith (talk) 14:45, 16 October 2018 (UTC)
    • @ArthurPSmith: this is intentional, for performance reasons we only keep one class. The dump and RDF export have both. Smalyshev (WMF) (talk) 17:55, 16 October 2018 (UTC)
      • Ah, ok I guess that's fine. I'm running into an issue with query timeouts in lexemes, not sure why it should be happening since we don't really have many yet - do you want to hear about it?... ArthurPSmith (talk) 17:58, 16 October 2018 (UTC)
      • And whatever the performance problem was seems to have resolved - or maybe I just changed the query enough to get it to work now, but it's quite fast. I've added a number of specific examples to the Wikidata:Lexicographical data/Ideas of queries page.

Danish missing genitiveEdit

Danish (Q9035) is said to have no genitive case (Q146233), see, e.g., [2] (in Danish (Q9035)). Nevertheless, in Danish (Q9035) an -s is added to the end of a word in the case similar to English (Q1860): "en uges varsel" -> "one week's notice" (no apostrof in Danish (Q9035), the basic form is "uge"). My question is if forms such as "uges" should be added to Wikidata? And if yes, what grammatical feature can we associate with these words. I was orginally made aware of this issue by @Rua: [3]. — Finn Årup Nielsen (fnielsen) (talk) 17:57, 19 October 2018 (UTC)

  • If it's not a canonical genitive, but an actual genitive, I'd still use the item. Obviously, if Danish has some word for the form or case, use that instead. If not, you could always make a descriptive item, e.g. "Danish s-form". It's likely that we end up with forms that can be attested, grouped by feature, but that aren't described by every print dictionary. --- Jura 18:10, 19 October 2018 (UTC)
  • I think there is a certain difference between school grammar, which I think is heavily simplified and tries to map latin grammar onto other language (obviously citation needed, just my opinion) and the way language uses certain constructs. School grammar teaches that German has 4 cases. But we have for example "in" (inessive case (Q282031)) and "heraus", which I would argue can be a word that exists only in elative case (Q394253). Cases should be viewed perhaps than more than just forms of nouns? (Disclaimer: I am not a language scientist). --Tobias1984 (talk) 18:55, 19 October 2018 (UTC)
  • As I mentioned, it's no different from the English possessive "'s", which can attach to any word that appears that the end of a noun phrase, not just to nouns. This makes it a clitic, rather than a case. Otherwise, we'd have to start adding genitive forms to everything, from prepositions to even verb forms! —Rua (mew) 19:49, 19 October 2018 (UTC)
  • I would definitely add "uges" as a form but I'm not sure for the feature. What is clear to me is that (almost) all attested lemmata are admissible and should be documented, whether there considered correct or not. Cdlt, VIGNERON (talk) 08:54, 20 October 2018 (UTC)
    So would you include meds ("with's") as a genitive form of the preposition med? —Rua (mew) 10:17, 20 October 2018 (UTC)
    • That is an interesting question, but I think it's more about what to do when words aren't used in their main lexical category. --- Jura 10:22, 20 October 2018 (UTC)
    @Rua: I don't speak Danish, so if one day I find meds in a Danish books, yes, it would be useful to also find meds in Lexemes. I'm not sure how to structure it (it depends largely on the references) but I'm sure I want to find it. Cdlt, VIGNERON (talk) 10:40, 20 October 2018 (UTC)
    English Wiktionary decided to treat words with clitics as sum-of-parts and thus not includable. The Latin suffix -que was given the same treatment. —Rua (mew) 15:15, 20 October 2018 (UTC)
  • So where does that leave us? Should I erase the s-form words from the lexemes that I have already entered? Should I leave them as they are, just not enter new s-forms? Should we add s-form but not call them "genetive"? My initial thought for adding the s-form was to ease computational lookup of the form, e.g., would "hus" be hus (L1111)-F1 or hu (L31704) in a s-form? As far as I can see Danish dictionaries do not include the s-form. — Finn Årup Nielsen (fnielsen) (talk) 17:30, 22 October 2018 (UTC)
    • I think they should be removed, for the reasons I outlined above. Any word can have this -s, so a word parser just has to be aware of this possibility. It can be compared to the aforementioned Latin -que and -ve, which could also be theoretically present on any word. Finnish -kin, -kaan -ko, -han, -pa and others are also good examples of such clitics. The Finnish example is especially illustrative of the mess we get into if we start including them as forms: Finnish nouns not only have 15 cases, but also 6 possessive forms for each of those cases, giving 105 forms per noun. Now consider that all of these forms could appear with a clitic, or even multiple clitics combined, and you get a combinatorial explosion. This is not relevant for Danish of course, but I do think we should be consistent in our treatment of clitics across languages. —Rua (mew) 10:36, 23 October 2018 (UTC)
    • I'd keep them for Danish nouns. I have no opinion on features that may resemble them in some framework for fi-lexemes. --- Jura 20:52, 23 October 2018 (UTC)
      • But why only for nouns? —Rua (mew) 10:44, 24 October 2018 (UTC)
        • I guess that the genitive/s-form is used most often on nouns. In noun phrases, the last word would most often be a noun. nominalized adjective (Q4683152) exists in Danish (example with gammel (L31494): "de ældres helbred", "the old's (the old people's) health"/"health of old people") or with a verb used as a noun (vælge (L32432): "de valgtes roller", "the elected's role"/"the role of the elected (people)"). I have a hard time coming up with examples for preposition (Q4833830) (such as meds, with's given above). There is a few pronoun (Q36224): The Den Danske Ordbog (Q1186741) notes the -s as genitive for "det" [4], otherwise I have not run into dictionaries listing genitive of Danish (I suppose that might also be because it is entirely predictable). — Finn Årup Nielsen (fnielsen) (talk) 12:50, 24 October 2018 (UTC)
  • @Lucas Werkmeister: I see that the "svenskt substantiv" forms in lexeme-forms ([5] [6]) include Swedish genitive. I imagine that the same question arises for Swedish as for German?
    • @Fnielsen: Whether Swedish has any cases in the first place is a matter of some dispute as well. I think sv:Kasus#Svenska explains it well. The conclusion is that a two-case system is the most traditional view, and that Svenska Akademiens grammatik supports the genitive as a case. --Vesihiisi (talk) 13:05, 24 October 2018 (UTC)
      • The situation for the genitive in Swedish and Norwegian is exactly the same as it is for Danish and English. It is a clitic that can attach to any word. See w:Swedish grammar#Genitive and w:Norwegian language#Genitive_of_nouns. The Swedish example given in the article is particularly illustrative. If the genitive were a true case, then in Konungen av Danmarks bröstkarameller the word "bröstkarameller" would be modified only by "Danmark", so the phrase would mean "the king of the cough drops of Denmark". But it's modified by the entire phrase "Konungen of Danmark", which means it's a clitic. —Rua (mew) 15:26, 24 October 2018 (UTC)
        • The statement that the -s can attach to "any" word is a wild exaggeration. Cases like "kongungen av Danmark", where this makes any difference, are really rare, and it is less than a century ago when the only correct form was considered to be "konungens av Danmark bröstkarameller" with the -s on the noun in question. Furthermore, even when the -s moves to Danmarks, it is still on a noun, not on any kind of word. Sentences where you try to attach an -s to the end of a phrase such as "hästen jag rider på" (the horse I ride on) are frowned upon by the vast majority of native speakers (above the age of five). Thus, "pås" is not a Swedish word. The -s simply does not attach to a preposition (you can try, but it does not stick), only to nouns. I'd say Rua is wrong. But why not give Rua a chance to rule how Lexemes in Wikidata should work! That would make the whole project fail, and so the classic Wiktionary will be victorious. I'm not at all against this development. Go ahead! --LA2 (talk) 21:25, 30 October 2018 (UTC)

@LA2, Rua, Vesihiisi, Jura1, VIGNERON, Tobias1984: If we do not add the s-form then there is an issue when referring to a form. broderskab (L34118) has "De er udstyret med fornuft og samvittighed, og de bør handle mod hverandre i en broderskabets ånd." as a usage example (P5831) and could use demonstrates form (P5830) but what should it refer to? L34118-F1 (broderskab)? — Finn Årup Nielsen (fnielsen) (talk) 08:24, 30 October 2018 (UTC)

Senses before formsEdit

Senses are more important for identifying the word and are generally what people are more likely to look for than forms. Can they be listed before forms? This is especially important for words with dozens of forms. —Rua (mew) 12:09, 21 October 2018 (UTC)

I agree with that. Because of it as a temporary solution for myself I wrote small script to quickly jump to senses from ToC at the top of the page: User:KaMan/ToC_to_lexemes.js. KaMan (talk) 12:57, 21 October 2018 (UTC)
The informations like the grammatical type of the lexeme are nethertheless very important as we have several lexemes for the same string (one example for the same string with different grammatical gender, see L:L2330 and L:L2332), so these informations may be important for disambiguation and to avoid mistakes. (By the way, I noticed that there did not seem to be differences in the type, gender and forms of twe homographs, L:L2331 and L:L2332, why are they different lexemes ?) author  TomT0m / talk page 18:41, 21 October 2018 (UTC) removed lates this argument as the discussion below proved it’s not really founded. Semantic is carried by Lexemes in « Lexical semantics », so senses are essential to identify an item. author  TomT0m / talk page 11:57, 22 October 2018 (UTC)
They might differ in information that hasn't been provided yet. —Rua (mew) 19:44, 21 October 2018 (UTC)
@TomT0m: exactly as Rua said (etymology for instance, you can already see the reverse etymology for tour de Babel (L474) or tourneur (L2334)), this 3 lexemes "tour"@fr has been discussed a lot already, look at the past discussions (Special:WhatLinksHere/Lexeme:L2330). In the other way round, do you know even one source who says it's one lexeme? (*all* the dictionaries describe them at least as 2 lexemes, often 3 lexemes). Cdlt, VIGNERON (talk) 09:57, 22 October 2018 (UTC)
Sorry for the naive question, I don’t follow very closely this page :) I must admit I find articles such as fr:Lexème_(linguistique) as it’s full of specialized terms and infinite nuances and lacks of example. Maybe we need a Help:Lexeme page which sum up the current discussions, give examples and details the definition and practices used on Wikidata for the layman ? I’m just searching intel on guidelines right now so I looked on the data model to see if there was basic definition and I see that the Lemma definition does mention that in mw:Extension:WikibaseLexeme/Data_Model#Lemma it’s written
Two distinct lexemes with the same lexical category can exist in the same language if they have different morphology, that is, different forms.
that does not mention the etymology on lexeme, so it’s inconsistent with actual use of the page. My own intuition would have suggested that etymology is tight to senses, but it’s just me :). author  TomT0m / talk page 10:36, 22 October 2018 (UTC)
@TomT0m: no problem, it's always better to ask. It is hard to explain simply what is a lexeme, like it's hard to explain what is a concept for items. For a very simplistic approach: a lexeme = a word. An more precise approach would be: an entry of dictionary (usually one lemma only has one entry but sometimes there is thing like « 1. tour and 2. tour » when one lemma is bore by several lexemes, see "tour" in the TLFi). And more technically: an entity with specific informations, including but not limited to morphology.
« etymology is tight to senses », yes but more exactly « etymology is tight to senses of a word ». Anyway, "tour"@fr is a weird exception were several homographs with similar informations are different lexemes, don't focus too much on it (and just look at references ;) if dictionaries says it's two lexemes, just follow them).
Cdlt, VIGNERON (talk) 11:17, 22 October 2018 (UTC)
@VIGNERON: I understand that practical approach but that does not really answer my question :) I finally took the approach of browsing the enwp article, and I found that the en:lexeme is defined by a field called en:Lexical_semantics which actually takes into account the meaning of the different lexical entity and carries semantics, so that explains the fact a little bit more. author  TomT0m / talk page 11:51, 22 October 2018 (UTC)
If there is general agreement that we should switch around Senses and Forms I'm happy to do that. Some more opinions please? --Lydia Pintscher (WMDE) (talk) 03:48, 26 October 2018 (UTC)
I   Support putting senses before forms (but after statements) on the page for a lexeme; people looking for a particular word should be informed quickly if they've gone to the wrong place. ArthurPSmith (talk) 15:13, 26 October 2018 (UTC)
Alright. I've opened phabricator:T208592 for it. --Lydia Pintscher (WMDE) (talk) 15:06, 2 November 2018 (UTC)

It's already live, though in older lexemes it appears after purge of page or some edit and refresh of the page. KaMan (talk) 10:09, 8 November 2018 (UTC)

Possible split on lexical category. Merging?Edit

common noun (Q498187) and common noun (Q2428747) seems to be about the same thing as I understand. Can someone acknowledge that they are the same? The Russian Wikipedia has two concepts [7] [8] though, so it cannot be merged directly. I think that Апеллятив should maybe has its own Wikidata item and the rest of the Wikipedia language links should be merged? I note that the item with the highest ID number has the most linked lexemes. I am not sure that the merge bot works on lexemes? — Finn Årup Nielsen (fnielsen) (talk) 21:20, 22 October 2018 (UTC)

I've seen KRbot fix some merge issues with lexemes before - specifically, various versions of "present participle" were merged together (see the archives of this page), and there were a lot of old lexemes that needed fixing. But that is probably new functionality and may be still in development. ArthurPSmith (talk) 18:16, 23 October 2018 (UTC)
I am not referring to the technical issue of merging, but rather the issue of whether they are the same, and if the are, then why are there two Russian articles? We would need some users with an understanding of Russian. — Finn Årup Nielsen (fnielsen) (talk) 08:26, 30 October 2018 (UTC)

Vote: Do we allow phoneme in the Lexeme namespace?Edit

Hello, from this discussion and this one, I understood that there is a relative consensus on the fact that we do not allow storing phonemes in the Lexeme namespace. Thus, I added this fact in Wikidata:Lexicographical data/Notability. However Jura1 (talkcontribslogs) considers that the discussion is not over. Because, I think it is (nobody write new message on that topic for a while), I propose to vote in order to validate this point. @KaMan, Nikki, VIGNERON, Njardarlogar, Rua, Infovarius: @Jura1, ArthurPSmith, Circeus, Lexicolover: I ping you because you participated to previous discussions on that topic.

Do we allow phoneme in the Lexeme namespace?



  • Phonemes have to be store in the Q-namespace. Pamputt (talk) 07:46, 25 October 2018 (UTC)
  • Despite asking several times, there as no explanation, justification or reason why the Qitems are not enough. VIGNERON (talk) 07:57, 25 October 2018 (UTC)
  • IIRC I already told it twice, phonemes are not lexemes. KaMan (talk) 11:26, 25 October 2018 (UTC)
  • None of the lexeme features (language, lexical category, forms with grammatical features, etc) apply to phonemes, as far as I am aware, so I don't see any benefit to putting them in Lexeme namespace at all. ArthurPSmith (talk) 15:00, 25 October 2018 (UTC)
  • Rua (mew) 15:58, 25 October 2018 (UTC)
  • Phonemes are not lexemes. Although it is theoretically possible to store non-lexeme phenomenons as L-entities, such a practice is quite problematic and should be only allowed, if there is a strong reason for it. The same applies to graphemes, btw.--Shlomo (talk) 06:40, 26 October 2018 (UTC)
  • --Njardarlogar (talk) 09:31, 29 October 2018 (UTC)


  • @Pamputt: As you don't contribute actively to lexemes on Wikidata, it's not clear how this would affect you and why you'd vote on this. Maybe you could outline the problem you are trying to solve. What alternatives do you propose? What is the urgency of this point? I find your overall posts to this page rather nonconstructive (who would start topic called "L21070_should_not_exist" to seek positive input?). Please avoid breaking things. --- Jura 07:55, 25 October 2018 (UTC)
  • @Pamputt: What now with this voting results? KaMan (talk) 09:45, 9 November 2018 (UTC)
    @KaMan: I wait until this section is archived and I will add a section to specify explicitly that phonemes and graphemes are excluded from the Lexeme namespace with a reference to this vote (that is why I need the discussion is archived in order to use a perennial link). Something similar to this. Pamputt (talk) 12:25, 9 November 2018 (UTC)

Linking with WiktionaryEdit

Now we have nearly all main features (Forms, grammatical categories, Senses, translations...) I would like to have links to Wiktionary pages (wasn't it the main idea to have Wiktionary repository in Wikidata?). When (and in which form) is it planned? --Infovarius (talk) 15:11, 25 October 2018 (UTC)

@Infovarius: Wiktionary URLs are based on lemmata so it's trivial to generate a link (and one fitting your needs, either the main lemma or a specific form, not senses as wiktionaries don't have a structures for senses, and some Wiktionary have different structure for anchor link to languages and section inside a page). Someone from the Lexeme team can confirm (or infirm) but I remember that there is no plan to explicitely store link to wiktionary on Wikidata (some people did with some hack around this though). Cdlt, VIGNERON (talk) 16:26, 25 October 2018 (UTC)
That's exactly the problem. Wiktionary pages are based on lemmata, so that one page can contain several lexemes with the same lemma. Wikidata lexicographical pages are based on lexemes and one page can contain several lemmata. Meaningful linking is in this situation surely not trivial. It could be done via statements (on WD side) and wikilinks in appropriate sections of each wiktionary, but I can't imagine the maintenance of such system.--Shlomo (talk) 05:45, 26 October 2018 (UTC)
Yes, there are several lemmata in "each" Wiktionary page, so I suppose we should have some template (using Lua access to WD Lexemes) in each section of it. But inversely, most Lexemes should correspond to unique Wiktionary page so it should be simple to add such linking. --Infovarius (talk) 08:07, 26 October 2018 (UTC)
  1. Many Wikidata Lexemes (L-items) have multiple lemmas and may (or may not) correspond to several Wiktionary pages. We don't know.
  2. The fact that there is a Wiktionary page with the name corresponding to a lemma doesn't mean, the Wiktionary page contains section with a lexeme corresponding to the Wikidata L-item.
--Shlomo (talk) 09:16, 26 October 2018 (UTC)
Pardon, Shlomo? Can you please provide an example of Lexeme which have multiple lemmas? I cannot imagine it... --Infovarius (talk) 12:12, 29 October 2018 (UTC)
Sure. Check these: color/colour (L1347), colour/color (L791), מזל/mazal (L12373), вода/voda (L2068), 大きな/おおきな (L661), מֵם/מים (L8305), ном/ᠨᠣᠮ (L7957).--Shlomo (talk) 16:45, 29 October 2018 (UTC)
And this is just for the main lemma (which has just an indicative value), each forms is a different Wiktionary entry and each senses is a different section of these entries. I'll try to make a schema to make it more clear on how linking is complex here. Cdlt, VIGNERON (talk) 11:51, 31 October 2018 (UTC)
We have no "forms" in ru-Wiktionary as separate so it's not a problem. Links to sections are not necessary too. But "color/colour" is the problem, yes. May be to suppose they are not numerous and just to ignore them?.. Infovarius (talk) 14:35, 1 November 2018 (UTC)
I created phab:T195411 a while back asking for a special page which could be used with Cognate to make it possible to navigate between Wikidata and Wiktionary. I get the impression the developers aren't convinced though. - Nikki (talk) 11:28, 26 October 2018 (UTC)
  • Yes, I think it's time to add them. Aren't they already all centrally stored? So one could easily display at least the ones leading to the Wiktionary in the same language. --- Jura 18:39, 25 October 2018 (UTC)
  • I've been somewhat disappointed that enwiktionary seems to have pretty much everything I look at covered already pretty well. However, I just worked on fencing (L33095) where I realized the number of senses one gets from wikidata items with that label is quite a bit more than what enwiktionary had. So there's hope that we actually can be useful beyond interlanguage linking :) ArthurPSmith (talk) 20:59, 25 October 2018 (UTC)
To answer Vigneron's question: there is no plan for a specific development to add Wiktionary links, as it can easily be covered by statements. This would also mean that we don't have to follow the 1-n rule of the Wikipedia interwikilinks. Depending on how you decide to model it, a Lexeme could link to several Wiktionary pages, and several Lexemes could link to the same Wiktionary page. Lea Lacroix (WMDE) (talk) 09:37, 26 October 2018 (UTC)
Sounds good. Thanks for the quick reply. --- Jura 10:51, 26 October 2018 (UTC)
Could you explain what you have in mind when you say it can be easily covered by statements? I can't see any sane way of doing it. - Nikki (talk) 11:28, 26 October 2018 (UTC)

In the FAQ, the very first question is "Why will this project be useful for Wiktionary editors?" and the reply talks about Indonesian Wiktionary populating Estonian words from Wikidata. Was that just a dream, or has anything like that been implemented? Maybe the reply should be changed into something that resembles actual reality? --LA2 (talk) 17:44, 26 October 2018 (UTC)

@LA2: as far as I know, nothing has been implemented yet (no surprise there, Lexemes are still at an early stage) but this section and the example still is true, it's up to the wiktionaries to decide to use Wikidata data (or not). Cdlt, VIGNERON (talk) 17:42, 27 October 2018 (UTC)
Yeah, I had it in mind! I thought that the initial plan was to have "central repository of lexicographical data" for Wiktionaries. But how can Wiktionary use it if they are not linked to each other? --Infovarius (talk) 12:12, 29 October 2018 (UTC)
@Infovarius: true, explicit links can make our life easier but per se, they are not needed to reuse Wikidata data. For proof, see all the templates on Commons who explicitly call for a specific Qid, Commons:Template:Creator or commons:Template:Artwork for the more common examples. Cheers, VIGNERON (talk) 11:51, 31 October 2018 (UTC)
  • It would be good to set up a version of Wikibase lexemes for Wiktionaries that are interested in having structured data without having to pay a high price. --- Jura 08:07, 30 October 2018 (UTC)

Dj Pava Hm MusicEdit

Lexeme:L34111 @Lea Lacroix (WMDE): What should we do with items accidentally created in lexeme namespace? (create lexeme is next to create item link in sidebar). Just request deletion or something else? KaMan (talk) 11:43, 30 October 2018 (UTC)

We currently have no tool or process to transform a Lexeme into an Item. So I guess request deletion is the safe way to go. Ideally, also write to the user and indicate the correct link. Lea Lacroix (WMDE) (talk) 12:30, 30 October 2018 (UTC)

A game with German articlesEdit

Hello all,

I just wanted to let you know about a game that that I developed on my volunteer capacity for the Wikidata birthday. DerDieDas is using lexicographical data to present German nouns and let you guess its grammatical gender. It's an idea I put in the ideas of tools a while ago, so I was happy to experiment with querying Lexemes and parsing the existing data :)

This is just a prototype to show what is possible to do with structured lexicographical data. The game doesn't have a lot of features and some issues may occur, but I hope it will give other people ideas to continue in this direction.

Also note that I adapted it for French, and the results are reflecting the current state of the French nouns in Wikidata (a lot of nouns ending with -ion were created, and they are currently very present in the game).

Cheers, Léa Auregann (talk) 13:11, 30 October 2018 (UTC)

And now a Danish Version by fnielsen, don't hesitate to make your own ;) (I'm a bit jealous it can't really be done in Breton :/ ). Cdlt, VIGNERON (talk) 11:13, 31 October 2018 (UTC)

Extensions of OrdiaEdit

The Toolforge tool Ordia at has now been extended. There are overviews of languages, lexemes, forms and senses. There is also a text-to-lexeme functionality though currently only enabled for four languages. Example: [9]Finn Årup Nielsen (fnielsen) (talk) 20:39, 1 November 2018 (UTC)

  • Nice. I had done something similar (offline). Just lacks the select lexical category and create buttons. It seemed to time-out when I tried with a Wikipedia article. --- Jura 13:18, 4 November 2018 (UTC)
    • Rather than time-out I suspect it is a CORS problem I need to look into. I hope to extend Ordia with button for input. — Finn Årup Nielsen (fnielsen) (talk) 00:20, 8 November 2018 (UTC)

Relate Form to SenseEdit

The forms of Bank (L34723) f. in German are not identical in the sense of bench and bank (e.g. nominative plural in the sense of bench is Bänke in contrast to Banken). How should these forms be related to the senses? --Mfilot (talk) 21:30, 1 November 2018 (UTC)

They should be entered as separate lexemes. As well as different forms, they also have different etymologies. - Nikki (talk) 21:37, 1 November 2018 (UTC)
Ok, makes sense. I created a new lexeme Bank (L34791) for the financial institution and cleaned up the translation (P5972) (see Bank (L34791), banque (L15448), bank (L3354)). --Mfilot (talk) 22:16, 1 November 2018 (UTC)
  • I think we still need a way to indicate forms that may be linked to specific senses. --- Jura 13:16, 4 November 2018 (UTC)

Forms dependant on first letter of the following wordEdit

What grammatical category should we use, or how should we tag forms that are dependent on the first letter of the following word? For example: the English article 'a' has the form 'an' if followed by a word beginning with a vowel; and the English prefix 'in-' has the form 'im-' if it is followed by a base word starting with 'p' or 'b', etc. How should we / should we distinguish such forms? Liamjamesperritt (talk) 01:09, 4 November 2018 (UTC)

I have such cases in Polish too. For example nad (L14478) which can have forms "nad" and "nade" depending on next word. I mark them with vocalic form (Q55082724) and non-vocalic form (Q55082712) as it is taged in Grammatical dictionary of Polish (Q55214514) used by me as source of grammatical forms. KaMan (talk) 07:13, 4 November 2018 (UTC)
  • Normally, we would list the required forms somewhere. Maybe with the same as for Latin/French? Wikidata:Property_proposal/requires_form. --- Jura 12:04, 4 November 2018 (UTC)
    • Interesting idea. I left some thoughts about the "requires form" property on the proposal page, as I think something like this could solve the problem. Liamjamesperritt (talk) 00:18, 5 November 2018 (UTC)


I have been adding translation (P5972) to Dienstag (L6818) in the sense of day of the week. Currently (12:37, 4 November 2018 (UTC)) these are 16 translations. From these senses a translations points to the sense on the German sense object resulting in 272 translations entries. Of course this is a bit redundant, but this is the way it is intended to work, isn't it? I'm aware that most senses have item for this sense (P5137) pointing at Tuesday (Q127), but this might be cumbersome when looking for a translation taking the path over the Q-item instead of translation (P5972). --Mfilot (talk) 13:03, 4 November 2018 (UTC)

  • Maybe the idea is that people focus on a limited number of language pairs for an unlimited number of lexemes rather than the opposite.
    The layout for these could be improved. The language should be visible at least as (language) code and the gloss could fall back to the language of the lexeme (sorry for the digression). --- Jura 13:16, 4 November 2018 (UTC)
  • That is a good thought to focus on limited number of languages which will happen anyway since for most lexemes it will not be that easy to identify the translations. I agree that the language of the translation (P5972) should be visible, and a more compact layout would also help. Some translations in Dienstag (L6818) display as code instead of label e.g. L34322-S1 instead of вівторок. Is this related to my language settings? --Mfilot (talk) 13:47, 4 November 2018 (UTC)
  • It happens if there is no gloss in your interface language. With "code" I had in mind the language code (rather then id of the sense). I'm not even sure if it's a good idea to show the gloss in the interface language rather than the language of the linked lexeme. Maybe that should be a setting in preferences. --- Jura 14:01, 4 November 2018 (UTC)
Don’t we have items to represent meanings ? This seem like the same situation than the interwikis on the pre-Wikidata era on wikipedias. My understanding was that « item for this sense » had the same role of a central item to represent a meaning each of the exact meanings for that item would connect to. This would avoid the translation explosion, and the exact translation pair are easy queryable.
< ?sense of a word in french > item for that sense search < Wikidata item A >
< ?sense of a word in english > item for that sense search < Wikidata item A >
. author  TomT0m / talk page 21:43, 4 November 2018 (UTC)
  • It seems redundant to have item for this sense (P5137) and translation (P5972) together in the same sense. Would it be possible to have a constraint that only one of those properties should be used for each sense? When the number of translation (P5972) statements grows beyond a certain number, a new item could be created for that sense to act as a hub for all languages. In my view a "sense" is similar to a Qitem that is embedded in a lexeme because it is convenient, but we shouldn't fear creating new Qitems when necessary.--Micru (talk) 18:44, 5 November 2018 (UTC)
  • @Micru: I've added a lot of senses (for English) - it seems relatively easy to find item for this sense (P5137) when the lexeme is a noun, but we almost never have existing Q items for other parts of speech (verbs or adjectives most commonly). Are we comfortable with adding Q items for verbs and adjectives? ArthurPSmith (talk) 20:05, 5 November 2018 (UTC)
  • @ArthurPSmith: I'm inclined to think that verbs and adjectives meet our notability criteria since they refer to clearly identifiable entities (specially if they exist in several languages) and they would fulfill a structural need. Of course, I'm open to hear more arguments about this.--Micru (talk) 21:16, 5 November 2018 (UTC)
General comment: implementing sense entities still seems like a cleaner approach to me than putting senses for verbs, adjectives, adverbs etc. in the main namespace. It should also have a lot of benefits, like making it much easier to search through existing senses. We would also not have to worry about the notability policy for the main namespace; if one lexeme uses a sense entity, that should be good enough to keep it. We could also hard-code certain sense-specific behaviours, like making it possible to set the narrowest hyperonym (supersense, like blue for light blue) in a special field on the entity rather than using a property, and listing up all the widest hyponyms (subsenses) as well to aid navigation. --Njardarlogar (talk) 21:18, 5 November 2018 (UTC)
@Njardarlogar: How is a "sense entity" different than a regular item? --Micru (talk) 22:06, 5 November 2018 (UTC)
@Micru: I don't think of the extent to which a sense entity would be different from an item as the most important point, but rather that it is actually a separate type of entity with its own namespace. Having senses as their own entity type does mean that we can tailor them to their specific purpose, like I suggested above, and which I think potentially could get quite useful; but I don't think it is the most important reason.
Simply by keeping senses separate in their own namespace, we get
  • improved experience with the user interface: now we have to first enter a gloss, then select item for this sense (P5137) as the property to use, and only then can we select an entity. With sense entities, we would have a special field that would only accept sense entities as input and that would not require a gloss - just start searching among senses right away.
  • dedicated entities that do not contain content irrelevant for senses; for example, on a sense entity for a country, we would have no information on head of state, population, et cetera, et cetera. We could instead have a more prominent position for links to related senses, such as an inhabitant of the country, its language(s) and so on
  • easier navigation among existing senses: the main namespace is mostly composed of entries that will never be used as senses
--Njardarlogar (talk) 13:12, 6 November 2018 (UTC)
@Njardarlogar: I understand your point, however I find that adding an additional namespace for senses would increase the complexity unnecesarily. 1) The feature that you suggest of a special field for senses could be thought to accept q-items as input, no additional namespace necessary. 2) Entering again the information for existing entities would mean duplication of effort for creation and maintenance. 3) There is nothing that indicates that adding an additional namespace would make navigation easier.
On the other hand, using items for senses where relevant doesn't require any additional infrastructure, we can start doing it now if there is the will.--Micru (talk) 22:14, 6 November 2018 (UTC)
@Micru: Regarding 1), you can still expect many irrelevant suggestions from the main namespace; items that will never be used on a lexeme for a sense. As long as we can e.g. set example lexemes on the sense entities (or have them generated automatically by the MediaWiki software), it must necessarily be easier to navigate a dedicated sense namespace than the main namespace because the main namespace is filled with irrelevant items. There would be duplication with sense entities, particularly for nouns; but it would likely not be 100% unless adverbs and similar concepts were included in the main namespace independently of their use by the lexicographical project. The essence of a sense should not change over time, so maintenance should mainly be about dated language in the descriptions/definitions (including altered classifications of the concept the sense corresponds to).
All that said, there is one potentially important difference between how we would use sense entities versus items: items are currently not supposed to have lengthy definitions, an item description is not meant for a dictionary definition but to be brief and act as a disambiguator. On senses entities, on the other hand, we could accommodate precise definitions and have a specific field for this purpose. Without definitions, the lexicographical project would be incomplete, surely. A property could potentially be used for this purpose on items, yes. --Njardarlogar (talk) 17:49, 7 November 2018 (UTC)
@Njardarlogar: Regarding irrelevant suggestions, that could be improved with a better suggester. From my experience it is not that bad, but if you have some examples of where the suggester was not offering you the items that you needed, then you should post them so that the developer team is informed.
As for the definitions, in the past I was under the impression that we need them. However the more I thought about it, the more I came to realize that the statements *are* the definition. For Wikidata it is not so relevant to come up with textual explanations of words (which btw normally have copyrights), that is the job of the wiktionaries, but what we can do is to transform those definitions into structured data (CC0).--Micru (talk) 12:24, 8 November 2018 (UTC)
  • @Micru: So far, all Wikidata items have been "conceptual or material entities", or some notable "thing". To start adding Q-Items for verb and adjective senses in mass would represent a substantial change to the essence of the Wikidata ontology. Since Wiktionary (and by extension, dictionary) entries are generally disallowed from entering Wikidata's Main namespace, it feels to me that entering senses of various lexical categories (verb, adjective, adverb, etc.) goes against that policy. A key reason the Lexeme namespace was created was to keep encyclopedic data separate from lexical data, and now we are coming to realise that lexemes alone are potentially insufficient to efficiently describe lexical data. Do we need to create another namespace to more fully attain to what the Lexeme namespace was meant to fulfil, or do we shift the usage of the Main namespace, and potentially go against the purpose of creating a separate lexical namespace in the first place? Either way, it seems to me that the current data model appears to be missing an important piece of the puzzle. Liamjamesperritt (talk) 21:28, 6 November 2018 (UTC)
  • @Liamjamesperritt: I have difficulties understanding why an adjective or a verb could not be considered a conceptual entity. In my view the definition of conceptual entity seems quite arbitrary and perhaps related to what can be considerated encyclopedic, which generally does not apply to Wikidata. It is true that individual dictionary entries are generally disallowed from entering Wikidata's Main namespace, because for that we have the lexeme namespace, however here we are talking about senses that are shared among a high number of languages. Even if we created an item for the sense "important", we still would need to create lexeme entities for each of the languages that have lexemes that represent that concept, because each language has its own peculiarities regarding pronunciation, use, etc. By allowing the senses of verbs and adjectives in the main namespace, we are not going against the purpose of creating a separate lexical namespace, because such q-items would be a complement to lexeme entities, not a replacement. As you say, we are missing a piece of the puzzle.--Micru (talk) 22:14, 6 November 2018 (UTC)
  • @Micru: Although I still feel that adding senses for verbs, adjectives and adverbs on mass would represent a substantial shift in the usage of the Wikidata Main namespace, if it is eventually concluded that such senses are valid Q Items, then I agree that this would be a nice solution to the problem, as we are already using Q Items to link noun senses. The next question is then: what would we make these items instances of / subclasses of? Or would we instead link them to their noun counterparts with a new property (e.g. "run" -> "running"; "beautiful" -> "beauty")? Or both? Liamjamesperritt (talk) 22:45, 6 November 2018 (UTC)
  • @Liamjamesperritt: It would definitely be a new practice to add items for senses, and as such it should be discussed thoroughly. I find your question about "instances of / subclasses of" too generic, because each one will have a different value, plus several other statements might help outline their meaning. I would say that "beautiful"<indicates quality>"beauty", about "running" I am not so sure because the item seems to conflate different concepts (sport, terrestrial locomotion), so probably it should be split.--Micru (talk) 00:00, 7 November 2018 (UTC)
So I've been reading up a bit on WordNet - - what they've done for English is sort of create the conceptual items we are talking about. See the section on "Relations" on that page for how they differently handle noun, verb, and adjective relations; the hierarchies or groupings are quite different. The total number of items that might need to be created for verbs, adjectives and adverbs could be estimated from their counts - I would guess it will be well under 100,000 (most of their synsets are nouns already). So I don't think it would be in any way a big burden on Wikidata to add these concepts as items, they'll be at about the 1/1000 level of items. ArthurPSmith (talk) 16:30, 7 November 2018 (UTC)
  • @ArthurPSmith: Very interesting. This idea of a "synset" does make a lot of sense. And with the current state of the Wikidata data model, it certainly seems as though Items would be the simplest choice for representing these concepts, especially if it is determined that it would not be a burden on the Main namespace. Perhaps there should be a discussion about whether this direction should be taken? Liamjamesperritt (talk) 02:48, 8 November 2018 (UTC)
I agree, we need more input on this. I have started a thread on the project chat.--Micru (talk) 13:48, 8 November 2018 (UTC)
I would say that the synsets for nouns are already the Q-items. At least that is how I have used it. For instance, tape recorder (Q213777) is linked to WordNet via and exact match (P2888). — Finn Årup Nielsen (fnielsen) (talk) 01:50, 13 November 2018 (UTC)

How to split or merge pronoun (Q36224)Edit

Some individual pronoun (Q36224) can either get their own lexeme or be grouped into one lexeme. Consider the English we (L483) which groups we, us, our, ourselves, while in Danish (Q9035) I have split vi (L35288) (basic form) and vores (L35289) (possessive pronoun). For Danish (Q9035), the dictionary Den Danske Ordbog (Q1186741) split these words/lexemes [10] [11]. I am unsure which way is the best. If we do not split, it seems that individual forms can be attached to different word classes. The same might go for the etymology where Danish (Q9035) vi/vor is based on vár/várr. — Finn Årup Nielsen (fnielsen) (talk) 17:23, 4 November 2018 (UTC)

In the Indo-European languages, many possessive determiners can inflect by themselves, unlike genitive cases which don't have any further inflections. This makes me think that they should be treated as lexemes in their own right. —Rua (mew) 20:19, 4 November 2018 (UTC)
I was surprised when I saw that "my" wasn't entered as a lexeme of its own. That's not what I was expecting at all, so I would be favour of splitting them for English. The online OED also has separate entries for them all. - Nikki (talk) 20:29, 4 November 2018 (UTC)
I think I took care of most of the pronouns in English; the grouping was based on my reading of how to handle lexemes in such cases, such as this question and discussion. There aren't many of them so I suppose they could be split - but then how do we link them properly to indicate they have such related meanings? ArthurPSmith (talk) 23:01, 4 November 2018 (UTC)
I don't think we have anything suitable right now, we don't have many properties for linking lexemes. - Nikki (talk) 09:37, 5 November 2018 (UTC)

Lexemes that are defined grammatically in terms of other lexemesEdit

One way that English Wiktionary avoids having to re-define words several times is by using special definitions that refer to another lexeme. For example, a word might be defined as the verbal noun or passive of some other verb. An example is the Northern Sami pair gávdnat (L35329) "to find" and gávdnot (L35330) "to be found", where the latter is a passive derivation of the former. Is there a way to give something like "passive of gávdnat" as the sense, instead of repeating all the senses of the base verb but in passive form? —Rua (mew) 20:17, 4 November 2018 (UTC)

Why should you regard them as different lexemes and not forms of one lexeme? --Infovarius (talk) 10:39, 7 November 2018 (UTC)
Because they are full lexemes in their own right. Passive verbs are full verbs, and have an infinitive and all the forms that any other verb might have. You can even derive new lexemes from one. That latter point is important, because we can only derive lexemes from other lexemes with our properties. —Rua (mew) 11:19, 7 November 2018 (UTC)
For example, gávdnon (L35767) "occurrence" derives from the aforementioned passive verb. —Rua (mew) 11:28, 7 November 2018 (UTC)

List of properties for Lexemes and List of Lexemes by languageEdit

Where can I find list of properties for lexemes? I think proporty with its use examples would be very helpful to understand how/where to use them. It is quite difficult to understand which propoerty should be used for what. List of Lexemes by language would be very helpful to find language specific lexemes and edit them. Regards,-Nizil Shah (talk) 05:16, 6 November 2018 (UTC)

@Nizil Shah: list of all lexeme related properties is here: Template:Lexicographical properties. Some of properties have Wikidata property example for lexemes (P5192) specified but if you have problem how to use some property just ask here. You can get list of lexemes in your language two ways. Easy one is to list all linkings to language item in lexeme namespace. For example here are all lexemes in Gujarati (Q5137) see here. Second method is to run query. Hope this helps. KaMan (talk) 11:17, 6 November 2018 (UTC)
@Nizil Shah: Besides of the template that KaMan has linked, there is also Wikidata:List of properties/linguistics, or you can also browse properties using Prop explorer. In any case you can also look at the showcased lexemes or already existing lexemes in your language and find inspiration there.--Micru (talk) 12:42, 6 November 2018 (UTC)
@Nizil Shah: You can also get a list of properties in Ordia: Årup Nielsen (fnielsen) (talk) 01:36, 13 November 2018 (UTC)

Understanding properties and data modelEdit

I am following the Lexeme project since its proposal. Over the years, the words used in data model and properties have became too technical to understand for new people as well as a person like me who have no linguistic knowledge. Sometimes I could not even understand a property and where to use them. I am working with small Gujarati Wiktionary and other Gujarati Wiki people who were waiting for Sense to add 200000 words from a public domain dictionary. Now we are stuck because it has became difficult to explain the data model and what from normal print dictionary should go where. Some technical things like "Gloss" is difficult to understand/explain. Broad and simple explanation in context of print dictionary will be a great help to people like us. We tried to map (which thing should go where) our public domain print dictionary to Wikidata Lexeme but we are stuck. The print dictionary has limited type of data in it. If we can map them, we might be able to create simple editing tool via OAuth to edit Wikidata Lexemes without confusing about too many things while editing. We could not even figure out Gujarati labels for properties and other technical labels due to lack of simple explanation. In short, people need simpler explanation in context of print dictionary because all editors are not linguists. Properties should be also explained this way with simple clear examples. Can we have it? If Wikidata Lexemes wants to attract editors, it need simplicity in explaining technical things. Regards,-Nizil Shah (talk) 05:47, 6 November 2018 (UTC)

List of your lexemes that need sensesEdit

This URL lists all the lexemes you’ve created until 18 October 2018, the date senses became available. (Ever since, you’ve always added senses to your new lexemes, right? 😉) Perhaps it’s time to add some sense(s) to them? --Lucas Werkmeister (talk) 23:15, 7 November 2018 (UTC)

The other way is to query for all lexemes without any sense in your preffered language: here is example for esperanto. KaMan (talk) 10:00, 8 November 2018 (UTC)

Forms that also have idiomatic meaningsEdit

How are cases handled where a form has acquired meanings that can't be predicted grammatically from which form it is, but are idiomatic to that particular form? An example that comes to mind is English broken, which has meanings that don't follow from it being the past participle of break. —Rua (mew) 17:30, 10 November 2018 (UTC)

I would say create separate lexemes for new set of form(s) of new meaning(s). KaMan (talk) 18:07, 10 November 2018 (UTC)
How would you define its etymology? —Rua (mew) 18:34, 10 November 2018 (UTC)
With derived from form (P5548) of derived from (P5191). KaMan (talk) 18:40, 10 November 2018 (UTC)
@Rua, KaMan: Almost(?) all English participles can act as adjectives; I'm not sure it's really worth having a separate lexeme for all of them. And almost all the senses I see in enwiktionary for "broken" are also possible to associate with the original verb. But one or two of them perhaps not, so yes a distinct lexeme for that sort of case makes sense to me. ArthurPSmith (talk) 19:11, 10 November 2018 (UTC)
According to Wikipedia, participles are adjectival or adverbial by definition, so it's not surprising to see them acting like adjectives. The question is what to do with the ones that have semantically separated from the verb and become independent words. Even if "broken" is not a good example, there are plenty of examples across languages that are. Another example I can think of is ukudla in Zulu, which is both the infinitive of -dla "to eat" and lexicalised in the meaning "food". —Rua (mew) 23:00, 10 November 2018 (UTC)

Lemmata for Latin verbsEdit

In Wiktionary, the main lemma for most Latin verbs is the first person conjugation. However, most of the Latin verbs in Wikidata so far have the infinitive as the lemma. I want to start adding Latin verbs to Wikidata, but I'm not sure if I should set lemmata to the first person or the infinitive. Any advice? Liamjamesperritt (talk) 04:02, 11 November 2018 (UTC)

  • Just follow the existing ones, similar to other languages. --- Jura 04:16, 11 November 2018 (UTC)
  • There's no reason to have the "main" lemma in infinitive. AFAIK every serious dictionary uses first person, and so do many Wiktionaries, including the Latin one. Alas, the English and German Wiktionariesy uses infinitive lemmas, so you can expect a strong opposition from this side.--Shlomo (talk) 07:13, 11 November 2018 (UTC)
    Agreed with Shlomo, we should follow the way specialists and reference works on that language use. Pamputt (talk) 10:21, 11 November 2018 (UTC)
    Yeah, let's follow what specialist contributors already do. As Wiktionaries shouldn't be copied, we can't really follow them. --- Jura 12:57, 11 November 2018 (UTC)
    Many Wiktionaries follow standard dictionary protocols and contain a wealth of valuable lexical information, so why do you say that Wiktionaries should not be followed? Should we not do our best to align the Lexeme namespace with Wiktionaries in order to provide structured data support for Wiktionaries, as the Main namespace has done for Wikipedias, Wikiquotes, etc.? Liamjamesperritt (talk) 19:49, 11 November 2018 (UTC)
    This discussion is not about the information itself, but about structural issues. There are good reasons, why the data model of Wiktionaries shouldn't (and can't) be followed:
    1. The software used for powering Wiktionaries is text-based, which is an appropriate solution for Wikipedias, Wikisources, Wikibooks etc., not so much for a dictionary, which is primarily a database of lexical information. Still it can be used after introducing many workarounds and strict rules concerning the pages' structure. Wikidata software is more appropriate to process the lexical information as data and doesn't have to follow all the Wiktionaries' workarounds and limitations.
    2. There are many Wiktionaries and they are autonomous projects. Various Wiktionaries use different solutions for the problems mentioned sub (1) and Wikidata can't follow all of them. Even in the case discussed here, we can follow the en.wikt and de.wikt (etc.) and use infinitive as "main" lemma, or we can follow la.wikt and fr.wikt (etc.) and use 1st person sg. for this purpose. Or we can have multiple lemmas without saying, which one is the main one, and let the user decide — this is not possible in Wiktionaries, but it is possible here.
    --Shlomo (talk) 09:14, 12 November 2018 (UTC)
    Um, just for your information, en.wiktionary uses the first-person singular present active indicative as the lemma for Latin verbs. See wikt:WT:Lemmas. —Rua (mew) 10:43, 12 November 2018 (UTC)
    Ehm, thanks, mea culpa. It was stuck somwhere in my memory, which seems to be not so reliable any more ;)--Shlomo (talk) 16:48, 12 November 2018 (UTC)
    • As Wiktionaries aren't CC0, we can't import most of their content and model. Obviously we shouldn't otherwise these valuable projects would get aborted by another WMF project. We can still link to their pages and they can do the same. Lexemes at Wiktionary would probably have been preferable, but somehow a series of users we have hardly seen editing lexemes since wanted otherwise. Now people like you and me who actually contribute are stuck with the current situation. --- Jura 05:39, 13 November 2018 (UTC)
  • Useful could be to include several forms in the lemma. --- Jura 12:57, 11 November 2018 (UTC)
    That's possible. The question was about the "main" lemma (whatever it is), which should be, per definition, only one. The way I understood it is, that main lemma is the one with plain language code (in this case la). The infinitive can be added as alternative lemma, e.g. with the code la-x-Q179230.--Shlomo (talk) 15:28, 11 November 2018 (UTC)

Senses are now displayed before FormsEdit

Hello all,

Based on several requests, we now display the Senses section before the Forms one on Lexemes.

If you still see the Forms first on a Lexeme, you should purge the page or do an edit on it.

If you have any issue, the related ticket is this one, you can also ping me.

Cheers, Lea Lacroix (WMDE) (talk) 15:17, 12 November 2018 (UTC)

I noticed this change Friday or Saturday, was wondering when it would be announced - thanks! ArthurPSmith (talk) 17:11, 12 November 2018 (UTC)
Return to the project page "Lexicographical data".