Wikidata:Lexicographical data

Lexicographical data
Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2019/03.


Languages listed in dropdownEdit

Some properties (like Property:P1559) require you to pick a language from a dropdown. I notice that the Tobelo language isn't listed. How do I add it to the list? HaEr48 (talk) 07:15, 22 February 2019 (UTC)

Hello, this property has a monolingual text datatype. On this help page you can find more information. Requesting a new language to be added takes place on Phabricator. Lea Lacroix (WMDE) (talk) 08:17, 22 February 2019 (UTC)
@HaEr48: when a language is not (or not yet) available, you can still use the code mis (which is the special ISO 639 code when a code is required, but the language has no code). Cheers, VIGNERON (talk) 12:33, 5 March 2019 (UTC)



Could you please help me to find the full mapping of the ontology? I mean, a link to have the complete list of classes and open it in Protégé. Also, I am interested by a description of all controled vocabularies used to this point and how they are connected with the ontology. I know it is still in development but it could be nice to share the ontology and make it available for the community that work on this matter, for example on Ontolex mailing list.

By the way, I am on a project that imply to map French Wiktionary with Lemon, Lexinfo and Lexicog and I think the latter one, a new module for lexicography, is a great improvement that may help here too. Noé (talk) 13:05, 28 February 2019 (UTC)

@Noé: It's not clear to me what you're looking for, but you might want to start by reviewing the wikibase RDF format. This can be queried via the Query Service (link in the nav bar here) from which you can obtain other RDF output that might more precisely match what you need. ArthurPSmith (talk) 15:19, 28 February 2019 (UTC)
Thanks for your answer. On the mentioned page, there is a link to, and this is the general URI for Wikibase ontology. I was looking for a more specific one for the Lexeme part. Noé (talk) 15:48, 28 February 2019 (UTC)
At the bottom of your link, we find some information about the lexemes. However, there is nothing about the Sense so this is probably not up-to-date. Lea Lacroix (WMDE), may you help Noé or probably ask around you? Pamputt (talk) 17:05, 28 February 2019 (UTC)
Also pinging @Tpt: who worked on the ontolgy of Lexemes for the SPARQL endpoint (where ontolex is used so at least part of the mapping is done ;) ). Cheers, VIGNERON (talk) 12:31, 5 March 2019 (UTC)
Hello Noé. The OWL definition file for the WikibaseLexeme RDF vocabulary is here. The best description of how Wikidata Lexemes are represented in RDF is probably mw:Extension:WikibaseLexeme/RDF mapping. It uses LEMON where possible for the basic structures enforced by the WikibaseLexeme extension. The other data are represented using Wikibase statements. Feel free to ping me on IRC if you have specific questions. Tpt (talk) 12:42, 5 March 2019 (UTC)
Thanks, Tpt (merci). Both pages helped me to get when Ontolex/Lemon was used and when not. Some part seems tiedous for me like the use of wikibase:lexicalCategory instead of lexinfo:PartOfSpeech or the use of rdfs:label rather than lemon:SenseDefinition and the wikibase properties could be more documented. But, well, those two pages may gain to be more accessible for the audience, as it is a key to make the mapping more interconnected. And discussions should continue in order to make the ontology clearer. Anyway, great job to you all   Noé (talk) 15:55, 5 March 2019 (UTC)

Gadget for linking to dictionaries on WikisourceEdit


Could someone write a gadget (in Javascript I guess?) to link directy to dictionaries? For instance on Lexeme:L114#P1343, adding a link to fr:s:Index:Henry - Lexique étymologique du breton moderne.djvu. Or anything similar, the idea is to help checking the source. Here the link is already on Q19216625 but making it 1 click away instead of 2-3 would ease the consultation.

Cheers, VIGNERON (talk) 13:26, 5 March 2019 (UTC)

Linking to Wiktionary even more expectable... --Infovarius (talk) 08:33, 18 March 2019 (UTC)

derived from (P5191)Edit

Can anybody build queries with this property that would be a workable alternative for Wikidata:Property_proposal/periphrasis ?

@EncycloPetey, Vive la Rosière: fyi: participants of earlier discussion. --- Jura 13:45, 10 March 2019 (UTC)

@Jura1: what do you want exactly? I dont see the link between derived from (P5191) and periphrasis... (but I must admit I don't really understand these proposals either).
Anyway, here a request of all lexemes using derived from (P5191) (with the corresponding lemmata) :
  ?l a ontolex:LexicalEntry ; wikibase:lemma ?lemma ; dct:language ?language; wdt:P5191 ?derivedfrom .
  ?derivedfrom wikibase:lemma ?derivedfromlemma ; dct:language ?language.
Try it!
If you want something more specific, just tell me what you want (I feel this is a subset of this general query but I don't get what parameter are wished here, I already restricted to lexemes inside one and same language, what else is there to fit you idea?).
Cdlt, VIGNERON (talk) 15:15, 11 March 2019 (UTC)
I don't either, but if the opposing argument is valid, it should be possible. Can you say so in the property creation discussion? --- Jura 08:35, 16 March 2019 (UTC)
I don't understand this proposal (nor the example - but from what I get, this is what this query do, no? - and search engine give no meaningful result for "periphrastic definition") so no, I can't make a comment on something I don't understand (except saying that « I don't understand » which is not really constructive). Cheers, VIGNERON (talk) 15:00, 17 March 2019 (UTC)

Limba românăEdit

Please delete from all articles combination limba moldovenească-it is a mistake !!!!!!!!!!!!!!!! ONLY Limba română please, moldovenească is a dialect for regional use !!!!!!!!!

It seems that you are talking about the fact that Moldovan and Romanian are considered the same lang (more or less, I simplified). Dealing with this is a well known longstanding problem on Wikimedia projects but that doesn't seems to be a problem (at least not yet) on Lexemes nor in lexicographical data (where we can store multiple - and even contradictory - statements about lang and dialect). Ping to @Gikü: who create most of the 21 Lexemes in Romanian and is from Moldova.
Cheers, VIGNERON (talk) 14:40, 15 March 2019 (UTC)


As there are no separate talk about Wiktionary issues in Wikidata apart from Lexeme space, I have to announce here: I am collecting information about rather complex system of categories in Wiktionaries (primarily in Russian) in order to explain some obscure things. Many were confused so here you are place for discussions. @LA2, Superchilum, Jura1: --Infovarius (talk) 13:05, 19 March 2019 (UTC)

I'm not sure what you aim to achieve here. But I note that for semantic categories, some Wiktionaries have them (English, Russian, Swedish) while others completely lack them (Polish). For Swedish, we are quite good at creating categories for concrete things (plants, animals) but seldom create categories for abstract concept (philosophy, politics, feelings). Recently, I found out that there is a Swedish version of Roget's Thesaurus, published in 1930 by S.C. Bring, which is now out-of-copyright and available as a dataset, so I imported it to Swedish Wiktionary as Appendix:Bring. This appendix has 1000 subpages for semantic categories, the same as in Roget's Thesaurus, each subpage listing associated words, e.g. 366. Animals, 434. Red, and 460. Carelessness. I think these 1000 could just as well be actual wiki categories. (You will find "räv" (fox) both in animals and in red.) --LA2 (talk) 21:19, 19 March 2019 (UTC)
The same as LA2; I am not sure to understand what do you want to do on this new page. Do you plan to make some matches betwenn categories of different languages following a given pattern? Or anything else? Pamputt (talk) 23:21, 19 March 2019 (UTC)
Any clarification is welcome. Let's see what Infovarius will produce and then we'll discuss :-) --Superchilum(talk to me!) 13:05, 20 March 2019 (UTC)
The first aim was to explain the difference between such categories like Category:Nouns (Q61945932), Category:Nouns by language (Q30431819), Category:Noun (Q9557799) and Q7773966. This explanation depends on the existance of different types of categories so I had to introduce them too. I don't want to discuss internal rules for category content too much but I want adequate interwiki linking for them. --Infovarius (talk) 16:10, 21 March 2019 (UTC)
