Wikidata talk:Lexicographical data

Lexicographical data

Place used to discuss any and all aspects of lexicographical data: the project itself, policy and proposals, individual lexicographical items, technical issues, etc.

Translate this header box!

Start a new discussion

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/07.

Adpositional sense item trees

Latest comment: 1 month ago8 comments3 people in discussion

For a long time I've wanted to do something about lexeme categories that are underrepresented among lexeḿes with item for this sense (P5137) statements on their senses, in particular adpositions (prepositions, circumpositions, postpositions etc). Since these typically describe relations between objects (above, in, after and so on). I believe their items should be subclasses of ̼relation (Q930933) in one way or another, I have written a partial proposal for an item tree model at [1] but also made reference to this idea in the property discussion [2] where it may have become lost in the broader discussion under that subject line. Unfortunately I haven't received much feedback, whether positive or negative on this idea,and I don't think I have enough authority to get started building these item trees on my own, since if they are going to be used it may have a significant impact on how work on lexemes in general (not only adpositions) is conducted.

Therefore I'd like to ask for your comments here, both regarding the merits of the idea as such,and the way in which we could come to an agreement on what to do. Adpositional item trees, yes or no? What's your opinion here? Is there a better place than my personal subpage or the item for this sense (P5137) property talk page where such trees can be discussed?--SM5POR (talk) 16:55, 1 May 2024 (UTC)Reply

I think it's a good initiative, you should try. But what tree do you mean? I can't see a hierarchy of adpositions at your page... Infovarius (talk) 23:54, 7 May 2024 (UTC)Reply

@Infovarius@Shisma,@VIGNERON,@ZI Jonyː Sorry for the unclear reference, immediately after the "in" section I have a section labelled [3] where you can expand my first tree for ̺relation (Q930933) after which I have begun working on one for conjunctions and similar lexical operators. I don't want to put too much work into my personal subpage, but I'd rather see a WikiProject dedicated to these item class trees. I also have trouble formatting and editing the language sample translation table, and wonder if there is some convenient tool to help me add an arbitrary column or row to an existing table and start filling it in with translations for comparison. After"in" I'd like to get on with "of"to help find replacement qualifiers for various statements still using the ̼of (P642) property currently being deprecated.̴̴̴̴̃ SM5POR (talk) 09:35, 8 May 2024 (UTC)Reply

@Infovarius,@Mahir256ː Under [4] I found an instruction I would like to challengeː "This property is used to link a sense representing a substantive concept (typically on a noun or adjective) to a Wikidata item representing the concept." Why the apparent restriction to nouns and adjectives? Then there is ̺predicate for (P9970) which is stated to be used with verbs, I don't quite get the sense model this documentation page appears to convey and wonder whether it's considered up-to-date with current best practice.--SM5POR (talk) 09:12, 9 May 2024 (UTC)Reply

@Infovarius,@Mahir256,@ZI Jonyː Sorry for bugging you all about this, but i think you have the ball in your half of the playground right now. I want to conduct this discussion as part of some active project in Wikidata, not merely in my personal wiki pages. Could you please help me establish a page or project for this discussion, if you think my idea isworth trying out? I have given you a number of referencesincluding one above to documentation which i consider unclear or incomplete,in particular whether item for this sense (P5137) should be used beyond nouns and adjectives, since using it with adpositions seems to do exactly that.--SM5POR (talk) 14:45, 13 May 2024 (UTC)Reply

@SM5POR: I think any attempt at establishing a hierarchy in items of relationships—spatiotemporal or otherwise—typically expressed by adpositions should be backed up by sources, and these should try to be described in as language-neutral a fashion as possible (i.e. not make reference to specific languages or specific words in those languages). You may find a volume like Adpositions (Q119239595) useful as a starting point.

As for the comment regarding the documentation of P5137, the term "typically" doesn't introduce any sort of restriction; the use of "substantive concept" was intended to distinguish its primary—again, not a restrictive word!—use from that of P9970. I do hope to clarify it and other incomplete documentation subsections soon. Mahir256 (talk) 15:04, 13 May 2024 (UTC)Reply

@Mahir256,@Infovariusː Thankyou for clarifying this, and I'm sorry for misinterpreting the documentation. I'd like to add that the Swedish word for "noun" happens to be "substantiv",possibly contributing to my reading of "substantive concept" as more restrictive than it was meant.I appreciate your suggested literaturereference and I agree completely that it should be used as a source, unfortunately I don't have access tothat work myself, so I hope someone who has will be able to contribute with citations for our item trees. But my practical question of where to conduct this discussion remains. Should we perhaps allocate a section of the documentation page,which doesn't seem to be matched with a corresponding talk page of its own, or create a separate project page somewhere else? As a technical compromise, I could offer to create one among my personal pages, but then the issue becomes one of advertising it appropriately so that anyone seeking info on item for this sense (P5137) will find the discussion and be able to participate. Where can I find best current practice with respect to project pages? Should we write something at [Wikidata:WikiProject_Interesting_Content#Suggestions_for_future_content]? Maybe it's a data quality issue(we have a page tree for those)?--SM5POR (talk) 12:27, 14 May 2024 (UTC)Reply

You can try to download the book somewhere here. --Infovarius (talk) 19:16, 14 May 2024 (UTC)Reply

snowclones in scope?

Latest comment: 28 days ago2 comments2 people in discussion

the snowclone (Q2338287) being a phrasal template (src: https://snowclones.org/about) like "old X never die, they just Y"; the use of X, Y, Z, A, etc. seems to be the consensus way to notate the 'blanks' of the template, but it strikes me as rather ad-hoc when we're considering a semantic database that might be able to represent that concept of replaceability with better fidelity. Arlo Barnes (talk) 06:28, 17 May 2024 (UTC)Reply

@Arlo Barnes: good point, not exactly sure how to deal with it ; especially as lemmas can contain X as a letter in itself and not as a placeholder (like in "X marks the spot" or "fragile X syndrome"). One way could be to explicitly put a placeholder in combines lexemes (P5238). PS: it's not only for snowclones, many phrase are in a similar case. Cheers, VIGNERON (talk) 14:19, 15 June 2024 (UTC)Reply

Still unable to add some lexemes

Latest comment: 1 month ago1 comment1 person in discussion

I'm trying to add a statement with a lexeme I created a few weeks ago, and I'm still not able to. Wondering when this will be fixed. The lexeme is "bio-" (L1327069). When you search for L:bio- you get it in the results, but it says "Unknown language, Unknown..." in the display of the terms retrieved by the search. When you view the lexeme it does show that it is an English prefix. When I try to add a statement with L1327069, it says "Not found". Any idea when this will be fixed? AdamSeattle (talk) 21:06, 1 June 2024 (UTC)Reply

Lexicodays, online event dedicated to Lexicographical Data, on June 28-30, 2024

Latest comment: 19 days ago3 comments1 person in discussion

- Indonesian version below -

Hello all,

Have you ever wondered how Wikidata stores and models words? How to create and improve Lexemes in your languages? Or even why it is useful and which projects could benefit from it?

The Lexicodays 2024 will answer these questions, and many more. During this online event, you will be able to learn more about Lexicographical Data on Wikidata, to discover how to model words in your languages, and to try out various tools that make it easier to work on Lexemes. It offers a space for editors involved in creating and maintaining Lexemes to discuss their ideas, challenges and best practices.

The online event will take place on June 28, 29 and 30, with sessions replicated in different languages and at different times across time zones. It is co-organized by Wikimedia Deutschland and the Software Collaboration Team in Indonesia, and we will focus on the languages of Indonesia and the Wikidata community in Indonesia. The event is open to everyone regardless of their knowledge of Lexemes. Most sessions will be recorded and published after the event.

On the main event page, you can discover the structure of the program, which will keep evolving in the upcoming weeks. We are also welcoming proposals for the program until June 20th - we are particularly interested in introductions to Lexicographical Data in different languages, and discussions run by community members on how to improve modelling and documentation in a specific language.

We will launch registration for the event in the upcoming days - if you’re interested, stay tuned by following the talk page or joining the Lexicographical Data Telegram group.

If you have any questions, feel free to write on the talk page of the event. See you soon, Léa (Lea Lacroix (WMDE)) and Raisha (Fexpr).

---

Halo, teman-teman!

Pernahkah Anda bertanya-tanya bagaimana Wikidata menyimpan dan memodelkan kata-kata? Bagaimana cara membuat dan meningkatkan Leksem dalam bahasa yang Anda tuturkan? Kenapa Leksem itu bermanfaat? Proyek-proyek apa yang akan terbantu dengan adanya Leksem ini?

Lexicodays 2024 akan menjawab pertanyaan-pertanyaan tersebut, dan masih banyak lagi. Selama acara daring ini, Anda akan dapat mempelajari lebih lanjut mengenai Data Leksikografis di Wikidata, menemukan cara memodelkan kata-kata dalam bahasa Anda, dan mencoba berbagai perkakas yang memudahkan Anda dalam menyunting Leksem. Acara ini membuka ruang bagi para penyunting yang terlibat dalam pembuatan dan pemeliharaan Leksem untuk saling berdiskusi mengenai ide, tantangan, maupun praktik-praktik terbaik.

Acara daring ini akan berlangsung pada tanggal 28, 29, dan 30 Juni, dengan waktu penyelenggaraan yang tersebar dalam beberapa zona waktu dan sesi-sesi serupa yang diantarkan dalam bahasa-bahasa yang berbeda. Acara ini diselenggarakan bersama oleh Wikimedia Deutschland dan Tim Kolaborasi Perangkat Lunak di Indonesia. Fokus dari acara ini adalah untuk bahasa-bahasa yang dituturkan di Indonesia dan komunitas Wikidata di Indonesia. Acara ini terbuka untuk siapa saja, terlepas dari seberapa akrab Anda dengan Leksem. Kami akan merekam sebagian besar sesi dan mempublikasikannya setelah acara selesai.

Anda dapat mengakses jadwal kegiatan pada halaman beranda acara, yang akan terus kami perbarui dalam beberapa pekan ke depan. Kami juga mengadakan panggilan terbuka untuk pengajuan proposal kegiatan hingga tanggal 20 Juni. Kami sangat tertarik dengan pengenalan Data Leksikografis dalam berbagai bahasa, dan diskusi yang dilakukan oleh anggota komunitas mengenai cara meningkatkan pemodelan dan dokumentasi dalam bahasa tertentu.

Kami akan membuka pendaftaran untuk acara ini dalam beberapa hari mendatang. Apabila Anda tertarik, silakan pantau terus laman pembicaraan ini atau bergabunglah dengan grup Telegram Data Leksikografis.

Jika Anda memiliki pertanyaan, jangan ragu untuk menulis di laman pembicaraan acara Lexicodays 2024. Sampai jumpa, Léa Lea Lacroix (WMDE) dan Raisha Fexpr. Lea Lacroix (WMDE) (talk) 09:00, 3 June 2024 (UTC)Reply

Hello all,

As a reminder, the Lexicodays 2024, online event dedicated to Lexicographical Data on Wikidata, will take place on June 28, 29 and 30, with sessions replicated in different languages and at different times across time zones.

The event will take place both on Zoom and Jitsi, and the access will be free without registration (the access links will be added to the program page). However, if you’re planning to join, we invite you to add your username to the Participants page.

We also remind you that you can contribute to the program until June 20th by adding a proposal to the talk page. You’ll find more information here.

We are particularly interested in introductions to Lexicographical Data in different languages, and discussions run by community members on how to improve modelling and documentation in a specific language. You can also present tools or Lexeme usecases.

If you have any questions, feel free to reach out to Léa (Lea Lacroix (WMDE)) or Raisha (Raisha (WSC)).

We’re looking forward to seeing you at the Lexicodays! Lea Lacroix (WMDE) (talk) 10:28, 18 June 2024 (UTC)Reply

Hello all,

The Lexicodays 2024 will take place this week, on June 28, 29 and 30!.

The event will take place both on Zoom and Jitsi, and the access will be free without registration (the access links will be added to the program page). However, if you’re planning to join, we invite you to add your username to the Participants page. The event will include sessions replicated in different languages and at different times across time zones.

Here are a few interesting sessions that you will find in the program:

Introduction to Lexicographical data and how to model words in Wikidata
Discussions about modelling proverbs, sayings, compound words and predicates
Presentation of some useful tools
Modelling sessions and editathons in various languages of Indonesia
Introduction to Abstract Wikipedia and how it will work together with Lexemes
Exploring how to generate sentences with Lexemes

Note that most sessions will be recorded and available after the event.

If you have any questions, feel free to reach out to Léa (Lea Lacroix (WMDE)) or Raisha (Raisha (WSC)).

We’re looking forward to seeing you at the Lexicodays! Lea Lacroix (WMDE) (talk) 10:20, 24 June 2024 (UTC)Reply

Moving a statement for irregular verbs?

Latest comment: 5 days ago4 comments3 people in discussion

Hi,

I noticed that around 1600 lexemes for verbs use instance of (P31)irregular verb (Q70235) (https://w.wiki/AX5f) where I would have rather used the more specific conjugation class (P5186)irregular verb (Q70235). What do you think, should we move it or not? and if so, does someone have a bot to move them?

Cheers, VIGNERON (talk) 12:24, 29 June 2024 (UTC)Reply

irregular verb (Q70235) doesn't look like specific conjugation class (Q53996674). --Infovarius (talk) 20:05, 30 June 2024 (UTC)Reply

I don't think conjugation class (P5186) would be right, an irregular verb is a verb where conjugation behaves irregularly in some way:

- The conjugation class is a property of a verb, so I don't think its values should be subclasses of verb (Q24905) (like how the gender of a masculine noun is "masculine", not "masculine noun").

- "irregular" isn't a specific conjugation class. The verb might still have a conjugation class but be irregular because it has one or more irregular forms. It might be irregular because it follows one conjugation class for some forms and another for the rest.

- Nikki (talk) 06:27, 1 July 2024 (UTC)Reply

@Infovarius, Nikki: indeed, I'm not entirely convinced the current situation is right (especially as it's quite inconsistent) but my proposal is clearly not right either. I'll leave it (as least for now and until I have a better idea). Cheers, VIGNERON (talk) 15:37, 8 July 2024 (UTC)Reply

pa'al (Q7265893) and Fa3aL (Q114419665) etc.

Latest comment: 12 days ago1 comment1 person in discussion

Should pa'al (Q7265893) and Fa3aL (Q114419665) (etc.) be linked in some way? My knowledge of Hebrew is very rudimentary and I haven't looked into the details, but these kind of pairs seem to be related. Disclosure: I have created the items like Fa3aL (Q114419665) and started to use them for Arabic varieties in statements like كَتَب (L1331764-F1)uses (P2283)Fa3aL (Q114419665). --Marsupium (talk) 21:54, 1 July 2024 (UTC)Reply

Proposal to bridge the gap between items and lexicographical data

Latest comment: 8 days ago2 comments2 people in discussion

Tracked in Phabricator
Task T368704

The idea is to link Item labels, aliases and monolingual text with a corresponding lexeme. You can read more about it and post your comments on the phabricator ticket. 5628785a (talk) 21:44, 3 July 2024 (UTC)Reply

Interesting and good idea. This bridge is not easy to cross (for many reason, including but not limited to homograph) but indeed more tool would make it easier. Cheers, VIGNERON (talk) 08:43, 5 July 2024 (UTC)Reply

Oxford dictionaries

Latest comment: 5 days ago3 comments3 people in discussion

Hi y'all,

Soufiyouns made 6 property proposal for Oxford dictionaries. Right now, the proposal are strangely going in various direction and I thought it might be useful to centralize the talk here.

First, the proposal were strangely on Wikidata:Property_proposal/Authority_control instead of Wikidata:Property proposal/Lexemes (I fixed that but some people may have missed some of the proposals, pages where there is already only low participation in normal times).

Mahir256 had a very interesting question on the 3 English dictionaries: « That's enough with the English dictionaries, don't you think? » I'm wondering, how many is "enough"? I don't think it's just a question of number, but also of content. Sadly the examples given are about very common words and the motivation is short, so its indeed hard to see what te value of these identifiers. For example, is there words that can only be found in these dictionary that would make them unique? Also, are they really "Highly authoritative source"? (just because it's published by the prestigious Oxford don't make them magically great ; I add a quick look at the freely accessible part of https://www.oxfordreference.com/display/10.1093/acref/9780191739545.001.0001/acref-9780191739545 and I may have missed something but it's not really impressive).

Then for me, the main problem is that these website are not fully freely accessible. If the value was clear, maybe we could overlook this but the two points together makes me wonder... the main issue is for homograph, without seeing the content, how can I know what lexeme are https://www.oxfordreference.com/display/10.1093/acref/9780191739545.001.0001/b-fr-en-00003-0000001 and https://www.oxfordreference.com/display/10.1093/acref/9780191739545.001.0001/b-fr-en-00003-0000002 (both "a").

What do you all think?

Cheers, VIGNERON (talk) 15:55, 8 July 2024 (UTC)Reply

@VIGNERON Note that while ordinarily I would not encourage the addition of entirely paywalled reference properties, Oxford Reference is freely accessible to Wikimedia users through Wikipedia Library. I am not opposed to the addition of any of these properties for the facts that bilingual dictionary properties ultimately reduce the amount of time required of non-English speaking contributors to add sourced information to lexemes. However, given the number of existing properties for English, German, and French lexemes in particular those proposals are not a priority from my point of view. عُثمان (talk) 16:33, 8 July 2024 (UTC)Reply

Hi @VIGNERON, thanks for opening the discussion. I agree with the various points made.

I supported the creation of the English-Italian dictionary property. The Italian language still needs references and identifiers to support it. Especially with regard to the current Italian language.

Searching for a word in Google, moreover, the 'dictionary box' at the top of the page presents results from Oxford Languages.

Regarding all the other properties of Oxford dictionaries, I too am not convinced by the quantity. It is easy for them to remain dormant properties.

Therefore, I will not vote for other properties that do not interest me.

Thanks, Luca.favorido (talk) 05:20, 9 July 2024 (UTC)Reply

Add topic