Wikidata:Property proposal/Lexemes
Property proposal: | Generic | Authority control | Person | Organization |
Creative work | Place | Sports | Sister projects | |
Transportation | Natural science | Computing | Lexeme |
See also edit
- Wikidata:Property proposal/Pending – properties which have been approved but which are on hold waiting for the appropriate datatype to be made available
- Wikidata:Properties for deletion – proposals for the deletion of properties
- Wikidata:External identifiers – statements to add when creating properties for external IDs
- Wikidata:Lexicographical data – information and discussion about lexicographic data on Wikidata
This page is for the proposal of new properties.
Before proposing a property
- Search if the property already exists.
- Search if the property has already been proposed.
- Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
- Select the right datatype for the property.
- Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
- Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.
Creating the property
- Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
- Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
- See property creation policy.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/05. |
Wikibase lexeme edit
Duden node ID edit
Description | numeric identifier of an entry in the Duden |
---|---|
Represents | lexeme (Q111352) |
Data type | External identifier |
Domain | lexeme |
Allowed values | [1-9][0-9]{2,} |
Example 1 | zwischen (L302428) → 213628 |
Example 2 | Wikipedia (L221977) → 205799 |
Example 3 | Luftballon (L99) → 91371 |
Source | https://www.duden.de/ |
Planned use | add to existing German lexemes (and perhaps replace P8376 on those lexemes?) |
Number of IDs in source | current very loose upper-bound estimate: 399,000 (on seeing node '399770' for the word '2G-Regel' and on finding '569' to be the lowest node for '24-Sekunden-Regel') |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | https://www.duden.de/node/$1 |
See also | Duden lexeme ID (P8376), Duden sense ID (P12641) |
Motivation edit
This property is intended as a substitute for (not as something to coexist with) P8376, mainly to avoid needing to deal with multiple values being present on a lexeme due to URL changes. It would provide a unique and hopefully little more stable identifier for entries in the most trusted German dictionary.
The value for this property can be found both below the heading, when clicking "Als Quelle verwenden" (the quotation mark) and noting the value between 'node' and 'revision' within the URL, and in multiple parts within the <head> element as part of an image URL (presumably used in link previews): https://www.duden.de/og-image/569.png is the image for '24-Sekunden-Regel' and https://cdn.duden.de/lexem_images/399770.png that for '2G-Regel'.
(Regarding value stability: when looking at the earliest captures of the formatter URL's prefix, the values prior to April 2019 all had values greater than 1000000 (except for '10' and '1220') and all pointed to non-entries. Those captured after April 2019, which appear uniformly to be within the range of IDs given above, still resolve to the same entries five years later—Panamaer (L931866) from 1 May 2019, Remix (L761972) from 30 April 2019, and Neger (L804760) from 26 April 2019.)
(I have begun an index of these values and will slowly add them to Mishramilan (মিশ্রমিলন) once this property is created. I would be open to performing the necessary substitutions of P8376 myself as well.) Mahir256 (talk) 13:53, 22 April 2024 (UTC)
Discussion edit
- Support Hmm, do we want to plan to replace the other ID with this one? ArthurPSmith (talk) 19:19, 22 April 2024 (UTC)
- If you want to add this in addition to the existing property, fine, but I would be opposed to replacing the existing property with this one:
- It's not what people actually use for linking (global search for "duden.de/node/" finds just 122 results, versus more than the limit of 10,000 for "duden.de/rechtschreibung/"), so only having the node ID would make our data difficult to compare with what other places have.
- The node IDs are not very easy to find or use. I had trouble finding it even when I knew it should be on the page somewhere, it can only be accessed by clicking certain things and following the link redirects to a page without the node ID in the URL, which makes it hard to tell whether a particular link is for a particular entry or not. Both of those points make it more likely that people will add the wrong values to the property.
- A lexeme doesn't have a single current node ID: An entry typically has multiple pages (definition page, synonyms page, inflections page) and each page has its own node ID which has no obvious connection to the other pages for that entry, e.g. https://www.duden.de/rechtschreibung/Hund is 68862, https://www.duden.de/deklination/substantive/Hund is 385837 and https://www.duden.de/synonyme/Hund is 251193.
- The node IDs appear to be less stable than the normal URLs: In my experience, the normal URLs are pretty stable. The changes I've seen all seem to come from them recently simplifying the ones with multiple disambiguation parts (now they only include one), but the previous values redirect to the new values. (Much like how a merge on Wikidata results in the previous ID redirecting to the new ID). I looked at one page where the URL changed and immediately found that https://web.archive.org/web/20110627114010/http://www.duden.de:80/rechtschreibung/Band_Gewebestreifen_Fessel has "http://www.duden.de/zitieren/10020253/1.5", https://web.archive.org/web/20151013073540/http://www.duden.de/rechtschreibung/Band_Gewebestreifen_Fessel has "http://www.duden.de/node/651609/permalink?destination=node/651609" and https://web.archive.org/web/20220517054827/https://www.duden.de/rechtschreibung/Band_Gewebestreifen_Fessel has "https://www.duden.de/node/18218/revision/496432". Both the "zitieren" link and previous "node" link are 404s, while the original URL from 2011 still works now.
- If you want a single value for the current property, you should be using best ranked statements. If a lexeme has multiple best ranked statements, then either one now redirects and the other hasn't been set to preferred rank, or someone has conflated two entries.
- - Nikki (talk) 02:46, 24 April 2024 (UTC)
- @Mahir256:, could you please clarify the comments above by @Nikki:. Regards, ZI Jony (Talk) 18:30, 28 April 2024 (UTC)
- Oppose maybe instead, we should have a seperate Drupal node id property (analogous to MediaWiki page ID (P9675)). Duden has little incentive to keep node ids stable, they have on the other hand a strong incentive to keep the slug stable for SEO reasons. If it comes to a redesign, nobody is going to care about node ids. This happened before in the past: Archiv, das at some point had the node id
704401
which today is dead. the current node id for Archiv, das is8452
–Shisma (talk) 09:38, 21 May 2024 (UTC)- also note: that all pages in this drupal have a node id, not just lexemes. For instance
- that's not an argument agains having this property but it might throw off your estimation of Number of IDs in source – Shisma (talk) 11:04, 21 May 2024 (UTC)
Palula dictionary ID edit
Description | identifier for an entry in Henrik Liljegren’s dictionary of Palula |
---|---|
Data type | External identifier |
Allowed values | LX00[0-2][0-9]{3} |
Example 1 | پھاگ (L1321906) LX001765 |
Example 2 | مُلئیۡ (L1321907) LX001518 |
Example 3 | باد (L1321908) LX000207 |
Formatter URL | https://dictionaria.clld.org/units/palula-$1 |
Motivation edit
Palula dictionary (Q125729009) is a well referenced dictionary of the Palula language which is organized in a database that would be well suited for linking to Wikidata lexemes. The URL format is https://dictionaria.clld.org/units/palula-$1
and a regex to match the ID values is LX00[0-2][0-9]{3}
--عُثمان (talk) 13:18, 1 May 2024 (UTC)
Discussion edit
- Support VIGNERON (talk) 09:33, 21 May 2024 (UTC)
- @عُثمان, VIGNERON: Done as Palula dictionary ID (P12738). Regards, ZI Jony (Talk) 06:42, 22 May 2024 (UTC)
Dictionary of Taiwan Hakka ID edit
Description | entry in the online Hakka dictionary of the Ministry of Education (Taiwan) |
---|---|
Data type | External identifier |
Allowed values | [1-9][0-9]* |
Example 1 | 𠊎/ngài (L226037) 12754 |
Example 2 | 赌博/賭博/duˋ bogˋ (L1325085) 18497 |
Example 3 | 光/gongˊ (L1142704) 9030 |
Formatter URL | https://hakkadict.moe.edu.tw/search_result/?id=$1 |
Applicable "stated in"-value | Dictionary of Taiwan Hakka (Q125922550) |
Motivation edit
This proposal for an identifier for Dictionary of Taiwan Hakka (Q125922550) would add links to an authoritative source for Hakka Chinese from which we currently link to Mandarin and Min-Nan lexemes. -عُثمان (talk) 18:02, 15 May 2024 (UTC)
Discussion edit
Notified participants of WikiProject Taiwan Regards, ZI Jony (Talk) 19:39, 20 May 2024 (UTC)
- Support: Why not?--S8321414 (talk) 23:46, 20 May 2024 (UTC)
- Support:支持第一個以客家為主題的屬性通過 --Allenwang6212a (talk) 00:15, 21 May 2024 (UTC)
- Support--Cbliu (talk) 07:38, 21 May 2024 (UTC)
- Comment The entries in this dictionary are split over six panes ("海陸腔, 大埔腔, 饒平腔, 詔安腔, 南四縣腔, 四縣腔"), each with different pronunciations and sometimes different senses; while most of these panes (if they are non-empty) pertain to the same written form, of the ~4300 entries I've indexed so far, about 30 pertain to more than one written form. Mahir256 (talk) 16:11, 22 May 2024 (UTC)
Slovenski etymološki slovar ID edit
Description | entry in the online third edition of Slovenski etymološki slovar |
---|---|
Data type | External identifier |
Allowed values | [1-9][0-9]* |
Example 1 | brada (L308432) 4285227 |
Example 2 | smuč (L750012) 4291964 |
Example 3 | skedenj (L641945) 4291811 |
Number of IDs in source | 10,097 |
Formatter URL | https://www.fran.si/193/marko-snoj-slovenski-etimoloski-slovar/$1/_ |
Applicable "stated in"-value | no label (Q125938282) |
⧼Motivation⧽ edit
This proposal would support Slovenian lexemes with links to a thorough and reputable lexicographic resource. -عُثمان (talk) 16:01, 18 May 2024 (UTC)
Discussion edit
Notified participants of WikiProject Slovenia Regards, ZI Jony (Talk) 19:44, 20 May 2024 (UTC)
- Support Eh, I guess I can add it to Mishramilan (মিশ্রমিলন). (There appear to be several other large dictionaries on fran.si that might be worth including.) Mahir256 (talk) 15:17, 22 May 2024 (UTC)
Meurgorf identifier edit
Description | identifier for a Breton lexeme in the historical dictionary Meurgorf |
---|---|
Represents | Meurgorf (Q110820044) |
Data type | External identifier |
Domain | Lexeme |
Allowed values | [1-9][0-9]{0,4} |
Example 1 | ki (L69) → 18117 |
Example 2 | lagad (L114) → 19505 |
Example 3 | Breizh (L1756) → 5637 |
Example 4 | sarpant-nij (L1320551) → 24912 |
Number of IDs in source | 60470 (as of 2024/05/21) |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | https://niverel.brezhoneg.bzh/br/meurgorf/$1 |
Robot and gadget jobs | for Mishramilan (মিশ্রমিলন) |
Motivation edit
Meurgorf (Meurgorf (Q110820044)) is one of the main historical dictionary for Breton (and middle/old Breton). It was created and is maintained by the Ofis Publik ar Brezhoneg (Q1401194). It would be very beneficial for the Lexemes in Breton.
Cheers/A galon, VIGNERON (talk) 09:32, 21 May 2024 (UTC)
Discussion edit
- Support Mahir256 (talk) 14:37, 21 May 2024 (UTC)
milog.co.il entry ID edit
Description | identifier for an entry in the Hebrew online dictionary milog.co.il |
---|---|
Data type | External identifier |
Domain | Hebrew lexemes |
Allowed values | [1-9][0-9]* |
Example 1 | התחבר/הִתְחַבֵּר (L209475) → 158 |
Example 2 | הסביר/הִסְבִּיר (L208307) → 1611 |
Example 3 | שוטר/שׁוֹטֵר (L68258) → 2996 |
Source | https://milog.co.il/ |
Planned use | add to Mishramilan (মিশ্রমিলন) |
Number of IDs in source | 48,163 (at least) |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | https://milog.co.il/_/e_$1 |
See also | Ma'agarim ID (P11280), Strong's number (P11416) |
Motivation edit
This property will provide further authority control for Hebrew lexemes (including both for more modern terms and for compounds). Mahir256 (talk) 17:47, 22 May 2024 (UTC)
Discussion edit
- Support the more identifiers, the better (especially as Hebrew as few identifying properties for now). Cheers, VIGNERON (talk) 18:22, 22 May 2024 (UTC)