Wikidata talk:Lexicographical data/Archive/2021/11

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

space as a lexeme

Latest comment: 2 years ago4 comments3 people in discussion

Should I create space as a spacebar? I wanted to use it in the following statement:

schwarz (L181224): series ordinal (P1545) → 1
[space]: series ordinal (P1545) → 2
Loch (L303361): series ordinal (P1545) → 3

As we do it for interfixes. Is it a good idea? I don't know. If yes: which language is it? should every language have it's own space? some languages have different spaces with a different meaning--Shisma (talk) 07:42, 30 October 2021 (UTC)

@Shisma: I do not think having a lexeme for 'space', as used in most languages, is a good idea. Compared to infixes (which are much more likely to be heard and understood to provide meaning in speech), the contribution of a space to meaning beyond 'separates words' is quite minimal, and thus it feels to me like a lot of extra bloat on lexemes for no particular reason. In a prior section on this page (I think), I floated the idea of having decimal P1545 values for different parts of a single word in a multi-part lexeme, and at least in my view that would be a less disruptive convention (the example below for to a fare-thee-well (L201340)):

combines lexemes (P5238)

to (L248738)

edit

series ordinal (P1545)

1

0 references

add reference

a (L2767)

edit

series ordinal (P1545)

2

0 references

add reference

fare (L16725)

edit

series ordinal (P1545)

3.1

0 references

add reference

thou (L18745)

edit

series ordinal (P1545)

3.2

0 references

add reference

well (L3219)

edit

series ordinal (P1545)

3.3

0 references

add reference

add value

(One exception to this I might--although incredibly marginally--consider useful to have is the ideographic space (as used in CJK scripts to offset the names of the highest among individuals). Mahir256 (talk) 12:18, 30 October 2021 (UTC)

@Shisma: +1 with Mahir256 a lexeme for space seems odd (and if we create one space, why not all of them? there is at least twenty different space). But we need one way to indicate this information, especially in French where a lot of hyphens were removed in 1990 (see orthographic corrections of French in 1990 (Q486561)) and in English where variants with and without hyphen/space exist. Here is an idea :

combines lexemes (P5238)

schwarz (L181224)

edit

series ordinal (P1545)

1

subject form (P5830)

schwarzes

0 references

add reference

no value

edit

series ordinal (P1545)

2

some property (object has role (P3831) ?)

space (Q380933)

0 references

add reference

Loch (L303361)

edit

series ordinal (P1545)

3

0 references

add reference

add value

What do you think?

Cheers, VIGNERON (talk) 17:37, 13 November 2021 (UTC)

@VIGNERON, Shisma: I am still not entirely certain that having separate statements for word boundaries is the most efficient way to do it. To elaborate on my initial reply from 30 October, I've provided an example within that reply (see the altered P1545 values), which I continue to contend is less intrusive of an indication than an entirely new statement. (The actual P1545 format to indicate a word vs. hyphenated-component boundary could be settled on differently so that unambiguous vs. ambiguous boundary statements might be differently marked.) Mahir256 (talk) 17:50, 13 November 2021 (UTC)

Is ‚abbreviation‘ a lexical category?

Latest comment: 2 years ago7 comments4 people in discussion

Or should it be a noun or verb depending on what the abbreviation stands for? -Shisma (talk) 10:37, 2 November 2021 (UTC)

Shouldn't it be a form on the unabbreviated lexeme? ArthurPSmith (talk) 17:11, 2 November 2021 (UTC)
The Duden does not agree: UN/United Nations – Shisma (talk) 09:51, 13 November 2021 (UTC)

also, the abbreviated lexeme might have different properties like derived from lexeme (P5191) or grammatical gender (P5185). this is not what forms are for --Shisma (talk) 11:04, 13 November 2021 (UTC)

@Shisma: Could you provide an example where P5191/P5185 would be different for the abbreviation vs. for its unabbreviated form? (The judgment of Duden needn't be entirely authoritative on particular German lexemes, no?) Mahir256 (talk) 17:52, 13 November 2021 (UTC)

I can say they put a lot of thought into what constitutes a separate lexeme and what is merely a form. According to Duden, UN is a proper noun (Q147276) without gender and United Nations is a plurale tantum (Q138246) of feminine gender . I'd guess UN is derived from United Nations and United Nations… well isn't 🤣 – Shisma (talk) 18:25, 13 November 2021 (UTC)

@Shisma, ArthurPSmith: hmm, tough question. I don't think "abbreviation" is a lexical category; but it probably should be a data stored somewhere and a separate lexeme is porbably a good idea. That said, it depends on the abbreviation, wether it's lexicalised or not. For instance "laser" is obviously lexicalised and clearly a lexeme on it's own (most people probably forgot it's an abbreviation). UN is probably lexicalised in most languages (at least in German, per the Duden and in French per the TLFi - TLFi who doesn't have the full word BTW - and English per Cambridge, Oxford and Merriam-Webster dictionaries) but it needs to be attested by references. Cheers, VIGNERON (talk) 17:10, 13 November 2021 (UTC)

Automatically add translations?

Latest comment: 2 years ago2 comments2 people in discussion

Hi,

Do you think it would be a good idea to automatically add translations to each sense?

based on item for this sense (P5137): every sense that links to the same entity with the property item for this sense (P5137) OR
based on existing translations: If pain (L13072) (S1) is a translation of pano (L290180) (S1): then pano (L290180) (S1) should be a translation of pain (L13072) (S1).

I expected there should be a tool for that but there isn't (?). Is it a good idea? Should I build it? --Shisma (talk) 16:12, 26 November 2021 (UTC)

@Shisma: The latter might be a good thing, as long as it doesn't start causing people to turn the set of sense links "A<->B<->C<->D<->E" into "A<->E"; the former will just end up adding bloat to lexemes and so I would advise against it. Mahir256 (talk) 17:27, 26 November 2021 (UTC)

Is there a property for lexemes that sound identical, but have different meanings and spelling

Latest comment: 2 years ago12 comments4 people in discussion

Like Laib (L493284) and Leib (L613407) --Shisma (talk) 09:17, 27 November 2021 (UTC)

(I remember @Quiddity: being interested in this question.) To my knowledge, @Shisma:, no such property exists, although you are always welcome to propose it. Mahir256 (talk) 16:28, 27 November 2021 (UTC)
whenever a lexeme related property doesn't exist, I assume it does not exist for a reason – Shisma (talk) 16:44, 27 November 2021 (UTC)
LOL --- Jura 16:49, 27 November 2021 (UTC)
@Shisma: I don't believe we have such a thing. We do have homograph lexeme (P5402) for lexemes that are spelled the same but have different meanings; so a "homophone lexeme" property for ones that sound the same would make sense to me. ArthurPSmith (talk) 16:34, 27 November 2021 (UTC)
See also some of the discussion on Wikidata:Property proposal/homonym. ArthurPSmith (talk) 16:36, 27 November 2021 (UTC)
- @ArthurPSmith, Jura1, Mahir256, Quiddity: Wikidata:Property proposal/homophone lexeme. Feel free to change it and to add better examples --Shisma (talk) 16:52, 27 November 2021 (UTC)
  It should probably go to specific forms. --- Jura 16:55, 27 November 2021 (UTC)
  but then again: why is it homograph lexeme (P5402) rather than homograph form? – Shisma (talk) 17:00, 27 November 2021 (UTC)
  Back when it was proposed, I don't think forms as values were possible. --- Jura 17:05, 27 November 2021 (UTC)
  also, some lexemes have many forms that are identical in spelling. technically, all these would be homograph forms. So it should probably be homograph form of a different lexeme 🤷 – Shisma (talk) 17:17, 27 November 2021 (UTC)
  For human contributors, it's much easier when there is a way to figure out which page to check too. --- Jura 17:19, 27 November 2021 (UTC)