Wikidata:Property proposal/precedes word-initial

precedes word-initial

Originally proposed at Wikidata:Property proposal/Lexemes

Done: no label (P6712) (Talk and documentation)

Description	the letter that a following word starts with which determines the form of the subject lexeme
Data type	Lexeme
Domain	Form
Allowed values	Lexemes of Lexical Category letter (Q9788)
Example 1	an (L2767-F2) → a (L20817), e (L20821), i (L20825), o (L20831), u (L20837), h (L20824)
Example 2	im- (L35042-F3) → b (L20818), m (L20829), p (L20832)
Example 3	ef- (L35044-F2) → f (L20822)
Planned use	to be added to Latin prepositions and all Latin-based prefixes in any Latin-derived languages (as well as English). Also useful for articles/conjunctions that change form given the following word-initial. This apparently is also common in Polish.

Motivation

I've been doing work on Latin-based prepositions and prefixes, and it would be great to have a way of stating why they take on certain forms. Almost all of them have alternative forms that are determined by the first letter of the following lexeme (for phonaesthetic purposes). Wiktionary describes this, but there is no current property in Wikidata to be able to describe this. Liamjamesperritt (talk) 08:50, 12 January 2019 (UTC)[reply]

Discussion

Support I seem to recall Jura tried to propose something like this but nobody could understand it. This proposal makes sense to me though, thanks. ArthurPSmith (talk) 17:56, 12 January 2019 (UTC)[reply]
- oh :) That was at Wikidata:Property proposal/requires form. I still think the actual list of letters should be available. Is this the case above? If yes, how is it done reliably? --- Jura 11:55, 26 January 2019 (UTC)[reply]
  - @Jura1: That is a good idea. I've changed the proposal examples so that they refer to actual letter classes, and not phonetic classes, as phonemes can be represented by a variety of letters. If one runs a query over all instances of vowel letter (Q9398093) or consonant letter (Q3841820), one will be able to obtain a list of all Latin letters in that class. In other cases, one can just point to the item for the appropriate letter as shown above. This makes it more concise, rather than, for example, having to point to every consonant. Liamjamesperritt (talk) 12:28, 28 January 2019 (UTC)[reply]
    - @Liamjamesperritt: Wouldn't the classes vary by language? Above you link to items that aren't language specific. I don't think the lists of letters would be that long nor need much maintenance once defined.--- Jura 16:28, 28 January 2019 (UTC)[reply]
    - @Jura1: Although the items aren't language specific, the subject lexeme is language specific and can be used to qualify a query so that only the letters applicable to that language are listed. And since many languages share alphabets, I don't see anything wrong with pointing to items of specific letters, and there would be nothing stopping one from listing all those letters. However, since there are 21 consonants in the ISO Basic Latin alphabet, I do believe that would be an unnecessarily long list of statements. Liamjamesperritt (talk) 22:18, 28 January 2019 (UTC)[reply]
      - 21 isn't exactly long. Some items have hundreds of statements. Anyways, would you have working sample? Ideally with a letter that is considered differently in two languages. --- Jura 05:16, 29 January 2019 (UTC)[reply]
      - Of course many items have lots of statements, but just rarely for the same property. And since there is no consensus in the community about the inclusion of graphemes in the Lexeme space, it seems that pointing to items is the way to go. I don't have a working sample, but as long as the information about different alphabets has been encoded in the Main namespace, it shouldn't be a problem, otherwise one can just list the letters as you suggest. This way one has options. Liamjamesperritt (talk) 11:51, 29 January 2019 (UTC)[reply]
        ~~Oppose~~ I think the proposal needs some work. The samples given are currently not reliable. The earlier proposal didn't have this issue. --- Jura 14:18, 3 February 2019 (UTC)[reply]
        
        @Jura1: If you really feel that it is important to ensure specificity of each letter, I'll change the proposal's samples. I do, however, think that the letters should be Items and not Forms, since the majority of the community seems to be against the inclusion of graphemes in the Lexeme namespace. Let me know what you think given the changes, and whether this version is reliable enough. If not, what should change? Cheers. Liamjamesperritt (talk) 06:14, 4 February 2019 (UTC)[reply]
        Here one would just use letters. The samples are in English/Latin/Italian. I think that could easily work with lexeme datatype for these languages. I think most active contributors support these. --- Jura 08:30, 10 February 2019 (UTC)[reply]
Comment @Liamjamesperritt: would you mind if I change the datatype to lexeme? --- Jura 03:35, 23 April 2019 (UTC)[reply]
- @Jura1: I've made the requested change to the proposal. Defining the property this way will require Lexemes of Lexical Category letter (Q9788) to be created for every language (even when languages share letters, which will create redundancy), which I'm not sure is something that the community is agreement about, but if it means we can finally use this property, then I don't really mind. Liamjamesperritt (talk) 05:35, 23 April 2019 (UTC)[reply]
  - Looks good. --- Jura 04:23, 24 April 2019 (UTC)[reply]
Done please make good use of it. --- Jura 05:25, 29 April 2019 (UTC)[reply]