Wikidata:Property proposal/paronym

paronym

Originally proposed at Wikidata:Property proposal/Lexemes

Not done

Description	word whose spelling is almost identical to another
Data type	wikibase-lexeme-sense-invalid datatype (not in Module:i18n/datatype)
Domain	Senses
Example 1	cousin (fr) → coussin (fr)
Example 2	→
Example 3	→
Example 4	→

JackPotte (talk) 17:51, 26 April 2018 (UTC)[reply]

Discussion

Support If I understand correctly, it is not just a random letter that changes between two words (in this case, it could have been determined by a SPARQL query) but it is linked to the pronunciation too. Tubezlob (🙋) 19:11, 26 April 2018 (UTC)[reply]
Weak support ~~Weak oppose~~ I think there is no need to store this information, it should be very easily queryable (we can even be more precise and put a precise Levenshtein distance (Q496939) as a threshold). @JackPotte, Tubezlob: what do you think? Cdlt, VIGNERON (talk) 10:56, 3 May 2018 (UTC)[reply]
Yes, but as there can be false-positives into the query results (eg: current lemma inflexions), the human validated option doesn't seem absurd to me. JackPotte (talk) 12:45, 3 May 2018 (UTC)[reply]
@JackPotte: Hmm, maybe, could you imagine an example of false-positive? I guess that if the query is correctly made, there shouldn't be. Cdlt, VIGNERON (talk) 13:44, 3 May 2018 (UTC)[reply]
If you search for "parent" in French, you shouldn't get "parente" because it's the same word. JackPotte (talk) 14:02, 3 May 2018 (UTC)[reply]
@JackPotte: it depends on what you call "word" (especially as "parente"@fr could be either a noun, an adjective or a verb, 3 different lexemes). And it should be trivial in a query so filter out lemma that are forms of the same lexeme. Cdlt, VIGNERON (talk) 15:03, 3 May 2018 (UTC)[reply]

On the other hand, one concern I've got is: "will the query endpoint be able to handle expensive queries on millions of lexemes/forms/senses ?" @Lea Lacroix (WMDE): could you ask the devs for a technical point of view? For instance, "give me all homograph of XXX" should be fine (or not?) but will it be possible to ask "give me all homographs in French" ? (see similar questions that has been raised on Wikidata:Lexicographical data/Ideas of queries). VIGNERON (talk) 15:03, 3 May 2018 (UTC)[reply]
This is something we can't really answer right now. Testing will be needed after deploying the lexemes and making them queriable. Like Vigneron said, querying with restricting the language may be less expensive. Sorry for the vague answer ^^ Lea Lacroix (WMDE) (talk) 06:45, 19 May 2018 (UTC)[reply]
I agree with Tubezlob, it has more to do with pronunciation than with random spelling. A beginner must be careful when pronouncing certain words that can be confused. For example French poison/poisson, Catalan casa/caça, English ship/sheep, Spanish maya/malla. It is quite subjective and it depends on the phonetics of each language. --Vriullop (talk) 08:09, 25 May 2018 (UTC)[reply]

@JackPotte, Tubezlob, Vriullop, Micru, ArthurPSmith: for information, I just realise now that this property is similar to Wikidata:Property proposal/homograph lexeme. The overlap is not complete so we maybe need both and maybe we can define some constraints around it. Cdlt, VIGNERON (talk) 13:34, 29 May 2018 (UTC)[reply]

Comment @JackPotte, Tubezlob: is this still of interest? If so, could you update the samples to three actual ones? --- Jura 03:40, 23 April 2019 (UTC)[reply]
You can get one corpus with this research but that's a bot job. JackPotte (talk) 07:15, 23 April 2019 (UTC)[reply]
- @JackPotte: it's just that it would be good if the proposal had 3 working samples. cousin (L10083) exists now and others can be created if needed. --- Jura 04:11, 24 April 2019 (UTC)[reply]
No, the meaning of this word depends on the language. The description given does not correspond to the sense of the English word: see wikt:en:paronym. In French, paronyme implies a phonetic proximity, but not only, it also implies possible confusion. A classical example in French is conjecture/conjoncture. A property should always correspond to an objective concept (the same concept across languages), and be clear. As proposed, it's impossible, and I'm afraid it will always be impossible. It's possible only in wikis in a specific language, with the meaning usual for this language. Lmaltier (talk) 17:35, 26 April 2019 (UTC)[reply]