Wikidata:Property proposal/paronym

paronym edit

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
Descriptionword whose spelling is almost identical to another
Data typewikibase-lexeme-sense-invalid datatype (not in Module:i18n/datatype)
DomainSenses
Example 1cousin (fr) → coussin (fr)
Example 2
Example 3
Example 4

JackPotte (talk) 17:51, 26 April 2018 (UTC)[reply]

Discussion

  •   Support If I understand correctly, it is not just a random letter that changes between two words (in this case, it could have been determined by a SPARQL query) but it is linked to the pronunciation too. Tubezlob (🙋) 19:11, 26 April 2018 (UTC)[reply]
  •   Weak support   Weak oppose I think there is no need to store this information, it should be very easily queryable (we can even be more precise and put a precise Levenshtein distance (Q496939) as a threshold). @JackPotte, Tubezlob: what do you think? Cdlt, VIGNERON (talk) 10:56, 3 May 2018 (UTC)[reply]
    Yes, but as there can be false-positives into the query results (eg: current lemma inflexions), the human validated option doesn't seem absurd to me. JackPotte (talk) 12:45, 3 May 2018 (UTC)[reply]
    @JackPotte: Hmm, maybe, could you imagine an example of false-positive? I guess that if the query is correctly made, there shouldn't be. Cdlt, VIGNERON (talk) 13:44, 3 May 2018 (UTC)[reply]
    If you search for "parent" in French, you shouldn't get "parente" because it's the same word. JackPotte (talk) 14:02, 3 May 2018 (UTC)[reply]
    @JackPotte: it depends on what you call "word" (especially as "parente"@fr could be either a noun, an adjective or a verb, 3 different lexemes). And it should be trivial in a query so filter out lemma that are forms of the same lexeme. Cdlt, VIGNERON (talk) 15:03, 3 May 2018 (UTC)[reply]
    On the other hand, one concern I've got is: "will the query endpoint be able to handle expensive queries on millions of lexemes/forms/senses ?" @Lea Lacroix (WMDE): could you ask the devs for a technical point of view? For instance, "give me all homograph of XXX" should be fine (or not?) but will it be possible to ask "give me all homographs in French" ? (see similar questions that has been raised on Wikidata:Lexicographical data/Ideas of queries). VIGNERON (talk) 15:03, 3 May 2018 (UTC)[reply]
    This is something we can't really answer right now. Testing will be needed after deploying the lexemes and making them queriable. Like Vigneron said, querying with restricting the language may be less expensive. Sorry for the vague answer ^^ Lea Lacroix (WMDE) (talk) 06:45, 19 May 2018 (UTC)[reply]
  • I agree with Tubezlob, it has more to do with pronunciation than with random spelling. A beginner must be careful when pronouncing certain words that can be confused. For example French poison/poisson, Catalan casa/caça, English ship/sheep, Spanish maya/malla. It is quite subjective and it depends on the phonetics of each language. --Vriullop (talk) 08:09, 25 May 2018 (UTC)[reply]

@JackPotte, Tubezlob, Vriullop, Micru, ArthurPSmith: for information, I just realise now that this property is similar to Wikidata:Property proposal/homograph lexeme. The overlap is not complete so we maybe need both and maybe we can define some constraints around it. Cdlt, VIGNERON (talk) 13:34, 29 May 2018 (UTC)[reply]