Wikidata:Property proposal/notable misspellings

notable misspellings edit

Originally proposed at Wikidata:Property proposal/Lexemes

   Withdrawn
Descriptionmisspelling that appear in an authorative list of misspellings (use only on forms)
Representsmisspelled word
Data typeString
Domainform
Example 1L3280-F1 → "fuscia" (incorrect for fuchsia (L3280))
Example 2L3280-F1 → "fuschia" (incorrect for fuchsia (L3280))
Example 3L36116-F1 → "abbonnemang" (incorrect for abonnemang (L36116))
Example 4L36116-F1 → "abbonemang" (incorrect for abonnemang (L36116))
See alsoWikidata:Property proposal/correct form

Motivation edit

This makes it possible to easily create e.g. a spell checker that recommends a correction.--So9q (talk) 21:23, 22 March 2020 (UTC)[reply]

We should define "notable misspelling". Here is my suggestion: the misspelling has to appear in one of the following sources:

  1. an authoritative source such as e.g. Retskrivningsordbogen (Q3398246)
  2. articles like [1] from an authoritative source in this case: Oxford University Press EL. The official global blog for Oxford University Press English Language Teaching.

#appears in one of wikipedias list of misspellings, e.g. [2]

  1. appears in WD with p31misspelling (Q1984758) e.g. Rzehakinacea (Q33188867)
  2. appears in a corpus approved explicitly by this community with an occurrence over a certain threshold. (yet to be created, decided)--So9q (talk) 18:45, 24 September 2020 (UTC)[reply]

Discussion edit

  Support I support this proposal in this form (more in the linked discussion) with the condition we have applicable definition of common misspelling. --Lexicolover (talk) 12:17, 24 March 2020 (UTC)[reply]

See discussion here: Wikidata_talk:Lexicographical_data#Common_misspellings_data--So9q (talk) 19:55, 24 March 2020 (UTC)[reply]
Lexicolover stated that they suggest only using misspellings from an authorative source.--So9q (talk) 18:49, 24 September 2020 (UTC)[reply]

  Neutral we need something to solve this problem but I'm not sure if a simple property is the simpliest solution here. A broader system for all sort of variants would be more difficult but better in the long run as correct/incorrect spelling is often not a binary situation (see "colour"/color" in English, correctness is contextual here). Cheers, VIGNERON (talk) 20:44, 25 March 2020 (UTC)[reply]

@VIGNERON: I agree that there can be situations where its more about style/culture than a clear misspelling. In that case I guess we would avoid marking it as a misspelling. Have you thought out a better way to handle the complexities of misspellings than I have proposed?--So9q (talk) 19:04, 24 September 2020 (UTC)[reply]

  Neutral: what is your definition of "common"? It sounds a bit arbitrary... Nomen ad hoc (talk) 07:30, 26 March 2020 (UTC).[reply]

@Nomen ad hoc: that point can easily be objectively defined by the frequency. If a misspelling is over a threshold, let's say 5%, then it's "common". We can use tool like Google Books Ngram Viewer to see the frequency. We can also rely on sources, dictionaries (especially the descriptivist one) often give the common misspelling. Cheers, VIGNERON (talk) 08:55, 26 March 2020 (UTC)[reply]
@Nomen ad hoc: see proposed definition above.--So9q (talk) 10:41, 26 March 2020 (UTC)[reply]
@Nomen ad hoc: How would you define a common/notable misspelling?--So9q (talk) 18:51, 24 September 2020 (UTC)[reply]

  SupportFinn Årup Nielsen (fnielsen) (talk) 11:25, 26 March 2020 (UTC)[reply]

@Fnielsen: WDYT about the definition of misspelling above?--So9q (talk) 19:04, 24 September 2020 (UTC)[reply]

I changed according to the suggestion from ChristianKl. New voting started. Please vote again below. @ChristianKl, vigneron, fnielsen, jura1, Premeditated, ainali:@Nomen ad hoc:--So9q (talk) 08:17, 17 December 2020 (UTC)[reply]

Always the same: what's your definition of "authoritative"? Nomen ad hoc (talk) 09:08, 17 December 2020 (UTC).[reply]
@ChristianKl: got any input on this? I would say "an individual or organization working professionally with dictionaries or language teaching in the language in question". WDYT?—83.250.212.226 09:57, 17 December 2020 (UTC)[reply]
My input would be that it makes sense to define the term further. ChristianKl13:18, 17 December 2020 (UTC)[reply]
  • I still prefer the inverse approach. BTW for users to see what you mean with "notable misspellings" from "authorative list": can you add corresponding references to the samples? Would autocorrects from OO qualify? --- Jura 10:26, 17 December 2020 (UTC)[reply]
  •   Comment I previously supported this proposal, but I am now uncertain where I stand. It seems to me that language is not so fixed as a structured knowledge graph can represented. I think there is a gradualness to formness. While some forms are definitely forms, there are a some things that are not written words that most would say are not forms but just plain misspelt, — and then there is those in between. In Danish, there are some forms that have official alternative forms which we can interlink with alternative form (P8530) (see, e.g., https://ordia.toolforge.org/property/P8530). I have recently added pizzeria (L348857) and there accidentally added it as pizzaria. The issue is what "pizzaria". The form is not mentioned as in the official Danish spelling, but listed in another important Danish dictionary https://ordnet.dk/ddo/ordbog?query=pizzaria The official form is pizzeria, while "pizzaria" is an "unofficial, but common form".  – The preceding unsigned comment was added by Fnielsen (talk • contribs) at 17:04, 18 December 2020 (UTC).[reply]
  •   Comment @So9q: Isn't this property completely redundant? If a form is instance of (P31) misspelling (Q1984758), then its correct form is any other form of the lexeme with the same set of grammatical features that is not itself a misspelling. If notability is defined by authoritative source, it is already covered by references on instance of (P31) misspelling (Q1984758) statement. If notability is defined by frequency, it can be inferred from suitable corpus (or perhaps someday from more general frequency statements on forms). — Robert Važan (talk) 17:51, 1 May 2021 (UTC)[reply]
@Robert Važan:Thanks for the comment. I agree on with your points. Furthermore I thought about what a misspelling really is. In my view it is intrinsically linked to 2 properties:
  1. cultural setting (what is a misspelling in one region might be accepted in another)
  2. time. Over time misspelled words can become appropriated and accepted.
These two properties increase the complexity of misspellings a lot and if we add instance of (P31) misspelling (Q1984758) on forms I think we should also add point in time (P585) (ideally we would add start time and end time, but that is probably practically impossible to determine and find references for) and indigenous to (P2341) as qualifiers. I'm guessing we will have a hard time finding good references for misspellings. People seems more interested in correctly spelled words. An inverse approach of regarding all forms without a reference to an authoritative source as a misspelling might be more fruitful. I marked the proposal as abandoned.--So9q (talk) 06:10, 2 May 2021 (UTC)[reply]