Topic on User talk:Matěj Suchánek

Wikipedia disambiguation in Wikidata labels

5
Strepulah (talkcontribs)

Hi there. I came across this edit where your bot imports labels from sitelinks. It also adds the part between parentheses, used on Wikipedia for the sake of disambiguation. As you may know Wikidata has better means for that, namely the description field. Shouldn't the parenthetical disambiguation be omitted from the label? You could optionally add it to the description field though. Kind regards, ~~~~

Matěj Suchánek (talkcontribs)

Hi. You are right, I am well aware of this. There are few problems around this:

  • Some parenthetical fragments are not disambiguations.
  • The bot does this for every language. (It's impossible to make it understand every language.)
  • The parenthetical fragments are usually too short to become a useful description.

Therefore, I decided for the following tradeoff: if there is a pair of parentheses at the end of the link, I try to match it against the description in the corresponding language. If there is a match, it is removed. Otherwise it's kept.

I believe that if there is such a suboptimal label and no description in their language, it might motivate a user to fix the label AND add a new description. (However, I also saw users adding labels with parentheses manually.)

Strepulah (talkcontribs)

Thanks for your reply. First: I agree that the parenthetical fragments aren't ideal for the use as a description, so better leave that be. Second: it is true that some parenthetical fragments are not disambiguations, but I think it's save to say that most, I think like 90% or even more, are disambiguations. Is that a reasonable estimate you think? If so, wouldn't it be better to flip things around and always remove the parenthetical fragments and let users put it back into place if it turns out to be actually part of the name?


Finally: you write that "It's impossible to make it understand every language." Is it? I mean, parentheses are the same in all languages and scripts. Or am I missing something?

Matěj Suchánek (talkcontribs)

It's impossible to make it understand every language. was just a remark why I need to do such a tradeoff (in other words, why it's impossible for me to decide whether parentheses are appropriate for the label or not).

Maybe I could try this approach: if any other language is already keeping the same disambiguation, it will be kept in this label as well, otherwise I will remove it.

Strepulah (talkcontribs)

Yeah, that sounds like a plan. Keep in mind that for other namespaces like "Category" and "Template" the disambiguation parts should not be removed.