Wikidata:Requests for permissions/Bot/SaschaBot
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 11:35, 3 April 2015 (UTC)[reply]
SaschaBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Sascha (talk • contribs • logs)
Task/s: To remove disambiguation suffixes from Wikidata labels about common names. See diff for one sample edit, and a handful edits by this bot.
Function details: For every Wikidata item that is an instance of common name, the bot goes over the labels in each language. If the label has a suffix in parentheses, it extracts the suffix, and checks it against the whitelist below. If the label suffix is not whitelisted, the bot leaves the label unchanged. Otherwise, the bot strips off the label suffix, and checks whether the item already has a description in the current language. If so, the description is left unchanged. Otherwise, if the item has no description yet in the given language, the bot takes the stripped-off suffix and adds that string as item description.
Impact: If the bot gets permission to run, it will change 1335 labels.
Motivation: I would like to experiment with algorithmic transliteration of entity names, using the transforms in Unicode CLDR. From past experience, I don't think that the actual transliteration can be done fully algorithmically because the quality would be too low. Instead, I imagine a setup where a script will generate candidate labels, and human native speakers would confirm/improve each suggested edit. However, to get there, a first step is fixing the labels of common names in Wikidata so I have a clean data set to work with. In a later step, I'd like to run a similar script on surnames.
Here is the whitelist of suffixes that the bot is stripping away. I have considered special-casing "disambiguation" etc., but believe that cleaning up the descriptions would be easier in a second swoop over the items.
Given name Naam Nahme Name Numm Nåmen Patronyme Virnumm Vorname Yutarō anthroponym apartigilo apellido cognome cognomen desambiguación desambiguação dezambiguizare disambiguasi disambiguation discretiva doorverwijspagina drengenavn eesnimi egyértelműsítő lap eiginnafn etunimi fornavn förnamn given name gmina homonymie homónimos ime imię izena jméno keresztnév kvinnenavn křestní jméno mannsnafn meno mjeno naam nafn nama nama kecil name namn navn nimi nom nombre nombre propio nome nomen nomo nomu nume nume feminin név osebno ime pigenavn prenom prenome prvé meno prénom příjmení surname voornaam žensko ime Ги שם 人名 ім'я име имя лӱм імя όνομα мъжко име значения значення осетинское имя
--Sascha (talk) 19:43, 20 March 2015 (UTC)[reply]
- Comments, anybody?--Ymblanter (talk) 09:19, 25 March 2015 (UTC)[reply]
- Unless anyone raises any concerns within the next 3 days, I would like to approve this task, it looks fine to me. Vogone (talk) 15:30, 25 March 2015 (UTC)[reply]
- Support to strip away the suffixes. To add descriptions it should be considered that not disambiguation should be added but Wikimedia disambiguation page, i.e. the label from Wikimedia disambiguation page (Q4167410). --Pasleim (talk) 18:00, 26 March 2015 (UTC)[reply]
- I think it's a good idea to strip the suffixes from labels if they happen to have been included there. Depending on the type of items, descriptions in English should be either "name", "family name", "Wikimedia disambiguation page", or for first names ("given name", "female given name", "male given name"), but not a random one. For Q2781139, it should be "male given name". It might be easier to sort these out manually. --- Jura 05:49, 27 March 2015 (UTC)[reply]