Wikidata:Requests for permissions/Bot/SaschaBot 2
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Inactive request; create another bot request if you still want to consider this task in the future. Hazard SJ 06:22, 3 February 2016 (UTC)[reply]
SaschaBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Sascha (talk • contribs • logs)
Task/s: Add missing common names to Wikidata.
Function details:
To mine for missing common names, I went over all humans in Wikidata, extracted the first part of their English label (eg., Simone de Beauvoir
→ Simone
), and matched this against a list of all common names in Wikidata. Here is the result: List of common names that seem to be missing from Wikidata.
It should be easy to mine the gender of these common names. I think this would be best done in a later, separate pass, since this could then also check the gender of existing common names in Wikidata. After that step, another bot run would create descriptions (such as Female common name
) for all common names in Wikidata that don't have descriptions yet.
Impact: If the bot gets permission to run, it would create 108636 items. If we insert missing common names only if there's at least 2 people with that name, the bot would create 30893 items. If we restrict to names with at least 3 people, it would create 18361 items.
Caveats:
If you look at the list, you will see a couple of bogus entries. Some are not in the Latin script, or contain funny characters like :
. I will make sure that these do not get inserted, but wanted to start the discussion already now. However, there are also some entries that would not be detectable by a script, such as Empress
. What should we do about those? Is there a good tool so that others could help reviewing the list? (I've made the spreadsheet world-editable on Google Docs).
--Sascha (talk) 16:28, 26 March 2015 (UTC)[reply]
- Hey Sascha. The list is at the moment a mix of first names, last names (e.g. Li), pseudonyms (e.g. Seven) and other entries (e.g. Saint, K). To set proper descriptions and for later usage it is however important that the type of name is known. Do you see a way to figure out the type automatically or should all entries be reviewed by a human? --Pasleim (talk) 17:52, 26 March 2015 (UTC)[reply]
- Notify User:Jura1 – the name expert in Wikidata --Pasleim (talk) 18:03, 26 March 2015 (UTC)[reply]
- Good idea. I had thought about doing that at some point as well, but I'm glad it's being taken up.
How about checking the names against some of the lists at Wikipedia? Special:Search/list of given names helps find some.
WikiProject Names describes how to structure the items.
To avoid problems, I usually leave out given names that are not first names (Chinese, Korean, Japanese, Hungarian).
This list provides most existing first names. --- Jura 20:34, 26 March 2015 (UTC)[reply]
BTW, I couldn't resist and created Phil (Q19685923). --- Jura 05:47, 27 March 2015 (UTC)[reply]
- @Sascha: Finally, do you plan to use the list or may I use it to create some of the missing names? --- Jura 16:36, 13 April 2015 (UTC)[reply]
- Apologies for the delay, I was traveling and just came back today. Sure, feel free to use the list. Sascha (talk) 11:36, 26 May 2015 (UTC)[reply]
- Thanks, but in the meantime I made one on quarry and outlined a "top-down" approach on WikiProject Names. --- Jura 11:44, 26 May 2015 (UTC)[reply]
- @Sascha: Hello, what's the status on this? Hazard SJ 05:35, 28 December 2015 (UTC)[reply]
- Thanks, but in the meantime I made one on quarry and outlined a "top-down" approach on WikiProject Names. --- Jura 11:44, 26 May 2015 (UTC)[reply]
- Apologies for the delay, I was traveling and just came back today. Sure, feel free to use the list. Sascha (talk) 11:36, 26 May 2015 (UTC)[reply]
- @Sascha: Finally, do you plan to use the list or may I use it to create some of the missing names? --- Jura 16:36, 13 April 2015 (UTC)[reply]