Wikidata:Bot requests/Archive/2013/12

Remove duplicate identifiers

Can somebody remove the duplicate identifiers from 308 items of Gene Ontology ID (P686). The violations can be found here:

Preferably the bot should leave one of the identifiers that has a source. --Tobias1984 (talk) 09:33, 9 December 2013 (UTC)

@Tobias1984: Most of the objects are created by a bot and don't contain any links yet, only some labels and the Gene Ontology ID (P686). But I hope there are some plans to improve the data there. The task is to remove duplicate identifiers is   Done. --Zuphilip (talk) 11:35, 26 December 2013 (UTC)
@Zuphilip: The problem is that the bot missed a lot of existing items and has created a lot of duplicates. Merging will still take some time. Thanks for removing the duplicate entries. That helps a lot with cleaning up. --Tobias1984 (talk) 13:44, 26 December 2013 (UTC)
This section was archived on a request by: Zuphilip (⧼Talkpagelinktext) 14:18, 26 December 2013 (UTC)

Replace P160 with P463

Per Wikidata:Properties for deletion, P160 (P160) is going to be deleted. I have checked usage, and it seems that in all remainting items, it can be replaced with member of (P463). Thanks. --Zolo (talk) 10:21, 10 December 2013 (UTC)

@Zolo: Can I assume that the P160-statements do not have any qualifiers or references to be copied? How about ranks and the other new stuff? --Zuphilip (talk) 16:56, 15 December 2013 (UTC)
@Zuphilip:, some have ("imported from"). I think there are relatively few so if you plan to make the move just ignoring them and fixing them by hand afterwards should be fine. No rank appears to have been added to a p160 statemement (they would be shown here). --Zolo (talk) 19:06, 15 December 2013 (UTC)
@Zolo: 895 Statements withouth references, qualifiers   Done , 40 statements with references, qualifiers (by hand)   Done --Zuphilip (talk) 22:05, 16 December 2013 (UTC) , remaining 7 entities   Done --Zuphilip (talk) 13:30, 23 December 2013 (UTC)
This section was archived on a request by: Zuphilip (⧼Talkpagelinktext) 16:58, 29 December 2013 (UTC)

Set instance of (P31)=human (Q5) on items that have instance of set to either sahabah (Q188711) or ansar (Q25144)

The list of items that are sahabah or ansar can be found here. These are all humans - many of them has the DEPRECATED main type (GND) person - but their instance of is set to sahabah or ansar which are just names of groups of people who were companions of the Islamic prophet Muhammad.

I guess you want different things for these 187 objects:
  1. Add instance of (P31) human (Q5)
  2. Delete P107 (P107)
  3. Move instance of (P31) companions of the Prophet (Q188711) to member of (P463) companions of the Prophet (Q188711) ?
  4. Move instance of (P31) Ansar (Q425144) to member of (P463) Ansar (Q425144) ?
A bot is already doing the replacement (1.+2.) over all. I am not sure if property P463 is appropriate for 3. and 4. Suggestions? --Zuphilip (talk) 12:21, 6 December 2013 (UTC)
Well, for the replacement (1 + 2), some of those items don't really have the P107 (P107) property set. As for 3 and 4, I'm totally open for suggestions here. --Ahmed Sobhi (talk) 22:00, 6 December 2013 (UTC)
Exactly 5 objects don't have P107 (P107) and all of them already have instance of (P31) human (Q5). --Zuphilip (talk) 23:25, 6 December 2013 (UTC)
@Emw: I have seen that you are changing statements of the form "instance of (P31) companions of the Prophet (Q188711)" to "occupation (P106) companions of the Prophet (Q188711)", for example Q26918. Why did you choose this property and especially why it is okay to create a new constraint violation at Property_talk:P106, value type occupation (Q13516667)? --Zuphilip (talk) 21:17, 1 January 2014 (UTC)
Zuphilip, I chose occupation (P106) instead of member of (P463) because after reading the articles Sahabah and Ansar it seemed plausible to consider those terms as occupations, and occupations are probably the most common P31 values for humans other than Q5. I have no strong opinion on whether P106 or P463 is used as the replacement for P31 'Sahabah' and 'Ansar' claims, just as long as they are moved to another property and replaced by Q5. Emw (talk) 23:23, 1 January 2014 (UTC)
@Emw: How about part of (P361)? This would not give us any constraint violations and seems okay for me to state a single person is part of a larger group of persons. --Zuphilip (talk) 12:32, 2 January 2014 (UTC)
Zuphilip, that seems reasonable. member of (P463) is a subproperty of part of (P361). Emw (talk) 13:05, 2 January 2014 (UTC)

Task is   Done with property part of (P361) as discussed above. --Zuphilip (talk) 20:12, 2 January 2014 (UTC)

This section was archived on a request by: Zuphilip (⧼Talkpagelinktext) 11:16, 9 January 2014 (UTC)

Such as Q15301256, Q15301257 which are created by me, but there're still a lot of page in project and template namespace with sitelink in source code.--GZWDer (talk) 05:55, 12 December 2013 (UTC)

This section was archived on a request by: GZWDer (⧼Talkpagelinktext) 12:32, 8 March 2014 (UTC)

Add city and town values to instance of based on categories

MusicBrainz generates their list of Areas based on the presence of "instance of" "town" or "city" on items -- it'd be great to have those properties fully populated from the appropriate enwiki categories, like: w:en:Category:Towns_in_Connecticut. I know Legobot can do that, but it seems to be down currently. JesseW (talk) 06:27, 26 November 2013 (UTC)

Please note that city (Q515) and town (Q3957) are very ambiguous terms (see this discussion). I recommend adding town in the United States (Q15127012), city in the United States (Q1093829) etc. instead. --Pasleim (talk) 09:12, 26 November 2013 (UTC)
Fully agree! -- Lavallen (talk) 09:15, 26 November 2013 (UTC)
Switching away from city (Q515) and town (Q3957) will require coordination on the Musicbrainz side, so the new ones town in the United States (Q15127012), city in the United States (Q1093829) can't be used quite yet. Would anyone be willing to help with the original request? JesseW (talk) 07:40, 4 December 2013 (UTC)
I've now filed a bug report at Musicbrainz, so at least they are alerted: http://tickets.musicbrainz.org/browse/MBS-7050 . JesseW (talk) 07:48, 4 December 2013 (UTC)

Commonscat cleaning

Former and current regrettable neglecting of Wikimedia Commons in phase I of Wikidata causes the consistency more difficult to achieve. Because any fully-fledged solution is not foreseeable yet, we should work on comparation and completion of provisional interproject links.

The Commons category (P373) property should be cleaned:

  • if the linked Commons category is a redirect (including soft redirects), P373 should be changed to the target category.
  • if the linked Commons category doesn't exist, a bot should search for an alternative commonscat link in the linked Wikipedia articles, or search the target category name in the deletion log of the category.

For every item which contains Commons category (P373), the item number should be exported to the Commons category page as the {{On Wikidata|Qnumber}} template.

For every Commons category containing interwiki links, the category name should be added as Commons category (P373) into corresponding Wikidata items. --ŠJů (talk) 23:31, 8 September 2013 (UTC)

  Oppose to remving redirects as they can be quite different in sense. Infovarius (talk) 19:19, 6 November 2013 (UTC)
Thus, do you preffer to keep thousands invalid links without any maintenance? --ŠJů (talk) 17:18, 19 November 2013 (UTC)
This should be done manually because sometimes link to redirect is correct while link to its target would be wrong. --Infovarius (talk) 13:12, 22 November 2013 (UTC)
Are you prepared to monitor and correct manually thousands of links which are dead due to renaming of the target category? The basic problem is that category renames are not automatized and don't involve fixing of incomming links, interwikis etc. If none of bots take care of it, the links will be unsustainable. --ŠJů (talk) 17:19, 23 December 2013 (UTC)

Note that instance of (P31)=person (Q215627) appears more than 10000 times.--GZWDer (talk) 12:15, 21 October 2013 (UTC)

Did you discuss this anywhere? At Property talk:P31 the following example is given: Grace Hopper, a computer scientist, instance of person. -- Bene* talk 22:15, 21 October 2013 (UTC)
instance of (P31)=human (Q5) was decided as part of the migration from P107. User:Michiel1972
Thanks for noticing the outdated example, Bene. I was the one who put up the Grace Hopper example and one of the participants in the discussion about using 'instance of human' instead of 'instance of person'. I've updated that P31 example (diff). Emw (talk) 00:50, 22 October 2013 (UTC)

56.500 claims --Zuphilip (talk) 17:10, 9 November 2013 (UTC) I guess someone is doing this (or maybe just as a side effect of an other task?) 11.500 claims --Zuphilip (talk) 12:01, 6 December 2013 (UTC)

"Born in year" lists are not persons

The (several thousand) articles from it:Categoria:Liste di nati nell'anno apparently all are tagged as P107 (P107) person (Q215627) while they should be instance of (P31) Wikimedia list article (Q13406463) (probably plus is a list of (P360) human (Q5), or do we use person (Q215627) here?), so something completely different the usual P107 migration jobs won't handle correctly. It would be good if a bot could take care of this special issue. --YMS (talk) 10:19, 2 December 2013 (UTC) PS: The category only lists 1943 pages, while a search for "Nati nel" here on Wikidata results in 3590 items. I didn't check where this difference comes from. --YMS (talk) 10:21, 2 December 2013 (UTC)

These "born in year" lists seems to be only linked to itwiki. There exists also categories like it:Categoria:Nati_nel_1844, which is linked in many languages. Moreover, we will add for every person a statement about its birthday. Is it still worth to add these "born in year" list in wikidata? --Zuphilip (talk) 11:17, 4 December 2013 (UTC)
Yes, all articles should have an item here. But what properties they should have is a trickier question. -- Lavallen (talk) 12:35, 4 December 2013 (UTC)
But in any case they should be treated as persons. A bot that just removes those statements without any replacement would still be better than nothing. That said, of course it would be even better if the bot could already put the proper statements. --YMS (talk) 12:42, 4 December 2013 (UTC)
Okay, I believe you. - There are 13 objects with is a list of (P360) human (Q5) aggainst 63757 objects with is a list of (P360) person (Q215627). Thus, it seems that the second choice is more often used. I tried to make a nice example by hand: Q3872215, but maybe that is to much to ask a bot to do... Btw if you search "Nati nel" in wikidata you will also receive category pages https://www.wikidata.org/w/index.php?title=Special:Search&limit=500&offset=3000&redirs=1&profile=default&search=Nati+nel --Zuphilip (talk) 18:39, 4 December 2013 (UTC)
Ah yes, of course. --YMS (talk) 18:50, 4 December 2013 (UTC)
If births in 2000 (Q3872215) is correct, I can do it via bot. --ValterVB (talk) 19:28, 4 December 2013 (UTC)
Please do it as list of "human", not list of "person". --Izno (talk) 02:29, 5 December 2013 (UTC)
@Izno: Should we then also change the 63757 objects with is a list of (P360) person (Q215627) to human (Q5)? --Zuphilip (talk) 23:35, 6 December 2013 (UTC)
I don't see a problem with that. --Izno (talk) 00:27, 7 December 2013 (UTC)
@Izno: For example, to have both: births in 2000 (Q3872215) is a list of (P360) human (Q5) and deaths in 2011 (Q5140) is a list of (P360) person (Q215627) looks pretty inconsistent for me. Or is there a condition when to use person and when to use human? --Zuphilip (talk) 10:12, 7 December 2013 (UTC)
Please read the conversation about classifying items like Coco Chanel if you haven't already. In brief, for reasons described at the linked discussion, 'human' is preferable to 'person' whenever classifying items like Coco Chanel. deaths in 2011 (Q5140) is a list of (P360) person (Q215627) should be changed to deaths in 2011 (Q5140) is a list of (P360) human (Q5), unless the list contains things that are not human. Emw (talk) 02:26, 9 December 2013 (UTC)
This may lead to situations like "the list for 2011 is a list of humans, while the list for 2010 is a list of persons, as it also includes some fictional characters or something the like". I'm okay with this, as it seems to be a correct way to do, but this might cause quite some maintenance effort and possible edit wars (a list only includes humans in language X, but also other persons in language Y, so without carefully reviwing the other language everybody thinks he's right with human OR person). --YMS (talk) 09:45, 9 December 2013 (UTC)

Chemical identifiers extraction

In order to extract data from open databases we need to identify the Q items in wikidata about chemical with a well known identifier. That's the purpose of ChemID Initiative. But the first step is to collect data from wikipedia before starting any data import.

The idea is to extract fromt the different chemboxes of the different wikipedia some unique identifiers and to assess these data in order to identify clearly each Q item about chemical in Wikidata.
To start this I will need the next lists:

  • From en:WP
    for each article with the Chembox template, extract the Q-number of the item associated with the article, the name of the article, the CASNo and the PubChem entries of the template.
  • From de:WP
    for each article with the Infobox Chemikalie template, extract the Q-number of the item associated with the article, the name of the article, the CAS and the PubChem entries of the template.
  • From fr:WP
    for each article with the Infobox Chimie template, extract the Q-number of the item associated with the article, the name of the article, the CAS and the PubChem entries of the template.

Be careful: in some cases each entry can have different values separated by comma or </br>. For the french articles, data can be enclosed in templates like {{CID|xxx}} or {{CAS|x|x|x|x|x|x|x|x}}. Extract the whole format and a second extraction/conversion will be performed on those raw data in order to obtain the final value in right format.

A manual check of these lists will be organized and once each item about chemical receives its corresponding identifiers, a data import from free databases will be performed. Thanks for your help. Snipre (talk) 19:31, 5 December 2013 (UTC)

Please check with the DBPedia team, which are extracting chemical identifiers already. Egon Willighagen (talk) 22:55, 6 December 2013 (UTC)
But I don't think they extracted the Q number of the associated wikidata item. Snipre (talk) 09:34, 7 December 2013 (UTC)

Create items for Wikipedia articles

Many Wikipedia articles are currently not linked to Wikidata, so new items should be created. Some of them may be duplicates, but that seems to be a small minority, and they are easier to spot if they are in Wikidata anyway. --Zolo (talk) 08:55, 11 December 2013 (UTC)

is there a specific Wikipedia where more items are missing then in other Wikipedias? --Bene* talk 14:48, 11 December 2013 (UTC)
Some time ago many articles with sv:Template:Insjöfakta Sverige did not have any item here. -- Lavallen (talk) 15:24, 11 December 2013 (UTC)
The English Wikipedia, obviously.--Ymblanter (talk) 20:13, 11 December 2013 (UTC)
@Zolo:: You stated: „Some of them may be duplicates, but that seems to be a small minority, and they are easier to spot if they are in Wikidata anyway“ (my emphasis). Is this knowledge or only a guess? How useful are new created items without a single (useable) statement? --Succu (talk) 20:39, 11 December 2013 (UTC)
For the languages I have checked in the topics I have browsed: more missing items in Chinese and English than in French and German.
@Succu: yes, this is only a guess based on a non-random experience. To state things more rigorously, I have seen rather many Wikipedia articles that did not have any link to Wikipedia, and almost all of those I checked could not be linked to any existing item. Once it is in Wikidata, an item gets a chance to get statements. Actually, what prompted me to make this request is user:Magnus Manske's new "Widar" tool that allows to add statements based on Wikipedia category but leaves out articles that are not in Wikipedia (of course in this case, another solution would be to add an option for creating items through the tool ;). --Zolo (talk) 21:47, 11 December 2013 (UTC)
Short-term, I can offer Swedish people on en.wp without Wikidata item. --Magnus Manske (talk) 22:45, 11 December 2013 (UTC)
A lot of new items were created just after the start of Wikidata, but afterwards we didn't setup anything to keep importing new articles.
I wrote added a bot to Pywikibot to make it easy to create new items. I do a query to find articles without an item and use this as input for the bot.
The bot doesn't touch an article if it's created to recently (default: 3 weeks) and if it has been edit recently (default: 7 days). This prevents new items from being created when people might be still working on it.
I'm thinking about setting up a shared account (on Toollabs) with this bot in it. It could work on every Wikipedia, but we should probably start with a smaller subset.
What do you think? Is this a good idea? Who wants to help to setup this bot and maintain it? Multichill (talk) 15:58, 8 March 2014 (UTC)
It's a good idea. I hope you find a bot co-maintainer. :) Wikidata would need a tenfold increase of bot runners it seems, too much work on the table! --Nemo 17:42, 25 April 2014 (UTC)
My bot is doing this taks for two months in for first twenty languages of Wikipedia and Wikisource and all langugaes of Wikiquote Amir (talk) 20:07, 25 April 2014 (UTC)
@Ladsgroup: I see that about 3000 pages from the French Wikiquote are still missing. Many of them correspond to anexisting Wikidata item. I think that if the title matches a Wikipedia sitelink it is safe to add them to the item. Could you see about it ? For added security, it could also check that the page is linked from the "wikiquote"or the "q" parameter of the "Autre project" template, but it may not be necessary as it may cause more false negatives than it would avoid false positives--Zolo (talk) 16:55, 20 July 2014 (UTC)
@Zolo: Hi, Can you give me some examples? Thank you Amir (talk) 11:06, 23 July 2014 (UTC)
@Ladsgroup:. The first four pages I get with the item creator tool for fr.wikiquote's category "Citations:Racine". The are wikiquote:fr:Architecture, wikiquote:fr:Crime, fr:wikiquote:Culture de l'Europe and wikiquote:fr:Empereur. They correspond to the fr.wikipedia article with the same title (fr:Architecture, fr:Crime, fr:Culture de l'Europe, fr:Empereur). They are all linked from Wikipedia using {{Autres projets|wikiquote=}} or {{Autres projets|q=}}, but actually it seems the fr.wikiquote page has virtually always the same title as the corresponding fr.wikipedia page. --Zolo (talk) 12:35, 23 July 2014 (UTC)
@Zolo: Oh I see it now. I usually use the way that the bot went through all of Wikiquote pages in certain namespace(s) and if it could get a correspondent article in Wikipedia, it adds it to the related item (after some obvious checks) but the way that my bot goes through a template in Wikipedia and adds the related pages in wikidata is a great idea. I will work on it today Amir (talk) 12:53, 23 July 2014 (UTC)

@Zolo: I started the bot, It'll finish in the next week and It already added your examples (I give them to the bot for test) and some other an example Amir (talk) 14:14, 23 July 2014 (UTC)

This section was archived on a request by: GZWDer (talk) 07:02, 5 October 2014 (UTC)