User talk:Ivan A. Krestinin/To merge

Latest comment: 7 years ago by Wintik in topic IsRedirect check

Disambiguation pages

edit

Possible improvement: there would be less false matches if you could discard items with instance of (P31)=Wikimedia disambiguation page (Q4167410)      when the other item has instance of (P31)= "something that is not Wikimedia disambiguation page (Q4167410)      ". See the false match of Air Mali (Q407498)      and Air Mali (Q1199298)      for example. thanks - LaddΩ chat ;) 12:56, 6 May 2014 (UTC)Reply

page size

edit

Hi Ivan, this is a brilliant resource. However, the page it too large at the moment (almost 1MB). It would be easier if it was split into separate subpages, say: part 1, dewiki, eswiki, itwiki, etc. --Marcol-it (talk) 10:27, 7 May 2014 (UTC)Reply

cswiki

edit

Hello, could you add more languages? sk-cs, sk-en, cs-en... ? JAn Dudík (talk) 06:05, 12 May 2014 (UTC)Reply

jawiki #

edit

Hello, jawiki (zh, ko) articles # are about numbers, not years, so you can exclude them. JAn Dudík (talk) 19:54, 12 May 2014 (UTC)Reply

Exclude items already fully linked

edit

Hi Ivan, the merge of city of Japan (Q494721)      and list of cities in Japan (Q735757)      is proposed on fr-en ("rating 37"). Both items have links to Wikipedia articles in both fr and en - merging is unnecessary and inappropriate. Could you exclude such item pairs? thanks - LaddΩ chat ;) 00:19, 14 May 2014 (UTC)Reply

Exclude disambiguation items with mismatching labels

edit

Hi, another suggestion. To_merge/frwiki#enwiki has section "'# (homonymie)' — '# (disambiguation)' (rating 43)", that proposes a number of merge of disambiguation items with other disambiguation items, like Morocco (Q488261)      and Maroc (Q3294558)      ; however, according to Wikidata:Disambiguation pages task force, Wikimedia disambiguation page (Q4167410) items should only link to WP pages with identical labels ("The item should only contain links to Wikipedia disambiguation pages with the exact same spelling..."). "Morrocco" and "Maroc" must remain distinct, even if in English/French they represent the same country. There would be less false positives if you could exclude such item pairs. Thanks - LaddΩ chat ;) 01:26, 14 May 2014 (UTC)Reply

nl?

edit

Could you include nlwiki with your merge candidates? Would be nice :) Lymantria (talk) 05:23, 14 May 2014 (UTC)Reply

Checked items

edit

Hi Ivan, where should we note if items are checked and the result is "don't merge"?

Example: 1919 in broadcasting (Q346668) and 1919 in radio (Q16831988)
#1: broadcasting (radio and television), #2: only radio

--Kolja21 (talk) 15:58, 17 May 2014 (UTC)Reply

Hi, punctual response to your answer is nowhere currently. But the situation is interesting. Bot uses existing items to find disconnected items. "rating" is number of existing connections. For example for your case: there are 95 items like 1986 in radio (Q1299002), 1987 in radio (Q926278), 1988 in radio (Q923214) where #1 and #2 are connected. So to remove discussed pair from list we need split 95 items or merge 1 item... — Ivan A. Krestinin (talk) 16:33, 17 May 2014 (UTC)Reply
Is it possible to use a list like User:Pasleim/whitelist to filter out false positives on the list here? There is probably overlap between items listed there and those found by your bot. Rigadoun (talk) 03:34, 17 July 2014 (UTC)Reply
  Done, bot uses same page as whitelist now. — Ivan A. Krestinin (talk) 09:42, 19 July 2014 (UTC)Reply

svwiki

edit

... is my wish. Would it be possible?

Also, I will link the lists from Help:Merge to make more users find them. Matěj Suchánek (talk) 12:09, 18 May 2014 (UTC)Reply

Updating

edit

Hello, there is problem when #1 and #2 are about smae item in linked languages, but other links in #2 are about anything else. When I move link to correct item, in next update is this item not deleted from list

example for cs-sk:

Item 1
  • cs:1999, en:1999
Item 2
  • en:1998, es:1998, sk:1999

Luxembourg at the 1992 Winter Olympics (Q144061) - Luxembourg at the 1992 Summer Olympics (Q144968) is one of these situation. JAn Dudík (talk) 05:51, 28 May 2014 (UTC)Reply

Indeed. 2012 Guinea-Bissau coup d'état (Q621827) and United Nations Security Council Resolution 2048 (Q15718627) is another example. Kind regards, Lymantria (talk) 09:09, 28 May 2014 (UTC)Reply
Bot uses dumps. New dumps appear 2 times per month. Bot removes deleted items in cross-dump period only. — Ivan A. Krestinin (talk) 20:33, 4 June 2014 (UTC)Reply
No problem, Your bot performs a great job! Lymantria (talk) 12:58, 10 June 2014 (UTC)Reply

User:Ivan A. Krestinin/To merge/enwikisource false positives

edit

Category: YYYY works and Category: YYYY are, in general, not the same thing. E.g. take Category:1307 works (Q8090261), Category:1307 (Q6583731); you can see that w: has a page for both of those, suggesting that they aren't the same category. It Is Me Here t / c 18:11, 4 June 2014 (UTC)Reply

dawiki

edit

Would it be possible add dawiki? --Steenth (talk) 12:31, 10 June 2014 (UTC)Reply

Suggestion: nationality adjectives

edit

Perhaps it is an idea to let a bot search for merge candidates involving nationality adjectives in order to find couples like nl:Categorie:Thais saxofonist and en:Category:Thai saxophonists? Lymantria (talk) 06:56, 17 June 2014 (UTC)Reply

Good idea, but it is not simple for implementation... — Ivan A. Krestinin (talk) 18:24, 17 June 2014 (UTC)Reply
@User:Lymantria: There have been some lists with countries by User:Byrial, but unfortunately he has been inactive for almost a year now. - FakirNL (talk) 12:24, 4 August 2014 (UTC)Reply

False positives for Category:XYZ and XYZ

edit

Hello,

Q200794 (en:Eurovision Song Contest 1961) - Q9014598 (en:Category:Eurovision Song Contest 1961) made it to the list. There's probably a way to avoid such false positives. Place Clichy (talk) 11:06, 29 July 2014 (UTC)Reply

This pair is listed correctly, one of them have mixed main [1] [2] and category namespace. And as Ivan wrote above, such links are removed only when is available new dump. JAn Dudík (talk) 11:41, 29 July 2014 (UTC)Reply
OK then, I had not seen this past edit. Place Clichy (talk) 17:41, 29 July 2014 (UTC)Reply

elwiki done

edit

@FakirNL: Hello,

I believe I have corrected everything at elwiki, and added the few false positives to User:Pasleim/whitelist. You may clean up this page, or run another dump. Place Clichy (talk) 16:59, 30 July 2014 (UTC)Reply

Good and thanks! Question to Ivan, does Pasleims whitelist have any effect on your merge-project? Other thing I realize now, maybe my timing wasn't ideal, suggesting this project to several users just as the operator seems to be less active (in his Q160169 maybe?) :-) - FakirNL (talk) 17:12, 30 July 2014 (UTC)Reply
@FakirNL: #Checked items, User talk:Magnus Manske#Merge game. He also stated that he would be more active in two weeks. Matěj Suchánek (talk) 17:35, 30 July 2014 (UTC)Reply
Thanks Matěj! - FakirNL (talk) 17:40, 30 July 2014 (UTC)Reply
edit

Sometimes your project gives false positives and that could have to do with other false links; especially sometimes when it's based on only 3 of 4 links. How hard would it be have a button "based upon similar items here" above every caption? Just another idea. - FakirNL (talk) 22:20, 7 August 2014 (UTC)Reply

  DoneIvan A. Krestinin (talk) 20:32, 9 August 2014 (UTC)Reply
Hi User:Ivan A. Krestinin, could you, in your next run, perhaps list slightly more of these "based upon"-links? How about 6 or 10 instead of 3? If a false positive is based upon 4 or 5 links, it's sometimes hard to find the fourth and fifth though it needs to be changed. - FakirNL (talk) 19:01, 21 September 2014 (UTC)Reply
Hello, the reports are too large already... Is this really often situation? — Ivan A. Krestinin (talk) 19:15, 21 September 2014 (UTC)Reply
Well, the reports are getting smaller every time! Polish had a lot of false positives and was decreased from over 1 Mb to 105 kb last time. The last run French was the largest but I decreased it from 378 kb to 147 kb and I believe now Russian is the largest at 167 kb but it has a lot of items fixed so it will be much smaller next time. Let's say 6 items (not 10) as workable compromise? - FakirNL (talk) 20:05, 21 September 2014 (UTC)Reply
I would like to repeat my request to give six "based on"s instead of three and I'll explain why. If there is a false positive based on five items, only three are listed. Now if you handle those three, two false positives remain and the next run the item will not be listed though two false are still around and they might be hard to find. If six items are listed, it's much easier to catch all those false positives in one fix. So, if possible, based on six items would be an improvement. - FakirNL (talk) 09:57, 31 October 2014 (UTC)Reply
  Done, increased to 6 items. ~3 hours are needed for the reports update. — Ivan A. Krestinin (talk) 19:04, 31 October 2014 (UTC)Reply

Wikinews

edit

please, add wikinews to merging process. JAn Dudík (talk) 05:20, 21 August 2014 (UTC)Reply

  Done, bot will create the reports in 4-5 hours. If crash will not happen :-). — Ivan A. Krestinin (talk) 17:57, 21 August 2014 (UTC)Reply
There is no any wikinews links in the latest wikidata dump. We need wait new dump. — Ivan A. Krestinin (talk) 18:47, 21 August 2014 (UTC)Reply

Commons

edit

Would it be possible add commons to merging process? --Steenth (talk) 08:04, 21 August 2014 (UTC)Reply

A request

edit

Would you please add guwiki, mrwiki, newiki, newwiki, sawiki, bhwiki, maiwiki and piwiki? I will try to cover all of them.--Vyom25 (talk) 17:49, 3 May 2015 (UTC)Reply

  • Hello, bot process these wikies, but find nothing to merge. Bot will create report if find something. Also please note, bot uses existing interwiki links to find additional unlinked pairs. — Ivan A. Krestinin (talk) 19:34, 3 May 2015 (UTC)Reply
Okay... Thank you.--Vyom25 (talk) 06:19, 4 May 2015 (UTC)Reply

Hello, can you generate reports for tawiki, mlwiki and knwiki? thanks in advance.--Vyom25 (talk) 07:17, 11 October 2015 (UTC)Reply

A blacklist for false positives?

edit

Hi Ivan, I was made aware of some false positives on your otherwise very useful lists: User:Ivan A. Krestinin/To merge/frwiki#enwiki, then all items in section Catégorie:Rameur (aviron) aux Jeux olympiques d'été de # — Category:Rowers at the # Summer Olympics. Do you provide a way to exclude these items via a blacklist? Otherwise Wikidata users tend to merge them, although they should not be merged. Thanks and regards, MisterSynergy (talk) 21:01, 15 December 2015 (UTC)Reply

@MisterSynergy: See #Checked items. Matěj Suchánek (talk) 14:03, 16 December 2015 (UTC)Reply
Thanks, I’ve just added these cases to the whitelist. —MisterSynergy (talk) 15:42, 16 December 2015 (UTC)Reply

IsRedirect check

edit

Since updating occurs only once a week I would propose to add IsRedirect check info to the list (like 3rd column here). --Wintik (talk) 18:24, 23 August 2016 (UTC)Reply

If you want to see whether a page is already a redirect, add this to your stylesheet. Matěj Suchánek (talk) 18:45, 23 August 2016 (UTC)Reply
Thanks, good for me, nice point to create my own common.css --Wintik (talk) 19:20, 23 August 2016 (UTC)Reply
Return to the user page of "Ivan A. Krestinin/To merge".