On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at User talk:Ivan A. Krestinin/Archive.

Constraints not updating?Edit

Hi - I've been watching Wikidata:Database reports/Constraint violations/P356 and it updated 3 times in early January, but hasn't done so for over a week now. Something blocking the bot? ArthurPSmith (talk) 20:33, 22 January 2020 (UTC)

Hi Arthur, I fixed two issues. Update frequency will increased as I hope. — Ivan A. Krestinin (talk) 20:32, 1 February 2020 (UTC)
Out of curiosity, what would it take to run these updates daily? Is the bottleneck computing power, or something else? Thanks. Mike Peel (talk) 20:39, 1 February 2020 (UTC)
Bottleneck is computing power. More efficient algorithms may help also. — Ivan A. Krestinin (talk) 18:15, 4 April 2020 (UTC)

Code?Edit

Out of curiosity, is the code for KrBot available somewhere? I'm interested in looking at how you find the constraint violations, as that could improve the efficiency of some of Pi bot (talkcontribslogs)'s tasks. Thanks. Mike Peel (talk) 20:29, 25 January 2020 (UTC)

Hi Mike, the code is not available in public. Generally bot downloads Wikidata dumps, load its and check all values using different constraints. The code is written on C++ and uses several custom libraries. So it will not very useful for you as I think. But I can provide details about parts that are interesting for you. — Ivan A. Krestinin (talk) 20:26, 1 February 2020 (UTC)
Thanks for your reply. I used to code in C++, but nowadays I use Python, so perhaps I can't directly reuse your code, although I'd be interested in looking at it if you can make it publicly available. To pick a specific example, your bot updates the Commons link violations at Wikidata:Database reports/Constraint violations/P373. I wrote a Python script at [1] to try to remove the bad links automatically on a daily basis, but I re-run it through the constraint violation report every time your bot updates it, and that seems to find extra cases. I'd be interested in learning how you find those extra cases, so that pi bot can handle them quicker. Thanks. Mike Peel (talk) 20:34, 1 February 2020 (UTC)
Bottleneck of my bot is dumps management. Currently bot loads all values of all properties from dumps into memory. This is long process due to dumps size. I am improving this mechanism now. I hope the improvements will increase update frequency. You can try to load needed values directly from Wikidata DB. It should be possible on https://tools.wmflabs.org. This will allow you to get more actual data. About P373: I also have bot that fixes wrong Commons category (P373) links. — Ivan A. Krestinin (talk) 21:00, 1 February 2020 (UTC)
I hope you don't mind, but I've raised this at Wikidata:Contact_the_development_team#Increasing_the_frequency_of_constraint_violation_report_updates - you provide a vital service, and it would be good if this was better supported. I'll follow up about P373 soon. Thanks. Mike Peel (talk) 21:33, 1 February 2020 (UTC)

Unique/Single value constraints – one item/value listed twice as a violationEdit

Hey, I just want to let you know that user:KrBot2 listed some violations in a wierd manner: [2]. One item is listed twice as a violation of a unique value violation and one value is listed twice as a single value violation. Wostr (talk) 11:38, 1 February 2020 (UTC)

…and in many other reports, some really extreme ones being Wikidata:Database reports/Constraint violations/P2639 (ca. 3400 false positives) and Wikidata:Database reports/Constraint violations/P2250 (1000+ false positives). (@Wostr: The fact that the same item shows up as a unique value violation and the same value shows up as a single value violation does not necessarily mean that the bot is wrong: sometimes people add the same value twice—usually using some sort of (semi)automated edit—, and then it’s right to list these as violations. But that’s not the case now, at least not for the most of the reported violations.) —Tacsipacsi (talk) 12:28, 1 February 2020 (UTC)
Hi, thank you for the report. The issue is fixed. Please wait for the next update. — Ivan A. Krestinin (talk) 20:17, 1 February 2020 (UTC)
Just a comment that I noticed this too, particularly on the report for DOI (P356) - suddenly there are over 400,000 violations of the single-value constraint! ArthurPSmith (talk)

Hi Ivan, if it's bug fixing time, could you please look at the discussion here? Not sure if you noticed my ping there. The community thinks that deprecated values should not trigger the unique value violation... Is that fixable or not? Cheers, Vojtěch Dostál (talk) 22:22, 1 February 2020 (UTC)

@Ivan A. Krestinin: You made an update on Feb 1. 2020 but the bug is still not fixed. -- MovieFex (talk) 12:59, 5 February 2020 (UTC)
The same problem in Wikidata:Database reports/Constraint violations/P6359. -- MovieFex (talk) 13:16, 5 February 2020 (UTC)
The February 1 update was the one that went wrong (see the above timestamps), of course it’s not fixed. The next one should be good. —Tacsipacsi (talk) 13:52, 5 February 2020 (UTC)

KrBot and maxlagEdit

Hi! I see from Special:Contributions/KrBot that this bot is currently editing at about 100 edits per minute, but maxlag is currently at 17. How do you account for this? See also Wikidata:Administrators'_noticeboard#WDQS_lag_is_terrible_(over_9_hours_now) and mw:Manual:Maxlag_parameter. Cheers, Bovlb (talk) 16:57, 10 February 2020 (UTC)

Wrong merge and consequencesEdit

Hi,

Last November, an IP wrongly merged brigand (Q20650523) and Dacoity (Q17176963). The merge has been undone been meanwhile KrBot resolved the redirection (which was a good idea in theory but not in this specific case). Could KrBot now undo this batch https://tools.wmflabs.org/editgroups/b/KrBotResolvingRedirect/Q20650523_Q17176963/ ?

Cheers, VIGNERON (talk) 18:34, 11 February 2020 (UTC)

2020-02 KrBot2Edit

Hello,

Could you run KrBot2 every 5 day? Visite fortuitement prolongée (talk) 14:19, 15 February 2020 (UTC)

Or at least, allow the bot to be triggered manually like ListeriaBot. That would be very useful, if you have time. Just noticed that the P7882 report is old. --Ysangkok (talk) 17:53, 27 February 2020 (UTC)
Bot cycle is ~5 days now. It works automatically. My current attempt to reduce the time fails. I hope the next attempt will be more successful. — Ivan A. Krestinin (talk) 18:08, 4 April 2020 (UTC)
Thank you very much. Visite fortuitement prolongée (talk) 22:39, 5 April 2020 (UTC)

Wrong ISBN-13 to ISBN-10 transferEdit

I have rolled back your update. You removed the correct ISBN-13 value and wrongly used it as a (wrong) ISBN-10 value. Geertivp (talk) 09:59, 18 February 2020 (UTC)

KrBot blockEdit

Hey Ivan, your bot ignores the maxlag parameter since 2 PM today, thus I blocked it now. You can see its edit pattern in these Grafana charts, particularly in the "Max Single User Edit Rate" panel. It is your bot which accounts for the ~100 edits/min in that chart, and it obviously does not stop during phases of high database load.

Since server resources are unfortunately very limited, I had to block the bot. All other bot operators do respect the maxlag parameter as indicated by the Wikidata:Bots policy, and it is only fair if you do so as well. Please let me know when you have implemented it properly, as the bot can then be unblocked again of course. In case of questions, feel free to ask. —MisterSynergy (talk) 17:48, 6 March 2020 (UTC)

Related to this, can you please collapse these edits into one edit? I assume you're using https://www.wikidata.org/w/api.php?action=help&modules=wbsetdescription , with https://www.wikidata.org/w/api.php?action=help&modules=wbeditentity you can just do it in one edit like in the example. Multichill (talk) 15:55, 7 March 2020 (UTC)
  •   Done, maxlag=5 is added, wbsetdescription is replaced to wbeditentity (except cases with conflicting descriptions). — Ivan A. Krestinin (talk) 17:50, 4 April 2020 (UTC)
    • Thanks, I have unblocked your bot account. —MisterSynergy (talk) 18:24, 4 April 2020 (UTC)

Wikidata:Database reports/Constraint violations/P856Edit

The last successful update to this page by your bot was over half a year ago. It should nowiki all links if it runs into the spam blacklist. (By the way, I think facebookcorewwwi.onion (Q24590047)’s constraint violation causes triggers the spam blacklist, but that’s not important, as the bot doesn’t need to detect it, just nowiki everything.) —Tacsipacsi (talk) 00:59, 23 March 2020 (UTC)

Please remove **non**-violations from reports (green boxes)Edit

The bot posts statistics about cases which are NOT violations. This complicates the task for checking properties as these properties then have links pointing to them as "Report/Pnnn/violations" when in fact there are none: we have to search each property by their Qid in some very long pages (very long because of violations reported there).

Please remove the "green" boxes for these report pages, only list the violations (red boxes). Notably for the generated tables for "Allowed types" listing all references (valid or not).

You may want to report instead on other pages (not "violation" pages) the cases where there are no violation detected, but I think it is just not needed at all.

Thanks. Verdy p (talk) 05:50, 23 March 2020 (UTC)

I think the green boxes are a quite useful feature, they help getting a picture about how the property is used. Maybe they could use external links instead of internal ones, although I don’t know how external links work with displaying labels (whether getting label generates a backlink, how much more complicated the module becomes with this extra feature etc.). —Tacsipacsi (talk) 00:01, 24 March 2020 (UTC)

Canadiana Authorities IDEdit

Hi. I have seen your bot removing the suffixes E and F from the values. Please do not do that. The english entries with E are deprecated, as those have been merged into the NACO authority file. The french entries with F are still valid and are used as a base for the new "Canadian name authorities in french" file, which is part of VIAF. Example: [3] --Sotho Tal Ker (talk) 20:34, 15 April 2020 (UTC)

This job is stopped. — Ivan A. Krestinin (talk) 20:57, 15 April 2020 (UTC)

Wikidata:Database reports/Constraint violations/P345Edit

Hello Ivan, your last update 2020-04-20 was with old data from 2020-04-13. If you have such a long time between the update intervalls it would be nice to have an actual data record. At the moment it takes 2 updates to get data nearly 2 weeks old. This cannot be the sense of an update. -- MovieFex (talk) 11:37, 21 April 2020 (UTC)

Hello, current update interval is 5 days unfortunately. — Ivan A. Krestinin (talk) 18:36, 25 April 2020 (UTC)
You do not understand what I'm trying to say. Last update was today (2020-05-05) with data from 2020-04-29. Every correction from that status was not considered. Why didn't you took actual data from today or not more older than yesterday? Don't you see that the lists of constraint violations grow and grow? How should anyone work with an update which is absolutely outdated? -- MovieFex (talk) 22:22, 5 May 2020 (UTC)
Bot needs 5 days to process the data. So I understand the issue, but it is not simple to fix it. — Ivan A. Krestinin (talk) 16:24, 8 May 2020 (UTC)
Hello, is it possible to include that the labels are shown like User:Queryzo did here? -- MovieFex (talk) 18:17, 26 May 2020 (UTC)

categoryEdit

Ivan, do you can to get the category in hebrew: קטגוריה:טקסונים שתוארו בידי תומאס הנרי האקסלי to the file: Category:Taxa named by Thomas Henry Huxley in another launguges? 2A01:6500:A051:379A:2008:67A5:30CE:7E4F 13:50, 23 April 2020 (UTC)
I have no idea that is wrong. Looks like bug. It is better to discuss it here: MediaWiki_talk:Gadget-Merge.js#Unexpected_error_while_merging. — Ivan A. Krestinin (talk) 18:32, 25 April 2020 (UTC)

VIAF updatesEdit

Hi! A little question: I remember that KrBot, among other extremely useful functions, also updates VIAF ID (P214) removing deleted clusters and correcting redirects; in general, 1) how frequent are the updates 2) when was the last update and 3) when will the next update take place? Thank you very much also from @Bargioni:, --Epìdosis 14:52, 8 May 2020 (UTC)

Hello! Update frequency is limited by appearing new dumps on http://viaf.org/viaf/data/. New dump appears today, so updating is in progress now. Usually this happens once per month. — Ivan A. Krestinin (talk) 16:11, 8 May 2020 (UTC)
Really great! Thank you very much! --Epìdosis 19:08, 9 May 2020 (UTC)
Thx, Ivan. -- Bargioni 🗣 09:48, 11 May 2020 (UTC)

Value Taipei, China (Q30940804) will be automatically replaced to value Chinese Taipei (Q216923)Edit

Hi Ivan, I do not agree with the automatic deletion of Taiwan (Q865) from country for sport (P1532). Krdbot replaces Taiwan with Chinese Taipei. Wanted was to replace Taipei, China with Chinese Taipei. I do not agree with this one-sided following of Chinese politics and ignoring Taiwanese politics. --Florentyna (talk) 04:38, 14 May 2020 (UTC)

Hello, could you provide link to the edit? — Ivan A. Krestinin (talk) 08:54, 14 May 2020 (UTC)
See for instance [4]. At the end for all people with occupation [P106] badminton player [Q13141064] and country Taiwan [Q865], where everytime were in country for sport [P1532] both values: Taiwan [Q865] and Chinese Taipei [Q216923]. Taiwan was everywhere deleted from this property. --Florentyna (talk) 05:08, 15 May 2020 (UTC)
I disabled the autofix rules. — Ivan A. Krestinin (talk) 08:32, 15 May 2020 (UTC)
Thanks a lot! --Florentyna (talk) 09:15, 15 May 2020 (UTC)

GND ID replacement of redirected idsEdit

Please stop [5]. The IDs are still valid and resolve. They are used in third party websites. Removing them, breaks links to WD by P227 and the ability to see which IDs are merged in GND database. Additionally in the case above the inserted value was left with "deprecated" label, existed already as preferred and the bot has been reverted before. Pinging @Kolja21: who works on GND. MrProperLawAndOrder (talk) 18:41, 18 May 2020 (UTC)

@Raymond: FYI: You made the same suggestion on deWP. --Kolja21 (talk) 19:22, 18 May 2020 (UTC)

Also, the edit has no edit group, no batch number so how can one see all edits done in the same run? And, can you in general provide more documentation of the bot runs? A page for each task? MrProperLawAndOrder (talk) 18:47, 18 May 2020 (UTC)

Thank you MrProperLawAndOrder and Kolja21 for bringing this up. I am working on GND together with the "Deutsche Nationalbibliothek" since > 10 years. The IDs are deprecated but still valid and resolve. This fact is an important information for all users incl. 3rd party users who use Wikidata as authority control data hub. Raymond (talk) 19:44, 18 May 2020 (UTC)
@Kolja21, Raymond: I think, they should even be imported by some bot. I don't know if this is in VIAF and KrBot can do it (monthly), or it has to be done by GND dump (not regular). Raymond, what is the exact terminology used by DNB for the merged/redirected values? Do they call it "deprecated". Because in WD there are three levels, in the case above one value is preferred, one deprecated and no value normal. How to do this exactly should probably discussed on P227 talk. Or better for all VIAF components since other libraries probably also merge and redirect. MrProperLawAndOrder (talk) 20:49, 18 May 2020 (UTC)
Imho we don't need to import redirects. Many of them had a low cataloguing level or had been Tns. These IDs should only be kept in Wikidata when they where added with a source and used by other databases. --Kolja21 (talk) 20:56, 18 May 2020 (UTC)
Kolja21, the source would be GND. DtBio also stores them, I have seen that old IDs are redirected on their website. Very professional. But I think that that excludes Tns. When I proposed importing GND IDs, I only meant the redirects, not Tns. MrProperLawAndOrder (talk) 23:17, 24 May 2020 (UTC)

Similarly, I reverted you here: Q64633427. Your bot changed the value of a deprecated VIAF statement!!! Vojtěch Dostál (talk) 13:54, 24 May 2020 (UTC)

@Vojtěch Dostál: ongoing also with GND [6], breaking any resolver. MrProperLawAndOrder (talk) 23:17, 24 May 2020 (UTC)

BTW @Vojtěch Dostál: I would differentiate between:
--Kolja21 (talk) 00:34, 25 May 2020 (UTC)
@Kolja21 Good point, will do in future. Vojtěch Dostál (talk) 05:47, 25 May 2020 (UTC)