About this board

Previous discussion was archived at 2013 2014 2015 2016

Titodutta (talkcontribs)
Pasleim (talkcontribs)
Gnoeee (talkcontribs)

Hello @Pasleim, Since the bot is not updating the page for quite some years I'm trying to update the list based on the Lang statistics by Ibrahem once it's get updated. So could you stop the bot to edit the page.

MisterSynergy (talkcontribs)
Reply to "India statistics page"

Deltabot inverse claim = self reference

3
Lupin~fr (talkcontribs)
MisterSynergy (talkcontribs)
Lupin~fr (talkcontribs)

Thank you, I'll check the doc of this property :)

Reply to "Deltabot inverse claim = self reference"

Number of main statements by property

7
Facenapalm (talkcontribs)
Facenapalm (talkcontribs)

On second thought, formatnum looks like a valid workaround. Still, I think it would be an improvement if template would return number only. :)

MisterSynergy (talkcontribs)

Well, it's a valid request. I have added onlyinclude-tags to the script, as this should solve the problem as much as I am aware.

Facenapalm (talkcontribs)

Yup, that works too. Thanks for the quick fix!

Facenapalm (talkcontribs)
MisterSynergy (talkcontribs)

Yes, those are updated by the same script and they did receive this update as well.

Facenapalm (talkcontribs)

Great. Thanks once again!

Reply to "Number of main statements by property"
Pallor (talkcontribs)
MisterSynergy (talkcontribs)

Thank you for the notice. There was indeed something wrong with PLbot, so I killed the stuck job that prevented others from running. It will catch up gradually now…

Pallor (talkcontribs)

Thank you!

MisterSynergy (talkcontribs)

It has almost caught up. "Humans with missing claims" is the last job to finish, but it is running currently.

Reply to "Humans with missing claims"
Fralambert (talkcontribs)
MisterSynergy (talkcontribs)

I've restarted it. Somehow it got stuck.

This post was hidden by MisterSynergy (history)
Reply to "Wikidata:Requests for deletions"
Infrastruktur (talkcontribs)

I was thinking about going over and fix complex constraints that aren't working. Would it be possible to have your bot log which SPARQL queries either has errors or times out?

MisterSynergy (talkcontribs)

There is currently no logging in place. If anything goes wrong during querying, the bot just continues with the next constraint and processes it.

However, logging can easily be added. I will report back here when I can share results.

MisterSynergy (talkcontribs)

Okay do you have experience with wiki template coding?

There is a new version of this complex constraint bot coming, and it will include presentation of more details about the processing. Two places in particular should benefit from this:

  1. Rows in Wikidata:Database reports/Complex constraints will include additional information. This requires {{TR complex constraint}} to be expanded by the following parameters:
    • query_http_status (integer value)
    • query_time (float value)
    • query_timeout ("True" or "False" currently, but can be made human-readable)
    • sparql (string, to link to WDQS)
    • errors (string, kinda bulky)
  2. I can also add a new (i.e. to be created) template containing similar information to each section on complex constraint violation report pages. All meta information about the constraint evaluation should be displayed inside this template.
    • I am looking at Template:Complex constraint section header as a page title, but this can easily be changed
    • It should probably be made translatable just as Template:Complex constraint is.
    • Parameters are:
      • label (string)
      • description (string)
      • sparql (string)
      • result_cnt (int)
      • query_http_status (int)
      • query_time (float)
      • query_timeout ("True" or "False" currently, but can be made human-readable)
      • errors (string, kinda bulky)

If you cannot or do not want to help, I will certainly somehow figure this out by myself—but it will take some more time.

Infrastruktur (talkcontribs)

Sure, I can help with templates. If you were thinking to include the whole call-stack for the error message, this is IMO too much and should ideally not occur that often, better to just log error as a boolean value. People can just run the query themselves to see what's going on. For the query I'm not sure if I need it to have certain characters escaped "{}|" or if I might be able to get it to work by having it wrapped in nowiki tags. Try the nowiki tags first maybe? When you're ready just post some sample output. I have two subpages under my user named Sandbox5 and Sandbox6 which you can use, or as an alternative https://paste.toolforge.org/ .

MisterSynergy (talkcontribs)

I am currently thinking like this:

On Wikidata:Database reports/Complex constraints using {{TR complex constraint}}, there will be more parameter values available to the template. The transclusion will look like this:

{{TR complex constraint
|p=Pxxx
|label=(as given in Template:Complex_constraint)
|description=(as given in Template:Complex_constraint)
|violations=123
|query_http_status=200
|query_time=5.24
|query_timeout=False
|sparql=(as given in Template:Complex_constraint)
|errors=(short description what went wrong, if this is the case)
}}

We can then decide which info the template actually uses for display. I would not mind if some information is simply not used for display, at least in cases where everything went okay.

The template-to-be-created Template:Complex constraint section header works similarly and it should replace all meta data on violation report sections. I think it would be best to place everything into some sort of a box above the actual list of constraint violations. Again, I would not mind if some params passed to the template are not being displayed.

The pipe character in the sparql parameter of Template:Complex constraint needs to be masked with {{!}} anyways, otherwise query execution fails. This is an error that I have now seen a couple of times, and this fact is not well documented at the moment.

MisterSynergy (talkcontribs)
Infrastruktur (talkcontribs)

You already know templates pretty well I see. Didn't do much. Made the look consistent and avoided printing elapsed time if the query timed out. "results_cnt" should be changed to "violations" to be consistently named.

I made a sketch of the summary page at User:Infrastruktur/Sandbox5 . It had limited horizontal space so I just included the most important things. Feel free to edit the page directly and try out stuff.

How will DeltaBot be updated if Pasleim is not around?

MisterSynergy (talkcontribs)

Yeah, I will change the "results_cnt" parameter to "violations".

I think the summary page would benefit from having (some) more columns. This would help to keep it sortable by different criteria.

Regarding Deltabot: I have write access to its Toolforge space which is sufficient to update the source code. Pasleim made me co-maintainer for all his three Toolforge tools some months ago, since he is barely around anymore.

MisterSynergy (talkcontribs)

Alright, after I managed to finish a dry-run without crashes/exceptions, I pushed the rewritten code to the deltabot account which is now updating all complex constraint violation reports. We can still tweak the templates, and of course the bot source code as well. The bot is scheduled to run every night, and a complete run needs 4 (or so) hours.

MisterSynergy (talkcontribs)

The first iteration with the new complex constraints bot has finished, and everything looks good. We can now also migrate the summary page template as the information is all available as template parameters.

I am not sure, however, whether the summary page might be a bit heavy now. It grew from 224k to 693k in wikitext size, primarily due to the inclusion of all SPARQL queries. It could be useful to have these available directly in order to debug problematic cases, but I hope the page remains usable. What's your opinion?

Infrastruktur (talkcontribs)

Nice. Like you say it's convenient and for the individual report pages we should keep the query. The summary page will exceed the default max page size (2M) if the amount of complex queries increases by a factor of 3 which is not enough wiggle-room. Instead of including the query, append the text " (query)" to the link to the talk pages to make it clear where to find it. It's two extra clicks, but that's fine.

I noticed the error message about parsing JSON. It shouldn't be necessary to try to parse a reply as JSON if the HTTP status code is different from 200. AFAIK there are a couple of query outcome modes:

- HTTP 200 no failure, returns valid JSON

- syntax errors etc. results in immediate error message

- runtime errors like stack or heap overflows. results in delayed error message.

- timeout during compute, results in delayed error message.

- timeout during transmission of result after compute is done. results in truncated (invalid) JSON and an error message.

- (POST) request exceeds 1MiB including protocol overhead. Varnish caching proxy returns HTTP 500-ish error. there is no error from Blazegraph.

- special queries like ask, construct, describe. the construct one returns data in an RDF serialization format.

- explain mode returns HTML.

MisterSynergy (talkcontribs)

I agree, we do not need the query on the summary page. I just went though the summary and fixed a particular type of error (unmasked pipes and unrecognized template paramters), and I realized that I do not need the query on the report page. The SPARQL queries will disappear from the summary page on next successful run.

On HTTP statuses and errors: this is all a bit of a heuristical procedure.

  • query_time is measured by the script
  • HTTP status code is read from the server response; the only exception is "HTTP status code=0"—which is very rare—when the bot crashed during the request so that no useful response object to read a status code from is available.
  • query_timeout=True is inferred if query_time>60 (seconds) *and* there is either one of these "status code=0" situations, or the response is not valid JSON (i.e. cannot be parsed). At this point I cannot distinguish "timeout during compute" from "timeout during transmission". Unfortunately, the server does not really seem to report what's wrong in many cases and one has to infer this client-side. Existing documentation is not very useful.
  • If you happen to identify "syntax errors", "runtime errors", and different "timeout errors" from the responses (HTTP status, query time, also received content, etc.), then please let me know. I could try to output this explicitly once I know what to look for.
Infrastruktur (talkcontribs)

Some of the HTTP response headers could help here, it might be worth asking one of the server technicians about it.

For the case of oversized request i got an HTTP 502 with response header "server: ATS/9.1.4", apache traffic server, so the request didn't reach Blazegraph as shown by the lack of a "x-served-by: wdqsXXXX" header. Requests that reach WDQS tend to have "server: nginx/1.14.2" in them and an x-served-by header which indicates which server in the cluster was serving the query.

For the case of timeout during transmit; I submitted a request for all triples and got an nginx response with http status code 200 and truncated json. For the first request I got an error message appended, but for the second request I did not, which is weird. The response headers did not indicate the second request was cached (x-cache-status: local-hit as opposed to x-cache-status: pass), so it is hard to say what is going on here. Anyways it is very problematic that it is possible to receive a truncated response without any clue that anything went wrong other than to check if the JSON is valid.

Timeouts during compute seems to give 500 responses. Other errors likely use that status code as well.

Syntax errors seem to reliably give 400 responses.

MisterSynergy (talkcontribs)

Thank you! What a mess :-/

Starting from here, I try to familiarize myself a little more with these cases. Maybe it finds its way into the source code as well, at least partially.

MisterSynergy (talkcontribs)
  • "malformed query" is now being reported by the bot.
  • it does not distinguish between "timeout during transmission" and "timeout during querying"; I don't think this really matters
  • Overall, the situation looks already much better now. There are 33 (or so) complex constraints with issues, mostly timeouts, out of a total of ~1200 defined complex constraints. Looking at them, most appear to be very difficult to fix, though.
  • If you think the bot can still be improved, please let me know. Thank you for your input, again!
Infrastruktur (talkcontribs)

Thanks for your hard work, it makes a big difference.

Reply to "Complex constraints"

Wikidata:Property proposal/Computing

3
Laftp0 (talkcontribs)
MisterSynergy (talkcontribs)

I'm guessing you refer to Wikidata:Property proposal/Overview?! The subpage was indeed missing from there, but I have just added it and ran an update for the report. It will be included in every update from now on as well.

Laftp0 (talkcontribs)

Yes, thank you!

176.37.192.236 (talkcontribs)

Wrong wiki-link to :ru:Великая Отечественная война instead of :ru:Восточный_фронт_(Вторая_мировая_война)

Reply to "Q189266"
Billinghurst (talkcontribs)
Pasleim (talkcontribs)

Unfortunately, this is quite hard to achieve.

It would mean that either I could create with SPARQL a list of all these items which is almost impossible because of the number of these items. Or I need to read the statements of each single item on the lists which is extremely time consuming.

Reply to "PLbot comparison, not offer ..."
Nk (talkcontribs)

For info, PLbot missed a few updates at Q14334596. --~~~~

Pasleim (talkcontribs)

I could identify and resolve a problem with my code.

I expect that the page will be updated more reliably now.

Reply to "Template:POTD"