Hi Magnus, https://sourcemd.toolforge.org/index_old.php has not been working for the past week or so. Hoping it will be up again soon as it is such a fabulous workhorse. MargaretRDonald (talk) 20:26, 23 September 2023 (UTC)
User talk:Magnus Manske
Hallo Magnus, ein gesperrter User meldet sich seit Wochen regelmäßig unter neuen Usernamen an und führt mit Mix'n'match tausende Edits aus, die fehlerhaft sind. Besserung ist nicht in Sicht, siehe User talk:Lizofon#Erroneous batch from Mix'n'Match. Kannst du die Nutzung von Mix'n'match für neu erstellte Accounts beschränken?
Comment: This could be a block evasion by indefinitely blocked user Matlin. See this thread for more details. I'm also wondering if there is any way to restrict Matlin's access to Mix'n'Match and his new sockpuppet accounts. Regards
replied there
I posted there that the code changes didn't seem to work, but didn't get a response. Here's another case where an account registered 4 days ago is doing stuff on MnM: https://mix-n-match.toolforge.org/#/rc/5708 (pretty sure it's another sockpuppet of him)
Not working for the 3rd day - https://mix-n-match.toolforge.org/#/jobs
Yes, Toolforge is switching from compute grid to kubernetes, and guess who has to adapt dozens of job scripts? Work in progress...
Should be running now, at least some of them
It looks like they have moved once and got stuck again overnight :(
The jobs are stuck again, for almost a week now :(
I was travelling. Restarted the jobs now.
Stuck again since 14 January.
kicked it
Jobs are stuck again after the last glitch a few days ago. In 4 days "automatch by search" in RUNNING status has not progressed at all for some catalogs, when it used to take a few hours to fully complete and find 30k preliminarily matched, now it's showing 34 preliminarily matched for a few days now.
looking into it
I changed a few things, seems to work better now
it's stuck again. the numbers haven't progressed anywhere among the "running" jobs in 8+ hours.
After 48 hours I can confirm that the jobs are running extremely unstable. Almost all the time they don't progress (preliminary numbers are not changing), and at some point they break out for a while, after which they freeze again. It looks like they actually worked for about an hour or two in two days.
Deployed an update, let's see if that does the trick
It seems it didn't work and the queue has stopped again: none of the RUNNING - automatch by search has made any progress in the last hour (by preliminarily matched numbers)
Yeah, the situation hasn't changed. Since I wrote the previous post the numbers didn't change until yesterday morning when there was a jump in progress, and now in more than 24 hours the numbers have not changed again and the queue is not progressing.
The problem seems to persist. Is there anything you can do?
The queue progressed today for the first time since 31 July (but now seems to be stuck again). Meanwhile, autoscrape/update from file jobs have been stuck since 14 July - none of them have been completed since then.
Something happened yesterday, the job queue moved, but now some jobs are invisible, e.g. my own new catalogue #6010. In the catalogue itself is still says "job from 2023-08-15: update from tabbed file", but it doesn't show up in the general job list. Instead, newer catalogues are given "High Priority" for webscraping.
Just back from holiday now, looking into it. Problem is, the jobs run well for hours/days then stop for no obvious reason. I may have to restart the jobs several times to test things.
Καλώς ήρθας! :) Nice to see you back here, but even nicer to know you had yourself a holiday. Hope it was relaxing (even though I kept bugging you on various channels).
I don't know how much this helps or if it's the cause, but I've noticed that when the queue hangs, the properties whose number of values can cause a timeout always end up in RUNNING. Right now NUKAT has the timeout error, but now there's also IMDb which also has a timeout problem due to 975k IDs on WD. Currently MnM can't open the IMDB manual sync page - https://mix-n-match.toolforge.org/#/sync/676 , because the WD query service almost always times out - MnM also always times out via the API response, while the MnM page is stuck on infinite loading. UPD: After I wrote this it looks like there was a reset and IMDB went from RUNNING to TODO.
upd: never mind, the queue is now stuck with fairly small catalogues, both on mnm and wd
The queue has just been cleared of everything except autoscrape jobs. And it looks like autoscraping doesn't work anymore - all such jobs just stuck in RUNNING status. #4712 and #4708 have a pretty simple ID grabbing from a single html or json file that used to complete in a few seconds, but now stuck idle.
The queue got clogged again a couple of days ago with five non-passing autoscrapes. Is it possible to completely disable autoscrape jobs so that other jobs can go through until the autoscrape gets fixed?
I changed autoscrape task size one level up, and restarted. Should run others first now.
Apparently autoscrape tasks share slots with "update from file" tasks? As they are still blocking them from going through.
No, but the priority assignment is a bit complicated. I have tuned it a bit, should be better now.
Seems to be fixed. Thanks!
Hi, can you please remove/stop the scraper from Mixnmatch:5951? It will likely fail again and only waste resources. I scraped on my own computer and imported manually.
Done
Hi Magnus,
I've just added a new set for Te Papa agent IDs, which supersedes a set from 2017. Can the old set be taken down? Thanks heaps.
Yes, https://mix-n-match.toolforge.org/#/catalog/362 deactivated.
Thanks Epidosis! Just reopening this because I messed up the ids on the new dataset and think it'd be easiest if I loaded it fresh. Can you please also remove this one?
Hi Magnus, I've tried to create a catalogue for Mythoskop https://mix-n-match.toolforge.org/#/catalog/6010 My CSV file is fine, but the catalogue turns up empty. Is it possible that MnM "loses" submitted CSV files if the jobs get stuck for too long? I've waited for a fortnight the first time around, and I've resubmitted my file last night.
As always, your help is much appreciated. Φιλία, J.
update: The catalogue is now online, but it has only 317 entries. There should be ca. 2000.
working on it...
There was a regression where only "autoq" and not "q" was allowed for pre-matched entities. FIxed now. I believe there might still be a few missing, but that might be an UTF8 encoding issue in the data file.
Is UTF8 the desired format?
There are still a few missing, but I don't know which ones. There should be 2,080 entries, MnM says 1,926. Should I send you the CSV file for reference?
yes please
Sent.
Hi Magnus! I am not sure what to do next. Should I put the description in my CSV file in "" quotation marks and re-upload?
I think that might fix it. Untested, obviously.
Thank you! It seems I have a lot to learn about CSV files. I will give it a try.
Hi Magnus,
a week ago I had uploaded the CSV file again with "" around the strings. Yesterday the MnM job list finally fed the new tabbed file through. Unfortunately, the catalogue still has the same low number of entries (1926, so more than 100 short). With MANTO it's the same problem.
I don't understand what goes wrong there. Could you perhaps take another look at both files? I'll send them to you again via email.
Χαίρε Μάγνε!
Dein Bot läuft ja unermüdlich, aber einige Seiten liegen trotzdem schon lange brach (z.B. der sehr nützliche Rapport Wikidata:WikiProject Ancient Greece/reports/Set of mythological Greek characters). Gibt es irgendeine intendierte Regelmäßigkeit bei den Updates (z.B. alle drei Monate wird der Bericht aktualisiert)? Wenn ja, könntest Du das transparent machen?
Φιλία
Jonathan
Is it possible to make a catalog for Property:P6328?
Made it through manual scraping https://mix-n-match.toolforge.org/#/catalog/6031
Hi Magnus. I've previously scraped lots of sets using lookahead like
<li>(([^<]|<(?!/li>))*)</li>
(or the equivalent for a table row) to get a Regex block for each item. I made a simple scraper at https://mix-n-match.toolforge.org/#/catalog/6013 but it failed, because a new requirement prohibits lookahead. The job log error reads:
regex parse error: <li>(([^<]|<(?!/li>))*)</li> ^^^ error: look-around, including look-ahead and look-behind, is not supported |
But I can't figure out a good way of doing this without lookahead. Any suggestions? How important is prohibiting lookahead?
Hi, the reason for this is that I rewrote the background jobs in Rust, and the default Rust regex crate does not support lookahead. There is an alternative crate for regex, but it's not an instant replacement, and I didn't have time to change it over yet.
Hi, if you please check why this template doesn't work on pawiki?
This post was hidden by Solidest (history)