About this board

Babel user information
de-N Dieser Benutzer spricht Deutsch als Muttersprache.
en-4 This user has near native speaker knowledge of English.
Users by language

This page uses the Structured Discussions extension of MediaWiki (SD, formerly known as “Flow”). I think that we really need something like this instead of the classical discussion approach, but I am also aware of the fact that not everything works smoothly yet in SD.

If you struggle to use this discussion page perfectly, do not worry and just leave a messy or broken comment for me. You do not need to figure out how to write a perfectly formatted comment with lots of trial-and-error edits. I am hopefully going to figure out what is on, otherwise I am going to ask you. Thanks!

Btw. SD comments do not need to be signed, since the software manages all contributions and puts user names and timestamps above them. Experienced users might also like the wikitext mode of SD, to be actived in the lower right corner of the input field.

Previous discussion was archived at User talk:MisterSynergy/Archive 1 on 2015-11-09.

Cl3phact0 (talkcontribs)

Hello MisterSynergy, Wondering if and how I erred in the creation of items: Q117199095, Q117199128, Q117199159, Q117199254, and Q117205166. I had created these in preparation for en:wp drafts on the relevant subjects and planned to fill in references bit by bit. (I'm mostly editing on Wikipedia, but have become more active here too.) Very receptive to advice and guidance. Cheers

PS: "Q:" may not be the best shorthand for "Question" in this forum. Apologies for the ambiguity.

MisterSynergy (talkcontribs)

I have restored these items: Q117199095, Q117199128, Q117199159, Q117199254, Q117205166.

In the current form, they all qualify for immediate deletion again. So please make them compliant with the notability policy at Wikidata:Notability. If there are no sitelinks (yet), please add external references to serious sources so that other editors can easily identify that these items are about.

Cl3phact0 (talkcontribs)

Thank you, I will try to spend some time on this over the next few days in order bring the items into compliance. Cheers

Cl3phact0 (talkcontribs)

I've filled in some external references for Q117199254 (Najla El Zein). Please have a look and let me know if more is needed. If all is well, I will also take care of the others in the same manner per above. (Again, my intention is to draft these articles and/or bring them to the attention of the WiR workgroup in that hope that another editor will.)

MisterSynergy (talkcontribs)

Yes, this looks good.

Cl3phact0 (talkcontribs)

Ok, great. Thank you. I'll proceed. (Apologies for not doing so more promptly.)

Cl3phact0 (talkcontribs)

I have now filled in a fair amount of reference data for all but the last (Q117205166 – which will be a en:wp category: ÉCAL alumni), and will continue to improve the above too. I will also try to find time to create the missing cat this weekend. In future, please drop me a note if any of my additions are incomplete or otherwise in need of improvement. (I will never knowingly add anything here that shouldn't be added, though sometimes I may work in a manner that appears out of sequence.) Thanks again

Reply to "Q: re: deleted items"
Infrastruktur (talkcontribs)

For the last month I see 2880 edits by people who have been banned some time in the last year. These edits are interesting only if there is more than one edit, sooo... user aggregation. I could supply a patch for your excellent and possibly underappreciated tool if you wish. I never announce anything I'm working on. And you still refuse to blow you own trumpet I notice. You make good stuff and should be proud of it.

MisterSynergy (talkcontribs)

Yeah the tool…

I still consider it to be vastly unfinished. There are plans to make a much more responsive and interactive UI that uses Javascript to fetch data from an API, but that API by itself is still to be developed. And the tool is supposed to provide in-tool patrolling functionality at some point at as well. Nothing of that is overly complex, but it needs some effort — and as long as it is unfinished like it is now, there is little point to advertise the tool to users. But time is scarce, and I am still have a lot to do with all the bots I am maintaining.

Re. user aggregation: I have never looked into edits by formerly blocked users, so I'd be interested to look into this. No idea what the reasons for their blocks were, and so on, and whether their editing is okay now. If can are willing you can send a wikimail via Special:EmailUser/MisterSynergy with a couple of examples that I could have a look at. I think we do not need to bring anyone to public attention at this point.

Infrastruktur (talkcontribs)

Those users that returned to vandalize have already been dealt with. But since this will be recurring, having an overview will be useful.

The original script is located in the patrolling folder in my PAWS area.

I downloaded the sources for WDPD to my PAWS area. There was some exact version requirements for libraries and instances of hardcoded paths and directories that needed to be manually created, so it's not exactly made to be portable. I think I should be able to get the backend to complete without crashing so I can start working on having it query the block log and output a tabular data file for unpatrolled edits by previously blocked users.

MisterSynergy (talkcontribs)

Yeah it is surely not packaged ideally yet :-D

As you claim, it is probably not too difficult to query all (formerly) blocked users into a dataframe and merge it appropriately with the master dataframe of the backend script.

A potentially even more interesting idea would be to look at IPs and IP ranges that have been blocked formerly and show activity in the recent changes table.

Reply to "Turns out banned people return"
Oravrattas (talkcontribs)
MisterSynergy (talkcontribs)
Reply to "Please restore Q118111705"

Restore the deletion of Bahasa Banjar Kuala (Q118356270)

3
Fexpr (talkcontribs)

Hi, I just had a discussion with contributors from the Banjarese-speaking community (@Volstand and @Arcuscloud). They just started the project to put Banjarese lexemes into Wikidata Lexeme. We found out recently that the item for Bahasa Banjar Kuala that is currently linked to the lemma variation of Banjarese lexeme (like this Lexeme:L1119292) was deleted because there was no information stored in that item. Could you help us to restore the deletion of this item Q118356270? We are going to fill out the necessary information as soon as the item is being restored. Thank you!

MisterSynergy (talkcontribs)
Fexpr (talkcontribs)

Thank you!

Reply to "Restore the deletion of Bahasa Banjar Kuala (Q118356270)"

hallo - brauche mal wieder kompetente Hilfe

4
Qwertzu111111 (talkcontribs)
MisterSynergy (talkcontribs)

Hallo Qwertzu111111, das ist alles in Ordnung. Es existierten zwei Datenobjekte zu dem Kirchengebäude: Q113632884 mit dem Commons-Link, und Q99317708 mit dem Wikipedia-Link. Du hast beide zusammengeführt, so dass nun nur noch Q99317708 mit allen Daten und beiden Sitelinks existiert. Genau so soll es auch sein :-)

Qwertzu111111 (talkcontribs)

danke

Qwertzu111111 (talkcontribs)
Reply to "hallo - brauche mal wieder kompetente Hilfe"
Laxeril (talkcontribs)

Hello! I wanted to discuss the list generated by your bot, specifically the Wikidata:Database reports/Sitelink to redirect with unconnected target. First and foremost, thank you for your valuable work on this project! I have a query regarding certain items that appear on this list despite being deleted quite some time ago. Examples include Q109284827, Q21789694, Q21789728, Q20650307, Q21789732, Q19530760. As a developer myself, I'm curious to know if this issue is related to a cache problem, considering that these items were deleted back in 2022, which is a significant amount of time. I've been reviewing your code on GitHub, but I haven't been able to identify the reason behind this "failure". I apologize for my curiosity, and I hope you don't mind my inquiry. Wishing you a fantastic weekend!

MisterSynergy (talkcontribs)

Hey Laxeril, the relevant code is actually here. The reason why these long-deleted cases are in the results is that I am querying from a secondary data source, specifically from the page_props table of client wikis. Consider Q109284827 with redirect sitelink commons:Slums in Jakarta for instance (first result from the current list), which is listed with a redirect at Wikimedia Commons:

MariaDB [commonswiki_p]> SELECT page_id, page_namespace, page_title, page_is_redirect, pp_propname, pp_value FROM page JOIN page_props ON page_id=pp_page WHERE pp_propname='wikibase_item' AND pp_value='Q109284827';
+-----------+----------------+------------------+------------------+---------------+------------+
| page_id   | page_namespace | page_title       | page_is_redirect | pp_propname   | pp_value   |
+-----------+----------------+------------------+------------------+---------------+------------+
| 122452632 |              0 | Slums_in_Jakarta |                1 | wikibase_item | Q109284827 |
+-----------+----------------+------------------+------------------+---------------+------------+

This should not be there since the item Q109284827 has been deleted long time ago (October 2022), but the page_props table in client wikis is not always up to date. What helps in these cases is to touch (null-edit) the page on client wikis so that the page_props table is being refreshed.

The bot is actually supposed to do that, but for some reason it does not get to this point in some cases. This needs some investigation, probably at around this part of the source code: https://github.com/MisterSynergy/redirect_sitelink_badges/blob/10e25168b026a9f12d537a24f57e6c069c1a8dfa/main.py#L631. It's not a major issue, however, since there are only a few cases on the entire report (for all Wikimedia projects).

MisterSynergy (talkcontribs)

Okay, line 631 is not relevant for sitelinks to redirects with an unconnected sitelink target (as in these cases). The filter in line 624 prevents this (particularly the df['target_qid'].notna() condition).

In order to make the bot touch such pages on client wikis, I would need to add another check in the write_unconnected_redirect_target_report function where this report is being written. It does not really fit there logically, but it would be the most efficient place to perform this check (and to touch pages if necessary).

Laxeril (talkcontribs)

Thank you for clarifying. I now understand.

Reply to "Deleted items in Wikidata:Database reports/Sitelink to redirect with unconnected target"
Karl Gruber (talkcontribs)

Hallo MisterSynergy, das Q1374463 ist mir auf RAT durchgerutscht. Daher habe ich den artikel erst jetzt in den ANR verschoben. Kannst du mir bitte ihn auch auf WD wiederherstellen. Danke K@rl (talk) 10:15, 17 May 2023 (UTC)

MisterSynergy (talkcontribs)

Klar, ist erledigt!

Karl Gruber (talkcontribs)

Sorry, ist aber doch noch ein Rotlink, oder versteh ich da etwas falsch? ;-) --lg K@rl (talk) 14:59, 18 May 2023 (UTC)

Karl Gruber (talkcontribs)

War von mir ein Gedankenfehler d:Q1374463 gibts eh wieder, danke K@rl (talk) 15:07, 18 May 2023 (UTC)

Reply to "Recover"
Lost in subtitles (talkcontribs)

Hi, the page with this reference is getting linked to the pages Chucho in different languages, could it be avoided? Neither of them refer to that person (who seems to try to promote himself. I tried to undo the bots edits, but it replaced them. Thanks. :)

MisterSynergy (talkcontribs)

No, those sitelinks are correct. Recently some had tried to repurpose this item from "disambiguation page" to some human individual, which is not allowed. I have restored the correct version (from March this year). If someone needs an item for the human, a new one needs to be created in compliance with the Wikidata:Notability policy.

Lost in subtitles (talkcontribs)

Oh, perfect. Thanks. I didn't notice the repurposing.

Reply to "Chucho (Q1089082)"

Instrumenting significant Wikidata bots

4
Infrastruktur (talkcontribs)

I was taking a preliminary look at the sources for Deltabot. It doesn't say how it is deployed. Do every script get its own Kubernetes pod or do they share a pod? Some of the scripts doesn't do a lot of error checking so they should at the very least provide some sort of traceback that can be inspected retroactively after a crash. If the scripts all run in the same pod, how are they spun up? I would be interested in helping instrumenting this bot if I knew more. I haven't looked into Toolforge yet.

MisterSynergy (talkcontribs)

Okay a couple of comments:

  • DeltaBot and PLbot have both been created by User:Pasleim, and both have similar characteristics.
  • Bot run on Toolforge (tool accounts deltabot and pltools).
  • Since Pasleim pretty much inactive, I have volunteered last November to co-maintain his bots to troubleshoot problems. Most actions have been minor code fixes and job management until now, but there is indeed some need to optimize bot management in pretty much every aspect.
  • There are roughly 50 Python scripts running under these two bot accounts with varying frequency (every 10 minutes to monthly). Both bots have a virtual environment with a couple of required modules. Both use pywikibot to edit.
  • The code is old, up to ~8 years; Pasleim's style is to keep things simple, but this is not always that helpful if something goes wrong; there is some basic logging, but it is often not helpful to analyze a problem.
  • There is a code repo for DeltaBot, but not for PLbot. However, I do not have write access (yet?), so the actual code for Deltabot on Toolforge is kinda different from the Github repo since I have to change code on the Toolforge console.
  • Both bots run a couple of shell scripts in a Python3.9 container. Each shell script just runs a sequence of Python scripts; if a Python script crashes, it usually continues with the next one in the shell script. See https://k8s-status.toolforge.org/namespaces/tool-deltabot/ am https://k8s-status.toolforge.org/namespaces/tool-pltools/ for details regarding the job status.
  • As much as I am aware, memory constraints on Toolforge are occasionally an issue.

I have very recently contacted Pasleim regarding the situation, in order to see whether and what he thinks about a modernization. I would be willing and able to fix many of the issues, in order to make these bots more robust and future-proof. However, Pasleim seems pretty busy and has not responded yet.

Infrastruktur (talkcontribs)

Thanks for the explanation, that clears things up a lot.

For starters you should edit the shell script started by the cronjobs so that every bot-task gets its stdout and stderr redirected to a separate logfile. e.g.:

/home/botuser/useful-bot >> /var/log/useful-bot.log 2>&1

the user account the script runs under needs to have write access to the folder where the logs are stored.

If something throws, the backtraces should now go into the logfiles.

Then you should set up logrotate, hopefully that's installed on the system. This will keep the harddrive from filling up.

https://linuxconfig.org/logrotate

If the system doesn't have logrotate, it can be emulated by using the day of the month as part of the filename.

Rotating the logs daily and keeping a months worth of logs makes it easier to see if something crashed.

Now that logging is set up, all that remains is keeping an eye on them to look for problems. Grepping the logs for the word "Traceback" is one way to do that.

Feel free to send me some tracebacks. I'll see if I can make a fix and send back, but no guarantees.

It is possible that due to the lack of error checking some of the scripts that assumes well-formed wikitext such as the ones that deal with the property proposals might be vulnerable to locking up or produce bad output if you give them bad input. If this happens it might be useful to manually log which wikipage and revision is being parsed to make it possible to recreate the condition that led to the issue.

MisterSynergy (talkcontribs)

Let's see if and when Pasleim responds. I have addressed related issues in the email, and I am waiting for his input. At this point I am not aware how much interest/time he still has in these bots, but I am reluctant to make significant setup changes without his approval. If he does not respond, changes will inevitable come at some point, but probably after some major failure. The status quo is not ideal, but it is not a drama either.

If he approves, there are several options and I am not 100% sure which way to go. For my own bot (~15 tasks), there are individual Github repos per task, each task has its own Python venv, and each one its own cronjob via k8s. Logging is done based on necessity, usually from within Python's logging module. However, all tasks in my own bot tool account use a central pywikibot configuration (which itself relies on Toolforge's shared pywikibot).

The reason Pasleim's bot's configuration has not been changed yet is that I have considered my co-maintainer role as a troubleshooter in case of complaints, not as a redesigner of this entire thing. I am just in the process to expand my involvement here…

Reply to "Instrumenting significant Wikidata bots"
Chilocharlie (talkcontribs)

I believe that you deleted the "shadowhunter" entity and definition that I created (Q115103335) by mistake, apparently because it is not "notable" enough. This is inconsistent with several Wikidata entries for the films. The films are notable, the noun is notable, and it belongs in Wikidata. Please, restore "shadowhunter".

MisterSynergy (talkcontribs)

This item needs external sources in order to have a clear definition. Can you provide such a source?

Chilocharlie (talkcontribs)

Sure. Sources are all the automatic options for "shadowhunter" that will autocomplete while typing. The definition itself was redacted by me, based on one of the official trailers. See "THE MORTAL INSTRUMENTS: CITY OF BONES - Official Trailer" by Sony Pictures Entertainment in Youtube. I cannot post the link because of an automatic spam filter.

MisterSynergy (talkcontribs)

This needs some sort of an external source. It is not obvious to others what this is about.

Technically, the item needs to comply with the notability policy at Wikidata:Notability.

Chilocharlie (talkcontribs)

This statement is incorrect: "This needs some sort of an external source." Only two of the three criteria in the notability policy need to be fulfilled. An entity can be added even if it has no external sources. Still, this one has external sources, like the trailer for example, which could be better referred if the link wasn't automatically blocked. A Google search for "shadowhunter" and the definition would have also provided relevant results. And Wikidata is meant to be co-curated. No one person needs to add everything. If someone adds a term, that is already a win, a term and a definition is a double win. Instead of deleting, better to leave the wiki spirit of co-curation and co-contribution to follow its course and others to add more data and links.

The term itself *is notable* as I explained above. It will definitely enhance the knowledge base, it refers to a clearly identifiable entity with plenty of public references, and will make statements in other items more useful. For example:

"Katherine McNamara as Clary Fairchild, a *shadowhunter* raised among mundanes (humans) who finds out her true heritage on her 18th birthday." https://en.wikipedia.org/wiki/Shadowhunters#Main

I hope that now it is clear that the term was correctly added and incorrectly deleted. I do not understand why the item was deleted before checking these things within Wikipedia first and why all the documentation burden to restore an item that was correctly added in the first place should fall upon me. If correctly adding just one term will take so much struggle, then I better give up.

MisterSynergy (talkcontribs)

I'm still not convinced, but at this point I think it would be the best to just give it a try.

Re. "And Wikidata is meant to be co-curated.": sure, but this is also a fallacy. Particularly content with notability issues and a lack of external resources are usually not being edited by anyone else. In the current condition, you are the only one to improve the item.

Reply to "Please, restore "shadowhunter""