Hi! I've just noticed in the page history of Theophilos of Athens (Q12874082) that the article from fa.wikipedia got deleted in 2018 but remained present on Wikidata. Is it a know problem, or has it been already solved? Can a bot clear eventual similar cases? Thank you as always!
About this board
Deletion on fa.wikipedia ineffective on Wikidata
Hey, stuff like that can happen for various reasons. Network partition, lag in the database, bugs in the code, permission issues (the IP being blocked in Wikidata maybe?), etc. If it happens all the time, then we should look into it but there will be always cases that fail. Maybe we should have a way to spot and clean up.
Backup of the sanctioned user (Mr,p_balçi) with both an account and an IP
Hi, the mentioned user, who was expelled from Wikipedia, has once again attempted to sabotage articles and spare games with multiple IPs and accounts.
Some of her spares were blocked in the same month and many others are not . He has bypassed the block more than ten times. I request that his main account be closed endlessly and globally.
My guess is that if inspected, a lot more will be discovered.
In several cases, it has been blocked due to obscenity.
Several accounts under her have been closed globally (due to obscenity) :
Please handle thanks
Hello, please mention these cases in fawiki's وپ:دبک. It doesn't belong here.
Hello, what? I joined recently and I'm not edited here.
Correct references through bot
Hi! I remember you run a very efficient bot and in the past I asked you some fixes which were very efficient. Now I mostly do fixes through QuickStatements, which is a very good tool, but isn't still able to fix references leaving the statements unchanged. I sometimes notice big groups of items (thousands and tens of thousands) having references which are imprecise or wrong and I don't know who to ask for correction. Could I slowly report you some notable cases of references to be fixed, so that we can slowly deal with them through your bot? I think it is crucial for our data quality having references which are exactly correct, whilst at the moment this fact often doesn't happen. Thank you very much in advance!
Hey sure. I try to write something but I want to know the exact framework so I don't need write similar code every time, so I would write something general and use that every time.
Can you give me a couple of examples?
OK, great! So, here is a detailed panoramic of the situation. I see three main types of errors to be corrected:
- first type: correct previous imprecise bot-edits
- one known case: all references containing here) should have the stated in (P248) corrected in Internetowy Polski Słownik Biograficzny (Q96022943) (example); these wrong references have been recently added by @Reinheitsgebot:, to which I reported the problem without obtaining a correction + (you can easily infer a complete list from
- second type: properties which changed their format, so that references are now broken
- case 1: thousands of references containing HDS ID (P902) still have IDs with 4 or 5 digits (complete list): now the IDs require always 6 digits, so 1 zero should be added before the 5 digits (example), 2 zeros before the 4 digits (example); if absent, should be added before HDS ID (P902) (example)
- case 2: hundreds of references containing InPhO ID (P863) still have IDs containing only a number (complete list): now the IDs require the prefix "thinker/", which should always be added before the number (example)
- case 3: hundreds of references containing Spanish Biographical Dictionary ID (P4459) still have IDs containing only a number (complete list): now the IDs require also the following part, with "/" and the name, which should always be added after the number; if, as nearly always happen, the ID in the reference numerically coincides with the main value of P4459, the main value of P4459 should be used to complete the value of P4459 used in the reference (example)
- third type: properties which have been added twice as references to the same statement, with small differences or exactly equal; the two references should be merged keeping all the properties except reference URL (P854) (possibly obsolete and anyway not stable): stated in (P248), ID, named as (P1810) if present, most recent retrieved (P813) if present (here the range of properties involved is huge, I will give only some examples - more to follow in the next weeks) - introductive example 1, introductive example 2
- note: the following queries regard only date of birth (P569), but should be repeated at least for date of death (P570) and possibly for all properties
- Bibliothèque nationale de France ID (P268): first list (example of merged references, another one) and second listmay timeout, if necessary use LIMIT (example of coincident references, the oldest is removed, same here)
- Artsy artist ID (P2042): listcontains false positives (example of coincident references, the one having P813 first is removed)
- Artnet artist ID (P3782): listcontains false positives (example of merge of three references)
- GND ID (P227): list (example: a reference with P248 but without ID is markedly imprecise, so in presence of another reference having both P248 and ID should simply be removed)
For whichever question, ask me! When you have the bot ready, please start with some test-edits, so that I can have a look. Thank you very much in advance!
Thanks. I try to tackle it next weekend. This weekend I'm drowning in something personal.
Hi! Any updates? Obviously no urgence, as I said - just a little message in order not to forget myself the issue :)
Hey, sorry. I have been doing a million things and have been drowning in work but will get to it ASAP. I took some vacation for volunteer work :)
But it's on my radar, always has been. Don't worry.
Again. I have not forgotten about this. One day I will get it done. It's just there are so many things to do :(
Okay, one part is done: The bot now takes a SPARQL query and removes references that are exact duplicates. here's an example. I will write more in next weekends.
Very good, thanks!
A little case related to third type: Benezit ID (P2843) that had been inserted as reference in two different ways, the older one with reference URL (P854) and the more recent one with Benezit ID (P2843).
Very good P4459!
Done now, Gosh it took days :))) Let me fix type one now.
Can you give me a SPARQL query for the first type? I'm not good at queries involving refs :(
Very good. Waiting for part 3, which is obviously the most difficult, I have another task: all uses of described by source (P1343) in references (these thousands) should be substituted with stated in (P248), in order to avoid scope-constraint violations.
The third type is not that hard. I thought it's done. Let me double check and clean the mess.
Re-reading what you wrote for the third type a couple of times and now I get what you want but it's pretty complex. I'll try to see what I can do about it next weekend.
Hi! When you have time, could you have a look at these three?
- Wikidata:Bot requests#Accademia delle Scienze di Torino multiple references
- Wikidata:Bot requests#Archivio Storico Ricordi multiple references
- Wikidata:Bot requests#Library catalogs (2021-01-28)
They are probably less difficult than point 3 above, which I understand is quite difficult. See you soon!
Hey, Sure. Just give me a week or two.
Wrote something that can cleanup duplicates and subsets (e.g. if the reference is fully covered in another reference and more). I already started the bot and it's cleaning. Will continue but I don't think I can clean up more than that as it gets really really complicated.
Perfect! When it finishes, could you schedule it as periodic maintenance (e.g. once a month)? This would assure us the stability of the quality.
It works based on SPARQL queries. Which queries you want me to run regularly?
Maybe after the cleanup Dexbot is doing now it won't be necessary anymore; I think that these redundant references have been inserted due to an error by Reinheitsgebot, so maybe the error has been solved and the cases won't surge again. Maybe, however, I will give you other queries (of third type) in the future if I find similar problems with different properties.
Help for دبک
آیا این کاربر چنین اجازه ای دارد؟
در حالی که این زاپاس خودش بازرسی پرونده دارد
و حداقل در موارد زیر قطعیت یافته که زاپاسهای او بوده؟
و چرا بعد از سه ماه به خرابکاری های او رسیدگی نمی شود؟ همین امروز چند زاپاس جدید از او بی پایان بسته شد
این یک نمونه
آیا عزمی برای پایان اخلالگری های او نیست؟
حقیقتا من جز شما کاربری را پرتلاش و مورد اعتماد ندیدم آقایان اعصاب کاربران برایشان اهمیتی ندارد
این فرد تعداد زیادی ترول سراسری و غیرسراسری بسته شده دارد. به طور مکرر به کاربران شمالی، کرد، ارمنی و غیره توهین می کند و با آی پی اخلالگری می کند و در ویکی های فرعی به کاربران توهین می کند.
امکانش هست به بازرسی او ورود کنید سپاس
Solomon Hill (Ilam Province) (Q15975213)
HI Amir, can you have a look at Solomon Hill (Ilam Province) (Q15975213). Seems to be completely wrong. I noticed it while working on heritage photos on Commons.
Thanks. It looks complicated. It's the first Iran's national heritage and used to be in Iran's borders (or parts of it still is? I check) but now it's mostly in Iraq. I ask people who know better than I do.
People say its country should be Iraq but technically still an Iranian national heritage that fell over to Iraq after changes in borders mid-20th century.
The id seems to be invalid? Should be at least two digits according to the constraint on Iranian National Heritage registration number (P1369). If it used to be in Iran you should add the Iran info too, qualify both statements with start/end time and make the current one preferred.
Done. I think the constraint regex is wrong. Changed it to [0-9] so it accepts 01 too (maybe it should accept one digit instead? I don't know).
I think you messed up the ranks. Was your intention to state that this item was never in Iran? I currently completely ignore the one digit entries on Commons because it will be mostly mistakes.
Fixed the rank
Can you explain to me why you gave them an indef block? I seem to miss a recent discussion that might be considered harassment/intimidating behaviour. But I am probably missing something?
Hey, the one that triggered it was Wikidata:Requests_for_deletions/Archive/2021/02/20#Q105443300 (the anti-LGBT behavior) but two other reasons: The user is indef blocked in five other wikis as well for harassment + the user has a history of being blocked for vandalism/edit warring in here. If you feel it's too much, feel free to reduce it to some other time.
Okay, that is malicious indeed. Indef is long, but let they themselves start an unblocking procedure if they want to proceed here.
Recently a researcher has interviewed me regarding my experiences with the ORES service here at Wikidata.
I took this as an opportunity to have a really close eye on the results produced by the service, and found some things a bit questionable. A couple of charts involving ORES scores for unpatrolled changes (i.e. IP editors and newcomers) are being presented at https://msbits.toolforge.org/wdcvn/index.php?action=oresquality and automatically updated each hour over there.
My greatest concern is how differently IP editors and new editors are being treated—you can see this in the uppermost two figures on the cited page. From Wikidata:ORES/List of features I understand that this is intentionally being done, which is probably not the worst idea in fact. Assuming this page still describes the current configuration, I think the "log(age + 1)" with (account) age in seconds is not an appropriate way to differentiate editors. Way too quickly, basically after some minutes, the bonus of having an account is so strong that new registered editors barely produce "bad scores", while IP editors practically always produce "bad scores". This is so strong that in some sense, ORES rather produces "account age" scores than edit quality scores. From my patrolling experiences, this does not seem appropriate since, unsurprisingly, the quality of newcomer and IP edits is very similar (on average pretty good, in fact).
Do you think that this configuration can be improved? Two ideas:
- Add a waiting period in the beginning of an account life that does not improve the rating (such as 4 days, 30 days, or tie it to the autoconfirmed status)
- Use "age in days" or even "age in months" rather than "age in seconds" in the log function in order to grow the bonus less quickly to a very robust value that always produces "good scores"
Hey, Yes. The user age is currently used, you can see all of features in this page and this and this. The problem with is a bit more fundamental. It's currently not maintained actively anymore (and WMF is replacing it with a proper system soon) and the code is in a bit of mess state atm and the data training the model for wikidata has not been updated for really long time now (five years I think). This needs work but @Lydia Pintscher (WMDE) need to prioritize the work (and I think it's in her radar as well).
Alright then, thanks for the answer. If it is going to be replaced "soon", I don't think it is necessary to work on the old system any longer.
Do you happen to know when the replacement will be available? For the counter-vandalism work it would be very useful to have filters that actually work; the current ORES scores have very limited use due to the configuration, to be honest.
Totally agreed that this is important. My current priority is getting more and current training data. I've been trying to get some of the large internet companies to help with this as it's fairly easy for them to do. Still working on this. Fingers crossed!
As for a replacement for ORES: That's on the WMF to do and as far as I understand it's actively being worked on but takes time. I've made it clear several times that this is important for us and I think that was well understood.
Amir: If any of what MisterSynergy was talking about is meaningful to do before a rework of ORES let me know please and we can talk about it.
Thank you, Lydia!
I have already learnt from the researcher who has interviewed me last week that User:CAlbon (WMF) is apparently the person in charge for the next-generation machine learning service that could/should eventually replace ORES. Some sort of a timeline would be interesting, and if there are community consultations possible, I would be willing to provide input as well. Right now I am not able to find any onwiki information about this, unfortunately, neither here, nor on meta wiki, nor on mediawiki.org.
Based on Amir's input on ORES ("not maintained anymore", "code is in a mess state", "training data has not been updated for years"), I think we should not try to invest any resources into the old system—although it is barely useful right now. I do emphasize, however, that such a system would be extremely helpful once someone masters to make one whose outcome matches much better with our human perception of edit quality than in ORES's case.
Hi Ladsgroup, I might be mistaking, but it seems that Dexbot is missing items like Q32788643 for deletion. There are quite a few of those. Can you look into that? Thanks.
Thanks. I'll take a look. The thing is that the bot is pretty tight on the criteria since not deleting an item that should be deleted is a much smaller loss than deleting an item that shouldn't be deleted. I double check to make sure cases like this will get fixed but not 100% sure if it'll work.
Let me know if it does not succeed. In that case I will try to deal with the items manually.
I checked and it works fine and deletes the item, the problem is that it doesn't show up in the list of items to check which the bot gets from User:Pasleim/Items for deletion/Page deleted and somehow this didn't show up in the list. If you give me a way to build the list somehow. I'll get them handled. Thanks!
I created a list at User:Lymantria/Test.
I just started the bot. This is fun!
Thank you! I added a (smaller) second batch at User:Lymantria/Test.
I see that MisterSynergy dealt with those manually.
"Lovifm.com artist ID"
"Lovifm.com artist ID"
Hello, please do it in wikidata "Lovifm.com artist ID""