@Infovarius https://editgroups.toolforge.org/b/CB/42881e438c6c/ reverted the whole batch because of the false positive you mentioned. Q422222 should only be run without aliases because of "dynamos"
User talk:So9q
Your main subject (P921) edits to add hypertension (Q95566669) resulted in bad results. See Diffusion in minerals at high pressure: a review (Q107269639) and High pressure effects on the Raman spectra of solid C6H6 (Q106111860), as examples. I suspect "high pressure" appears in quite a lot of physical science, maybe chemistry, articles.
Thanks for the report. I'll revert and add to the blocklist.
@UWashPrincipalCataloger reverted some edits, so I reverted the batch https://editgroups.toolforge.org/b/CB/83832773ab83/ and added it to the blocklist
Hi, I do not understand this change: https://www.wikidata.org/w/index.php?title=Property:P1591&diff=1635275079&oldid=1635274472
The property is used for more than Swedish decisions (like Q111795222 for example), and now the items show a warning. I assume this was a error, and I guess you were trying to add a constraint that was likely useful, except I do not get the logic, so I am not sure how to fix (and do not want to revert before discussing).
Thanks for the hint. I reverted my edit. I don't know how to make the logic correct either 😅 The intention was to have a warning on P31-Q96482904 which says defendant/plaintiff are needed. Do you know how to make a constraint for that?
I think defendant/plaintiff is needed for all law cases, so just saying that if you have one, you need the others would do the trick ?
I was thinking that we should warn the users also if both defendant and plaintiff are missing. IDK how to achieve that.
Hi So9q, I noticed that a couple of days ago you added The Plague (Q94041556) as main subject (P921) for quite a few items. The Plague (Q94041556) is a scholarly article published in 1899; presumably what was meant was plague (Q133780). Would you be able to take a look at these edits?
Reverted the job. IDK how that happened. I’m matching diseases and it must be an error in the graph that made it appear as a disease.
I think it probably was, I've seen a couple of unusual matches from the graph - I'm not sure how they make it in there in the first place. Thanks for sorting it out 🙂
I have reverted this edit. I surmise it was made with some sort of script which looks for key words and infers that one item is the main topic of another.
For the journal article "Why the Greenwich Meridian moved" the main topic is not petrology. The meridian moved because the direction of gravity at the Royal Observatory, Greenwich, is not precisely toward the center of the Earth. The distribution of rocks in the Earth's crust is one of several factors that affects the direction of gravity, rocks are certainly not the main topic of the article.
Is there a way to prevent the script from making the same error over and over?
Hi, I reverted your edit and suggested you deprecate the statement instead. How does that sound?
The script can be changed/fixed, but in this case I don't see that it did not work as intended. It simply matches the subject from Crossref to one of our items and adds it as main subject. I don't know if Crossref has a way to report errors in their database.
After reviewing your contribution history, it appears you are either running a bot, or engaging in bot-like behavior. I have re-reverted your edit. Please link to your bot approval and explain why you are not doing these edits from a bot account.
The edits are semi-automated. That means if I manually approve, the edit is made.
As for the merit of your edit, what is Crossref? What does an item in Crossref look like. What is Crossref's criteria for creating an item?
According to it's short direction main subject (P921) is "primary topic of a work". This is in line with the property proposal that lead to the creation of the property.
If the criteria in Crossref is to name the main subject of a published work, adding the property and deprecating it might be appropriate, if Crossref is important enough to take notice of. If the criteria in Crossref is something else, they you shouldn't be doing this task.
These are very valid questions. You can read more about crossref here https://www.crossref.org/
I have not asked them about the subjects yet. Do you want to send them an email? If you find an edit that is wrong feel free to deprecate it and provide a reason.
see my new user script here for easy jumping to the source User:So9q/crossref-link.js
Are you on telegram? Mahir256 agree with you that it should be removed rather than deprecated. See https://t.me/c/1224298920/74848
I removed it now, feel free to undo any edits that seem bogus or wrong.
The plain link to https://www.crossref.org does not make it obvious where to find answers to my questions. It would be helpful if you would email them.
I am not on telegram,and am not sure what telegram is.
Hi again. See https://meta.wikimedia.org/wiki/Telegram for more information about telegram an the different Wikimedia related chats. There is a LOT going on in those groups everyday and you get answers to questions often very fast I find and a feeling of part of the community (there are 23k active editors and about 700 of them are in Telegram).The WMDE Wikidata community office hour is in Telegram also and I highly recommend attending.
Regarding Crossref I recently made a userscript to easily jump from WD->Crossref API. See User:So9q#User scripts
Argh! Another example of mistakes caused by over reliance on simple string matching when tagging articles as being abut something. The title for Pleistocene diversification and speciation of White-throated Thrush (Turdus assimilis; Aves: Turdidae) (Q110451187) includes the word "thrush" but this refers to the bird thrush (Q26050) not the fungal infection candidiasis (Q273510). It looks like any paper on thrushes (the birds) will be tagged incorrectly. It would be great if your tools were clever enough to avoid doing this sort of thing. For example, any term which has multiple meanings (e.g., "thrush") should be avoided. A simple query to Wikidata will reveal if a term has multiple meanings. Can you undo all instances of tagging with candidiasis (Q273510)?
Thanks for reporting. We talked about this in the WikiCite group. The batch in question is being reverted and a fix of the tool proposed by ArthurSmith is being implemented.
Done, the fix has been pushed to master, see https://github.com/dpriskorn/ItemSubjector/pull/47
Hello, please be careful.
I will! Thanks for fixing it.
Hi, I've noticed that you've made Q423930 the main subject of papers that include "aquarius" in the title, e.g. Q100953456 "Aquarius philippinensis sp.n., a large endemic water strider (Insecta: Heteroptera: Gerridae) from ancient crater lakes in South Luzon, Philippines" (there are others) . In these papers Aquarius is Q2859194, a genus of insects. Is it possible to revert these edits? I guess this is the limitation of using simple text to determine subject, especially for items that have lots of synonyms that also match other items.
Big thanks for the report, I will revert the batch and be more careful going forward (the aliases are the culprit in this case)
I guess it's an occupational hazard of trying to determine what a paper is about based on its title. Taxonomic names can be a nightmare given all the possible name clashes. I wonder if what we need is something sophisticated enough to "know" whether a paper is likely to be on a species or a chemical compound, e.g. Q41799598.
Are you in the Wikicite telegram group? There we talk about how to better categorize all this knowledge. This is a blunt tool, using AI to read the abstracts would be a huge improvement. @houcemeddine: works on that if I am not mistaken, but few abstracts are available as open data at this time I'm afraid.
Yes I am in that group. Abstracts are one way to determine what a manuscript is about, but I wonder whether we can get useful information from the surrounding network of connections? Knowing something about the journals and the authors may often tell us whether its likely that a strip refers to a chemical compound or a taxon.
So something like -a Q423930 -exclude-journals-with-main-subjects botany (Q441) zoology (Q431)
This would exclude Q100953456, but only because I just added the main subjects :)
I like this idea.
We currently have ~22000 journals and ~12000 of them are missing a main subject.
So for this to work we first need to add main subjects to all those journals.
Hi, as the creator of programming terminology (Q77654221), could you elaborate how it differs from computer science terminology (Q66747123) and by what criteria terms belong to it? I have not seen this distinction made elsewhere on Wikipedias, eg. enwiki just puts everything into Glossary of computer science.
It seems it could also be hard to decide which *-terminology a term belongs to...
The corresponding "programming term" item (analogue of computer science term (Q66747126)) also does not exist, so as of now the category can not be participate as a instance of (P31) on terms.
I suggest we merge them 😀
Alright, sounds reasonable😊