About this board

Previous discussion was archived at User talk:SCIdude/Archive 1 on 2019-11-02.

M2k~dewiki (talkcontribs)
SCIdude (talkcontribs)

Hallo @Codc. de:Naturstoff ist momentan bei en:biomolecule, und de:Naturprodukt ist bei en:natural product. Die Hierarchie im Englischen ist natural product --> umfasst natural material und biomolecule, siehe auch enwp. Das ist nicht korrekt?

SCIdude (talkcontribs)
SCIdude (talkcontribs)

Brockhaus Biomolekül:

Naturstoffe, im weiteren Sinn alle Stoffe, die in der Natur vorkommen; im engeren Sinn organische Verbindungen, die aus Tieren, Pflanzen und Mikroorganismen isoliert werden können.

Reply to "Naturstoff vs. Naturprodukt"
Wostr (talkcontribs)

FYI: we have an additional level in classification of aldohexoses between aldehydo-hexose (Q105024342) and compounds like aldehydo-D-mannose (Q27117223) or aldehydo-L-mannose (Q27117227)aldehydo-mannose (Q106964021) (group of two stereoisomers, L and D). This level was introduced for every aldohexose, even if there is no ChEBI equivalent, as an effort to standardise and clean-up items about carbohydrates (more in: User:Wostr/Carbohydrates, as for now I only managed to clean-up aldohexoses).

So I moved subclass of (P279) aldehydo-hexose (Q105024342) that was added by your bot from items like aldehydo-D-mannose (Q27117223) to items like aldehydo-mannose (Q106964021).

SCIdude (talkcontribs)

Thanks, this was a bit experimental, and I'll be switching from SMILES to InChi for detection of core structures next. It's still possible to miss such connections The goal, of course, is to have classes that can be checked, and substances added, (semi-)automatically.

Reply to "aldohexose (open form)"
Wostr (talkcontribs)

There is also one problem regarding the classification of cyclic compounds that has to be adressed. We have classes like tricyclic compound (Q3539074) and there are two ways such classes are defined in sources:

  1. n-cyclic compound = every compound has exactly three rings, no more, no less, in the whole structure
  2. n-cyclic compound = every compound has no less than three rings, but may have more

Selecting any of the options has serious consequences for the entire classification and may result in our classification being inconsistent with classifications from other sources.

The first option seems more logical and consistent as every compound is classified according to the number of rings in the structure. However, classes like phenothiazine (Q16023748) or dibenzazepine (Q33416403) cannot be subclasses of tricyclic compound (Q3539074) but only polycyclic compound (Q426145) (as there is no certainty that every compound belonging to phenothiazine (Q16023748) or dibenzazepine (Q33416403) has exactly three rings). It is also not consistent with ChEBI, e.g. pentacyclic LSM-20934 is classified under organic tricyclic compound. From the other side, choosing the second option leaves us with a weird classification tree: tetracyclic compound (Q7706284) (four or more rings) should be a subclass of tricyclic compound (Q3539074) (three or more rings).

I have no good solution to this. I'd personally choose the first option, even if it means a lot of inconsistencies between databases and the need for carefully checking that each class and chemical compound is assigned to the appropriate n-cyclic compounds class.

SCIdude (talkcontribs)

The classification of LSM-20934 looks like an error, note all the LSMs under are two-star entries. What remains is the problem of derivatives adding a bridge to the core structure, I don't think this happens often, and that compound is no longer a derivative (in my book). So, I agree with you that option 1 is the most natural, but only if this applies to the core, not the whole structure, e.g. is still a naphtalene.

SCIdude (talkcontribs)
Reply to "n-cyclic compounds"

Call for participation in the interview study with Wikidata editors

Kholoudsaa (talkcontribs)

Dear SCIdude,

I hope you are doing good,

I am Kholoud, a researcher at the King’s College London, and I work on a project as part of my PhD research that develops a personalized recommendation system to suggest Wikidata items for the editors based on their interests and preferences. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I would love to talk with you to know about your current ways to choose the items you work on in Wikidata and understand the factors that might influence such a decision. Your cooperation will give us valuable insights into building a recommender system that can help improve your editing experience.  

Participation is completely voluntary. You have the option to withdraw at any time. Your data will be processed under the terms of UK data protection law (including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018). The information and data that you provide will remain confidential; it will only be stored on the password-protected computer of the researchers. We will use the results anonymized (?) to provide insights into the practices of the editors in item selection processes for editing and publish the results of the study to a research venue. If you decide to take part, we will ask you to sign a consent form, and you will be given a copy of this consent form to keep.

If you’re interested in participating and have 15-20 minutes to chat (I promise to keep the time!), please either contact me on kholoudsaa@gmail.com or use this form https://docs.google.com/forms/d/e/1FAIpQLSdmmFHaiB20nK14wrQJgfrA18PtmdagyeRib3xGtvzkdn3Lgw/viewform?usp=sf_link  with your choice of the times that work for you.

I’ll follow up with you to figure out what method is the best way for us to connect.

Please contact me using the email mentioned above if you have any questions or require more information about this project.

Thank you for considering taking part in this research.



Reply to "Call for participation in the interview study with Wikidata editors"
Bamyers99 (talkcontribs)

I have just added a note at the top of the EntitySchema directory indicating that it is programmatically generated. I incorporated some of your changes into the Configuration. I moved the molecular biology schemas to their own category. I added a See also link to the WikiProject Main classes and their canonical database. I didn't add the See also link to the WikiProject ShEx page since it is duplication of the data in the directory.

SCIdude (talkcontribs)

This is great work!

Reply to "EntitySchema directory updating"

Please stay away from the Merge tool in the near future

Maxim Masiutin (talkcontribs)

Your advice to "Please stay away from the Merge tool in the near future" is inappropriate. Please stay away from such advices here. ~~~~

Reply to "Please stay away from the Merge tool in the near future"
Wostr (talkcontribs)

I'm not sure about 2-phenylcyclopropan-1-amine (Q100423358). Right now it's quite messy regarding classification of DL-tranylcypromine (Q420885), (2S)-2-phenyl-1-cyclopropanamine (Q27163528), (1R,2R)-2-phenylcyclopropan-1-amine (Q27280143) and 2-phenylcyclopropan-1-amine (Q100423358). Before your edits DL-tranylcypromine (Q420885) seemed to be about group of stereoisomers (both stereocenters undefined; probably with some identifiers for stereochemically defined compounds); now I'm not sure how to change instance of (P31)/subclass of (P279) in the rest of the items.

Check 2-phenylcyclopropan-1-amine (Q100423358), (1R,2R)-2-phenylcyclopropan-1-amine (Q27280143) and (2S)-2-phenyl-1-cyclopropanamine (Q27163528) to make sure that I get your idea right. But I'm still unsure about DL-tranylcypromine (Q420885) — is this about a racemate, about a group of stereoisomers or about a specific (stereochemically defined) chemical compound?

SCIdude (talkcontribs)

@Wostr The product is the trans-racemate, i.e. (R,S) and (S,R), and I actually wanted to add P31 for this, but suddenly remembered someone emphasized not to mix group and racemate, so I stopped. Maybe the name should be changed to (RS*,SR*) to be more exact?

Wostr (talkcontribs)

Okay, now it makes more sense, I'll handle this. We need two new items for both stereoisomers to properly model this situation and move/delete few properties that are not 100%-true for a racemic mixture. I'll write again after doing this.

Wostr (talkcontribs)

I think all is done right now. 2-phenylcyclopropan-1-amine (Q100423358) and (2S)-2-phenylcyclopropan-1-amine (Q27163528) as group of stereoisomers, tranylcypromine (Q420885) as a racemate, trans-(−)-tranylcypromine (Q100429558), trans-(+)-tranylcypromine (Q100429273), (1S,2S)-2-phenylcyclopropan-1-amine (Q100430420) and (1R,2R)-2-phenylcyclopropan-1-amine (Q27280143) as specific stereoisomers.

If you come across similar situations with racemates in the future, feel free to point me to such items.

SCIdude (talkcontribs)


Reply to "tranylcypromine"
Charles Matthews (talkcontribs)
SCIdude (talkcontribs)

Your decision to remove seems correct, why you would put it then on the human protein I don't understand, as MeSH treats it as family. I doubt that main subject statements are affected at all, as I did not use MeSH for them, do you have an example?

SCIdude (talkcontribs)
Charles Matthews (talkcontribs)

OK, thanks for the advice.

Reply to "Q29827740"
Daniel Mietchen (talkcontribs)

I'm glad to see that you are experimenting with bot work on citations and would be happy to help test these workflows and their output.

SCIdude (talkcontribs)

Yes, but I'm not planning large-scale operations at the moment. This is all in the context of SARS-Cov-2. One reason is that I'm scraping from PMC pages and I don't want to overdo that. However, that is probably also the most up-to-date source for this data, what do you think?

SCIdude (talkcontribs)
SCIdude (talkcontribs)

Oh I forgot. The output of it is to be fed to wikibase-cli like this: `wd ee --batch -s pmcart-cites --no-exit-on-error <output`.

SCIdude (talkcontribs)

Datapoint: of 25,668,149 items with DOI in a recent dump, 6,995,884 (27%) had at least one P2860 (cites work) statement.

Reply to "Bot for adding citations"
Reedy (talkcontribs)


Your bot is logging into Wikimedia projects nearly 23K times in a 48H period, which is excessive, and shouldn't be necesssary.

See https://phabricator.wikimedia.org/T256533#6261565

Can you do anything about this?


>If you are sending a request that should be made by a logged-in user, add assert=user parameter to the request you are sending in order to check whether the user is logged in. If the user is not logged-in, an assertuserfailed error code will be returned.

SCIdude (talkcontribs)

Thanks, I'm investigating and reporting at phabricator.

Reply to "High Scidudebot login rate"