User talk:Wostr/Structured Discussions Archive 1

About this board



I'm quite busy in my real life, so I may not respond swiftly to comments on this page.

Archived discussion (from before March 10, 2018) is available here.

RudolfoMD (talkcontribs)

Can you explain further? Your edit summary to https://www.wikidata.org/w/index.php?title=Q57055&oldid=prev&diff=1981715362, was this property is not designed for this type of data, please read the documentation first or propose other uses of this property on the discussion page.

As noted Wikidata:Project chat#Basic question. How do I mark that a drug has a en:boxed warning?, I have read the documentation and am seeking input. If the data doesn't belong in the way I've added it, explain or better yet show me how it should be added, please. (Or are you trying to keep the info from being added at all?)

Wostr (talkcontribs)

klasyfikacja i oznakowanie substancji niebezpiecznych (P4952) was created and designed only for chemical hazard classification and labelling, not for any medicine-related info. It has quite uncommon structure with a legal act as a main value and classification/labelling information added as qualifiers. Therefore, it has several constraints added (both simple constraints on the property page and complex constraints on the discussion page) which do not allow any other use of this property.

The second problem here is that Wikidata is not a Wikipedia article, you cannot add data in some random way, because WD data is not designed to be read by human like an encyclopedic article. The data model must be designed in such a way that the data can then be re-used automatically, using various methods. You add this information in one way, someone else in another, and the person who wants to re-use this data will not know how to get it from WD or won't even know that such information is here. On the other hand, using the wrong property will result in re-users importing data from WD they should not receive at all and that they do not expect to be there. And no one will manually review the results of such an import, where the number may amount to millions of imported items.

You should ask here first. Maybe there is a property that can be used for this kind of information (there are properties for legal status of pharmaceutical products like status prawny produktu leczniczego (P3493), but I'm not familiar with all medicine-related properties), maybe it was proposed earlier but was not accepted, maybe this kind of data would be better modelled in a qualifier, not a stand-alone property. That should be determined in a discussion and I think that WikiProject Medicine is the best place to ask.

RudolfoMD (talkcontribs)

Thanks. I don't feel closer to accomplishing this task.

How do I mark that a drug has a en:boxed warning? Are you willing to show me how to do it right? It sounds like you aren't.

You can see I am not trying to add data in a random way, so your second paragraph is thus strange, as it states what I think what I pointed you to shows I know, with only negative, not positive, guidance.

You are telling me not to add it the way I did. Q879952 exists. How do I connect it to Q57055, if not with P4952? (I don't want to spend a ton of time learning confusing stuff I'll never use again . I just want to add the information with as little hassle as possible. I have read WD documentation, for hours, to no avail. Perhaps the wrong documentation.) Frustrated. It feels like you you are unwilling to show me how to do it right, only how I'm doing it wrong. I'm not saying you aren't, but that's what it feels like. Wikidata:Database reports/Constraint violations/P4952#Scope shows many violations. Why single my edit out?


You are saying I'm not allowed to add this data until another property is created, right? But I'm not allowed to create a property, right?


You say I should ask here first. But it seems often no one is answering questions at Wikidata talk:WikiProject Medicine about drugs. viz:

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine#Drug_categories

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine#Drug_interactions_(P769)

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine#The_appearance_/physical_attributes/_of_drugs

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine#Drug_Indications

etc, etc. And I see you are helping with drugs on WD.

I don't understand what you are saying I should ask. You're saying Property:P4952 can't be used for what I used it for? If the constraints were rigid, then why didn't they prevent it? If not, why can't I expand them?

You say "Maybe there is a property that can be used for this kind of information (there are properties for legal status of pharmaceutical products like legal status (medicine) (P3493), but I'm not familiar with all medicine-related properties)" So are you saying I need to ask there if there is a property that can be used for this kind of information?

https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P4952 shows that Q7865437 is being used four times, even though it's not one of the Allowed values.  Why should we not just add it - Q7865437 - and Q879952 to P4952 as allowed values? Is that what you were suggesting when you wrote, "propose other uses of this property on the discussion page" ? If that's not an OK way to add it, how do we know what is or is not an ok way to add it?


For some reason your P links above are in a foreign language - Polish? This talk page is very constraining. I can't see what is wrong.

Wostr (talkcontribs)

It's a lot of questions, I'll try to answer them, but it won't be short.

1. I can't show you how to do it, because there is no valid way to do this that I know of. Wikidata is not about adding a sentence with the only problem being in which sections such sentence should be added. Wikidata is a structured data system, every piece of information has (or at least should have) a proper place.

2. For me, as a Wikidata editor, your edit seems very random. It's not uncommon here, 'add some information to a property that seems to be related to the one I'm looking for'.

3. You're asking the wrong question here 'how to add this information?'. The right question in Wikidata is: 'hot to add this information so that others could re-use it?'. If you add some information just for the sake of adding it, in a way that no one knows or expects, that information will be worthless in Wikidata. So, don't get me wrong, because this is not aimed at you personally and I will try to help you as best as I can, but Wikidata is much more complex than the rest of Wikimedia projects and it is required to learn the basics, which can take days or weeks.

4. Property constraints in Wikidata are not designed to prevent from editing. They display specific warnings and are used to maintain data consistency. We have a total of millions of constraint violations in over 10,000 properties and people work hard to manually check these violations and correct them. So no, I'm not going to allow another one, because you want to add one piece of information in one item.

5. You are free to propose a property (Wikidata:Property proposal), but without knowing the basics the chances of creating a property are low. That's why I suggested you should ask on the wikiproject discussion page.

6. With the data model adopted for P4952 property, I do not see any possibility of including this type of information in it. It would be more likely to include this kind of info as a qualifier for Property:P3493 (maybe like paracetamol (Q57055)status prawny produktu leczniczego (P3493)FDA-approved (Q111972129)charakterystyczna cecha (P1552)boxed warning (Q879952)), since this information is US-centric and related to the FDA drug approval process. Maybe it should be added via Property:P1552 property, not as a qualifier. But I don't know which way would be the best. I don't even know if this kind of information should be added to item like Q57055 or items about specific pharmaceutical products like Q3245302 or maybe as a qualifier to an item listed in Property:P3780 (like paracetamol (Q57055)substancja aktywna w (P3780)Tylenol (Q3245302)charakterystyczna cecha (P1552)boxed warning (Q879952), because in Wikidata it is not that simple, sometimes information described in one Wikipedia article is split between several (or even hundreds in some cases) Wikidata items (e.g. brand names of pharmaceutical products which are in Wikipedia article about the active substance do not belong to the item describing the active substance, every pharmaceutical product should have its own item in Wikidata). The first example with a qualifier for P3493 would be my proposal of how it could be done, but I don't know how it fits in the medicine-related properties data model. That's why I wrote: ask in the Wikiproject Medicine.

7. I don't know why Template:P shows you the Polish labels (maybe because you don't have any language info on your user page), it should show the labels in your language. I stopped using this template in this thread, just plain links now.

RudolfoMD (talkcontribs)
Wostr (talkcontribs)

The scale of this supports what I wrote: you should understand the basics of Wikidata, then discuss the whole project of yours in Wikidata. This is even more important in this situation because incorrect import of such a large amount of data may result in a lot of work going to waste and having to be corrected.

Any support in en.wiki has no impact on Wikidata as it is an independent project. Discuss your project in Wikiproject Medicine. I think you will find support for it here too. But before taking any action, you need to work out a solution on HOW to best do it.

U. M. Owen (talkcontribs)
Wostr (talkcontribs)

Not all statements refer to the mineral form. There are statements like InChI/InChIKey that are valid only for a type of chemical entity and would not be valid for mineral.

U. M. Owen (talkcontribs)

Do you say, that despite having a clearly defined chemical structure minerals may no longer have chemical IDs? Is this is a scientific or an ontological decision?

Wostr (talkcontribs)

As you can see, montmorillonit (Q422131) has many chemical external-IDs now. The problem here is that:

  1. montmorillonit (Q422131) describes a mineral form; I wouldn't say it has a defined chemical structure, at least not in a chemical sense, it is closer to a mixture of chemical entities than to a defined chemical entity.
  2. dodekatlenek diglinu tetrakrzemu jednowodny (Q26840848) describes a chemical entity; it is not important if such entity exists or not from a WD point of view, only if there are some external-IDs for it.
  3. Chemical composition of both entries differ and e.g. InChI is valid only for one of these entries. InChI refers to a specific chemical composition and structure and this InChI does not correspond to chemical composition or structure of montmorillonit (Q422131).
Poepchinezen (talkcontribs)
Wostr (talkcontribs)

Hi, my only edits in these field were to fix some relations that were incorrectly added using instance of (P31). I'm afraid I can't help with anything else here.

Poepchinezen (talkcontribs)

Thanks, I will wait for other opinions.

Jamie7687 (talkcontribs)

I see that you reverted my edit linking the two. While I can see a potential use for a more specific property, and agree that technically chemical compounds can be made many different ways, I feel like there should be some way to incorporate some aspect of this particular relationship into Wikidata, just as this link is present in many language editions of Wikipedia, and in numerous other databases. Do you think there's a way to express this relationship using existing properties, or do you think a new property would be needed to communicate that connection?

Wostr (talkcontribs)

I can't point you to any discussion right now, because I'm using my phone. I can just say that it was discussed earlier and probably the best option is to store that kind of data in items about reactions and using queries to retrieve the data for particular compounds. Having this kind of data in items about compounds would result in hundreds of statements in these items (something that was rejected earlier for chemical elements and 'is part' statements).

SCIdude (talkcontribs)

Please correct this How many did you do wrong in this batch?

Wostr (talkcontribs)

Yes, you're right about the fact that this item describe stereochemically defined compound. I did not know that there may be InChIs with "?" in sublayers /b or /t that are not an indication of an undefined stereocenter. The problem is the 3-iminopyrazol-1-yl group. InChI from PubChem indicates that double-bond stereochemistry of H-N=C< is undefined. However, it's hard to reproduce this in any software available to me and even redrawn PubChem structure in ChemDraw gives different InChI.

I'll check all the InChIs in these 4 batches for possible 'false positives' in /b sublayer. I can't tell you right now how big is this problem, but I'll contact you as soon as I have that kind of knowledge. Then I'll correct all incorrectly changed statements, but I can't tell you right know if these kind of errors are occasional and I'll correct them manually, or I'll have to use semi-automatic tools.

SCIdude (talkcontribs)

It may be a problem with InChis from ChEBI. If so, we should replace them in bulk if that solves the problem.

Wostr (talkcontribs)

After a quick check I see that there may be about 20–30 items in these batches that have to be checked. I'll do that manually (however, I don't know when — probably tomorrow od the day after as I have a really hot week in work).

Wostr (talkcontribs)

There are 30 items that I'll be reviewing, all are listed here. It seems that most of the problems is a result of double bond on nitrogen atoms or double bonds in rings. I'm not sure why InChI in these items shows e.g. oxime group HO-N=C< as a group that should have defined stereochemistry. However, there are situations like in Heme O (Q620211): InChI from PubChem shows undefined configuration of many double bonds of porphyrin, while InChI from ChemSpider shows all that double bonds as stereochemically defined.

I'll try to check whether these InChIs from PubChem are correct for these items. Maybe we should have more than one InChI is such situations (even with a deprecated rank).

SCIdude (talkcontribs)

This problem also showed with my current ChEBI InChi key fixes, resulting in different keys. I agree multiple InChis and keys are unavoidable. But, when using different ranks, we should have a consistent way to assign these. For example, do we prefer to not specify oxime bonds? What about diazo -N=N- bonds, PubChem usually leaves them unspecified (I agree with this). As to porphyrin bonds, is the (E) configuration geometrically possible? If not, the bond does not need to be specified. This has to be defined on some project page. Could you please do this?

SCIdude (talkcontribs)

BTW I think I found out why the ChEBI InChis may have a problem. Take the SMILES "C1C[C@@H]2CC[C@H]1C2" which is norbornan with redundant stereo information (the centers are potentially stereogenic but not in this case). When input in PubChem, they automatically remove the stereo specs, input in ChEBI does not. From this the InChis become different. So effectively it's a ChEBI software problem.

Wostr (talkcontribs)

As I thought that ~30 incorrect items is a very low number given the scale of QS batches, I checked the whole batch in Excel rather than trying to query it using SPARQL from WD.

I found 774 potential InChIs that may have ? in /b sublayer and may not be a group of stereoisomers. I've manually checked all the items (unfortunately, most have only one source – DSSTOX database – because were created by GZWDer imports) and found:

  • about 58,5% are correct (mostly undefined configuration of C=C bonds or substituted diazo bonds)
  • about 23,8% have to be checked more carefully, however I think that most are correct (about 95% of these are undefined configuration of double bonds in eight or more membered rings – I checked that it is possible to have an eight-membered ring with at least one E double bond, so probably these items are correctly described as 'group of stereoisomers')
  • about 17,7% are probably incorrect (about 85% of these are unsubstituted imino groups that are treated in many databases as stereochemically undefined, however at least 3 different InChIs can be assigned for such situations; the rest are unsubstituted diazo groups, heterocyclic compounds or some weird borderline cases + some items in which InChIs from different sources differ).

I'll post on Wikidata:WikiProject Chemistry discussion page in a few days about this problem. Most incorrectly added 'groups of stereoisomers' for compounds with unsubstituted imino group can be reverted using QS, so it won't be a problem to do it technically, but we have to do it in uniform way for all cases.

The problem you've mentioned about norbornan and redundant stereo descriptors may cause additional problems in the future. I added 'group of stereoisomers' to items that have ? in InChI sublayers /b or /t. However, there are also many InChIs for groups of stereoisomers that lack these sublayers (if a compound have 2 stereocentres and 1 is undefined – there is a /t sublayer with a ? descriptor for one stereocenter; if all 2 stereocentres are undefined – there is no /t sublayer – so we stil have thousands of 'groups of stereoisomers' (with all stereocentres undefined) that are classified as 'chemical compounds'. I asked Egon Willighagen about the script he wrote in 2019 – items like norbornan may be false positives that we should try to exclude.

Mabschaaf (talkcontribs)

Hi Wostr, may I ask you for the reason of this edit? According to SciFinder CAS 6912-67-0 exactly represents the molecule without defined stereochemistry in none of the both stereo centers. Greetings --Mabschaaf (talk) 17:42, 3 July 2021 (UTC)

Wostr (talkcontribs)

I'm not sure right now, but:

  • sourced external-IDs should be very rarely deleted even if such IDs are wrong; the better way to do it in WD is to deprecate such statement, because deleting wrong IDs is only temporary, sooner or later such IDs will be added again by some bot-owner — so I usually deprecate CAS numbers instead of deleting them as there were situations in the past that deleted (wrong) external-id reappeared after some mass-import of data.
  • as I no longer have access to SciFinder for several years now, in situations like this I have to use secondary sources (if CAS Common Chemistry entry is not available). I think that based on ChemIDplus entry and maybe some others (which matched CAS: 6912-67-0 with InChI=1S/C5H9NO3/c7-3-1-4(5(8)9)6-2-3/h3-4,6-7H,1-2H2,(H,8,9)/t3?,4-/m0/s1) I chose to deprecate CAS number in (4RS)-4-hydroxy-DL-proline (Q411237) and leave it with normal rank in (4RS)-4-hydroxy-L-proline (Q27102938).

If SciFinder entry tells different, it should be changed — statement in (4RS)-4-hydroxy-L-proline (Q27102938) deprecated and in (4RS)-4-hydroxy-DL-proline (Q411237) normalized.

Mabschaaf (talkcontribs)

I did as described. Thanks for your help.--Mabschaaf (talk) 11:08, 4 July 2021 (UTC)

Call for participation in the interview study with Wikidata editors

1
Kholoudsaa (talkcontribs)

Dear Wostr,

I hope you are doing good,

I am Kholoud, a researcher at King’s College London, and I work on a project as part of my PhD research that develops a personalized recommendation system to suggest Wikidata items for the editors based on their interests and preferences. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I would love to talk with you to know about your current ways to choose the items you work on in Wikidata and understand the factors that might influence such a decision. Your cooperation will give us valuable insights into building a recommender system that can help improve your editing experience.  

Participation is completely voluntary. You have the option to withdraw at any time. Your data will be processed under the terms of UK data protection law (including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018). The information and data that you provide will remain confidential; it will only be stored on the password-protected computer of the researchers. We will use the results anonymized to provide insights into the practices of the editors in item selection processes for editing and publish the results of the study to a research venue. If you decide to take part, we will ask you to sign a consent form, and you will be given a copy of this consent form to keep.

If you’re interested in participating and have 15-20 minutes to chat (I promise to keep the time!), please either contact me at [] or [] or use this form https://docs.google.com/forms/d/e/1FAIpQLSdmmFHaiB20nK14wrQJgfrA18PtmdagyeRib3xGtvzkdn3Lgw/viewform?usp=sf_link with your choice of the times that work for you.

I’ll follow up with you to figure out what method is the best way for us to connect.

Please contact me if you have any questions or require more information about this project.

Thank you for considering taking part in this research.

Regards

SCIdude (talkcontribs)

FYI I have added items with 13C and 14C to the aldehydo-hexoses:

aldehydo-hexose (Q105024342) ↑

├──aldehydo-(¹³C₆)hexose (Q82881663)

│   -aldehydo-L-(¹³C₆)idose (Q82842547)

│   -aldehydo-D-(¹³C₆)glucose (Q105108360)

├──aldehydo-galactose (Q100602655) ↑

│   -aldehydo-D-galactose (Q27102217)

│   -aldehydo-L-galactose (Q27117209)

├──aldehydo-allose (Q100604517) ↑

│   -aldehydo-D-allose (Q423216)

│   -aldehydo-L-allose (Q27117249)

├──aldehydo-gulose (Q101095964) ↑

│   -aldehydo-D-gulose (Q423227)

│   -aldehydo-L-gulose (Q27117231)

├──aldehydo-altrose (Q106941265) ↑

│   -aldehydo-D-altrose (Q423207)

│   -aldehydo-L-altrose (Q72437509)

├──aldehydo-glucose (Q106941538) ↑

│   -aldehydo-L-glucose (Q3266724)

│   -aldehydo-D-glucose (Q21036645)

│   -aldehydo-D-(6-¹³C)glucose (Q82694157)

│   -aldehydo-D-(1,6-¹³C₂)glucose (Q82694158)

│   -aldehydo-L-(1-¹⁴C)glucose (Q82877697)

│   -aldehydo-D-(6-¹⁴C)glucose (Q83061294)

│   =aldehydo-D-(¹³C₆)glucose (Q105108360)

├──aldehydo-idose (Q106947809) ↑

│   -aldehydo-D-idose (Q423179)

│   -aldehydo-L-idose (Q27277756)

│   =aldehydo-L-(¹³C₆)idose (Q82842547)

├──aldehydo-mannose (Q106964021) ↑

│   -aldehydo-D-mannose (Q27117223)

│   -aldehydo-L-mannose (Q27117227)

│   -aldehydo-D-(1,2-¹³C₂)mannose (Q82694075)

└──aldehydo-talose (Q107080868) ↑

    -aldehydo-D-talose (Q423195)

    -aldehydo-L-talose (Q27158868)

    -aldehydo-D-(2-¹³C)talose (Q82694379)

Wostr (talkcontribs)

Thanks for info. However, with no support for Wikidata:Property proposal/isotopically modified form of there should be some way to link e.g. aldehydo-D-(2-¹³C)talose (Q82694379) with aldehydo-D-talose (Q423195). With no dedicated property, there is probably one one way to do it: aldehydo-D-(2-¹³C)talose (Q82694379) subclass of (P279) aldehydo-D-talose (Q423195), but it's not possible right now with chemical compounds modelled as instances... I'll try to finish my proposal to switch 'instance of' to 'subclass of' regarding chemical compounds and post it in WikiProject:Chemistry discussion page (sorry, I didn't have time to review your proposal there, I will probably have some days off at the end of the week).

Gremista.32 (talkcontribs)

I did something wrong by putting a space

I don't know if you have any recommendations

Call for participation in a task-based online experiment

1
Kholoudsaa (talkcontribs)

Dear Wostr,

I hope you are doing good,

I am Kholoud, a researcher at King's College London, and I work on a project as part of my PhD research, in which I have developed a personalised recommender system that suggests Wikidata items for the editors based on their past edits. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I am inviting you to a task-based study that will ask you to provide your judgments about the relevance of the items suggested by our system based on your previous edits.

Participation is completely voluntary, and your cooperation will enable us to evaluate the accuracy of the recommender system in suggesting relevant items to you. We will analyse the results anonymised, and they will be published to a research venue.

The study will start in late January 2022 or early February 2022, and it should take no more than 30 minutes.

If you agree to participate in this study, please either contact me at [] or use this form https://docs.google.com/forms/d/e/1FAIpQLSees9WzFXR0Vl3mHLkZCaByeFHRrBy51kBca53euq9nt3XWog/viewform?usp=sf_link

I will contact you with the link to start the study.

For more information about the study, please read this post: https://www.wikidata.org/wiki/User:Kholoudsaa

In case you have further questions or require more information, don't hesitate to contact me through my mentioned email.

Thank you for considering taking part in this research.

Regards

Return to the user page of "Wostr/Structured Discussions Archive 1".