Wikidata talk:WikiProject Chemistry/Archive/2016

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

eChemPortal

Latest comment: 8 years ago12 comments3 people in discussion

The OECD eChemPortal is a valuable database of information on chemical substances. I recommend including a link to that database in the items of chemicals. In the case of pseudocumene (Q376994) for example, the link is http://www.echemportal.org/echemportal/substancesearch/substancesearch_execute.action?allParticipants=true&numberType=CAS&number=95-63-6. --Leyo 15:55, 4 February 2016 (UTC)

Leyo No, because this is not a database but a weblink database. This can a tool to find data in other databases but this is better to directly link to the original databases where data are instead of pointing towards a database which points to other databases. Snipre (talk) 18:09, 4 February 2016 (UTC)

OK, let's call it metadatabase. It's more than just a number of blind links to (possible) entries in other databases. The only problem for Wikidata is that there is no ID other than the CAS number. --Leyo 23:37, 4 February 2016 (UTC)

From WD point of view, we should have only one parameter defining a unique entry in the database. From what I see, this is not the case for this database. Snipre (talk) 16:01, 9 February 2016 (UTC)

The CAS number is “only one parameter”. Is the problem that it is not a parameter specific to the eChemPortal? --Leyo 23:21, 11 February 2016 (UTC)

Leyo We already have the CAS number as property so no need to create anything: people can search using this information using the search tool of the database.

But if you want to create a link using the CAS number, we have already the tool of Magnus which connect a CAS number to all databases using CAS number as search parameter: see here for the case of the methanol. Snipre (talk) 08:22, 15 February 2016 (UTC)

As far as I see, the tool is not linked in items of chemicals. However, my point is, that the eChemPortal should be accessible directly from there. --Leyo 00:43, 17 February 2016 (UTC) P.S. There are several dead links in Magnus' tool.

Leyo Seems that eChemPortal changed the way to link to their data. How do you want to create a link to eChemPortal ? Snipre (talk) 19:01, 17 February 2016 (UTC)

I am not sure how exactly it should be done. That's why I am asking. ;-) --Leyo 02:16, 18 February 2016 (UTC)

We should be using the InChI / InChIKey as main unique identifier for most compounds (e.g. all organic compounds). --Egon Willighagen (talk) 18:30, 14 February 2016 (UTC)

Egon Willighagen InChI is not really human friendly for comparison purpose. InChIKey is better but still complex. But to be honest we should have a tool which create a drawing of the chemical and the corresponding SMILES, InChI and InChIKey. These four elements have to be created at the same time and shouldn't have different origins. In that way PubChem is a good tool because it create these four elements together. Snipre (talk) 08:43, 15 February 2016 (UTC)

Snipre Yes, PubChem CID works for me. They have a bot that can help, see User:ProteinBoxBot and pinging User:Andrawaag. For the Wikidata:WikiProject_Medicine/Zika project I am using QuickStatements (see this source code: https://github.com/egonw/zikaVirus; both options use Bioclipse): 1. take a SMILES, generate InChI and InChIKey, lookup PubChem CID, and create a QuickStatement (linked to the paper in which the compound was mentioned); 2. take a ChEMBL ID, look up SMILES, InChI, and InChIKey in ChEMBL, and create a QuickStatement (with their permission to copy the data for these Zika-related compounds). In both cases, I visualize the 2D structures in Bioclipse (with the ui.view() command), to make sure things look OK. I have not had time for this, but need to learn how to write (and mostly use) bots, and then talk to the PubChem people, to autopopulate items with PubChem CIDs with additional CCZero/PD data from PubChem (as earmarked by them). Egon Willighagen (talk) 10:39, 15 February 2016 (UTC)

Open Beauty Facts

Latest comment: 8 years ago1 comment1 person in discussion

Hey all

Notified participants of WikiProject Chemistry,

The volunteers behind Open Food Facts are attacking Cosmetics :-)

Just like what we did for food, we're going to create a worldwide open data base of all cosmetic products, with ingredients, allergens, categories, brands, reference photos, using mobile phones. The effort has started at http://world.openbeautyfacts.org

We'll get a full list of ingredients in your favorite lipstick, shampoo or creams. We're starting to realize how much chemistry is actually involved.

My first step was to get the UNII (P652) ids imported (Mix'n Match).

We're going to improve the cosmetics articles, and hopefully manage to better link them with the underlying molecules (parabens, quaterniums…) (hierarchy)

We're also going to try and get as many Colour Index International constitution ID (P2027) as possible in relevant chemistry items (they're only in labels right now, and not part of the Chemistry infobox), as they're often used on shampoos.

Let me know if you have any ideas, either for Open Beauty Facts or on how to improve the cosmetic situation on Wikidata.

--Teolemon (talk) 13:50, 9 February 2016 (UTC)

Elements

Latest comment: 8 years ago1 comment1 person in discussion

There is two ways to define elements

Types of atoms with the same atomic number (aka. the set of all atoms for some atomic number)
Types of substances with only one type of atoms (in definition 1)

It seems that we have an interwiki conflict here, because en:Chemical elements uses 1. as a main definition, and, for example, fr:élément chimique does use the second. I'm afraid I failed to gain a concensus jut for the pair of languages to align the definitions, so I guess we won't avoid the spitting of items, this will give work to WD:XLINK. For each element … The good news is that it will clarify some classification issue (assuming we solve the same problem for chemical substance, molecular entity and all) :

⟨ Hydrogen (en) ⟩ subclass of (P279) ⟨ pure chemical substance ⟩

⟨ Hydrogène (fr) ⟩ subclass of (P279) ⟨ atom ⟩

⟨ Hydrogène (fr) ⟩ part of (P361) ⟨ Hydrogen (en) ⟩

Hydrogen (en) = hydrogène élémentaire/hydrogène pur (fr)

Hydrogène (fr) = Hydrogen atom (en)

⟨ Hydrogène (fr) ⟩ instance of (P31) ⟨ élément chimique (fr) ⟩

élément chimique (fr) = "type of atom occuring in some element" (en)

⟨ USS Akron (Q1456109)  

 ⟩ has part(s) (P527) ⟨ Helium (en) ⟩

Does that seem correct ? @Emw, Snipre: of course.

I am very much in favor of splitting things. For several reasons: one "element" can have two or more substances. Oxygen has at least molecular oxygen and ozone, carbon has several, excluding all the pure-carbon molecular structures (buckyballs, graphenes). Is there consensus now? Egon Willighagen (talk) 06:39, 13 April 2016 (UTC)

Compounds with several CAS numbers

Latest comment: 8 years ago7 comments4 people in discussion

What should we with tartaric acid (Q194322) for example?

CAS	Comment
133-37-9	DL
87-69-4	L-(+)
147-71-7	D-(–)
147-73-9	meso
526-83-0	D-(–) ?

--Kopiersperre (talk) 10:14, 10 March 2016 (UTC)

We have to separate mixture of isomers and isomers. So we will have 3 items at least:

- one for the mixture (DL)

- one for the D form

- one for the L form

Snipre (talk) 14:04, 10 March 2016 (UTC)

There should be separate items for dextrorotatory isomers, laevorotatory isomers, and racemic mixtures. I have been creating separate items for D- and L-isomers in certain cases. James Hare (NIOSH) (talk) 15:16, 10 March 2016 (UTC)

~~@James Hare (NIOSH): Can you provide the Q number of the items you created ? We should transfer the data from tartaric acid (Q194322).~~ In this case we have to create another item for the meso form which is a component too. For the second CAS number for D form we should check if these 2 numbers are correct and in that case check which one is the current valid one. Snipre (talk) 08:07, 11 March 2016 (UTC)

@Kopiersperre: L-tartaric acid (Q23034944), D-tartaric acid (Q23034947) and (S)-tartaric acid (Q23034950). Snipre (talk) 13:23, 11 March 2016 (UTC)

There is still a problem: there one CAS number for the racemic mixture which is different from the generic tartaric acid. Snipre (talk) 13:31, 11 March 2016 (UTC)

Very much supporting the split up. The more precise we are, the better we do. This particular case is important to research I do in various projects. Egon Willighagen (talk) 06:41, 13 April 2016 (UTC)

Silicic acids

Latest comment: 8 years ago5 comments3 people in discussion

silicic acids (Q16524585) and orthosilicic acid (Q422843) should be merged. May you please me help to investigate the right CAS numbers (see also silica gel (Q308976) and metasilicic acid (Q3604536))?--Kopiersperre (talk) 17:34, 10 March 2016 (UTC)

@Kopiersperre: Please look at the german articles for both items silicic acids (Q16524585) and orthosilicic acid (Q422843): one is about the family of silicic acid and the other one about a specific form of silicic acid. German links will prevent any merge actions so perhaps should you first analyze them. But for me these items shouldn't be merged just relabeled. Snipre (talk) 14:23, 12 March 2016 (UTC)

Rename orthosilicic acid (Q422843), create disilicic acid (Q23038943), ~~metasilicic acid (Q23038949)~~ and pyrosilicic acid (Q23038952). Snipre (talk) 14:43, 12 March 2016 (UTC)

Thanks for the solution. Created trisilic acid (Q23038984).--Kopiersperre (talk) 15:11, 12 March 2016 (UTC)

Trisilic or trisilicic acid ? --Chris.urs-o (talk) 12:27, 17 June 2016 (UTC)

Items for hypothetical compounds?

Latest comment: 8 years ago5 comments2 people in discussion

What do you think on User talk:Marsupium#Ammonia / ammonium hydroxide? Should a separate item be created for the hypothetical ammonium hydroxide described by AAT record 300266781? How are similar cases handled such as elements not existing under common conditions? Cheers and thanks for pinging! --Marsupium (talk) 14:21, 17 April 2016 (UTC)

@Marsupium: Create a different item because ammonium hydroxide ia only one type of molecule in a ammonia solution. Ammonia solution is a chemical substance meaning a mixture of different types of molecules and one class of these molecules is ammonium hydroxide. So you can connect ammonia solution with ammonium hydroxide usinf property "part of". Snipre (talk) 19:59, 8 June 2016 (UTC)

OK, thanks! But the problem is that ammonium hydroxide seems not to exist actually by itself outside ammonia solution. Which instance of (P31) shall <ammonium hydroxide> get? --Marsupium (talk) 18:48, 13 June 2016 (UTC)

@Marsupium: I don't know but hypothetical compound is not correct: this compound exists but only in small quantity and in certain conditions. Snipre (talk) 15:15, 14 June 2016 (UTC)

OK, thank you! I thought about that. If there is no obligation to point that out, I'll simply create the item. --Marsupium (talk) 18:12, 14 June 2016 (UTC)

Import parts of UniProtKB

Latest comment: 8 years ago15 comments3 people in discussion

Hi, what is necessary for an import of 550.000 reviewed items with their properties "accession number, protein name, gene name, organism, GO - molecular and biological function, keywords, length, mass and sequence"? We already have their permission to import. Here's the archived discussion from the project chat and here's the section at Portal:Gene Wiki, thanks, --Ghilt (talk) 07:15, 8 June 2016 (UTC)

@Ghilt: You need

a spreadsheet with all the data or at least an API to extract data from UniProtKB
the list of all items about protein with the corresponding UniProtKB identifier
a matching table between the wikidata properties and the corresponding UniProtKB parameters
an agreement from contributors working in the field of biology to import al the mentioned data.
and finally a bot operator ready to do the job. Don't forget to ask him to add after each statement import the reference using as example help:Sources, section databases.

The goal of wikidata is not to import all data from all databases. You should aim for data which can be useful for wikipedia mainly. The best is first to analyze infoboxes from different WP like en:WP, de:WP an fr:WP to see what kind of data is used in the articles. Then you can start to extract all corresponding data from UniProtKB. Snipre (talk) 19:55, 8 June 2016 (UTC)

Hi Snipre, thank you very much for the reply. The 555,000 items are not the full database, only their reviewed items. The data is used for writing protein articles on wikipedia. The matching table and the agreement shouldn't be a problem. But the API might, as it was difficult to get answers to my questions in either section (gene wiki on en.wp, wd Partnerships_and_data_imports and wd project chat) and i can't code sufficiently. Is there anybody who can help with that? --Ghilt (talk) 08:47, 10 June 2016 (UTC)

@Ghilt: Wikidata: Bot request. Snipre (talk) 11:29, 11 June 2016 (UTC)

Thanks again, i'll try that --Ghilt (talk) 21:58, 11 June 2016 (UTC)

@Ghilt: I am a little surprised to discover this proposal here and that you did not find the 377,000 UniprotKB items (SwissProt curated items, SPARQL query) we (project molbio/Gene Wiki team) already imported. We have all code in place and could do a full Swissprot import anytime required, but we prefer to do it species-wise, so we can link genes and proteins as described in the data model the Wikiproject Molecular Biology agreed on. Please see our papers on this [1] [2]. Sebotic (talk) 08:51, 16 June 2016 (UTC)

@Sebotic: Thanks for the reply. I had checked two typical protein items for molecular weight and length and didn't find the info, which is why i started at the project chat, followed by Portal:Gene_Wiki at en.wp, Partnerships and data imports, on this page and at Portal:Biology. And I finally found you! As i didn't intend to reinvent the wheel, your reply is a great help! This way, i don't need to import the 551,000. Should i discuss the creation of the properties "GO - molecular and biological function, keywords, length, mass and sequence" and the subsequent imports here or there? Cheers, --Ghilt (talk) 17:58, 16 June 2016 (UTC)

@Ghilt: The Wikidata protein items already have the full Gene Ontology annotations, which are maintained by our bot, directly from the original source QuickGo, so no need to add anything. Regarding length, mass and sequence: Length could be determined from sequence, so no need to add that, but there is a general agreement in WD project Molbio, not to add protein or nucleic acid sequences at this point, but let the users go to the original source if they need sequence info. This decision makes sense, as the current character limit for most WD text field properties is 400. Regarding mass: Several months ago, mass has been proposed as a property in the domain of chemistry, but it has been declined, because the mass of a molecule can be calculated from its chemical formula. Best, Sebotic (talk) 18:31, 22 June 2016 (UTC)

By the way, here ist the german version of the template infobox protein, cheers, --Ghilt (talk) 08:08, 17 June 2016 (UTC)

If sequences aren't feasible, how about importing the length? And I would really like to have the mass for writing protein articles without having to calculate each one or to go look at Uniprot. Cheers, --Ghilt (talk) 18:43, 22 June 2016 (UTC)

Moving this discussion to Project Molecular biology, cheers --Ghilt (talk) 20:42, 20 June 2016 (UTC)

BTW, i'll be in Esino Lario, who else? --Ghilt (talk) 15:11, 23 June 2016 (UTC)

Not possible for me. But if you have good experience there please feel free to report here your comments. Snipre (talk) 15:17, 23 June 2016 (UTC)

It actually was a great experience, the people of Esino Lario were incredibly welcoming. There were 'We welcome Wikipedians' signs on every fourth house and there were even drive-by hollars 'I love Wikipedia'. The local bakery renamed its cookies to 'Wikipedia's cookies'. The talks were ok, they're accessible on youtube, but more important was meeting some of the wikipedians i only knew by writing and pinning a face and a character to their name. Cheers, --Ghilt (talk) 18:07, 29 June 2016 (UTC)

Thanks for comment. It is always a good thing when we have positive feedback: this can help us to take part to the events in the future. Snipre (talk) 07:11, 30 June 2016 (UTC)

Philadelphia ACS meeting

Latest comment: 7 years ago3 comments2 people in discussion

Hello! There will be a Wikipedia Edit-a-thon at the national ACS meeting in Philadelphia next month. Will anyone from this group be there, to show ignorant chemists such as myself how to contribute to chemistry on Wikidata? Would anyone be able to give a short talk on what Wikidata is and how it will (hopefully) be used within Wikipedia? Walkerma (talk) 22:52, 15 July 2016 (UTC)

@Walkerma: Sorry, I am living in Europe and without any project to have holydays in the next weeks. I can only propose that you start to read some some help pages for the general structure of WD and then once you have more detailed questions, I will try to answer them. My reading proposition:

Wikidata:Introduction and Help:About_data for general overview.
Wikidata:Tours. These 2 small tutorials are quite good as description of the WD interface.
Help:Contents: a bunch of help pages on different topics if you want to go further.

Snipre (talk) 08:06, 20 July 2016 (UTC)

Thanks - I'll try to work through these. If I get anywhere, I may try to contribute a couple of slides on it to the Edit-a-thon, just to explain the concept to the chemists who show up.. Walkerma (talk) 02:59, 21 July 2016 (UTC)

GHS hazard statements

Latest comment: 7 years ago13 comments4 people in discussion

We already have items with H phrases and with P phrases. In my opinion unsourced hazard statements should get deleted.--Kopiersperre (talk) 14:40, 5 August 2016 (UTC)

I would like to import the first big chunk of P728 (P728) and P940 (P940). I think the only viable way is creating one item for every possible phrase or phrase combination (see the list).--Kopiersperre (talk) 14:42, 5 August 2016 (UTC)

The statements are strings, not items. --Izno (talk) 15:39, 5 August 2016 (UTC)

I know, but this should be changed.--Kopiersperre (talk) 08:31, 6 August 2016 (UTC)

BTW what will be the source of the statements you want to import? ∼Wostr (talk) 17:27, 6 August 2016 (UTC)
- German Wikipedia, which basically means GESTIS database (Q15811170). It will be a test, no plans to import everything.--Kopiersperre (talk) 19:43, 6 August 2016 (UTC)

@Kopiersperre, Izno: Please don't use Wikipedia to import data in WD: we already have enough complaint about the quality of these data to look for other sources. For GHS data please use the data from the ECHA available here as excel sheet. You can use the CAS number and the EINECS number to identify the item before the importation. Thanks Snipre (talk) 09:56, 8 August 2016 (UTC)

As Snipre above. Neither wikipedias nor unofficial sources/database/MSDSs should be used for GHS properties. Only ECHA database and harmonised classification (not notified classification, as it varies greatly depending on the producer) should be included in WD. I think we should also use applies to jurisdiction (P1001) = European Union (Q458), because classification in other parts of the World can be diffrent from European classification and labelling included in CLP and ATPs (e.g. U.S. OSHA may have it's own official c&l for certain substances). ∼Wostr (talk) 13:13, 8 August 2016 (UTC)

I think we should perhaps change the data structure. My concern is about different sets of H phrases. For example, if source A says that compound C should be labeled with H202 and H400 and source B says that the labelleing for C is H201 and H401, how can we later retrieve the good set of H phrases according to only one source ?

Instead of having different statements P728 (P728) and to have to filter them in order to get one unique labeling according to one source, we should create a new property Safety classification and to group all H phrases as qualifiers.

Example:

Safety classification: GHS hazard statement (Q28360)

P728 (P728): H201

P728 (P728): H401

Stated in : Source B

Safety classification: GHS hazard statement (Q28360)

P728 (P728): H202

P728 (P728): H400

Stated in : Source A

Snipre (talk) 13:44, 8 August 2016 (UTC)

Table of valid phrases

May you please help me filling out this tables? Some phrases (*) were altered by later ATPs.--Kopiersperre (talk) 10:36, 8 August 2016 (UTC)

@Kopiersperre: Please have a look at [3], page 34. Can you find a tool to extract data from pdf ? Snipre (talk) 11:17, 8 August 2016 (UTC)

@Kopiersperre: I found a better way: go to [4], then select all H phrases, select one language and choose the button "Download selected phrases as PLS" and you get a excel sheet with all phrases. Repeat the same with other languages then copy paste the content of the different sheets in one document and you have your list. Snipre (talk) 11:32, 8 August 2016 (UTC)

@Snipre: Very good solution.--Kopiersperre (talk) 13:45, 8 August 2016 (UTC)

Approximate values of dipole moment

Latest comment: 7 years ago3 comments2 people in discussion

CRC Handbook of Chemistry and Physics (95th edition) (Q20887890) contains table with dipole moments, some of them are given with a good presicion, but some are marked with "≈" ("Values measured in the gas phase that are questionable because of undetermined error sources are indicated as approximate") or enclosed in brackets ("Values obtained by liquid phase measurements, which sometimes have large errors because of association effects"). How can I add this information to WD? In propyl formate (Q421045) I tried to use sourcing circumstances (P1480) with circa (Q5727902) for [1,89] D, but I don't think it's a good option – this is not an approximate value, but just an undetermined uncertainity. sourcing circumstances (P1480) would be better with values marked with "≈", but I'm not sure if this is a right use of this property. ∼Wostr (talk) 23:10, 17 August 2016 (UTC)

@Wostr: The best is to use the original references and not the Handbook for these values in order to define which is the error. Snipre (talk) 09:47, 18 August 2016 (UTC)

@Snipre: I checked the first value marked with "≈" in the original source and it is marked with "Q" = "Questionable value" (there is a serious question about the best value to select or where there is insufficient information on which to base meaningful estimate of accuracy (...) They may be regarded as giving a rough estimate of the magnitude of the moment but are not of sufficient accuracy for quantitative use). Tables in CRC is based on 68 sources, mainly published before 2000, so for some compounds there may be better measurements of DM in the literature, but for some there may not be any other value. ∼Wostr (talk) 13:07, 18 August 2016 (UTC)

Pigments

Latest comment: 7 years ago3 comments3 people in discussion

I would like to add many printing and coating pigments (example Pigment Yellow 138 (Q26705718)). Am I right that there is no generic property for color?--Kopiersperre (talk) 16:18, 26 August 2016 (UTC)

Kopiersperre There is color (P462) for general colors description. But if you want to describe the color with a more detailed way there is sRGB color hex triplet (P465). Snipre (talk) 16:35, 26 August 2016 (UTC)

Colour Index International constitution ID (P2027). --Teolemon (talk) 15:10, 27 August 2016 (UTC)

DSSTOX substance identifier

Latest comment: 7 years ago1 comment1 person in discussion

Please also have a look at the proposed property for the EPA CompTox Dashboard identifier. User:ChemConnector has uploaded some 700 thousand InChIKey<>DTXSID mappings as CCZero to Figshare, and I want to include that information in Wikidata. For this, I will want to use a bot task, and will soon write up a task proposal. For go would then be to add mappings for Wikidata entries with matching InChIKeys, but I can also imagine creating new compound entries for InChIKeys not found in Wikidata yet. Comments on that second part most welcome. --Egon Willighagen (talk) 08:53, 28 August 2016 (UTC)

Import of ChEBI

Latest comment: 7 years ago9 comments3 people in discussion

Hello everyone, I will start importing all actual chemical compounds represented in ChEBI. Furthermore, I would like to import and maintain the full ChEBI ontology structure. This would enable a unique representation of chemical compounds in Wikidata and would highly improve the quality of chemical compounds in Wikidata. I have done that sucessfully with the Gene Ontology, which has a similar size and complexity and therefore have show that this is feasible.

For long term maintenance: The source code for this will be AGPLv3, available on our bitbucket repo [5] so in worst case, somebody else could take over and run the bot. Nevertheless, I would like to know your opinion on this. Best, Sebotic (talk) 20:43, 22 June 2016 (UTC)

@Sebotic: Not in favor of importing an external ontology in WD. Why do we have to maintain in WD an ontology defined and modified in another website ? The goal of WD is not to integrate everything from other databases but to link databases.

Same reasoning for importation of all chemicals from ChEBI. I don't see the interest of just being a mirror of another website. Better work at the interface of the existing databases than just copy-pasting data form one. I propose you instead of import data from one database to match data from different databases like ChEBI, ChemIDplus, ChemSpider, PubChem, ChEMBL or GESTIS and to import the data which are similar in all databases. ChEBI is just one database among several others so I don't understand why Wikidata should be the mirror of this database and not of the others. Snipre (talk) 11:38, 23 June 2016 (UTC)

@Snipre: Sorry for the delayed reply! The reason why I think ChEBI would be valuable is that it is the best chemical ontology currently available. It brings a ton of classification which could form the basis of futher work by the WD community. The only thing which maybe should not be imported is tautomers, as they have the same inchi (key). In general, I would want to import data from several source but certainly not as separate item per source but as a unified item with all the identifiers on it (CAS, Inchi key, Inchi, canonical SMILES, isomeric SMILES, CID, SID, ChEMBL, SureChEMBL, IUPHAR/GtoP, Drugbank, etc). The common id should be the InChI key, not perfect, but the best which is out there. Certainly, an important part is proper referencing, which is fairly easy as soon as the data sources have been determined. If we succeed, we would end up with the most high quality, open corpus of chemical compounds with most data/ids per compound anywhere to be found, which I think is great. Sebotic (talk) 01:13, 28 June 2016 (UTC)

@Sebotic: No problem for the delay. For the data I am sure you have good expertise. By only concern is to have a control process which work before the importation of data. I am really tired to correct statements and to merge duplicates each time large chemical data imports is done because people didn't do a correct job of data matching before importation. My recommendation are the next ones:

Before creating any new item check if another item already shares an identifier with your data set. And don't use label or page title of Wikipedia article as matching criteria.
Import data in one item only if you can match at least two identifiers between your data set and the data already present in the item.
If during the data import you detect the existence of an existing value for the property you want to import, compare the existing value with the value you want to import and if there is a difference don't import your data but create a conflict report in order to analyze the item later

For the question of the ontology, even if ChEBI is a good reference, we first have to check if the ChEBI ontology can match the overall Wikidata ontology. Wikidata can't be the sum of different ontologies if we want to have an unique way to query and to display data independently from the knowledge domains. For example, what happens if ChEBI ontology agrees to have items with both instance f/subclass of in an item but not Wikidata ?

I know that the ontology of Wikidata is very unclear but we need to be careful to keep a homogeneous system. Snipre (talk) 09:46, 28 June 2016 (UTC)

@Sebotic: I guess you have also seen the Mix'n'Match already? I love to see ChEBI fully in Wikidata. Now, ChEBI has a lot of ionic species (which becomes very clear when you run the Mix'n'Match in Game mode :) Do you also plan to include these? Also, will you include the links between the compounds, as the ChEBI ontology defined, particularly for these ions? --Egon Willighagen (talk) 08:40, 28 August 2016 (UTC)

@Egon Willighagen, Snipre: Well, after some more considerations and taking into account the concerns by Snipre, I think that importing all of ChEBI might not be too useful at this point. E.g. all the ions and enantiomeres do not have enough chemcial idenfiers to be really useful in Wikidata. Moreover, ChEBI has many edges which currently don't exist in Wikidata, so they would all need to be proposed and approved (subclass of and has role already exist, so most of the core graph could be imported). What I will do definitely is to make sure that all 'primary' (organic) compounds make it into Wikidata. That said, I would have the bot code ready to do a full import, only things missing are edges (WD properties) and a general consensus that the full import should be done. Sebotic (talk) 18:05, 29 August 2016 (UTC)

@Snipre, Sebotic: These two aspects of chemical compounds, along with pureness (compounds vs substance) are important. What about we start ironing out how Wikidata should model these things? Are ions notable enough (probably, given that other databases support them?)? Should compounds with unspecified stereochemistry be instances or subclasses? And, quite related, how will we model compound classes and other "things" that are more than one distinct (isomeric) chemical structure on Wikipedia? It seems to me, we have critical mass. It seems to me that @ChemConnector, Walkerma, Pigsonthewing: (first two have been very active in the Wikipedia Chemistry team along, and Andy has been at the Royal Society of Chemistry (Q905549)) will like join in these discussions too, and then we have critical mass. This defines a group of experienced chemists who think Wikidata should be used in science. I'd say, let's do it! Let's define the framework and do that final clean up. Within not too long, we can beat several popular scientific databases in quality. (And then we submit a paper to the Journal of Cheminformatics (Q6294930) with our results, along the lines of Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia (Q21957425). This will undoubtedly attract more scholarly chemists!) --Egon Willighagen (talk) 05:22, 30 August 2016 (UTC)

@Egon Willighagen:

* All ions are notable

* compounds with unspecified stereochemistry are defined as subclasses of chemical compounds and compounds with specified stereochemistry are defined as instance of items describing compounds with unspecified stereochemistry (see relations between L-lactide (Q24757824), D-lactide (Q24757832), (R,S)-lactide (Q24757839), and lactide (Q421313))

* Next problems to solve:

how isotopic compound (Q22332141) should be structured compared to chemical compound ? Is heavy water (Q155890) an instance of water (Q283) ?
how should we treat tautomers ? Two items or an item ? Which criterion can be used to define if a tautomer can have 2 items or not ?
what is the granularity of the structure for chemical compounds: can we consider ethanol as an instance or as a class ?

Snipre (talk) 10:10, 30 August 2016 (UTC)

@Snipre: Cool, thanks for the details! The first of the next problems is indeed interesting, because ontologically seen, an instance of an instance is not typically done. ChEBI actually models even water (Q283) as a class. That's not that unreasonable, as a water molecule instance is something in your mouth right now, and the 'chemical compound' water is just the concept of it. Tautomers is another hard one. Personally, I like to have all chemical graphs as separate entities, actually like ChEBI does. However, if you say chemical compounds have a 1-to-1 relation to the Standard InChI, then we have a problem. Worse, the Standard InChI does not consider everything a tautomer that a biologist/chemist would (it's an incomplete model). So, the current answer following from the compound<>InChI link is: both two and a single item. The third problem to solve is related to the first. But this is the discussion we indeed need to have. What is the central concept of a chemical compound? That has major implications for the identifiers side of this. To me, the more explicit we are, the better we serve the scientific community. --Egon Willighagen (talk) 12:08, 30 August 2016 (UTC)

Importing COSING

Latest comment: 7 years ago1 comment1 person in discussion

[Discussion with Magnus: Matching CoSing numbers using multiple identifiers]

The CoSing number has recently been created for Chemical compounds. It is the EU canonical identifier for Chemistry and Cosmetics, and as a result, there a 25 000 identifiers, as well as identifiers to all the other chemistry systems, and interesting info for properties and labels.

I had first thought truncating the file for import using Mix N'Match, but I wondered if someone is skilled to maximize the utility of the file.

Source

https://data.europa.eu/euodp/fr/data/dataset/cosmetic-ingredient-database-ingredients-and-fragrance-inventory/resource/33aa4726-d05c-4756-ad91-6c6297de9771

Snippet

COSING Ref No	INCI name	INN name	Ph. Eur. Name	CAS No	EINECS/ELINCS No	Chem/IUPAC Name / Description	Restriction	Function	Update Date
38946	ZEA MAYS STARCH	starch	maydis amylum	9005-25-8	232-679-6	Zea Mays Starch is a high-polymeric carbohydrate material usually derived from the peeled seeds of the Corn, Zea mays L., Gramineae	-	ABRASIVE, ABSORBENT, ANTICAKING, SKIN PROTECTING, VISCOSITY CONTROLLING	15/10/2010

Property talk:P3073#Importing the identifiers

@Teolemon : I don't think Mix N'Match tool is necessary: the dataset contains CAS number and EINECS number so you can use those identifiers to identify the item for adding the CoSing number in WD. The best would be to check when possible if the CAS and EINECS numbers in the item are identical to the ones present in the dataset from CoSing database. Snipre (talk) 11:33, 29 August 2016 (UTC)

qualifier to indicate a conformer

Latest comment: 7 years ago2 comments2 people in discussion

Is there any way to indicate the conformer? Dipole moments are sometimes measured for specific conformer (gauche, trans etc.), but I do not think there should be different items for every conformer as there are the same molecule. ∼Wostr (talk) 21:56, 30 August 2016 (UTC)

From what I know, no. Snipre (talk) 22:10, 30 August 2016 (UTC)

Problem with mixture and solution

Latest comment: 7 years ago2 comments2 people in discussion

I have a problem with items describing mixture and especially aqueous solution of salts or other soluble substances. First these items can't be classify as chemical compound but can we classify them as chemical substance or as mixture ? My proble with items describing solution like barium hydroxide solution (Q809681), this is the large possible and different solutions which can be represented by this item. If I take the IUPAC definition of chemical substance, I read "Physical properties such as density, refractive index, electric conductivity, melting point etc. characterize the chemical substance". As I understand the definition, barium hydroxide solution (Q809681) can't be classified as chemical substance because I can't define one density or refractive index to item barium hydroxide solution (Q809681): the density is valid only for one solution, for example water 70%/barium hydroxyde 30%, but not for the solution water 99%/barium hydroxyde 1%.

So this already solves a problem: barium hydroxide solution (Q809681) is not an instance of mixture or chemical substances but a subclass of mixture/subclass of chemical subsatnce as barium hydroxide solution (Q809681) represents an infinity of solutions having different compositions from 0.0001 to 99.9999%.

Then next question: can we put as constraint that identifiers used to identify pure substance can't be used to identify aqueous solutions of the same substance ? Even if this is allowed in general by external rules outside of wikidata ? Example is CAS number which is used for pure substances and their aqueous solutions. But this creates a mess in our constraints report so I woul like to formalize the restriction of CAS number to only pure substances and exclude the use of the same CAS number for aqueous solutions. Comments ? Snipre (talk) 21:09, 13 September 2016 (UTC)

I am not sure if we should use 'chemical substance' as an opposition to mixture (or maybe we shouldn't use it at all). That's very unprecise term and its definition may depend on language, author/source etc. (e.g. in Polish chemical literature from 60s–70s chemical substances are divided into pure substances /compounds, elements/ and mixtures). Even the IUPAC definition is not as precise as it should be: the solution (mixture) of two substances with specified composition would be a mixture and a substance at the same time (both conditions are fulfilled: constant composition, characteristic physical properties). And we also have legal definitions: substance is a [chemical] mixture (pure substance + necessary additives + technological impurities) and mixture is a mixture/solution of two or more substances [EU CLP definitions].

You're right with barium hydroxide solution (Q809681): it should be classified as 'subclass of' mixture (but IMHO better 'subclass of' saturated solution -> solution -> mixture).

And yes, we should limit the use of CAS number to 'pure chemical substances' only. I think that no distinction between solutions and compounds in the CAS Registry is not intentional, but it's a result of practical reasons only; so there is no substantive reasoning behind it. ∼Wostr (talk) 20:00, 16 September 2016 (UTC)

Annotation in which species chemical compounds are found

Latest comment: 7 years ago14 comments3 people in discussion

I am adding this to shed some light of what I am up to with Wikidata. At the moment I am close to the first steps of developing a bot based on the User:ProteinBoxBot code base and made a first request. This bot can help import a lot of data, but also help add missing information. For example, Christopher Southan just reported a list of about 700 hundred PubChem CID (P662) for entries with SMILES: https://twitter.com/cdsouthan/status/769814678197460993 Pulling in this information is easy. For now, I will focus on the biology side of things, and plan to annotate compounds and the species they are found in, e.g. using knowledge in the WikiPathways database (see Wikidata:Requests_for_permissions/Bot/UreomiczBot 1). User:Wostr pointed out on my Discussion page that found in taxon (P703) can be used directly on the Wikidata entry being added/edited, so, instead of instance of (P31). There are quite a few species specific metabolite database where this can be sourced from. I stress how important it is to have this kind of information, because academic researchers now often face the problem that they have measured compounds from human samples of unknown chemical identity (in any typical untargeted metabolomics experiment). More info can be found in this report of a recent student project on Figshare (https://figshare.com/articles/Volatile_Organic_Compounds_A_Detailed_Account_of_Identity_Origin_Activity_and_Pathways/3466805) and the H2020 project proposal Enabling Open Science: Wikidata for Research (Wiki4R) (Q26707522). --Egon Willighagen (talk) 08:49, 28 August 2016 (UTC)

@Egon Willighagen: Before doing any advertising to use WD in scientific research we have to implement a control system which allows to sell WD as reference database. To be able to reach that objective we should perform an unique step: for each "instance of: chemical compound", an unique value for InChI (P234) with the corresponding InChIKey (P235) has to be provided.

But currently we have

21969 items with "instance of: chemical compound"
14471 items with a value for InChI (P234)
15519 items with a value for InChIKey (P235)
14584 items with "instance of: chemical compound" and a value for InChI (P234) and for InChIKey (P235) ???

In one word, we have to be able once to propose in WD one fixed list of chemicals clearly identified with a coherent set of identifiers (mainly InChI, InChIKey and chemical structure) from the same source or generated from the same system. We are far away of that situation now so for me trying to sell WD as a tool for scientific research is just a bad idea and a way to loose any trust for the future. Snipre (talk) 12:25, 29 August 2016 (UTC)

@Snipre:, I am not claiming Wikidata is perfect yet. There are indeed a number of problems, but I like to see your results that show that Wikidata is doing worse than scientific databases. Many of the latter have a certain scope, and only a few use InChI as a basis. The above issues need a lot of attention, and the bot I am developing can help. E.g. it is trivial to add InChIs and InChIKeys for chemical compounds with a SMILES. Finding inconsistencies too. The fact that the number of "instance of: chemical compound" is currently higher than the number of InChIs does not worry me at all: many compound classes are annotated as "instance of: chemical compound" rather than "subclass of: chemical compound", and compound classes do simply not have an InChI. Furthermore, there are chemical substances annotated as compound, etc, etc. Yes, there is plenty to clean up, but that's why Wikidata should be at the center of science, as it is an open database where all scholars can contribute to, without having to worry of being able to reuse their own contributions later. I love to sit down with you and a few other Wikidata Chemists and iron out some ideas! What about a (virtual) meet up soon? --Egon Willighagen (talk) 13:07, 29 August 2016 (UTC)

@Egon Willighagen: The problem with the items defined as chemical compound without an InChI is that they are not completely identified. One quarter of our database is not fully defined and this why I prefer to slow down the use of WD by external users. We can always discuss about next steps but it would be great to put different options on the paper first in order to already have an idea about the possible work to perform before starting discussion. My proposition is developed there. The talk page can be used to add other ideas and we will update the page once an agreement will be found. Snipre (talk) 09:50, 30 August 2016 (UTC)

@Snipre: Great! Let's continue talking there then! Mind you, there are some people who want to solve this problem, including me and Sebotic. And I know for a fact ChemConnector has that interest too. These are scientists, not users, but developers and data providers. Over the next few days, I will run some scripts to quantify the current quality. There will be a lot of manual work to be done. Also, I think you overestimate with 'a quarter'... not everything now qualified as compound really is a compound that should have an InChI. More in that talk page asap! --Egon Willighagen (talk) 10:25, 30 August 2016 (UTC)

@Snipre: BTW, sn-glycerol 3-phosphate(2-) (Q26711901) is a new compound which, according to searching on PubChem CID and InChIKey, was not yet in Wikidata. Adding missing information (or correcting info, if needed) can be automated. Feedback on that new compound page is appreciated. --Egon Willighagen (talk) 12:26, 30 August 2016 (UTC)

@Egon Willighagen: Seems OK, but it would be perfect if you can follow the recommandation of Help:Sources#Databases and add at least the "retrieved date". Title is a good think too but less important. Snipre (talk) 22:15, 30 August 2016 (UTC)

@Snipre: Agreed about the 'retrieved data' but setting that requires an URL in the calendarModel property of the data, which causes the abuseFilter to overreact, so I cannot set that right now. See e.g. this log message. --Egon Willighagen (talk) 11:47, 31 August 2016 (UTC)

@Egon Willighagen: Please report the problem to the dev team or the abuse filter admin. Seems to be a programmation problem. Snipre (talk) 19:13, 31 August 2016 (UTC)

@Snipre, Egon Willighagen: This is the current count of core identifiers of chemical items in Wikidata. By chemical items, I mean items either instance of or subclass of chem compound, or just having a cas, cid, inchi(key), smiles but lacking an instance or subclass categorization. Snipre, I see your concerns, but I have have invested quite some time into the chem compound space in WD now and I am confident that there will be substantial improvements in the coming weeks. Sebotic (talk) 17:46, 29 August 2016 (UTC)

chem items	24809	subclass of or instance of chemical compound, or having cas, cid, inchi(key), smiles.
article	16391	links to en.wikipedia.org
mass	637
chemSpider	11462
pubchem_cid	16906
unii	11826
mesh_id	927
kegg_id	4065
mesh_code	3
chebi	4464
drugbank	2682
chembl	5461
iuphar	1033
cas	19692
csmiles	15635
inchi	14470
inchi_key	14943
chemical_formula	19475
atc_code	1709
ismiles	16

Here is a list 240 items with conflicts on the structure level, where the InChI key does not match for some of the identifiers on an item. This is usually due to incomplete sterechemistry items added or just the wrong stereochemistry or the wrong compound in the first place. I think I can clean up a good share of those by just getting the majority vote of identifiers for an InChI key and then using this key to populate the item. This is certainly not error free. Otherwise, these 240 could be fixed by hand, what is required is just to delete any PubChem CID, InChI key, Chembl, chebi, Unii, or chemspider which is incorrect. After that, one valid identifier on an item is sufficient to let the item be populated by my bots. Btw: these 240 are result of a consistency check for 4000 items, so approximately 1/4th of all compounds with a PubChem CID in Wikidata. Sebotic (talk) 09:20, 20 September 2016 (UTC)

@Sebotic: Thanks for your work. But I can't work now on that curation, at least not before 2 weeks. Snipre (talk) 19:25, 22 September 2016 (UTC)

@Snipre: I will try to get as many of them resolved through other automatic means, so that finally, we end up with a number which can be handled more easily. Manual curation, in my experience, is a very time consuming process, so I think it should be the last resort. Let's see how quickly we can resolve it. The biggest challenge for some of these will be to choose the one with the most appropriate ('correct') stereochemistry. Sebotic (talk) 20:12, 22 September 2016 (UTC)

UPDATE: that is now the full list of 1,279 items in the space of chem items with pubchem ID which need inspection and curation, out ot 17,709 (7,2%)
UPDATE: Now 1.284 compounds, a few new items appeared and a few fixed ones where removed.
UPDATE: Managed to bring it down to 1,030, updated list accordingly. What I see frequently are InChI keys where most major resources agree on, but PubChem has a different one (different connectivity) and the one agreed on by the other resources can be found in the PubChem data provider supplied descriptions, that's not ideal... Sebotic (talk) 08:57, 30 September 2016 (UTC)

http://www.wikidata.org/entity/Q5275604     Corrected, confusion between digermane (Q5275604) and digermanium (Q27183266)
http://www.wikidata.org/entity/Q905418
http://www.wikidata.org/entity/Q670450
http://www.wikidata.org/entity/Q5319233
http://www.wikidata.org/entity/Q5976757
http://www.wikidata.org/entity/Q421676
http://www.wikidata.org/entity/Q408221
http://www.wikidata.org/entity/Q937085
http://www.wikidata.org/entity/Q5144763
http://www.wikidata.org/entity/Q7272074
http://www.wikidata.org/entity/Q7841934
http://www.wikidata.org/entity/Q7182926
http://www.wikidata.org/entity/Q4596915
http://www.wikidata.org/entity/Q420070      Corrected, confusion between sodium percarbonate (Q420070) and sodium hydroperoxy(oxo)methanolate (Q27216890)
http://www.wikidata.org/entity/Q6508313
http://www.wikidata.org/entity/Q415798
http://www.wikidata.org/entity/Q2393155
http://www.wikidata.org/entity/Q1890177
http://www.wikidata.org/entity/Q744577
http://www.wikidata.org/entity/Q6948223
http://www.wikidata.org/entity/Q6714672
http://www.wikidata.org/entity/Q4115930
http://www.wikidata.org/entity/Q996659
http://www.wikidata.org/entity/Q4941885
http://www.wikidata.org/entity/Q15410255
http://www.wikidata.org/entity/Q4946240
http://www.wikidata.org/entity/Q2033359
http://www.wikidata.org/entity/Q262613
http://www.wikidata.org/entity/Q4122197
http://www.wikidata.org/entity/Q367994      Not corrected, two ways to represent the molecule: with ionic bond or covalent bond between calcium atom anf nitrogen atom
http://www.wikidata.org/entity/Q5516411
http://www.wikidata.org/entity/Q15720554
http://www.wikidata.org/entity/Q7197544
http://www.wikidata.org/entity/Q409648
http://www.wikidata.org/entity/Q7234718
http://www.wikidata.org/entity/Q3529346
http://www.wikidata.org/entity/Q209404
http://www.wikidata.org/entity/Q13019044
http://www.wikidata.org/entity/Q4348637
http://www.wikidata.org/entity/Q20707800
http://www.wikidata.org/entity/Q425180
http://www.wikidata.org/entity/Q7914945
http://www.wikidata.org/entity/Q15409364
http://www.wikidata.org/entity/Q2817111
http://www.wikidata.org/entity/Q4737382
http://www.wikidata.org/entity/Q423829
http://www.wikidata.org/entity/Q18206495
http://www.wikidata.org/entity/Q473546
http://www.wikidata.org/entity/Q3055852
http://www.wikidata.org/entity/Q417819
http://www.wikidata.org/entity/Q15709156
http://www.wikidata.org/entity/Q4887561
http://www.wikidata.org/entity/Q3315722
http://www.wikidata.org/entity/Q2823219
http://www.wikidata.org/entity/Q241678
http://www.wikidata.org/entity/Q385657
http://www.wikidata.org/entity/Q2616064
http://www.wikidata.org/entity/Q5159421
http://www.wikidata.org/entity/Q2669979
http://www.wikidata.org/entity/Q9137074
http://www.wikidata.org/entity/Q11163063
http://www.wikidata.org/entity/Q7280124
http://www.wikidata.org/entity/Q5276483
http://www.wikidata.org/entity/Q417003
http://www.wikidata.org/entity/Q15410253
http://www.wikidata.org/entity/Q3008693
http://www.wikidata.org/entity/Q19903618
http://www.wikidata.org/entity/Q2640914
http://www.wikidata.org/entity/Q681387
http://www.wikidata.org/entity/Q420527
http://www.wikidata.org/entity/Q6960846
http://www.wikidata.org/entity/Q1014287
http://www.wikidata.org/entity/Q4807880
http://www.wikidata.org/entity/Q424481
http://www.wikidata.org/entity/Q4596785
http://www.wikidata.org/entity/Q424726
http://www.wikidata.org/entity/Q487064
http://www.wikidata.org/entity/Q19597253
http://www.wikidata.org/entity/Q15634253
http://www.wikidata.org/entity/Q15425783
http://www.wikidata.org/entity/Q135270
http://www.wikidata.org/entity/Q1280166
http://www.wikidata.org/entity/Q425085
http://www.wikidata.org/entity/Q5383934
http://www.wikidata.org/entity/Q4890795
http://www.wikidata.org/entity/Q409676
http://www.wikidata.org/entity/Q419170
http://www.wikidata.org/entity/Q7388910
http://www.wikidata.org/entity/Q5190964
http://www.wikidata.org/entity/Q661724
http://www.wikidata.org/entity/Q21099667
http://www.wikidata.org/entity/Q7777226
http://www.wikidata.org/entity/Q415768
http://www.wikidata.org/entity/Q5018816
http://www.wikidata.org/entity/Q15720561
http://www.wikidata.org/entity/Q1810456
http://www.wikidata.org/entity/Q29428
http://www.wikidata.org/entity/Q2710669
http://www.wikidata.org/entity/Q7119043
http://www.wikidata.org/entity/Q4759444
http://www.wikidata.org/entity/Q132037
http://www.wikidata.org/entity/Q5057301
http://www.wikidata.org/entity/Q4545648
http://www.wikidata.org/entity/Q4918616
http://www.wikidata.org/entity/Q133878
http://www.wikidata.org/entity/Q4682573
http://www.wikidata.org/entity/Q4676582
http://www.wikidata.org/entity/Q7507259
http://www.wikidata.org/entity/Q1318344
http://www.wikidata.org/entity/Q6606395
http://www.wikidata.org/entity/Q176525
http://www.wikidata.org/entity/Q7390603
http://www.wikidata.org/entity/Q3927866
http://www.wikidata.org/entity/Q5572308
http://www.wikidata.org/entity/Q5251502
http://www.wikidata.org/entity/Q3120938
http://www.wikidata.org/entity/Q3832015
http://www.wikidata.org/entity/Q337231
http://www.wikidata.org/entity/Q421246
http://www.wikidata.org/entity/Q4691976
http://www.wikidata.org/entity/Q2823250
http://www.wikidata.org/entity/Q6951349
http://www.wikidata.org/entity/Q5572322
http://www.wikidata.org/entity/Q2912342
http://www.wikidata.org/entity/Q239593
http://www.wikidata.org/entity/Q4890804
http://www.wikidata.org/entity/Q7814247
http://www.wikidata.org/entity/Q4864613
http://www.wikidata.org/entity/Q4119810
http://www.wikidata.org/entity/Q721202
http://www.wikidata.org/entity/Q4774534
http://www.wikidata.org/entity/Q4445816
http://www.wikidata.org/entity/Q15605490
http://www.wikidata.org/entity/Q11071947
http://www.wikidata.org/entity/Q4993812
http://www.wikidata.org/entity/Q739601
http://www.wikidata.org/entity/Q415484
http://www.wikidata.org/entity/Q5036049
http://www.wikidata.org/entity/Q5137363
http://www.wikidata.org/entity/Q418258
http://www.wikidata.org/entity/Q421389
http://www.wikidata.org/entity/Q15624043
http://www.wikidata.org/entity/Q10864413
http://www.wikidata.org/entity/Q3079150
http://www.wikidata.org/entity/Q415909
http://www.wikidata.org/entity/Q5891570
http://www.wikidata.org/entity/Q15720548
http://www.wikidata.org/entity/Q7039308
http://www.wikidata.org/entity/Q415920
http://www.wikidata.org/entity/Q4673300
http://www.wikidata.org/entity/Q15408428
http://www.wikidata.org/entity/Q425240
http://www.wikidata.org/entity/Q10859631
http://www.wikidata.org/entity/Q4811191
http://www.wikidata.org/entity/Q4748577
http://www.wikidata.org/entity/Q26998367
http://www.wikidata.org/entity/Q5150954
http://www.wikidata.org/entity/Q373791
http://www.wikidata.org/entity/Q7671452
http://www.wikidata.org/entity/Q10861060
http://www.wikidata.org/entity/Q422305
http://www.wikidata.org/entity/Q6678756
http://www.wikidata.org/entity/Q5849706
http://www.wikidata.org/entity/Q4737395
http://www.wikidata.org/entity/Q5441133
http://www.wikidata.org/entity/Q8059839
http://www.wikidata.org/entity/Q4596757
http://www.wikidata.org/entity/Q943416
http://www.wikidata.org/entity/Q96385
http://www.wikidata.org/entity/Q12062354
http://www.wikidata.org/entity/Q132442
http://www.wikidata.org/entity/Q7846702
http://www.wikidata.org/entity/Q5871697
http://www.wikidata.org/entity/Q5010980
http://www.wikidata.org/entity/Q421291
http://www.wikidata.org/entity/Q4639640
http://www.wikidata.org/entity/Q420284
http://www.wikidata.org/entity/Q4789030
http://www.wikidata.org/entity/Q425036
http://www.wikidata.org/entity/Q5057233
http://www.wikidata.org/entity/Q4024215
http://www.wikidata.org/entity/Q20880500
http://www.wikidata.org/entity/Q5637062
http://www.wikidata.org/entity/Q4748994
http://www.wikidata.org/entity/Q5278705
http://www.wikidata.org/entity/Q15628035
http://www.wikidata.org/entity/Q413572
http://www.wikidata.org/entity/Q245487
http://www.wikidata.org/entity/Q2363204
http://www.wikidata.org/entity/Q903824
http://www.wikidata.org/entity/Q390239
http://www.wikidata.org/entity/Q904599
http://www.wikidata.org/entity/Q410095
http://www.wikidata.org/entity/Q4596899
http://www.wikidata.org/entity/Q4596815
http://www.wikidata.org/entity/Q407373
http://www.wikidata.org/entity/Q411175
http://www.wikidata.org/entity/Q369048
http://www.wikidata.org/entity/Q1281115
http://www.wikidata.org/entity/Q1097932
http://www.wikidata.org/entity/Q6951323
http://www.wikidata.org/entity/Q7119896
http://www.wikidata.org/entity/Q5013768
http://www.wikidata.org/entity/Q10880252
http://www.wikidata.org/entity/Q426921
http://www.wikidata.org/entity/Q413703
http://www.wikidata.org/entity/Q15634177
http://www.wikidata.org/entity/Q3083814
http://www.wikidata.org/entity/Q20817136
http://www.wikidata.org/entity/Q4721853
http://www.wikidata.org/entity/Q4746184
http://www.wikidata.org/entity/Q417309
http://www.wikidata.org/entity/Q853845
http://www.wikidata.org/entity/Q7923143
http://www.wikidata.org/entity/Q408132
http://www.wikidata.org/entity/Q5057298
http://www.wikidata.org/entity/Q8043134
http://www.wikidata.org/entity/Q409818
http://www.wikidata.org/entity/Q5256385
http://www.wikidata.org/entity/Q7099512
http://www.wikidata.org/entity/Q419308
http://www.wikidata.org/entity/Q2090740
http://www.wikidata.org/entity/Q414824
http://www.wikidata.org/entity/Q7800111
http://www.wikidata.org/entity/Q15409399
http://www.wikidata.org/entity/Q3058085
http://www.wikidata.org/entity/Q6581446
http://www.wikidata.org/entity/Q2044136
http://www.wikidata.org/entity/Q4689286
http://www.wikidata.org/entity/Q5528046
http://www.wikidata.org/entity/Q5521314
http://www.wikidata.org/entity/Q4637036
http://www.wikidata.org/entity/Q2074372
http://www.wikidata.org/entity/Q5506958
http://www.wikidata.org/entity/Q554818
http://www.wikidata.org/entity/Q4353536
http://www.wikidata.org/entity/Q4982752
http://www.wikidata.org/entity/Q7698194
http://www.wikidata.org/entity/Q5010983
http://www.wikidata.org/entity/Q61184
http://www.wikidata.org/entity/Q2482223
http://www.wikidata.org/entity/Q7076760
http://www.wikidata.org/entity/Q21098925
http://www.wikidata.org/entity/Q4708928
http://www.wikidata.org/entity/Q7260199
http://www.wikidata.org/entity/Q258653
http://www.wikidata.org/entity/Q3066718
http://www.wikidata.org/entity/Q19596037
http://www.wikidata.org/entity/Q898299
http://www.wikidata.org/entity/Q16935646
http://www.wikidata.org/entity/Q18358153
http://www.wikidata.org/entity/Q419209
http://www.wikidata.org/entity/Q21098923
http://www.wikidata.org/entity/Q4650413
http://www.wikidata.org/entity/Q423692
http://www.wikidata.org/entity/Q411899
http://www.wikidata.org/entity/Q662425
http://www.wikidata.org/entity/Q6816337
http://www.wikidata.org/entity/Q15633962
http://www.wikidata.org/entity/Q424091
http://www.wikidata.org/entity/Q6374755
http://www.wikidata.org/entity/Q248891
http://www.wikidata.org/entity/Q419846
http://www.wikidata.org/entity/Q5200299
http://www.wikidata.org/entity/Q7316883
http://www.wikidata.org/entity/Q424871
http://www.wikidata.org/entity/Q414359
http://www.wikidata.org/entity/Q4764702
http://www.wikidata.org/entity/Q416534
http://www.wikidata.org/entity/Q4641512
http://www.wikidata.org/entity/Q3825917
http://www.wikidata.org/entity/Q6528191
http://www.wikidata.org/entity/Q20706932
http://www.wikidata.org/entity/Q21045149
http://www.wikidata.org/entity/Q4596810
http://www.wikidata.org/entity/Q5140608
http://www.wikidata.org/entity/Q4392082
http://www.wikidata.org/entity/Q15410989
http://www.wikidata.org/entity/Q413127
http://www.wikidata.org/entity/Q2983921
http://www.wikidata.org/entity/Q417304
http://www.wikidata.org/entity/Q5050928
http://www.wikidata.org/entity/Q5277314
http://www.wikidata.org/entity/Q763802
http://www.wikidata.org/entity/Q6172522
http://www.wikidata.org/entity/Q413036
http://www.wikidata.org/entity/Q15409373
http://www.wikidata.org/entity/Q310828
http://www.wikidata.org/entity/Q179619
http://www.wikidata.org/entity/Q1074417
http://www.wikidata.org/entity/Q4903628
http://www.wikidata.org/entity/Q2034517
http://www.wikidata.org/entity/Q5204319
http://www.wikidata.org/entity/Q19903180
http://www.wikidata.org/entity/Q1387655
http://www.wikidata.org/entity/Q6593308
http://www.wikidata.org/entity/Q3817447
http://www.wikidata.org/entity/Q7851139
http://www.wikidata.org/entity/Q7808830
http://www.wikidata.org/entity/Q419415
http://www.wikidata.org/entity/Q412994
http://www.wikidata.org/entity/Q7912519
http://www.wikidata.org/entity/Q3243737
http://www.wikidata.org/entity/Q658
http://www.wikidata.org/entity/Q411138
http://www.wikidata.org/entity/Q421074
http://www.wikidata.org/entity/Q5283993
http://www.wikidata.org/entity/Q5383826
http://www.wikidata.org/entity/Q21099568
http://www.wikidata.org/entity/Q7119205
http://www.wikidata.org/entity/Q7863562
http://www.wikidata.org/entity/Q2406759
http://www.wikidata.org/entity/Q15627472
http://www.wikidata.org/entity/Q6122828
http://www.wikidata.org/entity/Q5201339
http://www.wikidata.org/entity/Q419070
http://www.wikidata.org/entity/Q15927659
http://www.wikidata.org/entity/Q2629981
http://www.wikidata.org/entity/Q6997373
http://www.wikidata.org/entity/Q5264591
http://www.wikidata.org/entity/Q4646883
http://www.wikidata.org/entity/Q3132209
http://www.wikidata.org/entity/Q4596759
http://www.wikidata.org/entity/Q5024643
http://www.wikidata.org/entity/Q18344013
http://www.wikidata.org/entity/Q2912604
http://www.wikidata.org/entity/Q7050960
http://www.wikidata.org/entity/Q5057289
http://www.wikidata.org/entity/Q4545730
http://www.wikidata.org/entity/Q414394
http://www.wikidata.org/entity/Q594482
http://www.wikidata.org/entity/Q4918919
http://www.wikidata.org/entity/Q7762
http://www.wikidata.org/entity/Q5332581
http://www.wikidata.org/entity/Q420191
http://www.wikidata.org/entity/Q2117581
http://www.wikidata.org/entity/Q7395228
http://www.wikidata.org/entity/Q198473
http://www.wikidata.org/entity/Q15634055
http://www.wikidata.org/entity/Q7263592
http://www.wikidata.org/entity/Q423910
http://www.wikidata.org/entity/Q414619
http://www.wikidata.org/entity/Q7936365
http://www.wikidata.org/entity/Q5137434
http://www.wikidata.org/entity/Q15426238
http://www.wikidata.org/entity/Q7957934
http://www.wikidata.org/entity/Q7367466
http://www.wikidata.org/entity/Q5323095
http://www.wikidata.org/entity/Q1235560
http://www.wikidata.org/entity/Q6581305
http://www.wikidata.org/entity/Q408360
http://www.wikidata.org/entity/Q794084
http://www.wikidata.org/entity/Q420056
http://www.wikidata.org/entity/Q252251
http://www.wikidata.org/entity/Q6823338
http://www.wikidata.org/entity/Q409054
http://www.wikidata.org/entity/Q1101052
http://www.wikidata.org/entity/Q2943815
http://www.wikidata.org/entity/Q13566119
http://www.wikidata.org/entity/Q5571076
http://www.wikidata.org/entity/Q417484
http://www.wikidata.org/entity/Q470900
http://www.wikidata.org/entity/Q7321711
http://www.wikidata.org/entity/Q416677
http://www.wikidata.org/entity/Q2823194
http://www.wikidata.org/entity/Q722387
http://www.wikidata.org/entity/Q4981136
http://www.wikidata.org/entity/Q5120032
http://www.wikidata.org/entity/Q539395
http://www.wikidata.org/entity/Q45044
http://www.wikidata.org/entity/Q5134843
http://www.wikidata.org/entity/Q4680659
http://www.wikidata.org/entity/Q5275247
http://www.wikidata.org/entity/Q421634
http://www.wikidata.org/entity/Q5319
http://www.wikidata.org/entity/Q5102982
http://www.wikidata.org/entity/Q61416
http://www.wikidata.org/entity/Q416904
http://www.wikidata.org/entity/Q4807670
http://www.wikidata.org/entity/Q2823840
http://www.wikidata.org/entity/Q409216
http://www.wikidata.org/entity/Q416513
http://www.wikidata.org/entity/Q3007886
http://www.wikidata.org/entity/Q7671383
http://www.wikidata.org/entity/Q1586727
http://www.wikidata.org/entity/Q3973521
http://www.wikidata.org/entity/Q3029787
http://www.wikidata.org/entity/Q421255
http://www.wikidata.org/entity/Q596946
http://www.wikidata.org/entity/Q6456961
http://www.wikidata.org/entity/Q5418554
http://www.wikidata.org/entity/Q5332352
http://www.wikidata.org/entity/Q4797402
http://www.wikidata.org/entity/Q416641
http://www.wikidata.org/entity/Q6138969
http://www.wikidata.org/entity/Q4716536
http://www.wikidata.org/entity/Q6542719
http://www.wikidata.org/entity/Q11350933
http://www.wikidata.org/entity/Q5102980
http://www.wikidata.org/entity/Q5264607
http://www.wikidata.org/entity/Q3347765
http://www.wikidata.org/entity/Q422504
http://www.wikidata.org/entity/Q414591
http://www.wikidata.org/entity/Q4596737
http://www.wikidata.org/entity/Q413258
http://www.wikidata.org/entity/Q1076381
http://www.wikidata.org/entity/Q419900
http://www.wikidata.org/entity/Q223099
http://www.wikidata.org/entity/Q4673314
http://www.wikidata.org/entity/Q6839784
http://www.wikidata.org/entity/Q423412
http://www.wikidata.org/entity/Q7234707
http://www.wikidata.org/entity/Q419895
http://www.wikidata.org/entity/Q5199864
http://www.wikidata.org/entity/Q6482030
http://www.wikidata.org/entity/Q2408443
http://www.wikidata.org/entity/Q4596911
http://www.wikidata.org/entity/Q4736748
http://www.wikidata.org/entity/Q14200355
http://www.wikidata.org/entity/Q411484
http://www.wikidata.org/entity/Q4637034
http://www.wikidata.org/entity/Q15634079
http://www.wikidata.org/entity/Q7260204
http://www.wikidata.org/entity/Q417103
http://www.wikidata.org/entity/Q7675206
http://www.wikidata.org/entity/Q5015902
http://www.wikidata.org/entity/Q143289
http://www.wikidata.org/entity/Q3077500
http://www.wikidata.org/entity/Q15269704
http://www.wikidata.org/entity/Q2816006
http://www.wikidata.org/entity/Q419775
http://www.wikidata.org/entity/Q15409431
http://www.wikidata.org/entity/Q2823201
http://www.wikidata.org/entity/Q4811598
http://www.wikidata.org/entity/Q5319234
http://www.wikidata.org/entity/Q4650970
http://www.wikidata.org/entity/Q2581447
http://www.wikidata.org/entity/Q4646882
http://www.wikidata.org/entity/Q7646983
http://www.wikidata.org/entity/Q21098924
http://www.wikidata.org/entity/Q4747075
http://www.wikidata.org/entity/Q3607822
http://www.wikidata.org/entity/Q2823244
http://www.wikidata.org/entity/Q6931218
http://www.wikidata.org/entity/Q7573806
http://www.wikidata.org/entity/Q5057221
http://www.wikidata.org/entity/Q26272
http://www.wikidata.org/entity/Q7843285
http://www.wikidata.org/entity/Q409035
http://www.wikidata.org/entity/Q2708007
http://www.wikidata.org/entity/Q2602246
http://www.wikidata.org/entity/Q3469748
http://www.wikidata.org/entity/Q15634126
http://www.wikidata.org/entity/Q419642
http://www.wikidata.org/entity/Q425064
http://www.wikidata.org/entity/Q5443648
http://www.wikidata.org/entity/Q6787831
http://www.wikidata.org/entity/Q411844
http://www.wikidata.org/entity/Q4885099
http://www.wikidata.org/entity/Q407658
http://www.wikidata.org/entity/Q3980350
http://www.wikidata.org/entity/Q5198686
http://www.wikidata.org/entity/Q2850134
http://www.wikidata.org/entity/Q904668
http://www.wikidata.org/entity/Q4981048
http://www.wikidata.org/entity/Q114391
http://www.wikidata.org/entity/Q5748732
http://www.wikidata.org/entity/Q2742455
http://www.wikidata.org/entity/Q7838854
http://www.wikidata.org/entity/Q16069783
http://www.wikidata.org/entity/Q7777225
http://www.wikidata.org/entity/Q4445833
http://www.wikidata.org/entity/Q3388802
http://www.wikidata.org/entity/Q5359421
http://www.wikidata.org/entity/Q423275
http://www.wikidata.org/entity/Q13578067
http://www.wikidata.org/entity/Q1187513
http://www.wikidata.org/entity/Q7322878
http://www.wikidata.org/entity/Q15411007
http://www.wikidata.org/entity/Q424250
http://www.wikidata.org/entity/Q544393
http://www.wikidata.org/entity/Q4734058
http://www.wikidata.org/entity/Q421761
http://www.wikidata.org/entity/Q425065
http://www.wikidata.org/entity/Q2008962
http://www.wikidata.org/entity/Q5518498
http://www.wikidata.org/entity/Q4890905
http://www.wikidata.org/entity/Q424528
http://www.wikidata.org/entity/Q2614009
http://www.wikidata.org/entity/Q3553093
http://www.wikidata.org/entity/Q4832247
http://www.wikidata.org/entity/Q510113
http://www.wikidata.org/entity/Q5749096
http://www.wikidata.org/entity/Q2653981
http://www.wikidata.org/entity/Q6896941
http://www.wikidata.org/entity/Q3915149
http://www.wikidata.org/entity/Q687686
http://www.wikidata.org/entity/Q192553
http://www.wikidata.org/entity/Q15708273
http://www.wikidata.org/entity/Q11786072
http://www.wikidata.org/entity/Q4737384
http://www.wikidata.org/entity/Q411426
http://www.wikidata.org/entity/Q413278
http://www.wikidata.org/entity/Q415646
http://www.wikidata.org/entity/Q1829318
http://www.wikidata.org/entity/Q73972
http://www.wikidata.org/entity/Q3592644
http://www.wikidata.org/entity/Q5404857
http://www.wikidata.org/entity/Q4674302
http://www.wikidata.org/entity/Q1630230
http://www.wikidata.org/entity/Q5089008
http://www.wikidata.org/entity/Q704923
http://www.wikidata.org/entity/Q7074645
http://www.wikidata.org/entity/Q407446
http://www.wikidata.org/entity/Q4138107
http://www.wikidata.org/entity/Q5162311
http://www.wikidata.org/entity/Q7204785
http://www.wikidata.org/entity/Q4890770
http://www.wikidata.org/entity/Q2331543
http://www.wikidata.org/entity/Q2657418
http://www.wikidata.org/entity/Q4391972
http://www.wikidata.org/entity/Q7165030
http://www.wikidata.org/entity/Q413849
http://www.wikidata.org/entity/Q7071996
http://www.wikidata.org/entity/Q21098845
http://www.wikidata.org/entity/Q899416
http://www.wikidata.org/entity/Q419421
http://www.wikidata.org/entity/Q620084
http://www.wikidata.org/entity/Q15408415
http://www.wikidata.org/entity/Q5280075
http://www.wikidata.org/entity/Q3991659
http://www.wikidata.org/entity/Q3080860
http://www.wikidata.org/entity/Q2896809
http://www.wikidata.org/entity/Q4332794
http://www.wikidata.org/entity/Q2930096
http://www.wikidata.org/entity/Q4828930
http://www.wikidata.org/entity/Q907070
http://www.wikidata.org/entity/Q420138
http://www.wikidata.org/entity/Q6990833
http://www.wikidata.org/entity/Q415392
http://www.wikidata.org/entity/Q416716
http://www.wikidata.org/entity/Q417755
http://www.wikidata.org/entity/Q4673311
http://www.wikidata.org/entity/Q7514072
http://www.wikidata.org/entity/Q7116885
http://www.wikidata.org/entity/Q417250
http://www.wikidata.org/entity/Q407891
http://www.wikidata.org/entity/Q4586731
http://www.wikidata.org/entity/Q15712807
http://www.wikidata.org/entity/Q15427895
http://www.wikidata.org/entity/Q4454241
http://www.wikidata.org/entity/Q3814656
http://www.wikidata.org/entity/Q2594649
http://www.wikidata.org/entity/Q15088351
http://www.wikidata.org/entity/Q909931
http://www.wikidata.org/entity/Q5003182
http://www.wikidata.org/entity/Q424684
http://www.wikidata.org/entity/Q4747307
http://www.wikidata.org/entity/Q3680915
http://www.wikidata.org/entity/Q571037
http://www.wikidata.org/entity/Q2073868
http://www.wikidata.org/entity/Q423846
http://www.wikidata.org/entity/Q5104342
http://www.wikidata.org/entity/Q4938924
http://www.wikidata.org/entity/Q5509469
http://www.wikidata.org/entity/Q740439
http://www.wikidata.org/entity/Q1117877
http://www.wikidata.org/entity/Q779118
http://www.wikidata.org/entity/Q5519727
http://www.wikidata.org/entity/Q419193
http://www.wikidata.org/entity/Q4642883
http://www.wikidata.org/entity/Q5276413
http://www.wikidata.org/entity/Q7072002
http://www.wikidata.org/entity/Q421116
http://www.wikidata.org/entity/Q5199357
http://www.wikidata.org/entity/Q5443567
http://www.wikidata.org/entity/Q425059
http://www.wikidata.org/entity/Q2553496
http://www.wikidata.org/entity/Q3629883
http://www.wikidata.org/entity/Q2943814
http://www.wikidata.org/entity/Q5011453
http://www.wikidata.org/entity/Q2866762
http://www.wikidata.org/entity/Q411046
http://www.wikidata.org/entity/Q408805
http://www.wikidata.org/entity/Q7395917
http://www.wikidata.org/entity/Q15410941
http://www.wikidata.org/entity/Q3333710
http://www.wikidata.org/entity/Q421905
http://www.wikidata.org/entity/Q18386276
http://www.wikidata.org/entity/Q417538
http://www.wikidata.org/entity/Q21098950
http://www.wikidata.org/entity/Q7800905
http://www.wikidata.org/entity/Q13024951
http://www.wikidata.org/entity/Q7269871
http://www.wikidata.org/entity/Q427105
http://www.wikidata.org/entity/Q6583647
http://www.wikidata.org/entity/Q7197959
http://www.wikidata.org/entity/Q5332458
http://www.wikidata.org/entity/Q7099046
http://www.wikidata.org/entity/Q2823302
http://www.wikidata.org/entity/Q7670227
http://www.wikidata.org/entity/Q4132745
http://www.wikidata.org/entity/Q5268487
http://www.wikidata.org/entity/Q21099604
http://www.wikidata.org/entity/Q6951351
http://www.wikidata.org/entity/Q15708268
http://www.wikidata.org/entity/Q706868
http://www.wikidata.org/entity/Q4445835
http://www.wikidata.org/entity/Q21099637
http://www.wikidata.org/entity/Q4545805
http://www.wikidata.org/entity/Q425053
http://www.wikidata.org/entity/Q3706873
http://www.wikidata.org/entity/Q7842218
http://www.wikidata.org/entity/Q422777
http://www.wikidata.org/entity/Q4119955
http://www.wikidata.org/entity/Q15426208
http://www.wikidata.org/entity/Q5161074
http://www.wikidata.org/entity/Q7851973
http://www.wikidata.org/entity/Q413805
http://www.wikidata.org/entity/Q3599478
http://www.wikidata.org/entity/Q5200296
http://www.wikidata.org/entity/Q1951971
http://www.wikidata.org/entity/Q3641126
http://www.wikidata.org/entity/Q8052674
http://www.wikidata.org/entity/Q419226
http://www.wikidata.org/entity/Q423398
http://www.wikidata.org/entity/Q6806652
http://www.wikidata.org/entity/Q415872
http://www.wikidata.org/entity/Q5513695
http://www.wikidata.org/entity/Q4177124
http://www.wikidata.org/entity/Q19833284
http://www.wikidata.org/entity/Q5113892
http://www.wikidata.org/entity/Q2064889
http://www.wikidata.org/entity/Q15402123
http://www.wikidata.org/entity/Q6710338
http://www.wikidata.org/entity/Q7116606
http://www.wikidata.org/entity/Q4463083
http://www.wikidata.org/entity/Q7263189
http://www.wikidata.org/entity/Q3277932
http://www.wikidata.org/entity/Q808801
http://www.wikidata.org/entity/Q15410232
http://www.wikidata.org/entity/Q7051399
http://www.wikidata.org/entity/Q778163
http://www.wikidata.org/entity/Q6824053
http://www.wikidata.org/entity/Q4735601
http://www.wikidata.org/entity/Q7263674
http://www.wikidata.org/entity/Q7119395
http://www.wikidata.org/entity/Q418758
http://www.wikidata.org/entity/Q1065083
http://www.wikidata.org/entity/Q475631
http://www.wikidata.org/entity/Q7119048
http://www.wikidata.org/entity/Q7558263
http://www.wikidata.org/entity/Q7198091
http://www.wikidata.org/entity/Q7636084
http://www.wikidata.org/entity/Q4745975
http://www.wikidata.org/entity/Q4864584
http://www.wikidata.org/entity/Q287582
http://www.wikidata.org/entity/Q2993328
http://www.wikidata.org/entity/Q5199059
http://www.wikidata.org/entity/Q3940320
http://www.wikidata.org/entity/Q424958
http://www.wikidata.org/entity/Q18208892
http://www.wikidata.org/entity/Q164403
http://www.wikidata.org/entity/Q1989071
http://www.wikidata.org/entity/Q4161099
http://www.wikidata.org/entity/Q384709
http://www.wikidata.org/entity/Q27267
http://www.wikidata.org/entity/Q410521
http://www.wikidata.org/entity/Q8041945
http://www.wikidata.org/entity/Q18211886
http://www.wikidata.org/entity/Q15427885
http://www.wikidata.org/entity/Q15427926
http://www.wikidata.org/entity/Q961081
http://www.wikidata.org/entity/Q1046522
http://www.wikidata.org/entity/Q7678897
http://www.wikidata.org/entity/Q4652482
http://www.wikidata.org/entity/Q4491065
http://www.wikidata.org/entity/Q6004128
http://www.wikidata.org/entity/Q6961010
http://www.wikidata.org/entity/Q2701649
http://www.wikidata.org/entity/Q21099606
http://www.wikidata.org/entity/Q930170
http://www.wikidata.org/entity/Q21099663
http://www.wikidata.org/entity/Q421894
http://www.wikidata.org/entity/Q17074532
http://www.wikidata.org/entity/Q415024
http://www.wikidata.org/entity/Q423783
http://www.wikidata.org/entity/Q3072948
http://www.wikidata.org/entity/Q421598
http://www.wikidata.org/entity/Q5010218
http://www.wikidata.org/entity/Q5470216
http://www.wikidata.org/entity/Q3637333
http://www.wikidata.org/entity/Q410036
http://www.wikidata.org/entity/Q5955671
http://www.wikidata.org/entity/Q3849795
http://www.wikidata.org/entity/Q5199001
http://www.wikidata.org/entity/Q17299859
http://www.wikidata.org/entity/Q4637047
http://www.wikidata.org/entity/Q3429577
http://www.wikidata.org/entity/Q4748447
http://www.wikidata.org/entity/Q6535822
http://www.wikidata.org/entity/Q5379483
http://www.wikidata.org/entity/Q18349230
http://www.wikidata.org/entity/Q757702
http://www.wikidata.org/entity/Q905058
http://www.wikidata.org/entity/Q7669595
http://www.wikidata.org/entity/Q15410969
http://www.wikidata.org/entity/Q60457
http://www.wikidata.org/entity/Q329022
http://www.wikidata.org/entity/Q4652479
http://www.wikidata.org/entity/Q2930105
http://www.wikidata.org/entity/Q2627834
http://www.wikidata.org/entity/Q2315302
http://www.wikidata.org/entity/Q6003986
http://www.wikidata.org/entity/Q416490
http://www.wikidata.org/entity/Q6965855
http://www.wikidata.org/entity/Q422327
http://www.wikidata.org/entity/Q7294040
http://www.wikidata.org/entity/Q4841341
http://www.wikidata.org/entity/Q421320
http://www.wikidata.org/entity/Q408201
http://www.wikidata.org/entity/Q259015
http://www.wikidata.org/entity/Q2709086
http://www.wikidata.org/entity/Q14035740
http://www.wikidata.org/entity/Q782318
http://www.wikidata.org/entity/Q423531
http://www.wikidata.org/entity/Q4163873
http://www.wikidata.org/entity/Q409743
http://www.wikidata.org/entity/Q249208
http://www.wikidata.org/entity/Q5242815
http://www.wikidata.org/entity/Q144917
http://www.wikidata.org/entity/Q284367
http://www.wikidata.org/entity/Q413733
http://www.wikidata.org/entity/Q4673302
http://www.wikidata.org/entity/Q10861089
http://www.wikidata.org/entity/Q3410841
http://www.wikidata.org/entity/Q19597525
http://www.wikidata.org/entity/Q3026455
http://www.wikidata.org/entity/Q15409424
http://www.wikidata.org/entity/Q15411005
http://www.wikidata.org/entity/Q2622702
http://www.wikidata.org/entity/Q4673057
http://www.wikidata.org/entity/Q2261930
http://www.wikidata.org/entity/Q413559
http://www.wikidata.org/entity/Q900922
http://www.wikidata.org/entity/Q420934
http://www.wikidata.org/entity/Q5382029
http://www.wikidata.org/entity/Q5113817
http://www.wikidata.org/entity/Q412291
http://www.wikidata.org/entity/Q412191
http://www.wikidata.org/entity/Q421272
http://www.wikidata.org/entity/Q4117486
http://www.wikidata.org/entity/Q964482
http://www.wikidata.org/entity/Q5276420
http://www.wikidata.org/entity/Q7921024
http://www.wikidata.org/entity/Q4891024
http://www.wikidata.org/entity/Q15426197
http://www.wikidata.org/entity/Q15634054
http://www.wikidata.org/entity/Q4161299
http://www.wikidata.org/entity/Q7238143
http://www.wikidata.org/entity/Q4596853
http://www.wikidata.org/entity/Q3979404
http://www.wikidata.org/entity/Q3381514
http://www.wikidata.org/entity/Q5712560
http://www.wikidata.org/entity/Q3604267
http://www.wikidata.org/entity/Q4041747
http://www.wikidata.org/entity/Q5404502
http://www.wikidata.org/entity/Q15632788
http://www.wikidata.org/entity/Q420043
http://www.wikidata.org/entity/Q2823228
http://www.wikidata.org/entity/Q225854
http://www.wikidata.org/entity/Q3604498
http://www.wikidata.org/entity/Q7118739
http://www.wikidata.org/entity/Q368222
http://www.wikidata.org/entity/Q4637180
http://www.wikidata.org/entity/Q267896
http://www.wikidata.org/entity/Q1839256
http://www.wikidata.org/entity/Q7706543
http://www.wikidata.org/entity/Q1104482
http://www.wikidata.org/entity/Q13024942
http://www.wikidata.org/entity/Q988591
http://www.wikidata.org/entity/Q6072216
http://www.wikidata.org/entity/Q413299
http://www.wikidata.org/entity/Q81890
http://www.wikidata.org/entity/Q139883
http://www.wikidata.org/entity/Q15991360
http://www.wikidata.org/entity/Q909387
http://www.wikidata.org/entity/Q6078756
http://www.wikidata.org/entity/Q6839436
http://www.wikidata.org/entity/Q10861003
http://www.wikidata.org/entity/Q2267471
http://www.wikidata.org/entity/Q2985253
http://www.wikidata.org/entity/Q18604129
http://www.wikidata.org/entity/Q920725
http://www.wikidata.org/entity/Q20054555
http://www.wikidata.org/entity/Q886862
http://www.wikidata.org/entity/Q4068819
http://www.wikidata.org/entity/Q4834702
http://www.wikidata.org/entity/Q7698204
http://www.wikidata.org/entity/Q2790082
http://www.wikidata.org/entity/Q3032708
http://www.wikidata.org/entity/Q415410
http://www.wikidata.org/entity/Q4790694
http://www.wikidata.org/entity/Q5611751
http://www.wikidata.org/entity/Q425248
http://www.wikidata.org/entity/Q5383774
http://www.wikidata.org/entity/Q287745
http://www.wikidata.org/entity/Q415588
http://www.wikidata.org/entity/Q418735
http://www.wikidata.org/entity/Q7101735
http://www.wikidata.org/entity/Q4734921
http://www.wikidata.org/entity/Q6647969
http://www.wikidata.org/entity/Q419478
http://www.wikidata.org/entity/Q3757667
http://www.wikidata.org/entity/Q419361
http://www.wikidata.org/entity/Q4642874
http://www.wikidata.org/entity/Q2705859
http://www.wikidata.org/entity/Q3109285
http://www.wikidata.org/entity/Q15708035
http://www.wikidata.org/entity/Q7352933
http://www.wikidata.org/entity/Q15410921
http://www.wikidata.org/entity/Q172409
http://www.wikidata.org/entity/Q2700587
http://www.wikidata.org/entity/Q4634059
http://www.wikidata.org/entity/Q420354
http://www.wikidata.org/entity/Q6062315
http://www.wikidata.org/entity/Q647580
http://www.wikidata.org/entity/Q3276808
http://www.wikidata.org/entity/Q5578972
http://www.wikidata.org/entity/Q4644278
http://www.wikidata.org/entity/Q10858037
http://www.wikidata.org/entity/Q1033359
http://www.wikidata.org/entity/Q5057294
http://www.wikidata.org/entity/Q17318234
http://www.wikidata.org/entity/Q412742
http://www.wikidata.org/entity/Q965955
http://www.wikidata.org/entity/Q60279
http://www.wikidata.org/entity/Q5200429
http://www.wikidata.org/entity/Q7680336
http://www.wikidata.org/entity/Q7784698
http://www.wikidata.org/entity/Q5409893
http://www.wikidata.org/entity/Q3347162
http://www.wikidata.org/entity/Q411087
http://www.wikidata.org/entity/Q412874
http://www.wikidata.org/entity/Q10859487
http://www.wikidata.org/entity/Q15410276
http://www.wikidata.org/entity/Q7071299
http://www.wikidata.org/entity/Q407189
http://www.wikidata.org/entity/Q15303950
http://www.wikidata.org/entity/Q3985292
http://www.wikidata.org/entity/Q19595855
http://www.wikidata.org/entity/Q2777979
http://www.wikidata.org/entity/Q283033
http://www.wikidata.org/entity/Q3831365
http://www.wikidata.org/entity/Q3517399
http://www.wikidata.org/entity/Q1786341
http://www.wikidata.org/entity/Q894130
http://www.wikidata.org/entity/Q5748740
http://www.wikidata.org/entity/Q28775
http://www.wikidata.org/entity/Q7245902
http://www.wikidata.org/entity/Q18155805
http://www.wikidata.org/entity/Q15634052
http://www.wikidata.org/entity/Q417044
http://www.wikidata.org/entity/Q7848584
http://www.wikidata.org/entity/Q120384
http://www.wikidata.org/entity/Q26998317
http://www.wikidata.org/entity/Q21400577
http://www.wikidata.org/entity/Q7050437
http://www.wikidata.org/entity/Q3277888
http://www.wikidata.org/entity/Q5198906
http://www.wikidata.org/entity/Q4642864
http://www.wikidata.org/entity/Q3935171
http://www.wikidata.org/entity/Q4677960
http://www.wikidata.org/entity/Q18209997
http://www.wikidata.org/entity/Q5415983
http://www.wikidata.org/entity/Q3044728
http://www.wikidata.org/entity/Q5049003
http://www.wikidata.org/entity/Q7843270
http://www.wikidata.org/entity/Q5102988
http://www.wikidata.org/entity/Q7811972
http://www.wikidata.org/entity/Q5120191
http://www.wikidata.org/entity/Q5205953
http://www.wikidata.org/entity/Q413596
http://www.wikidata.org/entity/Q20707829
http://www.wikidata.org/entity/Q15410217
http://www.wikidata.org/entity/Q2031142
http://www.wikidata.org/entity/Q904411
http://www.wikidata.org/entity/Q426660
http://www.wikidata.org/entity/Q7250468
http://www.wikidata.org/entity/Q900926
http://www.wikidata.org/entity/Q868435
http://www.wikidata.org/entity/Q4737376
http://www.wikidata.org/entity/Q278972
http://www.wikidata.org/entity/Q420087
http://www.wikidata.org/entity/Q833649
http://www.wikidata.org/entity/Q26979
http://www.wikidata.org/entity/Q349427
http://www.wikidata.org/entity/Q8214050
http://www.wikidata.org/entity/Q7915670
http://www.wikidata.org/entity/Q3512695
http://www.wikidata.org/entity/Q4637178
http://www.wikidata.org/entity/Q7368629
http://www.wikidata.org/entity/Q18357634
http://www.wikidata.org/entity/Q15425284
http://www.wikidata.org/entity/Q5991162
http://www.wikidata.org/entity/Q411909
http://www.wikidata.org/entity/Q6913406
http://www.wikidata.org/entity/Q2813821
http://www.wikidata.org/entity/Q410875
http://www.wikidata.org/entity/Q7106486
http://www.wikidata.org/entity/Q5049581
http://www.wikidata.org/entity/Q1761300
http://www.wikidata.org/entity/Q4652498
http://www.wikidata.org/entity/Q4701917
http://www.wikidata.org/entity/Q367258
http://www.wikidata.org/entity/Q3570564
http://www.wikidata.org/entity/Q4732178
http://www.wikidata.org/entity/Q7280510
http://www.wikidata.org/entity/Q7120083
http://www.wikidata.org/entity/Q1960495
http://www.wikidata.org/entity/Q7181329
http://www.wikidata.org/entity/Q410614
http://www.wikidata.org/entity/Q4364572
http://www.wikidata.org/entity/Q408256
http://www.wikidata.org/entity/Q426524
http://www.wikidata.org/entity/Q4676694
http://www.wikidata.org/entity/Q2823286
http://www.wikidata.org/entity/Q15409437
http://www.wikidata.org/entity/Q4499058
http://www.wikidata.org/entity/Q4779987
http://www.wikidata.org/entity/Q4836836
http://www.wikidata.org/entity/Q5984942
http://www.wikidata.org/entity/Q7067904
http://www.wikidata.org/entity/Q5120034
http://www.wikidata.org/entity/Q419849
http://www.wikidata.org/entity/Q2629234
http://www.wikidata.org/entity/Q904475
http://www.wikidata.org/entity/Q4352981
http://www.wikidata.org/entity/Q12744507
http://www.wikidata.org/entity/Q5203006
http://www.wikidata.org/entity/Q161294
http://www.wikidata.org/entity/Q8074586
http://www.wikidata.org/entity/Q4745983
http://www.wikidata.org/entity/Q2972710
http://www.wikidata.org/entity/Q424223
http://www.wikidata.org/entity/Q6927482
http://www.wikidata.org/entity/Q412805
http://www.wikidata.org/entity/Q5047057
http://www.wikidata.org/entity/Q7706553
http://www.wikidata.org/entity/Q6808812
http://www.wikidata.org/entity/Q3351791
http://www.wikidata.org/entity/Q2706622
http://www.wikidata.org/entity/Q2288772
http://www.wikidata.org/entity/Q151446
http://www.wikidata.org/entity/Q618730
http://www.wikidata.org/entity/Q21045227
http://www.wikidata.org/entity/Q743705
http://www.wikidata.org/entity/Q6518814
http://www.wikidata.org/entity/Q44944
http://www.wikidata.org/entity/Q3596763
http://www.wikidata.org/entity/Q286793
http://www.wikidata.org/entity/Q958387
http://www.wikidata.org/entity/Q4674080
http://www.wikidata.org/entity/Q413762
http://www.wikidata.org/entity/Q18209791
http://www.wikidata.org/entity/Q418564
http://www.wikidata.org/entity/Q7316807
http://www.wikidata.org/entity/Q7277486
http://www.wikidata.org/entity/Q2281857
http://www.wikidata.org/entity/Q421235
http://www.wikidata.org/entity/Q18347446
http://www.wikidata.org/entity/Q19904197
http://www.wikidata.org/entity/Q424851
http://www.wikidata.org/entity/Q1490748
http://www.wikidata.org/entity/Q6951371
http://www.wikidata.org/entity/Q425165
http://www.wikidata.org/entity/Q424541
http://www.wikidata.org/entity/Q5519258
http://www.wikidata.org/entity/Q4641536
http://www.wikidata.org/entity/Q379123
http://www.wikidata.org/entity/Q4832281
http://www.wikidata.org/entity/Q553129
http://www.wikidata.org/entity/Q7181438
http://www.wikidata.org/entity/Q423223
http://www.wikidata.org/entity/Q424931
http://www.wikidata.org/entity/Q422652
http://www.wikidata.org/entity/Q5398839
http://www.wikidata.org/entity/Q748200
http://www.wikidata.org/entity/Q16634590
http://www.wikidata.org/entity/Q414317
http://www.wikidata.org/entity/Q4973576
http://www.wikidata.org/entity/Q8213894
http://www.wikidata.org/entity/Q6681547
http://www.wikidata.org/entity/Q2436886
http://www.wikidata.org/entity/Q3151476
http://www.wikidata.org/entity/Q908742
http://www.wikidata.org/entity/Q5009205
http://www.wikidata.org/entity/Q19597398
http://www.wikidata.org/entity/Q3429576
http://www.wikidata.org/entity/Q418886
http://www.wikidata.org/entity/Q7251822
http://www.wikidata.org/entity/Q3546864
http://www.wikidata.org/entity/Q4782227
http://www.wikidata.org/entity/Q7681179
http://www.wikidata.org/entity/Q14521943
http://www.wikidata.org/entity/Q15409426
http://www.wikidata.org/entity/Q4545640
http://www.wikidata.org/entity/Q1072477
http://www.wikidata.org/entity/Q417674
http://www.wikidata.org/entity/Q5261117
http://www.wikidata.org/entity/Q12746850
http://www.wikidata.org/entity/Q347621
http://www.wikidata.org/entity/Q21098991
http://www.wikidata.org/entity/Q620072
http://www.wikidata.org/entity/Q410281
http://www.wikidata.org/entity/Q420212
http://www.wikidata.org/entity/Q4639568
http://www.wikidata.org/entity/Q1027605
http://www.wikidata.org/entity/Q5385177
http://www.wikidata.org/entity/Q4674081
http://www.wikidata.org/entity/Q5135146
http://www.wikidata.org/entity/Q10859673
http://www.wikidata.org/entity/Q4914076
http://www.wikidata.org/entity/Q15628029
http://www.wikidata.org/entity/Q20707021
http://www.wikidata.org/entity/Q19414
http://www.wikidata.org/entity/Q4353551
http://www.wikidata.org/entity/Q7860340
http://www.wikidata.org/entity/Q415945
http://www.wikidata.org/entity/Q5272281

Bot importations

Latest comment: 7 years ago10 comments6 people in discussion

@‎ProteinBoxBot, ‎SoCalChemBot, ‎TaxonBot:, @Doc Taxon, Sebotic, Andrawaag:. Please announce your importation campaign about chemicals and other proteins in this page in order to give the end of the campaign. This will help for the data curation and prevent any bot reimportation of bad data after a manual correction.

Then we have to think about the future: we can't just let the bots operate in the same manner in the future, after data curation. Even if a database is providing some data, we can't just erase what will be present in WD after a manual data curation. So next bot actions should avoid any data deletion and focus on data comparison with report generation indicating conflicts.

Then a remark for those bots adding molar mass as mass to chemicals. This is not a very good solution because this data can mix monoisotopic mass and average molecular mass. The best would be to provide only the number of the different atoms and to let the people calculate the molecular mass according to their own choice.

Thank you. Snipre (talk) 08:12, 7 October 2016 (UTC)

+1.--Kopiersperre (talk) 11:07, 8 October 2016 (UTC)

+1 --Ghilt (talk) 16:24, 9 October 2016 (UTC)

+1 --Mabschaaf (talk) 16:38, 9 October 2016 (UTC)

+1. A remark concerning TaxonBot: This task of adding ECHA Substance Infocard ID (P2566) based on CAS Registry Number (P231) was done based on pre-curated data. I am currently working on the remaining issues. --Leyo 17:40, 9 October 2016 (UTC)

@Snipre: Ok, so if that runs as planned, ChEBI compound imports should be done by Wednesday, after that, there will be an UNII compound import run of another 5 days, maybe I find a way to do imports faster. Regarding the curation: All of the newly imported/created items are in good shape, centered around an InChI key. The items which need human intervention are listed above (~1,030). These need to be centered around one InChI key too. In addition to those items, there are about another ~1000 where there are still some wrong IDs on them (CAS, UNII, ChEBI, ChEMBL), but SMILES, InChI (key), PubChem CID and ChemSpider are ok. I can remove these with a bot automatically.

For curation after a bot run: In order to make sure that a bot does not overwrite good curation, the curated values need to have good references according to the Wikidata ref guidelines for databases, otherwise, any curation work is futile, as this is essential for a bot to recognize human curation. But this only works for statements; labels, description, aliases do not have refs. In addition, it is not realistic to only do one time imports, because the original data sources evolve, improve, and expand.

So these need to be kept in sync. What happens if there is a one-time import of data to Wikidata and afterwards no constant sync, has been demostrated by importing chemical compound data from various infoboxes of various Wikipedias basically once and then not caring for continuous syncs any more, this is one major contributor to why there are still a ton of issues in the chem space of Wikidata.

So if we can agree on the mandatory requirement for good refs, I will modify my bots in a way to always keep the manually curated parts with good refs. Otherwise, there is no way to find out who made a good contribution. A list of user names is not a good way, because this will exlude anyone not on that list. Furthermore, keeping everything which has been contributed by users is also not a good way, because I have seen many wrong contributions because the users had either no idea of chemistry or were playing one of these curation games and got it wrong there. Ideas or suggestions?

Regarding the mass: What I import is the monoisotopic mass as stated in PubChem, I can add a qualifier to make that more explicit, but I cannot see how a data user should be able to calculate an average mass if the user does not know the isotopic distribution. But I can certainly add average mass as well (Or any other). Sebotic (talk) 07:58, 10 October 2016 (UTC)

@Sebotic: Thanks for your answer. My concern is currently about the duplication of items about the same chemical: the bots add data once in one item then in the second item. This can be solved by merging but the problem is to be sure that both items are about the same chemical. Then we arrive to the second problem: the confusion between mixture of stereoisomers and pure stereoisomers. This is a real problem especially for the CAS number. I have huge problem to curate CAS numbers because to few databases provide this identifier.

Concerning curation currently I delete and merge, no new addition. The problem is that sometimes the original databases are wrong (typical case of confusion between mixture of stereoisomers and pure stereoisomers) and I can't replace the wrong data by a correct one (typical case for CAS number). In that case reference doesn't help. And this why the sync is not a good idea: after data import and curation WD is not more a compilation of what is given by other databases, but is a database and should considered like this. Future imports are not more possible, only comparison and conflict reports should be generated. No more massive bot actions, only manual correction based on bot comparison. I agree with you about the references as key element to judge if a data should be kept but in the future bots will only play the role of data comparison and not more data import or correction (in large extend at least).

Bot work is not a problem but you have to agree that their actions will change after first import: sync is the not the goal, the goal is to provide a coherent set of data about one topic. If we agree on that, then we can go to the next step which is the definition of a system where bots provide some reports and contributors use them to curate and correct. So please don't work alone in your corner but try to work with this project when dealing with chemicals. The case of the mass is a good example of the lack of discussion: even if you are working based on a good reasoning, nobody knows which kind of mass you imported and no rules are defined for future data imports or addition. So the risk is very high that without guidelines, after some weeks people mix different data using the same property. Snipre (talk) 08:53, 10 October 2016 (UTC)

@Snipre: I agree that orienting around CAS is not a good idea, but this ID is so widely used that we should add it if we can. Therefore, I strongly advocate for using InChI keys, these are tied to the structure and uniquely identify a compound. So my basic premise is: The structure comes from scientific literature or chem/pharma companies, at Wikidata, we do not have the means to make a comment on the structure of a compound, it can be good or bad or incomplete. This is why I think that for some compounds, we will need to live with 2 or more versions of a compound, because the real, true structure is incomplete/wrong versions of the structure exist. The connectivity can serve as the common basis (in most cases there is no disagreement on that) but the isomerism might differ. These different isomers can be connected to each other using Wikidata properties, and can be detected by using SPARQL queries. And over time, hopefully, many of those will resolve, but certainly, we will not reach a point where each and every compound has a high quality structure. So if I add 2 compounds with the same connectivity but differeent isomer info, how do you know that one is better than the other, or they are just 2 different isomers? I see no problem in having parallel versions. We can also have a compound without any stereoisomeric info as a minimum requirement and one or more defined stereoisomers, ideally of good quality.

Regarding bot imports: I completely agree that Wikidata is an independent database and should not be the aggregator of other databases. In the contrary, we should make use of our flexibility and community curation. That said, senior figures in PubChem have told me personally that they are interested in taping/using the community curation done in Wikidata. That said, I think we need to find mechanisms to not touch the community curated things, but still import the improvements made in the original sources. Moreover, we definitely need the new compounds added from those resources, because these are usually compounds of high interest (e.g. in Drugbank, UNII) with high medical/biological relevance and of public interest and also with biologic activity. As I said above, I think proper references are one way of doing that.

Regarding import efforts and import of special data: I agree that I should have discussed the import of 'mass' beforehand. I will also put up any import campaigns beforehand.

For error detection, I think we should use SPAQL queries. I also have a bot which can continuously check if SMILES, and InChI (key) are consistent on an item and file a report if not. Sebotic (talk) 21:14, 10 October 2016 (UTC)

@Sebotic: I don't have any problem with CAS number, my problem is when you import the CAS number from Pubchem and I delete it in WD because someone did a mistake by importing the wrong CAS number in PubChem. I can't provide the correct CAS number because I don't have access to SciFinder and most of the time Google can't provide a good answer. The main problem is the data curation in PubChem: they should do the same as us and analyze their CAS number to check if they are correct with their structure.

Just have look at this compound in PubChem as example: someone put the wrong chemical formula as title for this entry in PubChem database. I can't change that wrong data so please don't reimport it with your bot. This is my only concern.

About stereoisomers I think we should focus only on two kind of compounds: the compounds which are completely defined and the one which not at all defined. The latter having a role of grouping all possible completely defined stereoisomers. We don't need to create items for all possible stereoisomers in a systematic way but when we have confusion about mixture and pure forms we should split the data in order to avoid the confusion in the future.

For your other remarks I think we agree together about the main principle. The only difference I think I have compare to your approach is the fact aboiut importing: I don't think we need to import data in WD to curate them after. Once we have a quite stable set of chemicals in WD we can work by comparison using bots and then create conflict reports. And only after a manual check we can import the data from others databases. If we agree on that we will avoid a lot of discussion later.

For now I am working with report of constraint violations: every day I can see the results of the curation. I don't need to use sparql for the moment. But if you want to create the querie just do it. Snipre (talk) 22:28, 10 October 2016 (UTC)

@Snipre: Regarding CAS numbers: I agree that PubChem does not do the best possible job here. But the CAS numbers I import are actually from UNII and ChEBI, not directly from PubChem. Still these could be incorrect. I have access to SciFinder, Reaxys, etc, but I am very sure that I am not allowed to do a systematic import of CAS or Beilstein numbers to WD. What could be a way to go, is to ask ACS directly if they would be willing to contribute a InChI key to CAS number mapping file.

On the importation: In principle, I agree that it's a good idea to detect and log conflicts instead of overwriting. Two questions here: Is that feasible for thousands of items? And where would I post such a list of conflicts, so it can be processed really in a fast manner? Text, which then needs to be copy and pasted around by a curator is not a good way. I also agree on the stereochemistry part, either fully defined stereochemistry or no stereochemistry. But I think we need some flexibility here, because for very many important compounds, the only stereochemistry which exists is partial. What I have seen, this is very common for naturally occurring, larger molecules. But for cases where several stereoisomers exist, take only the fully defined ones. Sebotic (talk) 23:40, 10 October 2016 (UTC)

ZVG number (P679)

Latest comment: 7 years ago6 comments3 people in discussion

I suggest importing the remaining ZVG number (P679) based on CAS Registry Number (P231) and/or EC number (P232). This task is unlikely to create significant issues. Of the 8745 ZVG numbers available in an Excel list from there, we currently have slightly less than half. --Leyo 15:19, 10 October 2016 (UTC)

@Leyo: I'm reluctant to import entries like 900063. The remaining ZVG entries seem not relevant to me, but when anyone imports data, we should do so, too.--Kopiersperre (talk) 16:14, 10 October 2016 (UTC)

I did not ask for the creation of new items based on this list. IMHO it is sufficient to add those with a match in either CAS or EC number (or both). --Leyo 17:46, 10 October 2016 (UTC)

@Leyo: Sorry for getting you wrong. The import was done by this Mix-n-Match catalog and can be resumed at any time. But I think, there is not much to do.--Kopiersperre (talk) 21:46, 10 October 2016 (UTC)

Before any importation can we once do a data comparison ? For example: take the list of CAS numbers from Gestis, match the items with their CAS number and then compare the EINECS number from Gestis with the EINECS number from WD. If both CAS number and EINECS number match then we can think about data importation if and only if the CAS number is used only once. My concern about Gestis is the fact that Gestis can use several times the same CAS number/EINECS number like for hydrogen chloride and hydrochloric acid solution (two ZVG numbers but one CAS number and one EINECS number).

But before any importation we have to solve all violations of the constraints for CAS numbers and EINECS numbers. Snipre (talk) 21:08, 10 October 2016 (UTC)

There are only very few contraint violations for the latter. Unclear cases should be skipped, and if possible, listed for manual review. --Leyo 23:37, 10 October 2016 (UTC)

Creating items for Cosmetic properties

Latest comment: 7 years ago3 comments2 people in discussion

In the COSING EU database about cosmetics, the Chemical components have one or several cosmetic properties. We might want to create those before importing COSING data --Teolemon (talk) 15:57, 16 October 2016 (UTC)

en:ABRASIVE
en:definition:Removes materials from various body surfaces or aids mechanical tooth cleaning or improves gloss

en:ABSORBENT
en:definition:Takes up water- and/or oil-soluble dissolved or finely dispersed substances

en:ANTICAKING
en:definition:Allows free flow of solid particles and thus avoids agglomeration of powdered cosmetics into lumps or hard masses

en:ANTICORROSIVE
en:definition:Prevents corrosion of the packaging

en:ANTIDANDRUFF
en:definition:Helps control dandruff

en:ANTIFOAMING
en:definition:Suppresses foam during manufacturing or reduces the tendency of finished products to generate foam

en:ANTIMICROBIAL
en:definition:Helps control the growth of micro-organisms on the skin

en:ANTIOXIDANT
en:definition:Inhibits reactions promoted by oxygen, thus avoiding oxidation and rancidity

en:ANTIPERSPIRANT
en:definition:Reduces perspiration

en:ANTIPLAQUE
en:definition:Helps protect against plaque

en:ANTISEBORRHOEIC
en:definition:Helps control sebum production

en:ANTISTATIC
en:definition:Reduces static electricity by neutralising electrical charge on a surface

en:ASTRINGENT
en:definition:Contracts the skin

en:BINDING
en:definition:Provides cohesion in cosmetics

en:BLEACHING
en:definition:Lightens the shade of hair or skin

en:BUFFERING
en:definition:Stabilises the pH of cosmetics

en:BULKING
en:definition:Reduces bulk density of cosmetics

en:CHELATING
en:definition:Reacts and forms complexes with metal ions which could affect the stability and/or appearance of cosmetics

en:CLEANSING
en:definition:Helps to keep the body surface clean

en:COSMETIC COLORANT
en:definition:Colours cosmetics and/or imparts colour to the skin and/or its appendages. All colours listed are substances on the positive list of colorants (Annex IV of the Cosmetics Directive)

en:DENATURANT
en:definition:Renders cosmetics unpalatable. Mostly added to cosmetics containing ethyl alcohol

en:DEODORANT
en:definition:Reduces or masks unpleasant body odours

en:DEPILATORY
en:definition:Removes unwanted body hair

en:DETANGLING
en:definition:Reduces or eliminates hair intertwining due to hair surface alteration or damage and, thus, helps combing

en:EMOLLIENT
en:definition:Softens and smooths the skin

en:EMULSIFYING
en:definition:Promotes the formation of intimate mixtures of non-miscible liquids by altering the interfacial tension

en: EMULSION STABILISING
en:definition:Helps the process of emulsification and improves emulsion stability and shelf-life

en: FILM FORMING
en:definition:Produces, upon application, a continuous film on skin, hair or nails

en: FLAVOURING
en:definition:Gives flavour to the cosmetic product

en: FOAM BOOSTING
en:definition:Improves the quality of the foam produced by a system by increasing one or 
more of the following properties: volume, texture and/or stability

en: FOAMING
en:definition:Traps numerous small bubbles of air or other gas within a small volume of liquid by modifying the surface tension of the liquid

en: GEL FORMING
en:definition:Gives the consistency of a gel (a semi-solid preparation with some elasticity) to a liquid preparation

en: HAIR CONDITIONING
en:definition:Leaves the hair easy to comb, supple, soft and shiny and/or imparts volume, lightness, gloss, etc.

en: HAIR DYEING
en:definition:Colours hair

en: HAIR FIXING
en:definition:Permits physical control of hair style

en: HAIR WAVING OR STRAIGHTENING
en:definition:Modifies the chemical structure of the hair, allowing it to be set in the style required

en: HUMECTANT
en:definition:Holds and retains moisture

en: HYDROTROPE
en:definition:Enhances the solubility of substance which is only slightly soluble in water

en: KERATOLYTIC
en:definition:Helps eliminate the dead cells of the stratum corneum

en: MASKING
en:definition:Reduces or inhibits the basic odour or taste of the product

en: MOISTURISING
en:definition:Increases the water content of the skin and helps keep it soft and smooth

en: NAIL CONDITIONING
en:definition:Improves the cosmetic characteristics of the nail

en: NOT REPORTED
en:definition:NOT REPORTED

en: OPACIFYING
en:definition:Reduces transparency or translucency of cosmetics

en: ORAL CARE
en:definition:Provides cosmetic effects to the oral cavity, e.g. cleansing, deodorising, protecting

en: OXIDISING
en:definition:Changes the chemical nature of another substance by adding oxygen or removing hydrogen

en: PEARLESCENT
en:definition:Imparts a nacreous appearance to cosmetics

en: PERFUMING
en:definition:Used for perfume and aromatic raw materials (Section II)

en: PLASTICISER
en:definition:Softens and makes supple another substance that otherwise could not be easily deformed, spread or worked out

en: PRESERVATIVE
en:definition:Inhibits primarily the development of micro-organisms in cosmetics. All preservatives listed are substances on the positive list of preservatives (Annex VI of the Cosmetics Directive)

en: PROPELLANT
en:definition:Generates pressure in an aerosol pack, expelling contents when the valve is opened. Some liquefied propellants can act as solvents

en: REDUCING
en:definition:Changes the chemical nature of another substance by adding hydrogen or removing oxygen

en: REFATTING
en:definition:Replenishes the lipids of the hair or of the top layers of the skin

en: REFRESHING
en:definition:Imparts a pleasant freshness to the skin

en: SKIN CONDITIONING
en:definition:Maintains the skin in good condition

en: SKIN PROTECTING
en:definition:Helps to avoid harmful effects to the skin from external factors

en: SMOOTHING
en:definition:Seeks to achieve an even skin surface by decreasing roughness or irregularities

en: SOLVENT
en:definition:Dissolves other substances

en: SOOTHING
en:definition:Helps lightening discomfort of the skin or of the scalp

en: STABILISING
en:definition:Improves ingredients or formulation stability and shelf-life

en: SURFACTANT
en:definition:Lowers the surface tension of cosmetics as well as aids the even distribution of the product when used

en: TANNING
en:definition:Darkens the skin with or without exposure to UV

en: TONIC
en:definition:Produces a feeling of well-being on skin and hair

en: UV ABSORBER
en:definition:Protects the cosmetic product from the effects of UV-light

en: UV FILTER
en:definition:Filters certain UV rays in order to protect the skin or the hair from harmful effects of these rays. All UV filters listed are substances on the positive list of UV filters (Annex VII of the Cosmetics Directive)

en: VISCOSITY CONTROLLING
en:definition:Increases or decreases the viscosity of cosmetics

The best is to use the property has use (P366) for that list. Snipre (talk) 20:01, 16 October 2016 (UTC)

I actually have a multilingual taxonomy with many more languages - http://en.wiki.openbeautyfacts.org/Global_properties_taxonomy --Teolemon (talk) 21:07, 16 October 2016 (UTC)

Importing Pigment CICN numbers (Colour Index)

Latest comment: 7 years ago5 comments2 people in discussion

The Colour Index International constitution ID (P2027) has been created a while ago.
Many CICN numbers are present in labels or infobox ("CI 12345", "C.I. 12345", "Colour Index 12345").
So far, as I was looking for Mix N'Match import candidates, I've found short lists of pigments that are 20-100 values long.
However, the labels and external databases seem to have most of them.

Is it possible to source them from an external db or to REGEX them out from labels ?

It would be tremendously useful for Open Beauty Facts, that way we could decypher what's in your shampoo: List of ingredients of your favorite shampoo

--Teolemon (talk) 08:46, 17 October 2016 (UTC)

Some data can be found here but CI number is a non-free system. Snipre (talk) 10:00, 17 October 2016 (UTC)

My understanding is that having the identifiers on an item to link to their system is not an issue ? If they claim it's proprietary, this is very disturbing, since it is used on all the cosmetics you use daily, as if it was a standard… --Teolemon (talk) 12:11, 17 October 2016 (UTC)

List ready at the bottom of the page. What I think is the CAS number and the CICN. I'm not quite sure how to add statements based on another statement. http://en.wiki.openbeautyfacts.org/Global_colour_index_taxonomy--Teolemon (talk) 12:53, 17 October 2016 (UTC)

The problem is not to import the data, the problem is to access the data. I don't think there is a free database with all CI numbers. We can use the numbers but we can't import all the database in WD. Snipre (talk) 14:02, 17 October 2016 (UTC)

Creating Wikidata Items for GHS hazard statements

Latest comment: 7 years ago7 comments4 people in discussion

Currently the GHS hazard statements are stored as strings into Items. I feel that creating items for each GHS hazard statement could be interesting, esp ecially since the H302 will translate not only to an English sentence, but to sentences in many languages. Here's what I have done for Open Beauty Facts

--Teolemon (talk) 15:41, 16 October 2016 (UTC)

We can create the new properties under Wikidata:Property_proposal/Natural_science#Chemistry. Snipre (talk) 20:04, 16 October 2016 (UTC)

@Teolemon: That's what I proposed here. I think we need no property, we should just change P728 (P728) and P940 (P940) from string to property.--Kopiersperre (talk) 20:08, 16 October 2016 (UTC)

@Kopiersperre: We can't change the datatype of a property: we have to create a new one. Snipre (talk) 09:22, 24 October 2016 (UTC)

Since the properties are used in very few items, we may remove them all and then change the datatype. --Leyo 17:56, 24 October 2016 (UTC)

My source is : http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32001L0059:EN:HTML --Teolemon (talk) 21:31, 16 October 2016 (UTC)

Well, this is about the [[system previous to GHS (CLP in the EU), i.e. Dangerous Substances Directive (Q899329). --Leyo 17:56, 24 October 2016 (UTC)

Some resources

Latest comment: 7 years ago1 comment1 person in discussion

Just to invite you to share your tools I create a section for the SPARQL queries related to chemistry in Wikidata:WikiProject_Chemistry/Tools#SPARQL_queries and to avoid reimport of wrong data from external databases, please report all errors in Wikidata:WikiProject_Chemistry/References#Report_of_errors_in_reference_databases. I hope we will find a way to contact once the administrators of the different databases to inform them about some problems in their dataset. Snipre (talk) 11:46, 19 October 2016 (UTC)

IECIC id (cosmetics in China)

Latest comment: 7 years ago1 comment1 person in discussion

Please review https://www.wikidata.org/wiki/Wikidata:Property_proposal/IECIC_id --Teolemon (talk) 07:18, 20 October 2016 (UTC)

Qualifier for reactions

Latest comment: 7 years ago2 comments2 people in discussion

Notified participants of WikiProject Chemistry

Has anyone figured out how to document reactions? Or is there a place this is already being discussed? Over on meta wiki I discovered there has been a proposal for a Wikichem (see discussion) which I think is a great idea but could use more feedback. And I foresee it's implementation depending on how much wikidata can support. Devon Fyson (talk) 05:25, 11 November 2016 (UTC)

@Devon Fyson: With WD, we don't need to create a new structure Wikichem. And when you see the activity in this project, which is the most similar to a Wikichem, I think you can easily deduce that a Wikichem will have very few contributors.

About reaction, this is no rule or model but if you want you can start a section under Wikidata:WikiProject_Chemistry/Tools and put a draft of reaction model. Snipre (talk) 20:06, 15 November 2016 (UTC)