Wikidata:Property proposal/Natural science
Property proposal: | Generic | Authority control | Person | Organization |
Creative work | Place | Sports | Sister projects | |
Transportation | Natural science | Computing | Lexeme |
See also
edit- Wikidata:Property proposal/Pending – properties which have been approved but which are on hold waiting for the appropriate datatype to be made available
- Wikidata:Properties for deletion – proposals for the deletion of properties
- Wikidata:External identifiers – statements to add when creating properties for external IDs
- Wikidata:Lexicographical data – information and discussion about lexicographic data on Wikidata
This page is for the proposal of new properties.
Before proposing a property
- Search if the property already exists.
- Search if the property has already been proposed.
- Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
- Select the right datatype for the property.
- Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
- Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.
Creating the property
- Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
- Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
- See property creation policy.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/12. |
Physics/astronomy
edit- Please review Wikidata:WikiProject Physics before proposing. Ping members of project using {{Ping project|Physics}}
- See also Wikidata:Property proposal/Pending for approved items awaiting the deployment of currently unavailable datatypes.
- Please look at Wikidata:List of properties/science/natural science before proposing a property.
SIMBAD catalog properties (used more than 1 million times)
editGaia Data Release 2 ID
editDescription | identifier for an astronomical object in Gaia Data Release 2 |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | [0-9]{18} |
Example 1 | BS Cnc (Q2889194) → 661284024235415808 |
Example 2 | Gliese 450 (Q5880899) → 4031586157514097024 |
Example 3 | TYC 3645-2080-1 (Q75838267) → 1943381923013901440 |
Source | Gaia Data Release 2 (Q51905050) |
Planned use | migrate all P528 values qualified with P972 Q51905050 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR2%20$1 |
2MASS ID
editDescription | identifier for an astronomical object in the Two Micron All Sky Survey |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | J[0-9]{8}[+-][0-9]{7} |
Example 1 | BS Cnc (Q2889194) → J08390909+1935327 |
Example 2 | Gliese 450 (Q5880899) → J11510737+3516188 |
Example 3 | TYC 3645-2080-1 (Q75838267) → J23350993+4851114 |
Source | 2MASS (Q1454942) |
Planned use | migrate all P528 values qualified with P972 Q1454942 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=2MASS%20$1 |
Tycho-2 Catalogue ID
editDescription | identifier for an astronomical object in the Tycho-2 Catalogue |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | [0-9]{1,4}-[0-9]{1,4}-1 |
Example 1 | BS Cnc (Q2889194) → 1395-2445-1 |
Example 2 | Gliese 450 (Q5880899) → 2526-2357-1 |
Example 3 | TYC 3645-2080-1 (Q75838267) → 3645-2080-1 |
Source | The Tycho-2 catalogue of the 2.5 million brightest stars (Q2725928) |
Planned use | migrate all P528 values qualified with P972 Q2725928 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=TYC%20$1 |
Gaia Data Release 1 ID
editDescription | identifier for an astronomical object in Gaia Data Release 1 |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | [0-9]{18} |
Example 1 | BS Cnc (Q2889194) → 661284019938140032 |
Example 2 | Gliese 450 (Q5880899) → 4031586157514097024 |
Example 3 | TYC 3645-2080-1 (Q75838267) → 1943381923012780160 |
Source | Gaia Data Release 1 (Q37859523) |
Planned use | migrate all P528 values qualified with P972 Q37859523 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR1%20$1 |
SDSS object ID
editDescription | identifier for an astronomical object in the Sloan Digital Sky Survey |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Allowed values | J[0-9]{6}\.[0-9]{2}[+-][0-9]{7}\.[0-9] |
Example 1 | BS Cnc (Q2889194) → J083909.03+193532.4 |
Example 2 | Gliese 450 (Q5880899) → J115106.57+351627.2 |
Example 3 | TYC 3645-2080-1 (Q75838267) → J233509.93+485111.4 |
Source | Sloan Digital Sky Survey (Q840332) |
Planned use | migrate all P528 values qualified with P972 Q840332 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=SDSS%20$1 |
OGLE-III object ID
editDescription | identifier for an astronomical object in the Optical Gravitational Lensing Experiment |
---|---|
Data type | External identifier |
Domain | astronomical objects |
Example 1 | R99 (Q22087000) → BRIGHT-LMC-MISC-429 |
Example 2 | R85 (Q28406638) → BRIGHT-LMC-MISC-9 |
Example 3 | SV* HV 2827 (Q74703824) → LMC-CEP-4689 |
Source | The Optical Gravitational Lensing Experiment. The OGLE-III catalog of variable stars. I. Classical Cepheids in the Large Magellanic Cloud (Q67054966) |
Planned use | migrate all P528 values qualified with P972 Q67054966 to this property |
Formatter URL | https://simbad.u-strasbg.fr/simbad/sim-id?Ident=OGLE%20$1 |
Motivation
editThe specific combination of catalog code (P528) qualified by catalog (P972) is used in 24 million statements, the vast majority of which are for astronomical objects. About 14 million of these statements come from six catalogues, so migrating those statements to use these properties would remove the 14 million triples taken up by the P972 qualifiers. (Another 18 catalogues have more statements than the number of statements for inventory number (P217) with qualifier collection (P195) The Palace Museum (Q2047427)—127545 as of 6 August 2024.)
(This migration would similar to the migration that took place after the properties proposed at Wikidata:Property proposal/proper motion components were created. While this page intends to handle only the six largest catalogues, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment.) Mahir256 (talk) 21:56, 6 August 2024 (UTC)
Discussion
edit- @Mahir256 Is there any specific reason why we want to reduce number of P528 statements? Ghuron (talk) 00:03, 7 August 2024 (UTC)
- @Ghuron: We have dedicated external identifier properties rather than lumping them all in a single property and qualifying them, just as we have dedicated website account properties rather than always using website account on (P553) qualified with website username or ID (P554). This proposal is intended as a logical parallel of both of those decisions. Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- @Mahir256: Let me rephrase how I understood your rationalization: if
p:P528/pq:P972 wd:Q51905050
occurs more than a million times, then it is both a necessary and sufficient condition for creating a new property, since it reduces the number of triplets and thus reduces the risk of Blazegraph crashing. Is that a correct summary? Ghuron (talk) 22:44, 12 August 2024 (UTC)- @Ghuron: I would not phrase it quite so absolutely, but I do want to see the number of triples reduced and believe this is a way to do it; an extremely high number of identically structured uses of a generic identification property like catalog code (P528) with the same qualifiers suggests that a more specialized identifier property is worth introducing to streamline things, just as has been done multiple times before. Mahir256 (talk) 16:50, 13 August 2024 (UTC)
- @Mahir256: Let me rephrase how I understood your rationalization: if
- @Ghuron: We have dedicated external identifier properties rather than lumping them all in a single property and qualifying them, just as we have dedicated website account properties rather than always using website account on (P553) qualified with website username or ID (P554). This proposal is intended as a logical parallel of both of those decisions. Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- See also Wikidata:Property proposal/New General Catalogue ID, and various failed proposal for properties for astronomical catalogues such as Wikidata:Property_proposal/Archive/15#HD.--GZWDer (talk) 12:28, 7 August 2024 (UTC)
- NGC ID is actually an example of a misleading external id. This is a very old catalog, and historians are debating how their IDs correspond to objects in modern catalogs. The most authoritative source for that discussion is this site, which is difficult to assign as "formatter URL". SEDS which is used now, is ok only for ~80% of elements. Ghuron (talk) 18:58, 7 August 2024 (UTC)
- I also proposes (since we have mul aliases) add each of catalog IDs as mul aliases. This is controversial though.--GZWDer (talk) 12:32, 7 August 2024 (UTC)
- Support Having unique identifiers for astronomical objects and being able to correlate them is important; something hard to do with catalog code. ArthurPSmith (talk) 20:45, 7 August 2024 (UTC)
- Oppose I don't think this proposal will improve anything. If anything it may cause further confusion:
- As stated by Ghuron, is there any reason why we need to reduce the number of P528 statements? In the first place there are millions of Gaia IDs because of the import of the Simbad database (I am NOT against this import btw).
- Also, I wonder why only some catalogues would have their own properties. This will create a weird in-between for catalogues in P258 vs catalogues having their own properties. This makes no sense imo.
- Romuald 2 (talk) 15:31, 8 August 2024 (UTC)
- There is nothing wrong with having separate external id properties for most used identifiers with the correct "url formatter".
But I have 2 major objections:
- I don't see any reason to use https://simbad.u-strasbg.fr/simbad/sim-id?Ident= as a url. Those items that are on simbad, we already have Property:P3083 with the link to simbad. Those rare items that are not on simbad, this link will result in 404
- Having in mind (1) it would make sense to link to really useful external storages, that are only partially synchronized with simbad (like HyperLEDA or Gaia Archive). And that leads us to question about proposed set of properties:
- Why did we choose Gaia DR2, because this is only temporary IDs, permanent are Gaia DR3?
- Why did we choose Tycho-2, they pretty much 100% imported in Simbad?
- Ghuron (talk) 12:52, 9 August 2024 (UTC)
- @Romuald 2: Reducing the number of RDF triples that Wikidata consists of is generally a good thing, as there is a lot of discussion going on about the health of the Query Service and how reducing the number of triples that a single running Blazegraph instance holds is generally a good thing. Also I had noted that there were 18 other catalogs with more entries than the most frequent inventory number source; I only didn't add them to this page because it would have got too long. If these six go through, then I will promptly propose properties for those 18 (and as I stated in the motivation above, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment). Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- @Ghuron: The reason I selected the SIMBAD formatter URL is that the external IDs I tried with that URL all seemed to resolve to the right objects; if there are in fact objects for which this resolution doesn't work, it would be great if you could name some. The caveat "(used more than 1 million times)" in the title of this property proposal page is important; because your imports did not yield more than 1 million Gaia DR3 identifiers, I did not think to propose a property for it here, though I'd gladly support one for Gaia DR3 if you think it would be useful. I don't know who "we" is as regards either Gaia DR2 or Tycho-2; you're the one who mass-imported the objects, so I'm working with the catalog codes I see on those objects. Mahir256 (talk) 17:18, 12 August 2024 (UTC)
- @Ghuron and @GZWDer, would you like to give your opinions? Regards, ZI Jony (Talk) 18:37, 16 September 2024 (UTC)
- I view external identifiers somewhat differently than @Mahir256. In my understanding, a new external identifier is needed when it provides a link to a new, previously unrelated external data source. In the proposed cases, we are getting connection to the same SIMBAD that we are already connected to via Property:P3083. Personally, the proposed identifiers will not bring me any valie (nor will they cause any harm).
I understand the idea that this will reduce the number of triplets, but I think that the measly few million that we are discussing here are a drop in the ocean. Our goal is to upload data to Wikidata, and not try to optimize it in a way that makes life easier for the foundation's engineers. Let them do their job and we will do ours. Ghuron (talk) 19:00, 16 September 2024 (UTC)
- I view external identifiers somewhat differently than @Mahir256. In my understanding, a new external identifier is needed when it provides a link to a new, previously unrelated external data source. In the proposed cases, we are getting connection to the same SIMBAD that we are already connected to via Property:P3083. Personally, the proposed identifiers will not bring me any valie (nor will they cause any harm).
- @Ghuron and @GZWDer, would you like to give your opinions? Regards, ZI Jony (Talk) 18:37, 16 September 2024 (UTC)
Biology
edit- Please visit Wikidata:WikiProject Taxonomy for more information. To notify participants use {{Ping project|Taxonomy}}
- Please visit Wikidata:WikiProject Biology for more information. To notify participants use {{Ping project|Biology}}
mode of reproduction
editDescription | ways for living organisms to propagate or produce their offsprings |
---|---|
Data type | Item |
Domain | taxon (Q16521) or organisms known by a particular common name (Q55983715) |
Allowed values | item |
Example 1 | mammal (Q7377)→sexual reproduction (Q182353) |
Example 2 | bacteria (Q10876)→cell division (Q188909) |
Example 3 | plant (Q756)→asexual reproduction (Q173432) |
Example 4 | plant (Q756)→sexual reproduction (Q182353) |
Planned use | Would like to enable specifying mode(s) of reproduction for any organism or taxon via this property, preferably with references. |
Expected completeness | always incomplete (Q21873886) |
Motivation
editCurrently, for the hundreds of thousands of Wikidata records related to taxa or organisms, there is no easy way to specify the mode of reproduction. This proposed property is intended to fill a gap. --Zhenqinli (talk) 04:37, 30 August 2024 (UTC)
Discussion
editNotified participants of WikiProject Biology. –Samoasambia ✎ 09:33, 30 August 2024 (UTC)
- This seems to be unnecessarily repetitive. All mammals (all vertebrates, even) reproduce by sexual means; all bacteria by cell division. We don't need to record this for every species of mammal, nor all species of bacteria. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:01, 30 August 2024 (UTC)
- Agreed that there is no need to specify this property for every species. For some, specification at the highest level of taxons would suffice. However, there is a great deal of diversity and variability in the biological world. Even just for vertebrates, the mode of reproduction could be: oviparity (Q212306), viviparity (Q120446), and ovoviviparity (Q192805). In short, this property would provide an option for clarifications when more explicit explanation(s) are needed. --Zhenqinli (talk) 13:28, 30 August 2024 (UTC)
- Oppose Taking into account the Pigsonthewing's comment, I think has characteristic (P1552) with any subclass of mode of biological reproduction (Q130077803) is sufficient. --Tinker Bell ★ ♥ 21:15, 1 September 2024 (UTC)
- Thanks for the feedbacks. Indeed, having has characteristic (P1552) with any subclass of mode of biological reproduction (Q130077803) is better than having no information regarding an organism's mode(s) of reproduction in Wikidata. Currently they are almost 300 taxon-related properties. Many of them could have been implemented in similar ways as suggested. In my personal opinion though, having a roundabout way to state a key feature of an organism, is not ideal. --Zhenqinli (talk) 21:46, 1 September 2024 (UTC)
- P.S. The description of has characteristic (P1552) does mention: "Use a more specific property when possible". This property is currently used in more than 200,000 statements, without constraints on subject (organism or taxon) or value (mode of reproduction) as this proposal would prefer. These facts will likely discourage systematic input of useful data and eventual WDQS query of mode of reproduction information using this property in Wikidata. --Zhenqinli (talk) 02:25, 2 September 2024 (UTC)
- Support; Zhenqinli makes a strong case against using has characteristic (P1552).
However, the proposal should be revised to reflect Andy's note – it's standard practice to apply statements only at the highest class (or taxon) at which they are universally true (and sometimes even higher, with qualification like nature of statement (P5102)=often (Q28962312)), a principle that Example 1 (at least) violates.[Edit: fixed 18:17, 12 September 2024 (UTC)] It doesn't seem like this property carries any special encouragement to violate that principle, but if it does, that could be addressed in a property usage note. Swpb (talk) 17:56, 9 September 2024 (UTC)
- Agree that in the first example, Homo sapiens (Q15978631) should probably be replaced by mammal (Q7377). As parent taxon (P171) is a subproperty of subclass of (P279), statements describing organisms at higher taxon ranks do not need to be re-stated at lower ranks of the class, so there will be no redundancy issue. --Zhenqinli (talk) 18:49, 9 September 2024 (UTC)
- I hope anyone who still has reservation about this proposal could help clarify if there are remaining open issues or alternatives to be discussed further. While diel cycle (P9566) does have more than 284,000 statements for animals, I believe this proposed property for all living organisms should require far less statements, since mode of reproduction is typically more well-defined biologically and commonly stated at higher taxon ranks than diel cycle (diel cycle could also be modified due to domestication). --Zhenqinli (talk) 18:09, 12 September 2024 (UTC)
- Weak support Infoboxes on Wikipedia might want to include the mode of reproduction and thus it's good to have it one it's own property that's separate from has characteristic (P1552).
- Currently, the problem is that the examples of the property are bad. It's not true that all plants have both sexual and asexual reproduction and thus it would be bad to make the statement for plants. ChristianKl ❪✉❫ 12:35, 1 October 2024 (UTC)
- Such a statement for plants could be qualified by nature of statement (P5102)=often (Q28962312), but I agree that an unqualified always-true statement would make a better example. Anything wrong with examples 1 and 2? Swpb (talk) 14:01, 1 October 2024 (UTC)
- Thanks for supporting the proposal. I, too, would like to see better examples. But I also think more examples could be introduced, improved or updated later. I believe the mode of reproduction is well-documented scientifically and systematically. Once introduced to Wikidata, this property can have comparable or better data quality and utilization compared with similar taxon-related properties such as is pollinated by (P1703), seed dispersal (P3741), longest observed lifespan (P4214), and diel cycle (P9566). --Zhenqinli (talk) 07:00, 18 October 2024 (UTC)
- This is a relatively complex field. Human (and mouse) parthenogenesis has been achieved, on an embryonic level. Gynogenesis is present in vertebrates, as is hybridogenesis. I imagine the viral reproduction we are familiar with is called lysogenesis, but I also imagine that there's more to viruses than they are letting on, and certainly there can be gene mixing (indeed there can be inter-species and even inter-kingdom gene mixing). So I suppose we would want a list with custom allowed. Would we also allow the use of this property on things that reproduce but normally considered living? All the best: Rich Farmbrough, 13:28, 19 November 2024 (UTC).
- Thanks for the informative comments. Indeed, this is an important and broad concept that is currently missing among existing Wikidata properties. Personally, I hope to see a new simple property to serve as a common denominator applicable to all taxa and organisms. The complexity of reproduction in the biological world could still be captured within combinations of value items and qualifiers, on an as-needed basis. For an example, the fact that sheeps could be reproduced via cloning can be expressed in the following statement: sheep (Q7368)→cloning (Q120877), with qualifiers observed in (P6531)=cloned mammal (Q57813806) and model item (P5869)=Dolly the Sheep (Q171433). --Zhenqinli (talk) 00:20, 20 November 2024 (UTC)
Duocet Wiki of Plants ID
editDescription | ID of a topic in Duocet Wiki of Plants |
---|---|
Data type | External identifier |
Example 1 | Orchidaceae (Q25308) => 兰科 |
Example 2 | Fendlera (Q144481) => 岩爪梅属 |
Example 3 | Lun Kai Dai (Q18984067) => 戴伦凯 |
Example 4 | Asteraceae (L1365348) => Asteraceae (note: this website has dedicated entries for taxonomic names) |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Number of IDs in source | 18757 |
Formatter URL | https://duocet.ibiodiversity.net/index.php?title=$1 |
Motivation
editAn online encyclopedia of plants.--GZWDer (talk) 18:06, 28 September 2024 (UTC)
Discussion
editNotified participants of WikiProject Botany Regards, ZI Jony (Talk) 07:25, 18 October 2024 (UTC)
- For identifiers, I expect to see a Wikidata item of the database. It is not listed here. Bluerasberry (talk) 13:08, 18 October 2024 (UTC)
homonymous taxon
editDescription | taxon item of which the taxon name is an exact homonym |
---|---|
Represents | homonym (Q902085) |
Data type | Item |
Example 1 | Nectria (Q18616886)←homonym of→Nectria (Q2708290) |
Example 2 | Lactarius (Q1900906)←homonym of→Lactarius (Q748899) |
Example 3 | Leptosomus (Q2623737)←homonym of→Leptosomus (Q67015908) |
Type constraint – instance of | taxon (Q16521) |
Wikidata project | WikiProject Taxonomy (Q8503033) |
Property constraints
edit- subject type constraint (Q21503250)
- value-type constraint (Q21510865)
- item-requires-statement constraint (Q21503247)
- symmetric constraint (Q21510862)
- allowed qualifiers constraint (Q21510851)
- property scope constraint (Q53869507)
For the following I'm not sure, until now I used it in order to make it clear that two homonyms are different and should not be confused but with the new property, since there will be a symmetric constraint, I don't know if is is necessar, me I would put that constraint, but well if someone has an arguments against, fine:
Christian Ferrer (talk) 15:55, 3 December 2024 (UTC)
Motivation
editThis proposal comes after this discussion [1] (see also this query), so at first it is planned to use it in order to harmonize and regulate the current ways of modeling taxonomic homonymy in Wikidata. In addition, me, I will add it every times I come across homonyms in the context of my contributions and this happens regularly. Note that this new property should have a symmetric constraint, and maybe an item-requires-statement constraint with different from (P1889). Christian Ferrer (talk) 21:03, 2 December 2024 (UTC)
- Also look at Wikispecies:Category:Taxon disambiguation pages to have a idea of the potential use of such property. Christian Ferrer (talk) 21:24, 2 December 2024 (UTC)
Discussion
edit- Support. Here are 142 statements where the proposed property would aid in the depopulation of the to-be-deprecated of (P642). I would suggest, rather than an item-requires-statement constraint, that the proposed property simply be a child property of different from (P1889). Swpb (talk) 21:20, 2 December 2024 (UTC)
- Support I support this but wish to draw attention to an issue that may need to be addressed here also. This is the issue of cross Code pauedo homonyms. Although no two names within a code can be identical you however have a plant and an animal for example with identical names. There are many of these in existence. In that situation both names are valid and hence there can be no supression of one over the other. However they should still be flagged for disambiguation peurposes. I would also suggest that any declaration of a name as a junior homonym needs to be supported by the primary literature as there are names where the senior Homonym is not the accepted name (for example Testudo terrestris where the senior homonym is actually the oldest name for the Mata mata but its junior homonym is considered valid). Hence the exact understanding of how the nomenclature was sorted needs to be considered. Cheers Scott Thomson (Faendalimas) talk 09:56, 3 December 2024 (UTC)
- This property is indeed for disambiguation purposes, not at all intended to highlight the potential validity of those names nor to say if the names depend on the same code. That being said one can very well consider to use a qualifier such as object of statement has role (P3831) added to this new property for that kind of purpose, e.g. Nectria (Q18616886) homonym of Nectria (Q2708290) object of statement has role (P3831) hemihomonym (Q36033662), all symmetrically. Christian Ferrer (talk) 11:10, 3 December 2024 (UTC)
- The label should be "homonymous taxon" since the current one is too ambigous.--GZWDer (talk) 11:38, 3 December 2024 (UTC)
- Well this makes sences, this will avoid that the property to be wrongly used, thanks you. Christian Ferrer (talk) 14:10, 3 December 2024 (UTC)
- Update as per suggestion above the label is changed from "homonym of" to "homonymous taxon", if the new label is not satisfactory we can also use something like "homonym (biology)". Christian Ferrer (talk) 14:10, 3 December 2024 (UTC)
- Update I added at top a list of Property constraints that I think could be useful. Christian Ferrer (talk) 15:55, 3 December 2024 (UTC)
taxon known by this common name
editDescription | taxon item of which this common name refers |
---|---|
Represents | organisms known by a particular common name (Q55983715) |
Data type | Item |
Example 1 | bird of prey (Q48428)→Accipitriformes (Q21736) |
Example 2 | bat (Q115690288)→Chiroptera (Q28425) |
Example 3 | mouse (Q2751034)→Muridae (Q25916)
|
Planned use | replacement of of (P642) stored values in the results of this query |
Wikidata project | WikiProject Taxonomy (Q8503033) |
Property constraints
edit- subject type constraint (Q21503250)
- value-type constraint (Q21510865)
- property scope constraint (Q53869507)
Motivation
editProposal following this discussion. The new property will allow to store the values curently stored with of (P642) within all the instances of organisms known by a particular common name (Q55983715). Christian Ferrer (talk) 13:33, 4 December 2024 (UTC)
Discussion
edit- Support Swpb (talk) 15:09, 4 December 2024 (UTC)
Biochemistry/molecular biology
edit- Please visit Wikidata:WikiProject Molecular biology for more information. To notify participants use {{Ping project|Molecular biology}}
Chemistry
edit- Please visit Wikidata:WikiProject Chemistry for more information. To notify participants use {{Ping project|Chemistry}}
molecular formula
editDescription | Description of chemical compound giving element symbols and counts |
---|---|
Represents | molecular formula (Q188009) |
Data type | Item |
Domain | type of chemical entity (Q113145171) group of stereoisomers (Q59199015) |
Allowed values | molecular formula (Q188009) |
Example 1 | 2-hydroxy-5-octanoylbenzoic acid (Q209407)→C₁₅H₂₀O₄ (Q129998552) |
Example 2 | abscisic acid (Q332211)→C₁₅H₂₀O₄ (Q129998552) |
Example 3 | Santonic acid (Q7420590)→C₁₅H₂₀O₄ (Q129998552) |
Example 4 | silver bicarbonate (Q27260276)→CHAgO₃ (Q130044611) |
Expected completeness | always incomplete (Q21873886) |
See also | chemical formula (P274) |
Wikidata project | WikiProject Chemistry (Q8487234) |
Notified participants of WikiProject Chemistry
Motivation
editThis proposal addresses the need for improved data structure and maintenance within Wikidata’s chemical compound data. Currently, the Wikidata:WikiProject Chemistry manages approximately 1 million chemical items, with many of them linked to chemical formula (P274) and mass (P2067). The main issues are:
Redundancy in Data: With about 300,000 unique chemical formula strings in use, redundancy is a significant problem. Some strings are associated with over 1,000 items, which complicates data management (see https://w.wiki/B2ax).
Efficiency and Maintenance: Transitioning from string-based formulas to item-based ones will simplify maintenance, reduce redundancy, and optimize query performance, especially for SPARQL queries involving formulas or masses.
Data Optimization: Moving mass (P2067) statements to the newly created formula items will reduce the number of triples and make data management more efficient. Additionally, this change will facilitate the use of different units for masses and allow for better structured data.
Improved Modeling: Switching to item-based formulas could eliminate the need for overly complex has part(s) (P527) statements on chemicals, allowing cleaner, more precise data models (e.g., identifying all chemical formulas containing more than five oxygen atoms).
This change is expected to bring numerous benefits, including reduced redundancy, improved query efficiency, and better data maintenance. The potential downside of increased label editing can be managed, and the overall gain for Wikidata’s chemical data justifies this proposal. If approved, I am prepared to create the necessary items and migrate existing data.
Any further input to refine this proposal is more than welcome!
P.S.: I have no strong opinions if current chemical formula (P274) should be deleted or used on the new items as "Chemical Formula String" – The preceding unsigned comment was added by AdrianoRutz (talk • contribs) at 15:00, August 28, 2024 (UTC).
discussion
edit- Support sounds great! Egon Willighagen (talk) 15:25, 28 August 2024 (UTC)
- Comment Last night on the boat between Finland and Sweden I thought of another aspect where this would help model the chemistry in Wikidata better. If chemical formula are items (and thanks to GZWDer for showing various Wikipedias decided it was useful too), then they can also subclass each other. We can have an isotope-agnostic chemical formula ( the common case) and subclasses for chemical formula with isotopes.As such it does much more than being something technical (e.g. just about scalability) but actually improve how we talk about the chemistry. Egon Willighagen (talk) 07:07, 29 August 2024 (UTC)
- Some comments:
- I will oppose "Additionally, this change will facilitate the use of different units for masses and allow for better structured data." - For consistency and machine-readability we should stick to one unit. I instead propose Wikidata:Property proposal/formula weight.
- Many wikis has pages like C15H20O4 (Q1250089). Some wikis treat it as disambiguation pages; some as set indices; we need to discuss how to handle such existing items. GZWDer (talk) 21:10, 28 August 2024 (UTC)
- I looked at the English Wikipedia sitelink-ed page, and that actually looks exactly like a page about a chemical formula. To be honest, this actually sounds like in argument in favor of this proposal and that C15H20O4 (Q1250089) should be of type chemical formula (Q83147). The same for the French WP page, and neither say they are disambiguation pages, but are far more like a category of things with the same property. Just like this proposal, not? Egon Willighagen (talk) 06:58, 29 August 2024 (UTC)
- I was only partially able to follow your mind here. In your proposal, you mention this property if created, thus you would support it? I believe the discussion about mass (P2067) (and units) or other properties is an interesting one this proposal would allow to better discuss/implement, and what I mentioned about these or what is currently on the example item are just ideas, if this new property allows for these things to also improve, even better! AdrianoRutz (talk) 08:51, 30 August 2024 (UTC)
- Weak oppose I cannot question arguments raised here about efficiency, but I don't see this as a proper way forward. This proposal completely fails to take into account the fact that for a given chemical entity there may be many – equally correct – chemical formulae (simple example in Q27260276#P274). Moving chemical formulae to another item will not help at all with the most important purpose for which WD exists – using this data. I would see the new property as being created only to assist with specific activities – but not to replace existing properties – and with appropriate disclaimers in the name and constraints that it is a strictly technical property only. Wostr (talk) 22:21, 28 August 2024 (UTC)
- I think this proposal has no problems with alternative formula notations, e.g. like CHAgO₃ (Q130044611). Or? Egon Willighagen (talk) 06:51, 29 August 2024 (UTC)
- CHAgO₃ and AgHCO₃ are not the same chemical formula. Just as e.g. XeF4O and XeOF4 which would require two different items for the same compound. In fact, for some compounds several new items would need to be created. For some chemical species we would have formulae that have different number of atoms of elements: C30H40F2N8O9, C15H17FN4O3·1,5H2O and C30H34F2N8O6·3H2O are correct formulae for the same compound, but I don't see a way for this to be reflected correctly by the current proposal. Everything looks fine if you consider only simple organic compounds and their formulae in Hill notation, but it's not that simple especially if we consider some inorganic compounds which are not molecules. Wostr (talk) 12:34, 29 August 2024 (UTC)
- Thank you for this important point! I removed the single value constraint, thus allowing for what you mention. AdrianoRutz (talk) 08:47, 30 August 2024 (UTC)
- Good point about non-molecular substances. I think the chemical concept we are trying to capture is that of isomerism: chemical entities are isomers when they have the same molecular formula (Q188009) or (non-structural) formula unit (Q1437643), enabling one molecule/ion/unit of the first chemical entity to be rearranged into one molecule/ion/unit of the second chemical entity by moving atoms/bonds around.
- For example, the ionic compounds with structural formulas [CrCl(H₂O)₅]Cl₂•H₂O and [Cr(H₂O)₆]Cl₃ are (hydration) isomers, which we can recognise by assigning them the same formula H₁₂Cl₃CrO₆. This shows that all species in the crystal lattice of a compound should be combined together into a single entity when determining the formula. In the example you give above, the correct formula would be C₃₀H₄₀F₂N₈O₉, derived from combining together 2C₁₅H₁₇FN₄O₃•3H₂O, the smallest formula unit with integer multiples of all species.
- Likewise, the molecular substance CO(NH₂)₂ and ionic compound NH₄OCN are considered isomers, which we can recognise by assigning them the same formula CH₄N₂O. This is the molecular formula of urea and the formula unit of ammonium cyanate, showing how molecular and non-molecular substances can be isomeric.
- For ions, fulminate(1−) (Q27110286) (with structural formula CNO-) and cyanate anion (Q55503523) (with structural formula OCN-) are isomers, which we can recognise by assigning them the same formula CNO-.
- Clathrates are similar to coordination compounds. E.g. methane clathrate (Q389036) has structural formula 4CH₄•23H₂O, yielding the formula C₄H₆₂O₂₃. Likewise, the endohedral fullerene CH₄@C₆₀ should have formula C₆₁H₄.
- Compounds should not usually map to multiple formulas: if C links to two different formulas, one the same as A (from reference 1) and one the same as B (from reference 2), this implies C is isomeric with A, and C is isomeric with B, but A is not isomeric with B. This only makes sense if 1 and 2 disagree as to what the correct formula of C ought to be.
- When references disagree, we may need to support multiple formulas. Historically, w:en:copper monosulfide was thought to have structure [Cu2+][S2-], corresponding to the formula CuS. It has now been assigned the structure [Cu+]₃[S2-][S₂-], which would correspond to Cu₃S₃. However, PubChem still has the old formula. We might want to update Wikidata to the new formula while also keeping the PubChem-referenced formula (with a note that it's not the correct formula).
- Non-stoichiometric compounds, alloys, and mixtures of indeterminate composition are more complicated to support. E.g. pyrrhotite (Q421944) has formula Fe1-xS (x = 0 to 0.125). Rather than trying to support formula units with atom counts that are algebraic expressions (e.g. 1 - x), I think it would be easier if we could list the formulas of the endpoints: Fe₇S₈ and FeS. Similarly, superconducting yttrium barium copper oxide (Q414015) has formula YBa2Cu3O7−x (x = 0 to 0.65), with endpoint formulas YBa2Cu3O6.35 (i.e. Y20Ba40Cu60O127) and YBa2Cu3O7. I think it's hard to come up with a perfect solution though. InChI (P234) has similar issues for non-stoichiometric compounds: https://doi.org/10.1186/s13321-015-0068-4#Sec45.
- Preimage (talk) 17:47, 31 August 2024 (UTC)
- CHAgO₃ and AgHCO₃ are not the same chemical formula. Just as e.g. XeF4O and XeOF4 which would require two different items for the same compound. In fact, for some compounds several new items would need to be created. For some chemical species we would have formulae that have different number of atoms of elements: C30H40F2N8O9, C15H17FN4O3·1,5H2O and C30H34F2N8O6·3H2O are correct formulae for the same compound, but I don't see a way for this to be reflected correctly by the current proposal. Everything looks fine if you consider only simple organic compounds and their formulae in Hill notation, but it's not that simple especially if we consider some inorganic compounds which are not molecules. Wostr (talk) 12:34, 29 August 2024 (UTC)
- I think this proposal has no problems with alternative formula notations, e.g. like CHAgO₃ (Q130044611). Or? Egon Willighagen (talk) 06:51, 29 August 2024 (UTC)
- Support I also see more benefits than downsides. Support. Wostr I am not sure to understand how this would be a problem even for entities which could be described using different MF sequences of atoms like Q27260276#P274. Indeed the has part(s) (P527) and quantity (P1114) of the MF entity, see C₁₅H₂₀O₄ (Q129998552) would allow to efficiently retrieve such compounds represented in different MF notation systems. What would exactly be the inconvenient in this particular case? GrndStt (talk) 06:22, 29 August 2024 (UTC)
- Support, conditional on change of representation to molecular formula (Q188009). As noted in w:en:chemical formula#Types, chemical formula (Q83147) has four separate meanings: empirical formula (e.g. formaldehyde and glucose both have empirical formula CH₂O), molecular formula (e.g. urea and ammonium cyanate both have molecular formula CH₄N₂O in Hill notation, indicating they are isomers), structural formula (a graphical representation of the structure, not so relevant here), and condensed (or semi-structural) formula (e.g. urea has condensed formula CO(NH₂)₂ whereas ammonium cyanate has condensed formula [NH₄][OCN]). Molecular formulas "indicate the simple numbers of each type of atom in a molecule, with no information on structure", which is what we need for mass calculations. They also avoid the issue raised by Wostr regarding non-uniqueness of chemical formulas (e.g. NH₄NO₃ and H₄N₂O₃ are both valid formulas for ammonium nitrate), as each chemical should have a single canonical molecular formula in Hill notation (with the exception of rare cases where there is disagreement regarding structure, e.g. w:en:copper monosulfide). One last potential issue: molecular formulas are often defined as not including isotopes, e.g. PubChem lists both deuterated chloroform and chloroform as having molecular formula CHCl₃. Egon Willighagen's suggestion to have a subclass of [molecular] formulas with isotopic information would resolve this issue though, I think. Preimage (talk) 12:22, 29 August 2024 (UTC)
- Just revised the naming to change to molecular formula (Q188009), as suggested. 👍🏼 AdrianoRutz (talk) 07:16, 24 September 2024 (UTC)
- Oppose A chemical formula is an abstract entity and not one that has a mass.
- It's worth noting that unicode can't capture all chemical formula and Mathematical expression could express more. ChristianKl ❪✉❫ 16:29, 25 September 2024 (UTC)
- You're wrong about that. Each chemical formula has a defined number of atoms of a defined number of elements. Although each element has multiple isotopes, for every element with stable isotopes there is a standard mass associated with it which is the atomic weight which will be found with a typical sample. So the molecular weight of a particular chemical formula very much can be expressed. David Newton (talk) 09:58, 27 September 2024 (UTC)
- Currently, in Wikidata a chemical formula is a notation. Notations don't have inherent mass. The NCI description of what a chemical formula happens to be is "representation of a substance using symbols for its constituent elements". It's not the object that it's describing. While the object that a formula is describing can have mass the formula itself doesn't. It's a Document in NCI's ontology. In PROCO it's a quality and also not something that has mass. material entity (Q53617407) have mass and molecular formula (Q188009) isn't. ChristianKl ❪✉❫ 12:47, 9 October 2024 (UTC)
- You're wrong about that. Each chemical formula has a defined number of atoms of a defined number of elements. Although each element has multiple isotopes, for every element with stable isotopes there is a standard mass associated with it which is the atomic weight which will be found with a typical sample. So the molecular weight of a particular chemical formula very much can be expressed. David Newton (talk) 09:58, 27 September 2024 (UTC)
Medicine
edit- Please visit Wikidata:WikiProject Medicine for more information. To notify participants use {{Ping project|Medicine}}
Mineralogy
edit- Please visit Wikidata:WikiProject Mineralogy for more information. To notify participants use {{Ping project|Mineralogy}}
Computer science
edit- Please visit Wikidata:WikiProject Informatics for more information. To notify participants use {{Ping project|Informatics}}
Geology
editPlease visit Wikidata:WikiProject Geology for more information.
Geography
editLinguistics
editPlease visit Wikidata:WikiProject Linguistics for more information. To notify participants use {{Ping project|Linguistics}}
Mathematics
editPlease visit Wikidata:WikiProject Mathematics for more information. To notify participants use {{Ping project|Mathematics}}
Material
editPlease visit Wikidata:WikiProject Materials for more information. To notify participants use {{Ping project|Materials}}
Meteorology
editGlaciology
editAll
editreference illustration
editDescription | an illustration of this subject to provide a detailed reference for its appearance. It should be ideally tied to the primary literature on the item. |
---|---|
Represents | scientific illustration (Q63385677) |
Data type | Commons media file |
Example 1 | Gallotia simonyi (Q268724)→File:Gallotia_simonyi-female.norarte.jpg |
Example 2 | optic chiasm (Q1071710)→Gray773.png |
Example 3 | spur-thighed tortoise (Q504549)→File:Testudo_graeca_-_1700-1880_-_Print_-_Iconographia_Zoologica_-_Special_Collections_University_of_Amsterdam_-_UBA01_IZ11600011.tif |
Example 4 | goniometer (Q1126161)→File:Britannica_Goniometer_Contact.png |
Planned use | Manually fill technical/scientific illustrations for subjects of interest, in special biological taxa |
See also | image (P18), schematic (P5555) |
Motivation
editWhile we do have a property for schematics (schematic (P5555)), in several cases there are informative scientific illustrations that are not schematic, but rather detailed representations of a particular subject.
For some cases where a photo is available (e.g. spur-thighed tortoise (Q504549)), the use of image (P18) for the illustration would be disputable, justifying the need of a more specific property.
TiagoLubiana (talk) 13:40, 15 November 2024 (UTC)
Edit: besides the content in the template, it would be good to add "scientific illustration" and "technical illustration" as aliases. TiagoLubiana (talk) 17:29, 21 November 2024 (UTC)
Discussion
edit- Support - Jvcavv (talk) 21:31, 20 November 2024 (UTC)
Tend to Oppose for the following reason: simply put it into the image property. --Prototyperspective (talk) 22:46, 20 November 2024 (UTC)
- I see the point there, but I argue there are many cases where a dedicated property for a particular facet exists for image (P18), like aerial view (P8592) or image of backside (P7417).
- The illustrations add extra visual information, thus complementing P18. Overloading P18 with multiple kinds of image is usually a bad idea, because it limits reuse, say, on infoboxes or dedicated queries. TiagoLubiana (talk) 12:24, 21 November 2024 (UTC)
- You could argue the same a thousand times for each and every subtype of image.
- Having a separate property for this type of image is also a problem
- it's not showing up in queries for image
- lots of files have these illustrations in image instead (and most users will not learn there's a separate prop for this either btw)
- Also when people don't see a file in scientific illustration they may assume no such exists/is set when it's in the Images tab
- like videos these then won't be suggested for Wikipedia articles via structured tasks
- I don't think any concrete tangible need or benefit for splitting it out like that. The infobox can already show scientific illustrations by making that the primary image and when there is a photo of an animal enabling the user to easily see the illustration is not needed and more cluttering than anything else.
- Prototyperspective (talk) 16:43, 21 November 2024 (UTC)
Notified participants of WikiProject Biodiversity TiagoLubiana (talk) 12:25, 21 November 2024 (UTC)
Weak Support - Do not see the harm in it and it may be beneficial for those seeking this subset of images. Loopy30 (talk) 14:37, 21 November 2024 (UTC)
Support - I think this would be a welcome enrichment of Wikidata. However, I find "scientific" a bit to vague, which leaves room for different interpretations. How about calling this property “Reference illustration”, instead. --Andrawaag (talk) 16:48, 21 November 2024 (UTC)
- Good point! "Reference illustration" makes sense. I am okay with the name being "reference illustration" and having "technical illustration" and "scientific illustration" as aliases. Would that work? TiagoLubiana (talk) 16:52, 21 November 2024 (UTC)
Support and support of rename and aliases as just above. Ainali (talk) 17:18, 21 November 2024 (UTC)
Support I also support the rename as it makes the property of more general application. - Ambrosia10 (talk) 19:31, 21 November 2024 (UTC)
- I am somewhat neutral as it is currently defined. If your going to make a reference illustration of a scientific object such as a species for example, to be classified as a reference illustration there should be a high degree of certainty that it is what it is intended to be, not some random photo from Commons that purports to portray the species. So it should be an illustration/ photo that is tied to the primary literature on the item. Cheers Scott Thomson (Faendalimas) talk 19:39, 21 November 2024 (UTC)
- This is a good point. I think both cases are valid, but it is, indeed, not 100% clear. Would it improve the situation if I added to the description saying that the image should be ideally an "illustration that is tied to the primary literature on the item"? TiagoLubiana (talk) 00:09, 22 November 2024 (UTC)
- yes that should help and I would support it then. Cheers Scott Thomson (Faendalimas) talk 02:51, 22 November 2024 (UTC)
- I am a bit on the fence for such a long name, leaning towards not. Yes there is the issue of uncertainty with the images on commons, but don't you think that the reference and qualifier feature on statements on wikidata provides the means to say something on the provenance of the selected image or illustration. --Andrawaag (talk) 07:21, 22 November 2024 (UTC)
- To add that we might also build on the possibility to use the statements coming with SDC in commons. My preference would be to stick with the short name and then use those features in both commons and wikidata to say something about the provenance --Andrawaag (talk) Andrawaag (talk) 07:23, 22 November 2024 (UTC)
- This is a good point. I think both cases are valid, but it is, indeed, not 100% clear. Would it improve the situation if I added to the description saying that the image should be ideally an "illustration that is tied to the primary literature on the item"? TiagoLubiana (talk) 00:09, 22 November 2024 (UTC)
Support this will cover a large number of historical illustrations in commons.Lmalena (talk) 21:07, 21 November 2024 (UTC)
Support I approve of "reference illustration"; very often in entomology we have a photo of a actual insect, and a detailed idealised line drawing from the paper describing that species. They're both important forms of illustration but are not interchangeable, and being able to link both to a single wikidata taxon would be useful. If we can have nighttime view, aerial view etc in items for cities, we could have a couple of image properties for species. —Giantflightlessbirds (talk) 08:58, 24 November 2024 (UTC)
Comment I would prefer an expansion of schematic (P5555) which could have this title as alias, but also "historical image", "architectural drawing" (which would work well with image of design plans (P3311), "symbolic representation", etc. A schematic for the tortoise could be this: File:Catalogue of the fossil Reptilia and Amphibia in the British Museum (Natural history) By Richard Lydekker (1888) (20571159432).jpg. I also think for species in particular, some sort of image property qualifier for common region- or season-specific variants could be proposed.Jane023 (talk) 08:22, 22 November 2024 (UTC)
- @TiagoLubiana (and other supporters) can you comment on this suggestion? — Martin (MSGJ · talk) 12:51, 26 November 2024 (UTC)
- Thanks @Jane023 and @MSGJ. This is an interesting suggestion. The example is good for turtle, I think it stands as a good schematic on its own.
- I think it may still be useful to keep things in separate properties for the moment, though. There are some cases where line drawings are very educational e.g. on neuron --> schematic but it is not quite a technical/scientific illustration so to speak.
- Oh, and for qualifiers for common region/season specific variants, that is a great idea! There are many facets of taxa to cover too (e.g. eggs for birds, flowers / fruits / seeds for plants). It is a great topic for some modelling study/design. TiagoLubiana (talk) 13:48, 26 November 2024 (UTC)
Support "reference illustration" seems better to me than the initial proposal "scientific illustration" -- Lucas.Belo (talk) 18:07, 24 November 2024 (UTC)