Wikidata:Property proposal/Pango lineage code
PANGO lineage code edit
Originally proposed at Wikidata:Property proposal/Natural science
Description | identifier of a lineage of the SARS-CoV-2 virus |
---|---|
Represents | PANGO lineage (Q107054569) |
Data type | External identifier |
Domain | variant of SARS-CoV-2 (Q104450895) |
Allowed values | [A-Z]{1,2}(\.\d{1,3}){0,3} |
Example 1 | SARS-CoV-2 Alpha variant (Q104376647) → B.1.1.7 |
Example 2 | SARS-CoV-2 Beta variant (Q104400171) → B.1.351 |
Example 3 | SARS-CoV-2 Theta variant (Q106171157) → P.3 |
Example 4 | Lineage B.1.1.207 (Q106171219) → B.1.1.207 |
Source | [1], en:Variants_of_SARS-CoV-2 |
Planned use | Add the correct Pango code for all SARS-CoV-2 strains represented on Wikidata; Add the strains in the Pango lineage database that are not on Wikidata yet. |
Number of IDs in source | on the order of hundreds (increasing) |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | https://cov-lineages.org/lineages/lineage_$1.html |
Applicable "stated in"-value | PANGOLIN (Q105185169) |
Distinct-values constraint | yes |
Wikidata project | WikiProject COVID-19 (Q87748614) |
Motivation edit
SARS-CoV-2 strain information is getting a lot of attention recently, and there are many Wikipedia pages and Wikidata items about the strains. This property helps to organize that information by linking the strains on the Wikimedia system to a major external identifier resource. TiagoLubiana (talk) 20:19, 17 March 2021 (UTC)
Discussion edit
- Notified participants of WikiProject COVID-19 TiagoLubiana (talk) 20:19, 17 March 2021 (UTC)
- WikiProject Taxonomy has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. TiagoLubiana (talk) 20:19, 17 March 2021 (UTC)
- WikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. TiagoLubiana (talk) 20:19, 17 March 2021 (UTC)
Oppose: Advertising. --Succu (talk) 20:42, 17 March 2021 (UTC)
- Comment:@Succu: False claim, not advertising. It is one naming system that is being widely adopted, even on Wikipedia articles across projects. It is part of the sum of human knowledge. TiagoLubiana (talk) 12:30, 19 March 2021 (UTC)
- @Succu: can you substantiate that? --Egon Willighagen (talk) 12:40, 19 March 2021 (UTC)
- @Egon Willighagen: Maybe "Advertising" is a little bit harsh, but we are talking about a dynamic nomenclature proposal used in a single database and not reused in others. WD would be the first.
- Above is stated "Add the strains in the Pango lineage database that are not on Wikidata yet". Hm, it's a dynamic nomenclature proposal. The lineage "P.1" is an alias for the lineage "B.1.1.28.1" There are a lot of withdrawn lineages e.g. "B.1.5". So how should this dynamic handled here, TiagoLubiana? --Succu (talk) 19:35, 19 March 2021 (UTC)
- @Succu: Thank you very much for the clarification. Good point, the dynamic nomenclature will be tricky to handle. We will have to adapt the statements (ranks and values) to follow the development of the nomenclature to fit the standard as well as possible. As they have aliases, it will not be possible to have a cardinality restriction, though. I agree that it is not the best standard possible, but the current situation is that different standards are kept as labels in the items, what is suboptimal (and makes it hard even to tell PANGO aliases from other/non-related names, for example). TiagoLubiana (talk) 00:17, 20 March 2021 (UTC)
- I doubt this "magic" works. So how to relate them to the more commonly used term variant of concern (Q105758262)? --Succu (talk) 20:38, 23 March 2021 (UTC)
- @Succu: Each variant of concern on Wikidata is an instance of a "variant of concern" and "variant of concern" is just a subclass of "virus strain". How do you propose PANGO codes are organized on Wikidata, if not via a specific property? Is your position that they should not be modelled at all? TiagoLubiana (talk) 23:42, 24 March 2021 (UTC)
- I doubt this "magic" works. So how to relate them to the more commonly used term variant of concern (Q105758262)? --Succu (talk) 20:38, 23 March 2021 (UTC)
- @Succu: Thank you very much for the clarification. Good point, the dynamic nomenclature will be tricky to handle. We will have to adapt the statements (ranks and values) to follow the development of the nomenclature to fit the standard as well as possible. As they have aliases, it will not be possible to have a cardinality restriction, though. I agree that it is not the best standard possible, but the current situation is that different standards are kept as labels in the items, what is suboptimal (and makes it hard even to tell PANGO aliases from other/non-related names, for example). TiagoLubiana (talk) 00:17, 20 March 2021 (UTC)
Support The codes are abundantly used in news media. --Egon Willighagen (talk) 12:40, 19 March 2021 (UTC)
Support Yes, standard IDs for strains are essential and NCBI don't systematically provide taxon IDs below species level. This strain naming system appears to have become the de facto agreed ID system. Bmeldal (talk) 15:49, 19 March 2021 (UTC)
- @Bmeldal: A "lineage (Q6553369)" is not a strain (Q855769). --Succu (talk) 21:45, 23 March 2021 (UTC)
Support as per above --Andrawaag (talk) 16:06, 19 March 2021 (UTC)
Comment @Succu: Do you still oppose this property? Do you have any alternative ways to model this on Wikidata? TiagoLubiana (talk) 16:04, 17 April 2021 (UTC)
- Yes. I doubt most of the „Pango lineages“ are notable enough to get an item. Only a handfull of them are widely known. variant of SARS-CoV-2 (Q104450895) does not really fit. A subclass of lineage (Q6553369)? --Succu (talk) 20:01, 17 April 2021 (UTC)
- @Succu: Pango lineage codes are widely used for strains, though. See https://en.wikipedia.org/wiki/Variants_of_SARS-CoV-2#Overview. Little numbers, that is true, but given that they are killing thousands of people everyday, it is important to capture that info on Wikidata. TiagoLubiana (talk) 00:46, 19 April 2021 (UTC)
- A strain and a lineage are different things. The information e.g. about B.1.617 (en:Variants of SARS-CoV-2#Lineage_B.1.617) is very limited. BTW: A coding system does not kill people. A disease does. --Succu (talk) 20:26, 19 April 2021 (UTC)
- @Succu: Pango lineage codes are widely used for strains, though. See https://en.wikipedia.org/wiki/Variants_of_SARS-CoV-2#Overview. Little numbers, that is true, but given that they are killing thousands of people everyday, it is important to capture that info on Wikidata. TiagoLubiana (talk) 00:46, 19 April 2021 (UTC)
- Support I have merged my duplicate proposal at Wikidata:Property proposal/PANGO lineage into this proposal. --Dhx1 (talk) 12:17, 2 June 2021 (UTC)