Wikidata:Property proposal/Pango lineage code

PANGO lineage code edit

Originally proposed at Wikidata:Property proposal/Natural science

Descriptionidentifier of a lineage of the SARS-CoV-2 virus
RepresentsPANGO lineage (Q107054569)
Data typeExternal identifier
Domainvariant of SARS-CoV-2 (Q104450895)
Allowed values[A-Z]{1,2}(\.\d{1,3}){0,3}
Example 1SARS-CoV-2 Alpha variant (Q104376647)B.1.1.7
Example 2SARS-CoV-2 Beta variant (Q104400171)B.1.351
Example 3SARS-CoV-2 Theta variant (Q106171157)P.3
Example 4Lineage B.1.1.207 (Q106171219)B.1.1.207
Source[1], en:Variants_of_SARS-CoV-2
Planned useAdd the correct Pango code for all SARS-CoV-2 strains represented on Wikidata; Add the strains in the Pango lineage database that are not on Wikidata yet.
Number of IDs in sourceon the order of hundreds (increasing)
Expected completenesseventually complete (Q21873974)
Formatter URLhttps://cov-lineages.org/lineages/lineage_$1.html
Applicable "stated in"-valuePANGOLIN (Q105185169)
Distinct-values constraintyes
Wikidata projectWikiProject COVID-19 (Q87748614)

Motivation edit

SARS-CoV-2 strain information is getting a lot of attention recently, and there are many Wikipedia pages and Wikidata items about the strains. This property helps to organize that information by linking the strains on the Wikimedia system to a major external identifier resource. TiagoLubiana (talk) 20:19, 17 March 2021 (UTC)[reply]

Discussion edit

  Oppose: Advertising. --Succu (talk) 20:42, 17 March 2021 (UTC)[reply]

  •   Comment:@Succu: False claim, not advertising. It is one naming system that is being widely adopted, even on Wikipedia articles across projects. It is part of the sum of human knowledge. TiagoLubiana (talk) 12:30, 19 March 2021 (UTC)[reply]
  • @Succu: can you substantiate that? --Egon Willighagen (talk) 12:40, 19 March 2021 (UTC)[reply]
    • @Egon Willighagen: Maybe "Advertising" is a little bit harsh, but we are talking about a dynamic nomenclature proposal used in a single database and not reused in others. WD would be the first.
    • Above is stated "Add the strains in the Pango lineage database that are not on Wikidata yet". Hm, it's a dynamic nomenclature proposal. The lineage "P.1" is an alias for the lineage "B.1.1.28.1" There are a lot of withdrawn lineages e.g. "B.1.5". So how should this dynamic handled here, TiagoLubiana? --Succu (talk) 19:35, 19 March 2021 (UTC)[reply]
      • @Succu: Thank you very much for the clarification. Good point, the dynamic nomenclature will be tricky to handle. We will have to adapt the statements (ranks and values) to follow the development of the nomenclature to fit the standard as well as possible. As they have aliases, it will not be possible to have a cardinality restriction, though. I agree that it is not the best standard possible, but the current situation is that different standards are kept as labels in the items, what is suboptimal (and makes it hard even to tell PANGO aliases from other/non-related names, for example). TiagoLubiana (talk) 00:17, 20 March 2021 (UTC)[reply]

  Support The codes are abundantly used in news media. --Egon Willighagen (talk) 12:40, 19 March 2021 (UTC)[reply]

  Support Yes, standard IDs for strains are essential and NCBI don't systematically provide taxon IDs below species level. This strain naming system appears to have become the de facto agreed ID system. Bmeldal (talk) 15:49, 19 March 2021 (UTC)[reply]

@Bmeldal: A "lineage (Q6553369)" is not a strain (Q855769). --Succu (talk) 21:45, 23 March 2021 (UTC)[reply]

  Support as per above --Andrawaag (talk) 16:06, 19 March 2021 (UTC)[reply]

  Comment @Succu: Do you still oppose this property? Do you have any alternative ways to model this on Wikidata? TiagoLubiana (talk) 16:04, 17 April 2021 (UTC)[reply]

Yes. I doubt most of the „Pango lineages“ are notable enough to get an item. Only a handfull of them are widely known. variant of SARS-CoV-2 (Q104450895) does not really fit. A subclass of lineage (Q6553369)? --Succu (talk) 20:01, 17 April 2021 (UTC)[reply]
@Succu: Pango lineage codes are widely used for strains, though. See https://en.wikipedia.org/wiki/Variants_of_SARS-CoV-2#Overview. Little numbers, that is true, but given that they are killing thousands of people everyday, it is important to capture that info on Wikidata. TiagoLubiana (talk) 00:46, 19 April 2021 (UTC)[reply]
A strain and a lineage are different things. The information e.g. about B.1.617 (en:Variants of SARS-CoV-2#Lineage_B.1.617) is very limited. BTW: A coding system does not kill people. A disease does. --Succu (talk) 20:26, 19 April 2021 (UTC)[reply]
@TiagoLubiana, Succu, Egon Willighagen, Bmeldal, Andrawaag, Dhx1:   Done PANGO lineage code (P9632) Pamputt (talk) 07:51, 5 June 2021 (UTC)[reply]