Wikidata talk:WikiProject Tabular data

Phabricator edit

COVID-19 tabular data edit

I am planning to import a COVID-19 case dataset (https://github.com/stccenter/COVID-19-Data) as tabular data on Commons, and would like some inputs here. The dataset focuses on COVID-19 data of subnational divisions across the world. Data are collected, manually checked and curated by NSF Spatiotemporal Innovation Center, which is jointly operated by George Mason, Harvard and UCSB. (COI disclosure: I am collaborating with them on this project to collect COVID-19 data, though I am not a member of the institution.) Currently there are data from about 4,500 administrative divisions. I have uploaded a few samples on Commons (for example [1], [2]), as well a few summary tables (for example [3], [4]). Any thoughts? --Stevenliuyi (talk) 21:44, 7 September 2020 (UTC)Reply

TiagoLubiana 01:35, 16 March 2020 Daniel Mietchen 01:42, 16 March 2020 (UTC)Reply
Jodi.a.schneider 02:45, 16 March 2020 (UTC)Reply
Chchowmein 02:45, 16 March 2020 (UTC)Reply
Dhx1 03:38, 16 March 2020 (UTC)Reply
Konrad Foerstner 06:02, 16 March 2020 (UTC)Reply
Netha Hussain 06:19, 16 March 2020 (UTC)Reply
Bodhisattwa 06:56, 16 March 2020 (UTC)Reply
Neo-Jay 07:04, 16 March 2020 (UTC)Reply
John Samuel 07:31, 16 March 2020 (UTC)Reply
KlaudiuMihaila 07:53, 16 March 2020 (UTC)Reply
Salgo60 09:11, 16 March 2020 (UTC)Reply
Andrawaag 10:12, 16 March 2020 (UTC)Reply
Whidou 10:16, 16 March 2020 (UTC)Reply
Blue Rasberry 15:07, 16 March 2020 (UTC)Reply
TJMSmith 16:15, 16 March 2020 (UTC)Reply
Egon Willighagen 16:49, 16 March 2020 (UTC)Reply
Nehaoua 20:32, 16 March 2020 (UTC)Reply
Andy Mabbett (UTC)
Peter Murray-Rust 00:00, 17 March 2020 (UTC)Reply
Kasyap 02:45, 17 March 2020 (UTC)Reply
Denny 16:21, 17 March 2020 (UTC)Reply
Kwj2772 16:56, 17 March 2020 (UTC)Reply
Joalpe 22:47, 17 March 2020 (UTC)Reply
Finn Årup Nielsen fnielsen) 10:59, 18 March 2020 (UTC)Reply
Skim 11:45, 18 March 2020 (UTC)Reply
SCIdude 15:15, 18 March 2020 (UTC)Reply
Evolution and evolvability 01:23, 20 March 2020 (UTC)Reply
Susanna Ånäs (Susannaanas) 07:05, 20 March 2020 (UTC)Reply
Mlemusrojas 15:30, 20 March 2020 (UTC)Reply
Yupik 20:23, 20 March 2020 (UTC)Reply
Csisc 23:05, 20 March 2020 (UTC)Reply
OAnick 10:26, 21 March 2020 (UTC)Reply
Gnoeee 12:28, 21 March 2020 (UTC)Reply
Jjkoehorst 14:27, 21 March 2020 (UTC)Reply
So9q 08:58, 22 March 2020 (UTC)Reply
Nandana 14:58, 23 March 2020 (UTC)Reply
Addshore 15:56, 23 March 2020 (UTC)Reply
Librarian lena 18:19, 24 March 2020 (UTC)Reply
Jelabra 19:19, 24 March 2020 (UTC)Reply
AlexanderPico 23:34, 27 March 2020 (UTC)Reply
Higa4 02:51, 29 March 2020 (UTC)Reply
JoranL 19:56, 29 March 2020 (UTC)Reply
Alejgh 11:04, 1 April 2020 (UTC)Reply
Will (Wiki Ed)) 17:36, 1 April 2020 (UTC)Reply
Ranjithsiji 04:47, 2 April 2020 (UTC)Reply
AntoineLogean 07:35, 2 April 2020 (UTC)Reply
Hannolans 17:22, 2 April 2020 (UTC)Reply
Farmbrough 21:15, 3 April 2020 (UTC)Reply
Ecritures 21:26, 3 April 2020 (UTC)Reply

  Notified participants of WikiProject COVID-19 --Stevenliuyi (talk) 21:56, 7 September 2020 (UTC)Reply

@Stevenliuyi:. Looks good. I see that you are using cumulative data. I mean Novembre 2020 case means cases that happen during that month + previous cases, not just cases that occurred during the month. We should either have a way to document that, or just decide that every data are should work the same way. Though using cumulative data may sound more natural for Covid, I am not sure it would be the best solution oeverall, especially for longer time series. --Zolo (talk) 12:07, 8 March 2021 (UTC)Reply

Community Wishlist Survey 2021 edit

  Notified participants of WikiProject Tabular data

I've added the following for the wishlist survey, that you may want to consider:

Jheald (talk) 19:46, 17 November 2020 (UTC)Reply

Structuring and documenting tabular data edit

If we want data to be easily usable, we should try to have predictable, documented data structures.

For tabular case data (P8204), we already have at least two different structres:

The second file has more columns, which sounds ok. But there are also columns that have the same meaning in both files, but with different names. That does not sound good.

My proposal would be:

  • Recommend starting colunn names with the Wikidata property number when possible
  • Document how the data should be structured in the relevant Wikidata property. We could start with creating a "suggested fields" property-type property that would provide guidelines about how to use tabular-data properties. For instance:

<tabular case data (P8204)> <Recommended fields> point in time (P585), number of cases (P1603), etc.

Of course the data themselves are on Commons, but there is no really relevant place for this kind of discussion on Commons, and Wikidata is the place to go for data-related issues.


Pinging users who contributed the date user:Stevenliuyi and user:Mxn. --Zolo (talk) 10:07, 7 March 2021 (UTC)Reply

@Zolo: Tabular data fields have localizable titles in addition to names, so it would be pretty reasonable to standardize on QIDs or property IDs. Not sure which is better though. Several Wikipedia templates and modules would need to be updated to recognize the new field names, and so would the scripts that keep these tables up to date. (At the moment, it looks like I'm the only one still actively updating COVID case tables via a somewhat scripted process...) – Minh Nguyễn 💬 11:36, 14 August 2021 (UTC)Reply
That said, I would caution against treating most of this tabular case data as something to be aggregated across geographies, which seems to be a motivation behind ideas about querying tabular data. Every source has different methodology, especially from one geography to another. In particular, there are different practices around retroactively updating past data, which is the main argument in favor of maintaining historical case data as tabular data instead of as Wikidata items. – Minh Nguyễn 💬 04:31, 15 August 2021 (UTC)Reply

Scope and Deletion/Undeletion discussion at Commons edit

  Notified participants of WikiProject Tabular data

Following deletion of a number of .tab data files on Commons, participants in this group may like to know that in the last couple of weeks there has been a discussion thread at Commons:Village Pump about the deletions, that has now been followed by the opening of an undeletion request for discussion. Input from here may be useful. Jheald (talk) 20:17, 31 May 2023 (UTC)Reply

Return to the project page "WikiProject Tabular data".