Wikidata:WikiProject COVID-19/Registry

We need a registry for COVID-19 knowledge -- data, models, forecasts + analyses, derived data + outputs.

Motivation edit

  1. The data in COVID templates (roughly 300 in all in en:wp alone) should be in wikidata.
  2. Major datasets in use by researchers, visualizers, and modelers, should have wikidata entries for the dataset.
    cf. CORD-19 and other collections of source-material, from articles to topical databases.
  3. Derivative datasets are flourishing, and it is helpful to see what underlying data they depend on / family trees of related sources, to cite the least-remixed source where appropriate.

Of course there is overlap between these -- data used widely in COVID templates on WP is among the most widely referenced data type among both modellers and other disease researchers, and among journalists reporting on the spread of the pandemic. (However many researchers prefer a more consistent or explicit overview of how data sources are selected for a time series, and may not be using the WP tables themselves, duplicating some of the selection and source-cleaning, and correcting it over time for known or reported errors.)

The general case is: a time-series of ~daily data for {a COVID-related property, counted for a geographic region}. What's the right way to do this in Wikidata [now, in an ideal future]?

We need this for COVID, but in general this sort of registry for quick-changing data is needed for any next epidemic Sj (talk) 22:28, 10 April 2020 (UTC)[reply]

Description edit

A COVID-19 Registry would highlight and map available data about COVID research, spread, and associated needs.

Components:

  • Data hub -- a custom collection paralleling dbpedia's datahub
  • Registry of sources -- a catalog of sources for primary-source data, secondary-source refinement, reconciliation, entity resolution, and context
  • Registry of layers -- a catalog of refined + enriched feature layers for related features, combining the above w/ OpenRefine and other tools

Requirements:

  • For consistency with free-licensing and free access to data being a fundamental principle of the WMF projects, any git repositories used should be those that follow these principles; this explicitly excludes github (owned by w:Microsoft, which has a strong track record of opposing free-licensing of software, although it has evolved to some degree recently) and gitlab's server - both github and gitlab block readers in many countries ("Denies Service to Crimea, Cuba, Iran, North Korea, Sudan, Syria") - see w:Comparison of source-code-hosting facilities; we cannot allow WMF wikis to promote a stranglehold on data. The gitlab software is free-licensed and can be implemented anywhere, e.g. on WMF servers.

Related projects edit