User:John Cummings/Archive/Publishing open data

This page provides an overview of open data for data publishers including information on database rights, best practices and added data to Wikidata.

Open data definition edit

  • Open data is data that anyone can freely and easily access, use and share.
  • Linked open data is open data that is linked to other datasets.

Resources edit

About Wikidata edit

Wikidata is a multilingual free knowledge base about the world that can be read and edited by humans and machines alike. The data on Wikidata is added by a community of volunteers both manually and by using software, much like other Wikimedia projects including Wikipedia. Wikidata has millions of items, each representing things like a person, a place, an artwork, an abstract concept, or some other type of entity.

Resources edit

Database rights edit

Databases do no fall under copyright, instead they fall under sui generis database rights. A property right, comparable to but distinct from copyright, that exists to recognise the investment that is made in compiling a database, even when this does not involve the "creative" aspect that is reflected by copyright. There are different sui generis rights in different jurisdictions. Individual facts cannot be protected using database rights, it is unclear how much of a database can be copied before breaking database rights. For data to be considered open data it must allow the user to freely access, use and share the data.

Resources edit

Benefits of open data edit

Common benefits of open data include:

  • Transparency
  • Releasing social and commercial value
  • Participation and engagement

Open Knowledge International identifies open data as being a contributor to:

  • Meeting global challenges
  • Enhancing research, science, and culture
  • Strengthening citizens, democratic accountability and governance
  • Holding business accountable to consumers

Resources edit

Open data publishing best practices edit

Tim Berners-Lee, the inventor of the Web has suggested a 5-star deployment scheme for Open Data.

Number of stars Description Properties Example format

make your data available on the Web (whatever format) under an open license
  • Open license
PDF

★★

make it available as structured data (e.g., Excel instead of image scan of a table)
  • Open license
  • Machine readable
XLS

★★★

make it available in a non-proprietary open format (e.g., CSV instead of Excel)
  • Open license
  • Machine readable
  • Open format
CSV

★★★★

use URIs to denote things, so that people can point at your stuff
  • Open license
  • Machine readable
  • Open format
  • Data has URIs
RDF

★★★★★

link your data to other data to provide context
  • Open license
  • Machine readable
  • Open format
  • Data has URIs
  • Linked data
LOD


Open data producers can use Wikidata IDs as identifiers in datasets to make their data 5 star linked open data. Importing data into Wikidata makes it 5 star data. The more stars the data has the easier it will be to import it into Wikidata, the minimum required in practice is 2 stars.

Resources edit

Certification and badging edit

To help people to know that data is available in a format that makes it easy to reuse, several organisations are working on certification and badging:

Organisations working on Open Data edit

There are several organisations working on open data including:

Open data platforms edit

There are many options for publishing open data that can be categorised in two ways:

  1. Self publishing on own website
  2. Publishing to external data platforms

Software for hosting data edit

  • CKAN: a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.

External platforms edit

Organisations producing open data edit

Adding data to Wikidata edit

Once data has been published it can be added to Wikidata by: