Wikidata:Linked open data workflow

There are many considerations when contributing data, media or other assets to Wikimedia projects. This chart attempts to list the tools and scripts in the linked data workflow, which is especially useful to GLAM institutions. This is based on the data and media partnerships chart on Outreach Wiki. Public shortcut to this page: w.wiki/87CA

PREPARE and normalize source data and media RECONCILE with Wikimedia modeling and coverage INGEST data, media, and free content ANALYZE, correct, and enrich RE-USE content intra-wiki and externally REPORT and measure impact
Notes
edit

Try finding a similar project or collection set on Wikidata or Commons to see how it has been done in the past.


Ask questions at the main Project chat on Wikidata or the Village Pump on Commons.


Those donating content should ensure assets are released under a free license or that copyright has expired. An easy way to prepare images for Commons is to upload collections to Flickr and set the proper license for the images (CC0, CC-BY, CC-BY-SA). Do not use non-commercial (NC) licenses.

Wikidata uses a CC0 license: any contributed data must be dedicated as CC0 or public domain.

Notes
edit

For Wikidata, usually a "crosswalk database" is needed to map terms from the uploading data set (a CSV file or records from an API) to Wikidata terms. This can be achieved with OpenRefine, a custom mapping using Google Spreadsheets, or both.


Check to see what entities and properties already exist in Wikidata and what categories and templates are used for Commons.

Find out how items are modeled in Wikidata, in order to set the proper "instance of" (P31) and "subclass of" (P279) properties for new items.

Need case study here.

Notes
edit

Try uploading small test batches before doing large data sets.

When ingesting collection metadata and media files to Wikidata and Commons, you need a way to make sure they are correlated. Inventory or accession number (P217) is often used for objects, with a qualifier for collection (P195) and institution. A Commons best practice for filenames is to incorporate the institution/source, inventory number and possibly a descriptive title.


Putting P217 in a Wikidata item description may help distinguish item names that are very similar (eg. Untitled, or Still Life with Flowers)

Need case study here.

Notes
edit

Depending on the success of the import and uploading process, you may need to deal with duplicates or conflicts with other editors.


For Commons, you may need to move files around or add additional categories.


You may want to create special custom maintenance queries to keep track of your contributed content over time, or to keep adding properties and metadata beyond the initial contribution.

Notes
edit

Scripts and templates can generate on-wiki content such as tables and infoboxes from Wikidata.


If identifiers/authority control records have been imported, then Wikidata can act as a crosswalk database to explore mappings among many different databases.

Notes
edit

Show the impact of contributions by tracking metrics on files used or impressions over time. For partnerships, this can help validate the work being done or to encourage more collaboration.


Some tools are on-demand (GLAMorgan) and some are regularly reported based on Commons categories of GLAM institutions.


You may also want to use Wikidata Query to make some custom reports on coverage or usage.

Tools and scripts
edit

Convert PDF files to structured data. If your source data is not well formatted, try a scraping tool like [<tvar name=1">https://tabula.technology/</tvar> Tabula]

Tools and scripts
edit

OpenRefine video tutorial from GLAM WIKI 2018 conference with Sandra Fauconnier

Tools and scripts
edit

Pattypan is the most popular way to do batch media uploads using a spreadsheet to gather needed metadata for each file. Find the correct template for artwork, photos or other media and identify the proper categories for organizing files.


Quickstatements takes spreadsheet generated CSV directives to create Wikidata statements.


The Mediawiki API provides a programming environment with Python (PyWikibot or PAWS) to do advanced work.

Tools and scripts
edit

Tracking property completeness:

Wikidata:WikiProject sum of all paintings/Property statistics - User:Multichill script in Github

InteGraality - User:Jean-Frédéric script to generate custom dashboards of property coverage for a given part of Wikidata.

Properties dashboard for Metropolitan Museum of Art

Wikimedia Commons Data Roundtripping project and report

Tools and scripts
edit

Infobox tutorials:

Wikidata:Infobox_Tutorial - how to create Wikidata-powered infoboxes or other templates for Wikipedia and other projects connected to Wikidata

Wikidata-driven infoboxes on Commons categories:

Template:Wikidata Infobox - created by User:Mike Peel

Tools and scripts
edit

Wikidata Queries to show stats on Met Museum open access contributions to Wikidata:

PAWS notebook by User:Fuzheado

Case studies

edit
  • Add yours here
edit