Wikidata:Linked open data workflow
There are many considerations when contributing data, media or other assets to Wikimedia projects. This chart attempts to list the tools and scripts in the linked data workflow, which is especially useful to GLAM institutions. This is based on the data and media partnerships chart on Outreach Wiki. Public shortcut to this page: w.wiki/87CA
PREPARE and normalize source data and media | RECONCILE with Wikimedia modeling and coverage | INGEST data, media, and free content | ANALYZE, correct, and enrich | RE-USE content intra-wiki and externally | REPORT and measure impact |
---|---|---|---|---|---|
|
|
||||
| |||||
NoteseditTry finding a similar project or collection set on Wikidata or Commons to see how it has been done in the past.
Ask questions at the main Project chat on Wikidata or the Village Pump on Commons.
Wikidata uses a CC0 license: any contributed data must be dedicated as CC0 or public domain. |
NoteseditFor Wikidata, usually a "crosswalk database" is needed to map terms from the uploading data set (a CSV file or records from an API) to Wikidata terms. This can be achieved with OpenRefine, a custom mapping using Google Spreadsheets, or both.
Find out how items are modeled in Wikidata, in order to set the proper "instance of" (P31) and "subclass of" (P279) properties for new items. Need case study here. |
NoteseditTry uploading small test batches before doing large data sets. When ingesting collection metadata and media files to Wikidata and Commons, you need a way to make sure they are correlated. Inventory or accession number (P217) is often used for objects, with a qualifier for collection (P195) and institution. A Commons best practice for filenames is to incorporate the institution/source, inventory number and possibly a descriptive title.
Need case study here. |
NoteseditDepending on the success of the import and uploading process, you may need to deal with duplicates or conflicts with other editors.
|
NoteseditScripts and templates can generate on-wiki content such as tables and infoboxes from Wikidata.
|
NoteseditShow the impact of contributions by tracking metrics on files used or impressions over time. For partnerships, this can help validate the work being done or to encourage more collaboration.
Some tools are on-demand (GLAMorgan) and some are regularly reported based on Commons categories of GLAM institutions.
|
Tools and scriptseditConvert PDF files to structured data. If your source data is not well formatted, try a scraping tool like [<tvar name=1">https://tabula.technology/</tvar> Tabula] |
Tools and scriptseditOpenRefine video tutorial from GLAM WIKI 2018 conference with Sandra Fauconnier |
Tools and scriptseditPattypan is the most popular way to do batch media uploads using a spreadsheet to gather needed metadata for each file. Find the correct template for artwork, photos or other media and identify the proper categories for organizing files.
|
Tools and scriptseditTracking property completeness: Wikidata:WikiProject sum of all paintings/Property statistics - User:Multichill script in Github InteGraality - User:Jean-Frédéric script to generate custom dashboards of property coverage for a given part of Wikidata. Properties dashboard for Metropolitan Museum of Art Wikimedia Commons Data Roundtripping project and report |
Tools and scriptseditInfobox tutorials: Wikidata:Infobox_Tutorial - how to create Wikidata-powered infoboxes or other templates for Wikipedia and other projects connected to Wikidata Wikidata-driven infoboxes on Commons categories: Template:Wikidata Infobox - created by User:Mike Peel |
Tools and scriptseditWikidata Queries to show stats on Met Museum open access contributions to Wikidata: PAWS notebook by User:Fuzheado |
Case studies
edit- Add yours here
Links
edit- Data and media partnerships workflow - General considerations for data and media partnerships, including a series of tools for Wikidata and Wikimedia Commons.
- Content Partnerships Hub/Software/Tool prioritization survey end 2022
- GLAM CSI - 2024 project to analyze the toolset, workflows, and user stories of GLAM wiki community