User:John Cummings/data import guide

Data import workflow edit

To do

  • Matching Q numbers and Properties section is a mess
Database name, source and description Structure of data within Wikidata Create and import data into spreadsheet Matching Q numbers and Properties Importing Date import complete and notes
Memory of the World Outline:

Notes:

Issues:

To do:

Link:

Done:

To do:

Link:

Done:

To do:

Date complete:

Notes:

Steps to import data into Wikidata edit

  1. Database name source and description: An outline of the data including the name of the dataset, a link an online source and who created the dataset and a description of the dataset.
  2. Structure of the data within Wikidata: An outline of the structure within Wikidata, which parts of the data will be items, properties and values and if any qualifiers are needed. Also any notes or issue about the data e.g if the data is complete, if the data is related to any other datasets. A to do list for anything that needs to be done e.g propose properties
  3. Create and import data into spreadsheet: Create a spreadsheet with a link to it and import the data into the spreadsheet, with a to do list for any additional formatting that must be done
  4. Matching Q numbers and Properties:
  5. Date import complete and notes: A date that the data import into Wikidata is complete and any notes about issues or anything else people should know to improve or maintain the data.

Guide for uploading data into Wikidata edit

This guide has been created for anyone wishing to import data into Wikidata

Prerequisite skills

  • Database management or data wrangling
  • Spreadsheet or list manipulation
  • Social interaction online with Wikimedians

Steps edit

  1. Modelling the metadata in Wikidata terms
  2. Choosing data to import
  3. Chose a program to create spreadsheets on
  4. Plan out the way the data fits into Wikidata
  5. Prepare the data in a spreadsheet
  6. Importing the data into Wikidata
    1. Ask others to import the data
    2. Import the data yourself
  7. Checking the data and repairing issue

Modelling the metadata edit

The data-modelling phase is the most important and most difficult of the entire process. Identify the types of data quality problems that may occur, including but not limited to the following considerations:

  1. data type redefinitions from your definition to the Wikidata property definition (e.g. date format, alphas in dates and numbers, embedded information in codes that need redefinition in Wikidata properties, implied content from keycodes);
  2. unformatted or irregular content (e.g., multiple uses for a single field, freeform text values, corrupted data, checked data vs. unchecked data);
  3. record relationships (e.g. broken chains in set relationships such as artworks by creators whereby you have the data for the artwork but don't know the Wikidata item number for the creator, orphan records, etc.);
  4. possible invalid content (e.g. values out of defined range, code fields not on a valid list of values or lookup table, inconsistent use of default values);
  5. context changes (e.g. historic changes to parameters (creator reattributions or collection moves such as department renaming, deaccessioning or collection reassignment), items on show or offsite, synchronization issues);
  6. upload issues (e.g., variations in actual metadata from previously planned constraints of size, data type, validation rules, and relationships).

Describe the strategy to be used to ensure metadata quality before and after metadata upload to Wikidata. Describe the approach to data scrubbing and quality assessment of metadata before they are uploaded to Wikidata. Describe the manual and/or automated controls and methods to be used to validate the upload and to ensure that all metadata intended for Wikidata has been checked. Describe the process for data error detection and correction, and the (manual?) process for resolving anomalies. It's all about building trust and covering your butt in case things go wrong in a big way and the community suddenly needs to clean up tons of incorrect items (the "horror of horrors").

Choosing data to import edit

  1. Within the scope of Wikidata
  2. From a reliable source
  3. Fit within the structure of Wikidata (ask on Wikidata project chat https://www.wikidata.org/wiki/Wikidata:Project_chat)

Choosing a programme to create the spreadsheets on edit

Importing data into Wikidata is done through spreadsheets, this means you need a programme

  • Google Sheets: Free and you can collaborate together
  • LibreOffice - open source
  • OpenOffice
  • Excel: Often on Windows computers
  • Excel online thing?

Plan out the way the data fits into Wikidata edit

What will be Items, Properties, Values, References etc

Properties

  1. Search existing properties
  2. https://www.wikidata.org/wiki/Wikidata:List_of_properties/all
  3. If you can’t find an existing property
    1. Ask on Project chat https://www.wikidata.org/wiki/Wikidata:Project_chat
    2. Request new property or properties https://www.wikidata.org/wiki/Wikidata:Property_proposal (allow at least seven days for this process)

Decide on a reference or set of references for the data, preferably online

Preparing the data in spreadsheets edit

Import all the data into a spreadsheet, it is very helpful to include ID numbers if they exist.

Create 2 spreadsheets

  1. Match rows in your table to Wikidata items. Mix n’ Match matches and creates the items that will be added to a spreadsheet with the properties and values
  2. Outline the properties and values you want to import into each Wikidata item which have been matched in the previous step.

It will take time for the Mix n’ Match list to be processed by the Wikidata Community if the items can’t be auto matched

Mix n’ Match spreadsheet edit

ID number:

Name of item:

Description of item:

Type: ????

URL: A reference URL to help people understand what the item is

https://tools.wmflabs.org/mix-n-match/import.php

Quickstatements spreadsheet edit

How to structure the spreadsheet so that it's clear how the data is structured in Wikidata

Order

Qs

Ps

Values

References

Matching Wikidata item to each row in your spreadsheet

Items

Properties

Values

References

If you have more than one value in a field it must be split into separate fields

Item you are adding information to

Property

Qualifier

Value

S143 number to say its going to be a reference

Reference

https://tools.wmflabs.org/wikidata-todo/quick_statements.php

Importing the Data edit

Option 1: Request data is imported by other people edit

https://www.wikidata.org/wiki/Wikidata:Bot_requests

Option 2: Self import edit

  1. Check with Wikidata project chat https://www.wikidata.org/wiki/Wikidata:Project_chat
  2. All the tools

Mix n’ Match

Matches items in your database to Wikidata items and creates items

It also adds an ID statement

Quick Statements

Checking the data and repairing issues edit

  1. Wikidata queries
    1. Learn how to do them
    2. Request a query https://www.wikidata.org/wiki/Wikidata:Request_a_query
  2. All the tools

Finally edit

Don't forget that you can include Wikidata IDs in your database, as these organisations have done: https://www.wikidata.org/wiki/Wikidata:Wikidata_for_authority_control