User:Geertivp/training/QuickStatements
With QuickStatements you can automatically edit Wikidata in batch mode.
- You can prepare a (large) transaction file using Excel, LibreOffice Calc, or OpenRefine
- Possibly from a Wikidata Query
- Then execute transactions via https://quickstatements.toolforge.org (copy/paste; format V1 or V2)
- You can create items, and amend statements
- Support for OpenRefine
Transactions
editYou can either generate transactions via:
- Wikidata Query
- OpenRefine
- CLI tools
- any tool generating a list of items, and statements (property/value pairs)
or by using other sources/tools (or manual input ⇒ but then you might use the Wikidata application directly, unless you want to create a lot new items?).
After loading the input file, you can choose between online (slow with more control), or offline (batch) mode (faster because no roundtrip delays).
Formats
editThere exist two formats:
- V1: TAB file with one statement per line
- V2: CSV file with pivot statements (qid in first column, P-numbers in the header, Q-numbers or values in the cells); the loading process translates the pivot into linear transactions, one statement after the other.
V2 string formats
editDouble quotes (when using Excel input files you must double all double quotes and add one extra " before and after the string value.
- "single double quotes" for Lxx, Dxx, and Axx
- "xx:""triple double quotes""" for language dependent properties
- """triple double quotes""" for other strings
Techniques
edit- Run a Wikidata query
- Detect and add missing data
- Detect and correct wrong data
- Detect and resolve constraints
- Generate a transaction file
- Verify the transaction file
- Import the transaction file
- Run QuickStatements (interactive or batch)
- Review errors
- Correct errors manually
- Amend any remaining problems manually
Authentication
edit- To use QuickStatements the user account needs to be autoconfirmed.
- You need WiDaR to authorize your QuickStatements session (Wikimedia account)
- Transactions are logged under your userID (username is visible in the application)
- You are responsible for any messing up, and eventual cleaning-up, of Wikidata
Attention points
editCaveat
edit- First try with one example; verify the results before executing 1000s of transactions
- You can pause, stop, and resume the script
- The order of execution in QuickStatements is extremely important since for every language the combination Qid/Lxx/Dxx must be unique
Error handling/post processing
edit- Any left-over errors/inconsistencies/conflicts you should handle manually via the interactive Wikidata editing tool (verify the history of transactions)
- If you have only a few transactions, you might use (only) the standard Wikidata edit functionality instead of the tool
- You might better use OpenRefine for better control on selective execution
Pitfalls
edit- This is a (very) dangerous tool - you are responsible to correct any errors caused by your batch transactions
- Take care; avoid mistakes; double verify your transactions
- Pay attention to proper use of Properties
- Do not create duplicate items/statements
- Activities are logged on your account
- Run one command of the list, then interrupt, verify, and resume if OK
- Be prepared for negative feedback
- When creating a new subject, have at least the "is an" and other statements; otherwise it is considered to be an empty item; risk for subsequent deletion
- When creating items, you could better do it interactively, then you could immediately amend the new Qnumbers
- Better run in batch mode: faster, and you do not have to keep your laptop connected to the network
Versions
editThere exist a new version of the tool; see Q29032512. This version is more easy to use, it allows for CSV import, and allows deleting statements by prefixing them with a "-".
Known problems
editYou can better use V2 of the applicaton instead of the obsolete V1.
For version 1:
- Click away the HOWTO to see the log file
- Use the Lxx and Dxx separately (otherwise only the first operation is executed...)
- Screen logging does not scroll down automatically -- Use Ctrl-End to see the current transaction
- Network problems could stop the processing; when the network connection is established again only process the rest of the file
For version 2:
- The labels are in English only (not translated into user language)
- TAB is not automatically converted to a comma for CSV input format (although this should be transparent) ⇒ use notepad to change TAB to comma
For both versions:
- Some edits might result in an (constraint) error
- Some manual corrections might be needed
- Wikidata Query runs on a replica of the live database, so can be a couple of minutes behind the live update of Wikidata edits/QuickStatements (to verify your resules with Wikidata Query you might wait up-to 5 minutes). Verify with "View history" to be sure.
You should see "All done!" at the end.
- Under certain circumstances LibreOffice Calc is generating “” instead of "" which is causing verbatim “” inserts in Wikidata text colums; please be very careful...
Invalid entity ID
editYou must use uppercase for the commands, otherwise you get an error:
Q17277055 Dnl "kerkgebouw in Aarlen, België" Q17277055 Lfr "Église du Sacré-Cœur d'Arlon"
Processing Q3581386 (Q3581386 dnl "kerkgebouw in Aarlen, België") ERROR (set_string) : Invalid entity ID.
Processing Q17277055 (Q17277055 lfr "Église du Sacré-Cœur d'Arlon") ERROR (set_string) : Invalid entity ID.
Duplicate Description
editThe combination Qid/Lxx/Dxx must be unique. When assigning a Description you might get a duplicate key.
Processing Q27959405 (Q27959405 Dfr "édifice religieux belge") ERROR (set_desc) : Item Q22668173 already has label "église Sint-Martinus de Zaventem" associated with language code fr, using the same description text.
- Correct one of the Labels, or Descriptions
- Merge the 2 Qid when they are linked to the same Item
Replication delay with Query
editQuery is running on a replicated database. After updating the live database with QuickStatements, it can take minutes before your updates are visible in Query.
Do not reexecute your updates => duplicate updates.
Excel drops leading zero's
editPay attention when loading a TSV or CSV file into Excel. Reinstate the leading zeros as necessary.
Geographic coordinates do not load properly
editReason?
Empty cells are not skipped
editYou must execute your transactions in separate batches... unhandy...
False errors
editYou can ignore the following error:
- No success flag set in API result
- The transaction was executed, but the status was lost. Do not re-execute the statement, because it got executed anyway.