A very simple Pywikibot script to replicate missing language labels.

Example: Select all Belgian persons (politicians, engineers, business people, ...) having missing language labels.

This was my first Python script I ever wrote.

First I implemented it as a Wikidata Script with Quickstatements. But it is much more effective with a fully automated Python script (no manual data manipulation with Excel required).



You can run this script from the shell, or from PAWS

./ [input language] [output languages]...

Simplified script


The below simplified script gives you an idea of what it does.

Note: You can consult the complete script code history, including technical details, and more documentation.


  • You can select the country, the source, and the target languages
  • I am running this on an always-on Raspberry Pi (actually a Piwikibot 🙂)
    • Low power consumption, and it is serving other functions in the home anyway...
    • My laptop does not need to stay online... saving electricity, and allows me to travel onto another network while the script runs on the Pi...

import sys
import time
import pywikibot
from datetime import datetime
from pywikibot import pagegenerators as pg

def wd_proc_all_items():
    QUERY = """# Search for Belgian/Netherlands citizen with missing en label
  VALUES ?instance { wd:Q5 }
  VALUES ?country {wd:Q31 }
  ?item wdt:P31 ?instance;
    wdt:P27 ?country;
    rdfs:label ?itemLabel.
  FILTER((LANG(?itemLabel)) = '""" + inlang + """')
    ?item rdfs:label ?label.
    FILTER((LANG(?label)) = '""" + outlang + """')
    wikidata_site = pywikibot.Site("wikidata", "wikidata")
    generator = pg.WikidataSPARQLPageGenerator(QUERY, site=wikidata_site)

    errsleep = 0
    print('Getting data')
    now =

    for item in generator:
        i += 1
        status = 'OK'
        label = ''

            if inlang in item.labels:
                label = item.labels[inlang]

            if label == '':     # Label not available; skip update
                status = 'Ignore'
            elif outlang in item.labels: # Target label already updated; skip duplicate update
                status = 'Skip'
            else:               # Update the target item label

                item.editLabels( {outlang: label}, summary="Pwb copy " + inlang + " label" )
                errsleep = 0
        except KeyboardInterrupt:
            status = 'Error'
            totsecs = int(( - now).total_seconds())       # Calculate technical error penalty
            if totsecs >= 30:   # Technical error
                errsleep += totsecs * 5
            if errsleep > 0:    # Allow the servers to catch up
                print('%d seconds maxlag wait' % errsleep)

        prevnow = now
        now =
        isotime = now.strftime("%Y-%m-%d %H:%M:%S")
        totsecs = (now - prevnow).total_seconds()
        print('%d\t%s\t%s\t%f\t%s\t%s' % (i, isotime, status, totsecs, item.getID(), label))

param = sys.argv                # Get the command parameters
if len(param) <= 2:             # Welcome the user
    print('in out...')
    param.pop(0)                # Skip the name of the executable
    inlang = param.pop(0)       # P1 = Source language (mandatory parameter)

    for outlang in param:       # Loop for all target languages (mandatory parameter)
      if inlang != outlang:     # Skip input language
        wd_proc_all_items()     # Execute all items for one language



You need to install and configure Pywikibot on a (virtual, private) Linux system, or use PAWS on a shared server.

Known problems


It is important to have a proper error handler to allow the script to recover from single transaction errors. Without proper error handler the script would fail (repeatedly) with a fatal error on the (same) first transaction in error and would not continue with the rest of the transactions.

Execute by item

  • Updates should be executed by item, instead of by language (avoid multiple watch notifications)

User errors

  • WARNING: Http response status 400
    • Syntax error in the SPARQL code
  • ERROR: An error occurred for uri ... WARNING: Waiting 240 seconds before retrying.
    • Too many items in query: add additional filters to reduce the number of items
    • You can relax the filters after data gets processed
  • WARNING: Http response status 429
    • LIMIT too high or missing

Data errors

  • WARNING: wikibase-form datatype is not supported yet.
  • WARNING: wikibase-lexeme datatype is not supported yet.
  • WARNING: API error modification-failed: Item Q682310 already has label "Gerrit Schimmelpenninck" associated with language code en, using the same description text.
    • The target label remained empty
    • Another item had the same label and description
      • Edit the 2 items to have a unique description (e.g. adding the birth/death date)
      • You should add different from (P1889) for both items
      • Possibly you might need to merge 2 identical items
  • WARNING: API error editconflict: Edit conflict. Could not patch the current revision.

Login failure

  • WARNING: API error badtoken: Invalid CSRF token: general problem with authentication (temporary)

Server errors

  • MaxlagTimeoutError: retry later (replication server busy)
  • OtherPageSaveError: ignore (the update was still made; verify the item update history)
  • ReadTimeoutError: retry later (HTTPS network error)
  • WARNING: API error failed-save: The save has failed. (general error)
  • Sleeping for 9.0 seconds, 2020-06-11 00:17:54
    • The script runs pretty slow, not to overload the servers (about 10 transactions per minute; use put_throttle = 6)
    • Set noisysleep = 60.0 to avoid too many "Sleeping" messages
    • When a transaction error occurs, the application sleep for some minutes (maxlag wait suspected)
    • Create and configure a bot account (higher transaction speed allowed)
    • You can increase the execution speed by assigning a lower value to put_throttle
  • Maximum retries attempted due to maxlag without success.
  • Maximum retries attempted without success.
  • WARNING: API error readonly: The database has been automatically locked while the replica database servers catch up to the master
  • WARNING: API error internal_api_error_JobQueueError

Network errors

  • requests.exceptions.ConnectionError: HTTPSConnectionPool(host='', port=443): Read timed out.
  • requests.exceptions.ConnectionError: HTTPSConnectionPool (network error)
  • Remote end closed connection without response


  • "Username unknown" problem with OAth and sitelinks to special namespaces

Workaround: create the missing username with e.g. wikisource:Special:UserLogin


  1. You should manually amend failed transactions
  2. Transactions that would be skipped once, due to a transient error, can be retried later.
  3. You should wait until the transactions are replicated to the SPARQL reporting instance before re-executing the script to avoid duplicate transactions

