Wikidata Bot This user account is a bot with a bot flag. The bot is operated by EvanProdromou.
  • Block this bot if it is malfunctioning.
  • Check its work.
  • Contact the operator about mistakes.
  • See all Requests for Permissions related to this bot: 1
  • License:

This is a soon-to-be proposed bot account for setting the UN/LOCODE property for world cities. Run by User:EvanProdromou.

It uses the UN/LOCODE data set from UNECE with the following columns:

  • "Ch" - change flag
  • "ISO 3166-1" - ISO 3166-1 2-letter country code
  • "LOCODE" - 3-character code identifying the city (or city-like feature), unique within the country
  • "Name" - Name for the entity
  • "NameWoDiacritics" - Name with only ASCII characters
  • "SubDiv" - 2- or 3-character code for the subdivision; this is the part of the ISO 3166-2 region code after the "-", not including the country code.
  • "Function"
  • "Status"
  • "Date" - date added
  • "IATA"
  • "Coordinates" - lon/lat of the center (?) of the entity, in a fixed-width format, "DDMM(N|S) DDDMM(E|W)".
  • "Remarks"

In addition, it uses a previously-extracted set of city data from the Wikiquery data service. For each ISO-3166-2 region code, it uses this query to fetch all urban settlement items in that region.

SELECT DISTINCT ?city ?cityLabel ?location WHERE {
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,fr,es,ar,ru,zh". }
 ?region wdt:P300 "%s" .
 ?city wdt:P31/wdt:P279* wd:Q124250988 .
 ?city wdt:P131* ?region .
 ?city wdt:P625 ?location .
}

To determine the LOCODE property for a city, it matches the Name or NameWoDiacritics property with the label for the city, and the ISO-3166-2 region code with the "ISO 3166-1" and "SubDiv" fields from the LOCODE data. It only does exact string match.

If there is a match, and the "Coordinates" property is set in the locode data set, it does a quality check to make sure the location of the city defined in Wikidata is less than 10km from the point defined in the locode data set.

At the end of the process, it has a set of LOCODEs for cities matched on ISO-3166-2 region code and full name, and quality-checked by location.

It then sets the LOCODE property for each of the matched cities that doesn't yet have a LOCODE property.