Wikidata:Requests for permissions/Bot/StreetmathematicianBot
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 19:35, 16 November 2021 (UTC)[reply]
StreetmathematicianBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Streetmathematician (talk • contribs • logs)
Task/s: Correct incorrect coordinates for locations in Mexico. These were imported incorrectly from INEGI's data.
Code:
Function details: Coordinates were mangled while importing data from INEGI through some unknown path going via Serbian Wikipedia (Q200386): 21°40′ became 21.40° rather than the correct 21.6666°. This led to a marked misdistribution of points in Mexico on the Wikidata maps.
There are, unfortunately, too many of these items to fix without using automation. What I've chosen to do is to use Wikibase CLI to run what's essentially a long shell script of wd
invocations. I'm not sure whether this counts as a proper "bot", but I've been advised it might and that asking permission for it is the right thing to do, so I'll call it one. Note, however, that it's only intended to run once; I understand that if it is to run again with different code or parameters, a new request will be required.
This bot checks items matching all of these criteria:
- item has an INEGI identifier
- item has a single coordinate position claim
- that claim is sourced, if at all, only through Serbian Wikipedia (Q200386)
- the claim does not match the actual INEGI data
- the claim does, approximately, match the INEGI data after the minutes-as-hundredths-of-degrees transformation (the approximation is because I don't know how sub-minute coordinates were mangled)
- item has not been edited recently
If all of the criteria are met, the bot:
- adds a new claim stating the correct coordinates from INEGI
- this claim has a single reference with two properties:
- reference URL (P854) set to the INEGI URL for displaying the relevant data
- retrieved (P813) set to 2021-11-07, when data was retrieved from INEGI
- this claim has a single reference with two properties:
- removes the existing claim
The net result of this is that the coordinates will be corrected, but only if the incorrect coordinates were previously used. Since a single coordinate claim remains, the constraints should continue to be met to the extent they previously were.
The bot uses shell scripting around the Wikibase CLI code to do its work. It is triggered manually and aborts if it encounters an error. It rate-limits itself by performing requests serially and sleeping between them.
If any modifications are required for this bot to achieve consensus, I'll be happy to look into that!
The code is essentially complete and I've modified those 74 items that had a label using it. The edited items look fine to me (but, again, if there are suggestions for improvement I'll be happy to investigate them).
A proper test run using the bot account can (and, IMHO, should) be scheduled (unless, of course, there is consensus on doing this another way, or not doing it at all).
Let me finish by pointing out that the data as it currently stands is grossly misleading; the bot will not make that data perfect, but the risks and costs should be contrasted with the significant benefit of not displaying coordinates dozens of miles from their actual locations.
Further improvements can be made using the INEGI data; in particular, it lists for each locality a name, an altitude, and describes the type of settlement. However, I initially want to focus on fixing the systematic coordinate error.
--Streetmathematician (talk) 17:30, 8 November 2021 (UTC)[reply]
- looks good to me. thanks for doing this. BrokenSegue (talk) 19:05, 8 November 2021 (UTC)[reply]
- test edits here. Does that look okay to everyone? Streetmathematician (talk) 18:51, 10 November 2021 (UTC)[reply]
- I will approve the bot in a couple of days provided no objections have been raised.--Ymblanter (talk) 19:58, 11 November 2021 (UTC)[reply]