Wikidata:Requests for permissions/Bot/Tildebot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 07:47, 4 February 2024 (UTC)[reply]
Tildebot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Mxn (talk • contribs • logs)
Task/s: Backfill AARoads Wiki article ID (P12052)
Code: https://gist.github.com/1ec5/a01abb67799b0fc2cf17e2ac320ff976
Function details: Upload about 16,000 AARoads Wiki article ID (P12052) statements based on reconciliation in OpenRefine. As the code shows, many of the items were matched automatically based on English Wikipedia article titles, since the AARoads Wiki was forked from there, but many more required matching by hand. No new items will be created automatically, since I've already created them by hand. Possible incremental updates in the future if the volume requires a bot flag (though it may be small enough to do manually using a normal account).– Minh Nguyễn 💬 03:32, 5 October 2023 (UTC)[reply]
- are you going to add references? how many of the statements did you hand validate? BrokenSegue (talk) 19:34, 5 October 2023 (UTC)[reply]
@BrokenSegue: Sure, I can add references if that would be appropriate. Should that just be stated in (P248)AARoads Wiki (Q122452106)?
OpenRefine says that it automatically matched 15,194 records for which the enwiki sitelink exactly matched the AARoads Wiki article title. I expect each of these matches to be accurate, since the AARoads Wiki only forked from the English Wikipedia a month ago or so and wouldn't have swapped article titles or anything tricky like that. A spot-check of these matches confirms that they're good matches. Most likely some of these items are already internally inconsistent due to, for example, Wikipedia editors morphing an article into a list article without updating Wikidata. I fixed several dozen of these items along the way (since I also want them to be consistent for OpenStreetMap to use), but at least the AARoads Wiki article ID (P12052) statement will be consistent with the enwiki sitelink in every case.
Following that automated match, I had it match 410 records to the best enwiki sitelink match based on string similarity and reviewed each one by hand, undoing 94 erroneous matches. Finally, I hand-matched these 94 and 280 other records. In most cases, I was able to find the correct item by searching for the item whose English label matches the AARoads Wiki article title exactly.
This process was helped by the fact that articles on U.S. and Canadian numbered roads generally have systematic names. In some cases where the AARoads Wiki has already created a new article not found on Wikipedia, or where they've split a Wikipedia article, I had to manually create an item beforehand (for example Mississippi Highway 533 (Q122939380)). Please let me know if I can provide any additional detail about this process.
– Minh Nguyễn 💬 00:57, 6 October 2023 (UTC)[reply]
- yeah that sounds good. maybe also with based on heuristic (P887) or matched by identifier from (P11797). BrokenSegue (talk) 03:19, 6 October 2023 (UTC)[reply]
- Good idea, I’ve added that to the schema I’m working with. Minh Nguyễn 💬 04:57, 6 October 2023 (UTC)[reply]
- @BrokenSegue: I've uploaded some test edits incorporating your feedback. Minh Nguyễn 💬 05:22, 7 October 2023 (UTC)[reply]
- looks good. BrokenSegue (talk) 20:55, 7 October 2023 (UTC)[reply]
- I don't think we should encourage people to add stated in (P248) unless the pages actually provide Wikidata IDs. Saying that the site says the site's ID is the site's ID is tautological. We also explicitly say on Help:Sources that external IDs like these don't need references. - Nikki (talk) 07:43, 13 October 2023 (UTC)[reply]
- Thanks @Nikki, that makes sense. Some articles like "Road map" do explicitly link to Wikidata entities, so this isn't a hypothetical distinction. I'm happy to exclude stated in (P248) but keep based on heuristic (P887) in the statements' references. Minh Nguyễn 💬 08:59, 16 October 2023 (UTC)[reply]
- I agree with Nikki. Yes, keeping the heuristic qualifier is preferred. So9q (talk) 10:43, 2 January 2024 (UTC)[reply]
- Thanks @Nikki, that makes sense. Some articles like "Road map" do explicitly link to Wikidata entities, so this isn't a hypothetical distinction. I'm happy to exclude stated in (P248) but keep based on heuristic (P887) in the statements' references. Minh Nguyễn 💬 08:59, 16 October 2023 (UTC)[reply]
- yeah that sounds good. maybe also with based on heuristic (P887) or matched by identifier from (P11797). BrokenSegue (talk) 03:19, 6 October 2023 (UTC)[reply]
- Support thanks for your clarifications and test edits 😀 So9q (talk) 10:44, 2 January 2024 (UTC)[reply]
- Support --Rschen7754 22:40, 3 February 2024 (UTC)[reply]