Wikidata:Requests for permissions/Bot/MajavahBot

MajavahBot edit

MajavahBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Taavi (talkcontribslogs)

Task/s: Import version and metadata information for Python libraries from PyPI.

Code: https://gitlab.wikimedia.org/toolforge-repos/majavah-bot-wikidata/-/blob/main/majavah_wd_bot/pypi_sync/main.py

Function details: For items with PyPI project (P5568) set, imports the following data from PyPI:

Additionally the PyPI project (P5568) value will be updated to the normalized name if it's not already in that form.

Taavi (talk) 19:54, 11 July 2023 (UTC)[reply]

how many statements do you think this will add? don't some packages have...lots of versions? BrokenSegue (talk) 20:05, 11 July 2023 (UTC)[reply]
Good point. There are about 200k releases it could import (for about 2k packages total, so about 90 per package on average). Taking an approach similar to github-wiki-bot and only importing that could bring it down to 75k for the last 100 (33 per package on average) or 50k for the last 50 (22 pep package on average). Taavi (talk) 20:50, 11 July 2023 (UTC)[reply]
i don't suppose major releases only is an option? BrokenSegue (talk) 20:54, 11 July 2023 (UTC)[reply]
I don't think there's a consistent enough definition for that. For example Home Assistant (Q28957018) now does year.month.patch type releases so the first digit changing isn't really meaningful.
However I can filter out all packages generated from https://github.com/vemel/mypy_boto3_builder, as those are all very similar and not intended for human use directly anywyays. That cuts the total number of versions to a third (~70k) even before doing any other per-package limits. Taavi (talk) 21:15, 11 July 2023 (UTC)[reply]
See also Wikidata:Requests for permissions/Bot/RPI2026F1Bot 5 for discussion of a previous similar task (seems not active) and Github-wiki-bot imports version data from GitHub (see e.g. history of modelscope (Q120550399)); however you should care that version numbers may be different between GitHub and PyPI.--GZWDer (talk) 11:38, 12 July 2023 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── Oh yes, the RPI2026F1Bot task looks somewhat similar. I'm aware of Github-wiki-bot, but there are quite a few PyPI projects that are not hosted on GitHub, and I think my code should be able to handle items with data from both and ensure the two bots don't start edit warring for example. Taavi (talk) 17:23, 12 July 2023 (UTC)[reply]