Wikidata:Requests for permissions/Bot/BsivkoBot 2
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Not done @Bsivko: This request seems to be abandoned, please reopen it if that is not the case. Thanks. Mike Peel (talk) 21:25, 18 January 2022 (UTC)[reply]
BsivkoBot edit
BsivkoBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Bsivko (talk • contribs • logs)
Task/s:
- Check webcites URL (Great Russian Encyclopedia Online ID (old version) (P2924), Store norske leksikon ID (P4342), RIA Novosti reference (P6081)) and deprecate invalid claims. Usually URL's are active, but goes to empty page which dissapoint users. The reason that there significant claims which linked to nowhere. For instance, Q174044 goes to
https://bigenc.ru/text/1810933
where we have Здесь скоро появится статья which means in fact that the article is absent. Example of processing.
Code:
- I use pywikibot, and there is a piece of software which gets property, makes a request with URL, gets page text, recognize article absence and switch to deprecated if keywords of absence were found:
def url_checking(title, page):
try:
item = pywikibot.ItemPage.fromPage(page)
except pywikibot.exceptions.NoPage:
return
if item:
item.get()
else:
return
if not item.claims:
return
id_macros = "##ID##"
cfg = [
{
'property': 'P2924',
'url': 'https://bigenc.ru/text/' + id_macros,
'empty_string': 'Здесь скоро появится статья',
'message': 'Article in Great Russian Encyclopedia is absent'
},
{
'property': 'P4342',
'url': 'https://snl.no/' + id_macros,
'empty_string': 'Fant ikke artikkelen',
'message': 'Article in Store norske leksikon is absent'
},
{
'property': 'P6081',
'url': 'https://ria.ru/spravka/00000000/' + id_macros + '.html',
'empty_string': 'Такой страницы нет на ria.ru',
'message': 'Article in RIA Novosti is absent'
},
]
for single in cfg:
if single['property'] in item.claims:
for claim in item.claims[single['property']]:
rank = claim.getRank()
if rank == 'deprecated':
continue
value = claim.getTarget()
url = single['url'].replace(id_macros, value)
print("url:" + url)
r = requests.get(url=url)
print("r.status_code:" + str(r.status_code))
if r.status_code == 200:
if single['empty_string'] in r.text:
claim.changeRank('deprecated',
summary=single['message'] + " (URL: '" + url + "').")
pass
pass
Function details:
- The bot works in background with processing other articles in ruwiki. So, that doesn't have broad scan. Also, there's not so many bad URL, and therefore, the activity is low (a few contribs per month). Bsivko (talk) 12:49, 8 May 2020 (UTC)[reply]
- Why would Great Russian Encyclopedia Online ID (old version) (P2924), Store norske leksikon ID (P4342), RIA Novosti reference (P6081) become invalid? What would be the reason for deprecation? --- Jura 13:31, 16 May 2020 (UTC)[reply]
- It's a question to the websites or, to a person who added the property. I didn't investigate it and I see only facts of data inconsistency. Bsivko (talk) 10:27, 19 May 2020 (UTC)[reply]
- The identifier doesn't become invalid merely because the webpage is offline. --- Jura 10:37, 19 May 2020 (UTC)[reply]
- It's a question to the websites or, to a person who added the property. I didn't investigate it and I see only facts of data inconsistency. Bsivko (talk) 10:27, 19 May 2020 (UTC)[reply]