Wikidata:Requests for permissions/Bot/SixTwoEightBot

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.

Approved--Ymblanter (talk) 19:51, 10 October 2019 (UTC)[reply]

SixTwoEightBot

SixTwoEightBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools) Operator: SixTwoEight (talk • contribs • logs)

Task/s:

Imports all adverbs from English Wiktionary to Wikidata lexemes, except for lexemes that already exist.

Code:

https://pastebin.com/raw/w9PMbUrZ; usage instructions at top of page (note that some entity IDs used are for the test instance of Wikidata)

Function details:

For every page in Wiktionary's Category:English_adverbs, it performs a search query containing the name of the lexeme, across all English lexemes. If any search results are found, it does nothing. Otherwise, it creates a new English adverb lexeme. The newly created lexeme has one form: the adverb itself, as adverbs don't have multiple forms.

The bot has very low maxlag values set, at one second for checking if a lexeme exists on Wikidata, one second for creating items, and five seconds for searching Wiktionary in batches of 500 results at a time. It waits at least 10 seconds to retry a request if it fails. Requests to check if an item exists on Wikidata are done in parallel in batches of 500, with a 0.034 second gap between requests. Lexeme creations are done sequentially.

I chose to import adverbs, as they have no extra forms or conjugation to deal with, meaning they can be directly imported to Wikidata easily.

Questions:

Do I need permission from an admin to do a test batch of 50-250? (I have tested the bot on test.wikidata so far)

--SixTwoEight (talk) 00:17, 18 March 2019 (UTC)[reply]

No, please perform your test edits. Lymantria (talk) 06:24, 18 March 2019 (UTC)[reply]

Done See https://www.wikidata.org/wiki/Special:Contributions/SixTwoEightBot --SixTwoEight (talk) 10:43, 18 March 2019 (UTC)[reply]

Hmm, mostly this seems fine, but some of them seem a bit questionable. Do we really want initialisms like 'a.s.'? Or archaic misspellings like 'aboord'? Uncommon borrowings from other languages, like 'à outrance'? Also I didn't catch any issues with this, but how are you handling UK/US spelling differences? ArthurPSmith (talk) 14:32, 18 March 2019 (UTC)[reply]

As for US/UK words, it would create two different lexemes. For example, it would create a "colorfully" lexeme and a "colourfully" lexeme. Unfortunately no category or other programmatic way to check if a word has an alternative spelling on Wiktionary exists, so making two different lexemes is what the bot does --SixTwoEight (talk) 15:12, 18 March 2019 (UTC)[reply]

For ou versus o you can create a list of all words containing -ou- and from that create a list of all words where another word exists that contains -o- instead, but is otherwise identical. These words can then be treated separately. More generally, you could look for {{standard spelling of}} in the source code of the entries. The categories British English and American English could also be helpful. --Njardarlogar (talk) 08:12, 19 March 2019 (UTC)[reply]

I wrote a script to scan through Category:British English forms, and only four adverbs are in that category (anti-clockwise, chock-a-block, colourwise, haemodynamically), which I manually added. As for {{standard spelling of}}, I checked it's inverse, {{alternative spelling of}}. That has 141 adverbs which aren't on Wikidata. I believe that since most of those aren't different dialects of English, but just different spellings, the proper course of action is to create two lexemes, then add derived from lexeme (P5191)

on both, which I will add once all the adverbs are created. – The preceding unsigned comment was added by SixTwoEight (talk • contribs).

@ArthurPSmith, Njardarlogar: are we ready to flag this bot?--Ymblanter (talk) 15:56, 10 October 2019 (UTC)[reply]

I don't have an opinion for or against; I was just suggesting a way to treat a specific subset of lexemes earlier. --Njardarlogar (talk) 16:18, 10 October 2019 (UTC)[reply]

Sure, I'm fine with running this now. ArthurPSmith (talk) 16:26, 10 October 2019 (UTC)[reply]

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.