Wikidata:Requests for permissions/Bot/Pi bot 2
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 16:30, 23 April 2018 (UTC)[reply]
Pi bot 2 edit
Pi bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Mike Peel (talk • contribs • logs)
Task/s: Add commons sitelinks based on query-selected values of Commons category (P373)
Code: Available on BitBucket
Function details: The code runs a sparql query to select candidates, then for each returned entry it checks that there is only one value for P373, that the Wikidata item doesn't already have a Commons sitelink, that the Commons category exists and doesn't already have a sitelink, before adding the sitelink (e.g., [1]). The query is the tricky part, but @Jheald: has made various suggestions at Property_talk:P373#Any_suggestions_on_how_to_identify_values_of_P373_that_should_be_copied_to_the_commons_sitelink? that can be used there, and the idea is that we try various queries and only run them if they will have a high success rate (as JHeald has already done for artists). Thanks. --Mike Peel (talk) 21:54, 26 March 2018 (UTC)[reply]
- The query I would suggest is this one
tinyurl.com/y9c3ogcb
, adjusted for the exact form of the output that's most convenient (this one is set so you can cut and paste output straight into QuickStatements). Increase the OFFSET in steps of 10,000 from 0 to 2,000,000. User:Multichill has a variant that's a bit tighter, but this one seems to work fine. There are about 600,000 in all to add. The query seems to be picking up about 2,500 to 3,000 on each step, which is about right. (I've been using a variant today to add sitelinks for persons). There will be some that get missed on the first pass, because the SELECT is not completely deterministic. But this should get us most of the way there. Jheald (talk) 22:31, 26 March 2018 (UTC)[reply]- I've updated the query in the code, and added offset steps. Thanks. Mike Peel (talk) 23:51, 27 March 2018 (UTC)[reply]
- @Mike Peel: In the query, I suggest you lose (or comment out) the "SERVICE wikibase:label" line -- it can slow a query right down; and this one could sometimes be quite time sensitive.
- I'd also suggest checking to see whether the query fails, and perhaps re-trying it once or twice if it does -- the time taken, even for the same query, can be quite variable.
- Also a little surprised that you're not using the P373 lookup from the query -- but I suppose, safety first for a bot.
- Note that the query should already check whether either the item or the Commons category already have a sitelink -- but I suppose, again, safety first.
- Unlike your code, the query currently is okay with there being a category-item with a P373 to the same Commonscat, so long as it isn't pointed to by a P910. If you want to be stricter than this, and insist there is no other P373 at all, then the query could be simplified to require this.
- Note that in production use, you should be expecting to see about 2500 to 3500 hits per step of 10,000 -- but I presume your maxnum=10 is for supervised testing. Jheald (talk) 00:36, 28 March 2018 (UTC)[reply]
- @Jheald: The SERVICE line is gone. So far the query has not yet failed, if it does so later then presumably reducing the step size will help. I'd already written the code for the P373 value and whether the sitelink already existed, it may now be redundant but I don't think it does any harm. Note that the check for multiple P373's in the code is for multiple values within the wikidata entry, not the same P373 across multiple items. maxnum=10 is for testing only (I'll increase the number significantly for bot runs), as is the 'Save?' prompt (which I'll remove for the bot runs). Thanks. Mike Peel (talk) 00:51, 28 March 2018 (UTC)[reply]
- @Mike Peel: Oops. I'd missed the need to check that the P373 is single-valued. Oh well, not the greatest upset if it's arbitrarily chosen one or the other a couple of times.
- I tested the OFFSET right up to 2000000 this morning, and it seemed to work fine. But I think sometimes the server can be a bit busy -- or maybe sometimes the wrong tables are in cache. At any rate, with my manual query (that has a join in the initial SELECT, which makes it that much slower), there's maybe twice or three times so far that I've had to run it a second time after a timing out the first time. Jheald (talk) 01:09, 28 March 2018 (UTC)[reply]
- @Jheald: The SERVICE line is gone. So far the query has not yet failed, if it does so later then presumably reducing the step size will help. I'd already written the code for the P373 value and whether the sitelink already existed, it may now be redundant but I don't think it does any harm. Note that the check for multiple P373's in the code is for multiple values within the wikidata entry, not the same P373 across multiple items. maxnum=10 is for testing only (I'll increase the number significantly for bot runs), as is the 'Save?' prompt (which I'll remove for the bot runs). Thanks. Mike Peel (talk) 00:51, 28 March 2018 (UTC)[reply]
- I've updated the query in the code, and added offset steps. Thanks. Mike Peel (talk) 23:51, 27 March 2018 (UTC)[reply]
- Question Was ever consensus reached to add massively sitelinks to categories from non-category items? --Pasleim (talk) 08:28, 28 March 2018 (UTC)[reply]
- @Pasleim: Well 680,000 have been added organically (stats) and seem to be accepted by the community.
- A user recently tried to create a category-item here, to support a Commons category sitelink. It was rapidly deleted at Rfd and the sitelink switched to the corresponding existing article-item. At the same time the guidance at WD:N was also changed.
- There's now an RfC at WDT:N to consider that change, and the wider question of notability of subjects that only have Commons categories. So far, not a single voice in the RfC has argued to oppose article-item to Commons category sitelinks (provided there is no category item here, and no gallery item at Commons).
- It's hugely beneficial to Commons to have sitelinks -- P373s are good for WDQS queries and links from Wikipedias; but from Commons, sitelinks are more useful for interwiki, for writing templates, and for SQL queries; and because of their guaranteed 1-to-1 nature.
- It now seems established we don't want to create category-type items, just for the purpose of linking to Commons. Given 1,280,000 currently-identified possible connections (ie for Commons categories matched to article-items, that are not matched to category-items) it makes no sense to have sitelinks for 680,000 but not for the other 600,000. Much better now to systematically have sitelinks for all 1,280,000 -- for easier and more efficient templates on Commons; for systematic Wikidata-driven interwiki; for effective access to Wikidata matches from SQL; and above all, to encourage more Commonscats to be matched to Wikidata (a real priority with Structured Data on the horizon). Jheald (talk) 09:00, 28 March 2018 (UTC)[reply]
- @Pasleim: There is consensus on Commons to widely use Wikidata infoboxes in categories, and that relies on Commons category sitelinks (as arbitrary access is more computationally expensive), hence why I'm working on improving them here (also see my other bot proposals that have been doing P373+commons sitelink cleanup). There's the RfC currently running here that Jheald linked to above, and although I've seen editors here confused about commons category sitelinks there doesn't seem to be opposition to them. We could do a specific RfC on the issue here, but that seems like overkill at this point. Thanks. Mike Peel (talk) 10:32, 28 March 2018 (UTC)[reply]
- I'm not completely against adding Commons category links to article items, but it should happen in a consistent way. If I interprete the bod code right, sitelinks are only added if there is not yet a sitlelink to a Commons gallery page and no corresponding Wikipedia category exists yet. What happens if after this bot run is done, somebody creates a Commons gallery page or a corresponding Wikipedia category? Will the bot edit be undone and the sitelink to the Commons category moved from the article item to the category item? Or will the sitelink stay and thus introduce a structure which depends on the chronological order of Commons category and Wikipedia category creation? Both is not satisfying. I would like to see a plan how to consistently deal with all kinds of Commons pages independant of the pre-existance of other Commons pages or Wikpedia categories. --Pasleim (talk) 10:47, 2 April 2018 (UTC)[reply]
- @Pasleim: We already have that chronological dependency, TBH. I'm aware that there are sitelinks to categories that need to be moved to a category item (see Wikidata:Database_reports/Complex_constraint_violations/P910#Items_that_link_to_a_Commons_category), and I'll probably write some bot code to sort those out in the (nearish) future. The more tricky case is where someone creates a gallery and we don't already have a category item, in which case maybe a bot could create entries like Category:Teide Observatory (Q51274107) to handle that, but that would need a wider discussion. Personally I prefer to take things a step at a time, and view those as problems for the future rather than ones that need to be solved before the existing problem of large numbers of missing sitelinks is sorted out. Thanks. Mike Peel (talk) 11:48, 2 April 2018 (UTC)[reply]
- I'm not completely against adding Commons category links to article items, but it should happen in a consistent way. If I interprete the bod code right, sitelinks are only added if there is not yet a sitlelink to a Commons gallery page and no corresponding Wikipedia category exists yet. What happens if after this bot run is done, somebody creates a Commons gallery page or a corresponding Wikipedia category? Will the bot edit be undone and the sitelink to the Commons category moved from the article item to the category item? Or will the sitelink stay and thus introduce a structure which depends on the chronological order of Commons category and Wikipedia category creation? Both is not satisfying. I would like to see a plan how to consistently deal with all kinds of Commons pages independant of the pre-existance of other Commons pages or Wikpedia categories. --Pasleim (talk) 10:47, 2 April 2018 (UTC)[reply]
- @Pasleim: There is consensus on Commons to widely use Wikidata infoboxes in categories, and that relies on Commons category sitelinks (as arbitrary access is more computationally expensive), hence why I'm working on improving them here (also see my other bot proposals that have been doing P373+commons sitelink cleanup). There's the RfC currently running here that Jheald linked to above, and although I've seen editors here confused about commons category sitelinks there doesn't seem to be opposition to them. We could do a specific RfC on the issue here, but that seems like overkill at this point. Thanks. Mike Peel (talk) 10:32, 28 March 2018 (UTC)[reply]
- I got confused by this sitelink/P373 thing as well: c:User talk:Mike Peel#Alternative Wikidata infoboxes?. So I support this request, I see no obvious problems with it. Alexis Jazz (talk) 15:01, 5 April 2018 (UTC)[reply]
- Any more comments? Shall I do a test run, and if so, how many edits should the bot make? Thanks. Mike Peel (talk) 15:43, 11 April 2018 (UTC)[reply]
- Around 50 edits.--Ymblanter (talk) 12:43, 12 April 2018 (UTC)[reply]
- @Ymblanter: Done. Seems to have worked fine. Thanks. Mike Peel (talk) 16:45, 12 April 2018 (UTC)[reply]
- @Jheald:, @Pasleim:, any further objections?--Ymblanter (talk) 05:22, 13 April 2018 (UTC)[reply]
- @Ymblanter: It's been a week since the last comment. Any eta on this? Thanks. Mike Peel (talk) 12:17, 20 April 2018 (UTC)[reply]
- Well, I would like to see statements that they do not object running the bot anymore.--Ymblanter (talk) 12:28, 20 April 2018 (UTC)[reply]
- OK. I don't think @Jheald: was objecting (we worked together on this - he wrote the query, I wrote the bot code!), so we're just waiting for @Pasleim: then. Thanks. Mike Peel (talk) 12:48, 20 April 2018 (UTC)[reply]
- I've done about 100,000 of these with QuickStatements, a little less carefully than Mike's bot. I am all for doing this, and as soon as possible. It adds real value over at Commons -- much better interwiki links; proper underpinning of Commons infoboxes by a 1:1 wikidata link; and the ability to identify Commons categories matched to Wikidata items in Commons SQL queries. Please can the bot be allowed to get on with it. Jheald (talk) 13:55, 20 April 2018 (UTC)[reply]
- I'm still not happy at all with this request. It is not a long-term solution nor has it anything to do with structured data. But since many individual Commons user already adding wrong links and Jheald did 100,000 similar edits without adding for permission, you can now complete it and run this job. --Pasleim (talk) 06:44, 21 April 2018 (UTC)[reply]
- I've done about 100,000 of these with QuickStatements, a little less carefully than Mike's bot. I am all for doing this, and as soon as possible. It adds real value over at Commons -- much better interwiki links; proper underpinning of Commons infoboxes by a 1:1 wikidata link; and the ability to identify Commons categories matched to Wikidata items in Commons SQL queries. Please can the bot be allowed to get on with it. Jheald (talk) 13:55, 20 April 2018 (UTC)[reply]
- OK. I don't think @Jheald: was objecting (we worked together on this - he wrote the query, I wrote the bot code!), so we're just waiting for @Pasleim: then. Thanks. Mike Peel (talk) 12:48, 20 April 2018 (UTC)[reply]
- Well, I would like to see statements that they do not object running the bot anymore.--Ymblanter (talk) 12:28, 20 April 2018 (UTC)[reply]
- @Ymblanter: It's been a week since the last comment. Any eta on this? Thanks. Mike Peel (talk) 12:17, 20 April 2018 (UTC)[reply]
- @Jheald:, @Pasleim:, any further objections?--Ymblanter (talk) 05:22, 13 April 2018 (UTC)[reply]
- @Ymblanter: Done. Seems to have worked fine. Thanks. Mike Peel (talk) 16:45, 12 April 2018 (UTC)[reply]
- Around 50 edits.--Ymblanter (talk) 12:43, 12 April 2018 (UTC)[reply]
Post-approval note edit
- I recall the consensus was that non-category items shouldn't be site-linked to categories including wikimedia ones. Duplication of P373 in site-links neither brings any additional information into the database nor makes it more structured. Honestly, I fail to see any value in these edits :( Salmin (talk) 12:22, 18 May 2018 (UTC)[reply]
- @Salmin: That consensus has changed over the years, and as far as I'm aware it's no longer in place apart from in a few bits of documentation. There's a related discussion going on at Wikidata_talk:Notability#RfC:_Notability_and_Commons that you might be interested in. In terms of benefits to Wikidata, the sitelinks are automatically updated when pages on Commons move, and through separate bot tasks I've been using this + sitelinks that point towards category redirects on commons to update quite a few P373 outdated values. Most of the benefit is on Commons, though, as it means that interwiki links are finally available in a huge number of additional categories, and those categories can also now have commons:Template:Wikidata Infobox added to them to provide the category's context in multiple languages. Thanks. Mike Peel (talk) 12:46, 18 May 2018 (UTC)[reply]
- Thanks for the link, I wasn't aware of this discussion. Salmin (talk) 14:50, 18 May 2018 (UTC)[reply]
- @Salmin: That consensus has changed over the years, and as far as I'm aware it's no longer in place apart from in a few bits of documentation. There's a related discussion going on at Wikidata_talk:Notability#RfC:_Notability_and_Commons that you might be interested in. In terms of benefits to Wikidata, the sitelinks are automatically updated when pages on Commons move, and through separate bot tasks I've been using this + sitelinks that point towards category redirects on commons to update quite a few P373 outdated values. Most of the benefit is on Commons, though, as it means that interwiki links are finally available in a huge number of additional categories, and those categories can also now have commons:Template:Wikidata Infobox added to them to provide the category's context in multiple languages. Thanks. Mike Peel (talk) 12:46, 18 May 2018 (UTC)[reply]