Wikidata:Requests for permissions/Bot/RottenBot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 10:05, 5 October 2021 (UTC)[reply]
RottenBot edit
RottenBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Notsniwiast (talk • contribs • logs)
Task/s: Add Rotten Tomatoes score (percentage and average) to movies (items with Rotten Tomatoes ID (P1258) beginning with 'm/').
Code: https://github.com/winstontsai/RottenBot
Function details: Add statements for review score (P444) with qualifiers review score by (P447) (with value Rotten Tomatoes (Q105584)) and determination method (P459) (with value Tomatometer score (Q108403393) or Rotten Tomatoes average rating (Q108403540)), along with associated reference and other qualifiers. This is related to the BRFA on Wikipedia. --Winston (talk)
- I'm having trouble testing the bot since the API is giving RottenBot an abusefilterwarning for trying to remove old Rotten Tomatoes statements as a new editor. Winston (talk) 10:13, 4 September 2021 (UTC)[reply]
- Sounds like API is doing that well. Why remove old statements? --- Jura 11:22, 4 September 2021 (UTC)[reply]
- A few movies right now already have Rotten Tomatoes score, and I felt it would be cleaner to remove those and start fresh with this bot. I can just leave those though and only add new statements.
- Current plan is to check if the most recent percent and average are up-to-date and have the same point in time (P585) value. If any of this is missing or not true, the bot will update/add statements accordingly. I'll run some test edits soon. The newly added/updated claims will have preferred rank and others will be set to normal. Winston (talk) 13:26, 4 September 2021 (UTC)[reply]
- Existing ratings shouldn't be overwritten. --- Jura 13:53, 4 September 2021 (UTC)[reply]
- Got it. The bot will only add new statements. Old statements are not touched (besides potential change of rank). Winston (talk) 13:58, 4 September 2021 (UTC)[reply]
- Ok. Thanks. Given the nature of data, I suppose we wouldn't want new ones more than once a year. BTW, I'm not sure about the rank to use. Maybe normal is sufficient. Let users select the ones they prefer. --- Jura 14:40, 6 September 2021 (UTC)[reply]
- Got it. The bot will only add new statements. Old statements are not touched (besides potential change of rank). Winston (talk) 13:58, 4 September 2021 (UTC)[reply]
- Existing ratings shouldn't be overwritten. --- Jura 13:53, 4 September 2021 (UTC)[reply]
- API wouldn't let bot edit Kick-Ass (Q2201) because of "Possible vandalism by adding badwords or similar trolling words". Winston (talk) 15:13, 4 September 2021 (UTC)[reply]
- Some test edits have been completed. Winston (talk) 15:24, 4 September 2021 (UTC)[reply]
- Accidentally added claims to A Gang Story (Q593) twice. Manually fixed. Winston (talk) 15:24, 4 September 2021 (UTC)[reply]
- Thanks for doing this. Was on my to-do list. Looks good to me. BrokenSegue (talk) 22:04, 4 September 2021 (UTC)[reply]
- Could you place the qualifiers in the order of review score by, number of ratings/reviews, point in time and determination method? @Notsniwiast:--Trade (talk) 23:54, 4 September 2021 (UTC)[reply]
- Ok, I've implemented this order. Winston (talk) 05:40, 5 September 2021 (UTC)[reply]
- When editing Wikipedia, the bot will also add missing Rotten Tomatoes ID (P1258) values or update incorrect Rotten Tomatoes ID (P1258) values that it encounters. For example, the Rotten Tomatoes ID (P1258) for Veronika Voss (Q703188) might be changed to 'm/veronika_voss'. Winston (talk) 07:31, 5 September 2021 (UTC)[reply]
- Comment I reviewed the test edits − looks good! Some comments:
- I don’t think there is absolute consensus on this, but I know some editors prefer to use Rotten Tomatoes ID (P1258) in the reference, rather than reference URL (P854). Wondering whether having both is good? ^_^
- I’m not too sure that this is correct usage of determination method (P459), but I’m not particularly well versed in that property. I would not have much better to offer anyhow besides has characteristic (P1552).
- Thanks for the work! Jean-Fred (talk) 10:11, 9 September 2021 (UTC)[reply]
- Oh, yeah, "reference url" shouldn't be used if there is an id property. Also, title, publisher, language, retrieved seem redundant. Maybe determination method should be the main value?
Also, ranking mentioned earlier is a problem as Rotten ends up being preferred over other ratings.--- Jura 12:19, 10 September 2021 (UTC)[reply]- Oh I see. I will now be following the guideline here, which has an example reference. Originally I was following the guideline for web pages.
Also, you are right about ranks, I didn't think of that. All ranks will be set to normal. Winston (talk) 14:57, 10 September 2021 (UTC)[reply]- Help:Sources#Databases itself doesn't mention the all other properties you had been using. Also, the differences to the statements here is that review score requires qualifier point in time (P585). Also, will the scores ever be add to items that don't have Rotten Tomatoes ID (P1258) with the same value? If not, I'd avoid title as well. --- Jura 05:46, 12 September 2021 (UTC)[reply]
- @Jura1 I'm confused. Could you list all the properties you think should be in the reference? Winston (talk) 15:56, 12 September 2021 (UTC)[reply]
- Like this --- Jura 09:16, 14 September 2021 (UTC)[reply]
- Also, in this particular case, there seems to be a difference between the title suggested by the url and the actual one at Rotten, so these edits would help too. --- Jura 09:27, 14 September 2021 (UTC)[reply]
- Okay, so the bot won't add title (P1476) or retrieved (P813) to the reference. Also, if the title at Rotten Tomatoes does not match the English label (edited to add: beyond just capitalization), then subject named as (P1810) will be added to the Rotten Tomatoes ID (P1258) statement and an alias will be added to the item if missing. Winston (talk) 19:17, 14 September 2021 (UTC)[reply]
- Why not use retrieved (P813)? Surely that’s helpful to know when the information was retrieved from the database? Sure, it is sort-of-redundant with point in time qualifier, but based on the discussion at Help_talk:Sources there seems to be at least some support for having complete references. Jean-Fred (talk) 11:28, 15 September 2021 (UTC)[reply]
- @Jean-Frédéric I'm new to Wikidata and I was going to follow the Help:Sources#Databases guideline, but @Jura1 seemed to greatly dislike redundancy. Personally I believe redudancy is fine and prefer complete sources. I think the source and the claim should handled be separately, i.e. there should be no such thing as redudancy between them. What I'm worried about is disagreements over source style getting in the way of this bot task. Winston (talk) 14:40, 15 September 2021 (UTC)[reply]
- It's not clear what information retrieved (P813) would provide nor what discussion on Help_talk:Sources this would related to. --- Jura 09:36, 16 September 2021 (UTC)[reply]
- retrieved (P813) would provide, like, the date at which the information was retrieved? To make the reference complete?
- I searched through the archive of Help_talk:Sources for discussions/decisions regarding the use of P813 on Databases-As-Sources − did not find any. The only mention I found (but I may not have been exhaustive in my search) argued that there was value in have "complete" references.
- @Notsniwiast: Fair concern :-) I personally do not care that much that I would block this bot task on these grounds − although I generally don’t believe my personal preferences should (because as far as I can see that’s what it is − Help:Sources is a guideline, and it’s unclear to me how much consensus lies behind it).
- Jean-Fred (talk) 12:53, 16 September 2021 (UTC)[reply]
- Why not use retrieved (P813)? Surely that’s helpful to know when the information was retrieved from the database? Sure, it is sort-of-redundant with point in time qualifier, but based on the discussion at Help_talk:Sources there seems to be at least some support for having complete references. Jean-Fred (talk) 11:28, 15 September 2021 (UTC)[reply]
- Okay, so the bot won't add title (P1476) or retrieved (P813) to the reference. Also, if the title at Rotten Tomatoes does not match the English label (edited to add: beyond just capitalization), then subject named as (P1810) will be added to the Rotten Tomatoes ID (P1258) statement and an alias will be added to the item if missing. Winston (talk) 19:17, 14 September 2021 (UTC)[reply]
- @Jura1 I'm confused. Could you list all the properties you think should be in the reference? Winston (talk) 15:56, 12 September 2021 (UTC)[reply]
- Help:Sources#Databases itself doesn't mention the all other properties you had been using. Also, the differences to the statements here is that review score requires qualifier point in time (P585). Also, will the scores ever be add to items that don't have Rotten Tomatoes ID (P1258) with the same value? If not, I'd avoid title as well. --- Jura 05:46, 12 September 2021 (UTC)[reply]
- Oh I see. I will now be following the guideline here, which has an example reference. Originally I was following the guideline for web pages.
- Oh, yeah, "reference url" shouldn't be used if there is an id property. Also, title, publisher, language, retrieved seem redundant. Maybe determination method should be the main value?
- Can there be different between the retrieval date and the date of the rating? Why did you point us to Help_talk:Sources? --- Jura 15:45, 19 September 2021 (UTC)[reply]
- Probably no difference, for sure, but that’s not the point. The point, as I said, is to have a reference complete.
- I pointed to Help_talk:Sources because it can be helpful to see what has been discussed before. In this case, I only found Help_talk:Sources#No_save_button_and_other_issues where @ArthurPSmith: opined that “In this case it sounds like retrieved (P813) isn't necessary but doesn't hurt to help fully define the reference” − hence my “there seems to be at least some support for having complete references.”
- Jean-Fred (talk) 16:22, 19 September 2021 (UTC)[reply]
- Some new test edits have been completed. Winston (talk) 20:28, 11 September 2021 (UTC)[reply]
- See comment above. --- Jura 05:46, 12 September 2021 (UTC)[reply]
- So it seems that there is general support for this bot, with some different opinions about sourcing style. As there is no right answer, I'm simply going to follow the guideline at Help:Sources#Databases since I find that adequate. So the source will have stated in (P248), Rotten Tomatoes ID (P1258), title (P1476), and retrieved (P813). Winston (talk) 18:35, 18 September 2021 (UTC)[reply]
- Support Jean-Fred (talk) 20:28, 18 September 2021 (UTC)[reply]
- Oppose the proposed redundancy given the absence of supporting discussion on Help_talk:Sources. Adding the qualifier subject named as (P1810) to the identifier statement is preferable over repeating its value in reference statements instead. --- Jura 15:51, 19 September 2021 (UTC)[reply]
- Well, Help:Sources#Databases literally says “Add the following qualifiers to your reference […] publication date (P577) − If no publication date is provided use retrieved (P813), the date when the data was taken from the database” (no exception indicated) (and has featured File:Source_example.png, which also says to use retrieved (P813))... Jean-Fred (talk) 16:22, 19 September 2021 (UTC)[reply]
- Lots of things while technically redundant to a human are still added to Wikidata, and provide value by increasing data linkage/structure, not just for humans but for bots. One example is that if a reference URL is provided as part of a source, most of the other claims in that source such as stated in (P248) and publisher (P123) are then "redundant". I think your approach may be a bit too minimalist. Winston (talk) 03:45, 20 September 2021 (UTC)[reply]
- Also, absence of supporting discussion does not mean something is incorrect or discouraged. Winston (talk) 05:29, 20 September 2021 (UTC)[reply]
- I know it's tempting, but contrary to the sample, the item already holds the identifier in its external links section and the date as qualifier. Not duplicating it avoids several 100k additional triples for each rating on every update. --- Jura 22:39, 26 September 2021 (UTC)[reply]
- There are about 70k items with a Rotten Tomatoes movie id. Of those, many won't actually have a score to be added. Finally, after an initial run, there will be many fewer triples added per update since the scores are relatively stable. The bot only adds statements if the numbers have changed.
- On another note, is the number of claims added to Wikidata even something to worry about in most cases? I thought the available storage for Wikidata is massive and even a million triples is like a drop in the bucket. Winston (talk) 23:14, 26 September 2021 (UTC)[reply]
- I hope Rotten Tomatoes wont be the only score. If too much gets added to an item, these become uneditable. We had to split country items because things got out of hand. --- Jura 10:05, 27 September 2021 (UTC)[reply]
- I know it's tempting, but contrary to the sample, the item already holds the identifier in its external links section and the date as qualifier. Not duplicating it avoids several 100k additional triples for each rating on every update. --- Jura 22:39, 26 September 2021 (UTC)[reply]
- @BrokenSegue: @Trade: @Ymblanter: Looking for some others' opinions on what should be included in the reference. Jean-Fred and I prefer more complete references and would include stated in (P248), Rotten Tomatoes ID (P1258), title (P1476), and retrieved (P813). But Jura feels that title (P1476) and retrieved (P813) are redundant and should not be included. This seems to be the only point of contention with this bot. Winston (talk) 23:15, 27 September 2021 (UTC)[reply]
- @Notsniwiast: Personally I would not want to see this bot stopped over a small dispute over reference style. @Jura1: is right that it is a little redundant but I don't see that as a large problem. Being very (maybe overly) explicit is good. If it were me I would drop title/reference url and keep retrieved. But honestly, I personally just don't care and would suggest we approve this bot and not waste a ton of time talking about it. One thing I do want clarity on is how often this bot will update scores. I would suggest having a policy of not updating more frequently than once a quarter and only updating films that have been released for a least a few weeks. BrokenSegue (talk) 13:57, 28 September 2021 (UTC)[reply]
- Agree. As said above, I think explicit is better but I don't care that much - go ahead without if that's easier. --Jean-Fred (talk) 17:33, 28 September 2021 (UTC)[reply]
- The request is only for one time addition. I don't think more frequently than once per year would be suitable. While the redundancy may seem trivial if you look at a single statement on a given item, there are just too many ratings out there and we don't want film items to become uneditable because people started adding ratings to them. For the more general problem, see phab:T282790. --- Jura 14:11, 28 September 2021 (UTC)[reply]
- I was planning on a run every couple months if I can remember to do it (so probably less frequently than once a quarter). In any case the bot does not touch items whose scores and review counts have not changed, so after an initial run most movies won't be updated again. Winston (talk) 22:06, 28 September 2021 (UTC)[reply]
- You mean, no new statements are added, given [1]? --- Jura 05:09, 30 September 2021 (UTC)[reply]
- If the bot has previously added statements to an item, it will not add new statements to that item if the score, number of reviews, and average haven't changed. Winston (talk) 05:47, 30 September 2021 (UTC)[reply]
- You mean, no new statements are added, given [1]? --- Jura 05:09, 30 September 2021 (UTC)[reply]
- @Notsniwiast: Personally I would not want to see this bot stopped over a small dispute over reference style. @Jura1: is right that it is a little redundant but I don't see that as a large problem. Being very (maybe overly) explicit is good. If it were me I would drop title/reference url and keep retrieved. But honestly, I personally just don't care and would suggest we approve this bot and not waste a ton of time talking about it. One thing I do want clarity on is how often this bot will update scores. I would suggest having a policy of not updating more frequently than once a quarter and only updating films that have been released for a least a few weeks. BrokenSegue (talk) 13:57, 28 September 2021 (UTC)[reply]
- @Lymantria: can you review and possibly flag the bot? Even if I think it's preferable that format is further improved, it's already considerably better than some other edits. --- Jura 08:14, 30 September 2021 (UTC)[reply]
- @Jura1: The bot will no longer add title (P1476) to the references. Also, sometimes when the title in Rotten Tomatoes is different than the Wikidata label, it is because the title in RT is not English and so should not be added to English aliases. So I have removed the alias adding behavior (but it will still add subject named as (P1810) to the Rotten Tomatoes ID (P1258) statement in these cases). Winston (talk) 08:31, 30 September 2021 (UTC)[reply]
- Original language film titles can actually be the English language alias. --- Jura 08:34, 30 September 2021 (UTC)[reply]
- Oh ok. Aliases will be added then! What about if the Wikidata item does not have an English label (I do not know if this might occur). Can the bot simply add the title from RT as the English label even if the title may not be English? Winston (talk) 08:56, 30 September 2021 (UTC)[reply]
- Actually, I think it is allowed because Help:Label#Items_without_pages_on_Wikimedia_sites indicates that transliterations are allowed and Rotten Tomatoes always uses Latin alphabet (I believe). Winston (talk) 09:01, 30 September 2021 (UTC)[reply]
- For films, we tried to avoid filling labels just for filling labels sake, so I'd rather not see put as label. --- Jura 09:21, 30 September 2021 (UTC)[reply]
- Oh ok. Aliases will be added then! What about if the Wikidata item does not have an English label (I do not know if this might occur). Can the bot simply add the title from RT as the English label even if the title may not be English? Winston (talk) 08:56, 30 September 2021 (UTC)[reply]
- Original language film titles can actually be the English language alias. --- Jura 08:34, 30 September 2021 (UTC)[reply]
- @Jura1: The bot will no longer add title (P1476) to the references. Also, sometimes when the title in Rotten Tomatoes is different than the Wikidata label, it is because the title in RT is not English and so should not be added to English aliases. So I have removed the alias adding behavior (but it will still add subject named as (P1810) to the Rotten Tomatoes ID (P1258) statement in these cases). Winston (talk) 08:31, 30 September 2021 (UTC)[reply]
- I notice that there seems to be sufficient support for the request now. I plan to approve the request in a couple of days, if no new objections will be raised. Lymantria (talk) 10:45, 30 September 2021 (UTC)[reply]
- @Lymantria Can I start running the bot? Winston (talk) 06:56, 5 October 2021 (UTC)[reply]