Open main menu

Wikidata:Property proposal/subscribers

2018 subscribersEdit

Return to Wikidata:Property proposal/Generic

   Under discussion
Description(qualifier only) number of subscribers in 2018
Data typeQuantity
Domainqualifier for subreddit (P3984), etc.
Allowed values>1
Allowed unitsnone
Example 1Spain (Q29) subreddit (P3984): es → 10,677
Example 2Spain (Q29) subreddit (P3984): spain → 15,414
Example 3Internet Relay Chat (Q73) subreddit (P3984): irc → 6,206
Example 4Ariana Grande (Q151892) Instagram username (P2003): arianagrande → 147,023,953
Example 5Katy Perry (Q42493) Twitter username (P2002): katyperry → 107,659,952
Number of IDs in sourceNumber of ids with number of subscribers (P3744)-qualifier: Twitter 80000, Subreddit 2500
Robot and gadget jobschange 2018 data to the above
See alsonumber of subscribers (P3744)

MotivationEdit

We probably agree that the numbers are useful, but the current qualifier number of subscribers (P3744) doesn't work that well. We haven't really come up with a good solution for this yet. Maybe the following could work: we could store previous years in separate qualifiers. Above a proposal for 2018. If we have earlier data, we should create qualifiers for the relevant years as well. Recent project chat discussion here, another approach was discussed at Number of twitter follower. (Add your motivation for this property here.) --- Jura 06:32, 23 June 2019 (UTC)

Options:

  • (A) this proposal (see above)
  • (B) overwrite number with new data
  • (C) add several statements for the same account, each with a different point in time qualifier (or retrieved date in reference): sample, sample 2
  • (D) create an item for each account and add the number of subscribers there
    • (D0) for any number of subscribers
    • (D1) if the number of subscribers exceeds a threshold (e.g. 10,000,000 subscribers). Sample: Q42493#P2002 and Q65665844
  • (E) create a property "number of <service> users" and use point in time qualifier to differentiate them ( previous proposal)
  • (F) add multiple number of subscribers (P3744) qualifiers to the same statement. This would make it difficult to link them to a date.
  • (G) develop qualifiers on qualifiers. This is currently not planned and IMHO unlikely to happen.
  • (H) store a datapage for all on Commons (not queryable, possibly not accessible even by LUA)
    • H0 annually
    • H1 by account
    • H2 by user
    • H3 periodically by service
  • (I) use number of subscribers (P3744) as property only and add data there

--- Jura 15:22, 5 July 2019 (UTC), updated 09:05, 25 July 2019 (UTC), updated 08:23, 1 October 2019 (UTC)

I crossed out the ones the I don't think are practical/currently feasible. --- Jura 04:41, 25 July 2019 (UTC)

DiscussionEdit

Sounds like the usual qualifiers of qualifiers problem. Generally we resolve this by creating an item - do we have/want items for youtube channels, twitter feeds, etc? ArthurPSmith (talk) 14:53, 24 June 2019 (UTC)
I think we don't, at least not for each that currently uses P3744 --- Jura 09:35, 25 June 2019 (UTC)
My first impulse would be that we don't want to store the data for the subscribers of the reddit about spain for every year. There's already a lot of data in the items of countries, more data increaes page loading time and few of the people who are looking at the item for the country care about the reddit followers. ChristianKl❫ 09:28, 25 June 2019 (UTC)
There are >2000 reddits with that data. I suppose we could skip the country ones. Btw, this isn't meant to be limited to reddit --- Jura 09:35, 25 June 2019 (UTC)
  Question The number of subscribers can change over a year. Will this property store the number at 01.01.2018 00:00:00 or at 31.12.2018 23:59:59 or at an arbitrary other point in the year? --Pasleim (talk) 11:30, 25 June 2019 (UTC)
The last, maybe even all three, but most reddit numbers are from May. I don't expect the numbers for previous years to change. --- Jura 11:48, 25 June 2019 (UTC)
  Oppose why not just use point in time (P585)? --DannyS712 (talk) 16:22, 28 June 2019 (UTC)
  •   Question @DannyS712: I'm not sure if I understand. Can you do a sample in a sandbox, e.g. with current and 2018 number of subscribers? --- Jura 16:25, 28 June 2019 (UTC)
@Jura1: if point in time (P585) were allowed to be used as a qualifier, then Special:Permalink/970594325 wouldn't show errors, and would work --DannyS712 (talk) 16:29, 28 June 2019 (UTC)
  • Even. I don't think it's a good idea to add P3984 (value "spain") several times. (BTW, this generates also a "distinct value" constraint violation). --- Jura 16:34, 28 June 2019 (UTC)
@Jura1: That violation was only because the value "spain" is used elsewhere - Special:Permalink/970611124 generates no such error --DannyS712 (talk) 17:12, 28 June 2019 (UTC)
Good point. I corrected my comment. The option was mentioned here, but not implemented. --- Jura 21:32, 28 June 2019 (UTC)
  Comment What is range of it? You can have different number of subscribers every year, month, day, hour and probably even minute. Eurohunter (talk) 21:34, 29 June 2019 (UTC)
  • The 2018 qualifier would be for data from January 1 to December 31, but as mentioned, I don't expect more than we already have, e.g. those from May for subreddit. --- Jura 17:21, 30 June 2019 (UTC)
  • In any case, that would prevent 2018 data from being overwritten by 2019 data [1]. --- Jura 17:05, 4 July 2019 (UTC)
  •   Comment Imho "number of subscribers" should only be used when it is accompanied by "point in time", otherwise it does not make much sense. But as soon point in time is used the number is valid. --Gereon K. (talk) 20:22, 4 July 2019 (UTC)
    • @Gereon K.: The earlier statement had the date in "retrieved [on]" [2]. Would you just overwrite all these numbers continuously? I find it a bit odd to use the qualifier "point in time" on statements where the main value is rarely subject to change. --- Jura 22:07, 4 July 2019 (UTC)
      • @Jura1: Yet the number of subscribers changes daily, so it is valid only at a certain point of time. This would be exactly the same date as "retrieved on", but "retrieved on" is meant for the source. And the source regarding the used name and the number of subscribers in this case is Twitter itself. --Gereon K. (talk) 06:12, 5 July 2019 (UTC)
        • @Gereon K.: According to your last sentence it is archived version for example in Wayback Machine, version you can access not directly Twitter. Eurohunter (talk) 15:00, 5 July 2019 (UTC)
        • I'm aware of that. As neither is optimal, I wrote this proposal. Also, above, I listed various other options that came up. --- Jura 15:17, 5 July 2019 (UTC)
  •   Comment Well, I'm already wondering why P3744 does not already have mandatory qualifier constraint (Q21510856). Since the property changes over time, a date must be mandatory (just as the fact that there is no "number of subscriber march 2018"). My choice: (F), but improved. —Eihel (talk) 19:44, 11 July 2019 (UTC)
    • @Eihel: mandatory qualifier constraint (Q21510856) wouldn't have any effect in the above use case, as here P3744 is used as a qualifier. To be sure what you have in mind, would you do a sample for "(F), but improved" in Q4115189, preferably using data that already exists on Wikidata? It seems to me that (F) is the least desirable solution. --- Jura 09:47, 13 July 2019 (UTC)
    @Jura1: Sorry for the delay, here's my thought:
     
    Effectively, the change will be on subreddit (P3984) with mandatory qualifier constraint (Q21510856) and point in time (P585). And the Items will become like this:
     
    Properties with a single use by Item (Q19474404) must be modified at the same time. Idem for all Prop. —Eihel (talk) 03:01, 24 July 2019 (UTC)
    • @Eihel: Thanks. Actually, I had seen this more as option (C) above. After thinking more about the various options, I'd move to using a combination of D1 and either C or A. --- Jura 03:34, 24 July 2019 (UTC)
    @Jura1: Well, I'm always at the same point. I do not understand more D1: if I doubt on a Property, it is not to add a multitude of Items instead. First of all, I was thinking about the economy: having P's or Q's for 2016, 2017, 2018, 2019, I think it's not very productive, not very db. I would take the problem upstream, hence my first intervention. Any property likely to change over time must be accompanied by a temporal sense. Example: We have mass (P2067), on the other hand we do not have several Properties to say that one is in kilogram (Q11570) and the other in ounce (Q48013), etc. Your proposal is "2018 subscribers", which means that there will be "2019 subscribers" and "2020 subscribers" (etc.)? Bof! If there is no date for Twitter, reddit or Instagram, you have to put it. If there is no date for any object likely to change in time… QED —Eihel (talk) 05:24, 24 July 2019 (UTC)
    Any of the options above have also disadvantages. I don't think there is a clear advantage of (C) over (A). At some point we have to opt for one or the other. BTW, I haven't added any of May 2018 subscriber data. The date is currently set with "retrieved date". --- Jura 05:31, 24 July 2019 (UTC)
    So   Oppose for the proposal —Eihel (talk) 10:30, 25 July 2019 (UTC)
  •   Comment added another option: D1. --- Jura 12:45, 13 July 2019 (UTC)
  •   Comment I made a bot request for D1 at Wikidata:Bot_requests#Monthly number of subscribers. --- Jura 14:24, 19 July 2019 (UTC)
  •   Oppose I would favour a solution involving storing all the subscriber data as tabular data on Commons, possibly with one data page per account per year (this would prevent the data pages from exceeding the page size limit). I'm surprised this hasn't been mentioned yet – Wikidata is an inefficient method of data storage compared to tabular data (measured by both time and density), and there are so many data points (more than one per second per account, if we were to gather all the data points) that it would quickly become impractical to store the data in Wikidata even if the data were only updated once a week. The newest data point would still be obtainable with Lua. There might be problems with database rights, but I don't think this would become an issue. Jc86035 (talk) 08:53, 25 July 2019 (UTC)
    • It seems that people overwrite subscriber numbers daily for some accounts (go figure why?), but I don't think anyone suggest to store all weekly values, so the suggested approaches should currently be practical and shouldn't become impractical anytime soon.
      If there would be a data page on Commons, we could obviously link that, I added it as an other option above. I don't think this would be queryable anytime soon. --- Jura 09:05, 25 July 2019 (UTC)
      • I do not see the relevance of saving this kind of data more than once a month. —Eihel (talk) 10:30, 25 July 2019 (UTC)
        • @Eihel: agree. For most site, annual should do. Thus this proposal. Anyways, I think this discussion was fairly productive. Even if this proposal finds no support, a few alternatives have been identified. --- Jura 12:15, 25 August 2019 (UTC)
          • I tell you that right now (I have to go back in history)… —Eihel (talk) 12:25, 25 August 2019 (UTC) Hello Jura, Know it: I think you're doing a great job on WD. I think it would be pretty easy to convince DannyS712 and Jc86035 because I do not understand their comment myself. For the rest, there are valid oppositions including me: I give you a valid solution where there is no need to create another property. By changing the existing properties, we arrive at the same result. Moreover, by correctly changing the property (as on my images), you have the leisure to put the number of subscriber to the month, or even the day (your initial idea). You can even ask that the country be in the constraints (ChristianKl), etc. Initially, I made a comment, because I know you have good ideas and it was ultimately that I made an objection, regretfully. In addition, if one gives a subscriber number at a time, it will be necessary to recreate a property to restore the number of subscribers at another time. Again, the idea I've provided you works ad vitam eternam — in 10 years too. Please accept my cordial greetings. —Eihel (talk) 13:11, 25 August 2019 (UTC)
        • "I do not see the relevance of saving this kind of data more than once a month" Like most things, we need to treat this on a case-by-case basis, rather then trying to encode a hard rule. Consider an account which jumps from 100 subscribers on Monday to 10,000 on Tuesday - that's worth recording. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:43, 23 September 2019 (UTC)
  •   Oppose as above. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:43, 23 September 2019 (UTC)
  •   Question Hello @Jura1:, how are you since time? Have you "taken the temperature" of the community on this proposal? According to the opinions, your proposal does not seem to lead to a Property. Open since June 23, is it possible to close it in not ready for archiving or do you have other ideas? Sincerely. —Eihel (talk) 07:45, 1 October 2019 (UTC)
    • @Eihel: Thanks for your kind remarks above. Somehow I should probably have presented it differently, as my idea was definitely not to update the data on a daily basis (I just meant to say that some do that). DannyS712's approach seem to match yours. Something in line with Jc86035's approach (H above) is Property:P4150, but it might not be efficient unless with get coherent blocks of data for a service or a user. Besides, there is no way to retrieve it through query server anytime soon. In any case, despite the views of the participants here, users seem to apply mainly for (B) possibly without much thought about what they are doing. Maybe people don't want to add the same identifier several times.
An alternative I added as (I) now would be to use the property as initially defined. In that case, adding several statements there wouldn't be much of a problem.
As we need to sort this out, maybe we come across others solutions. Until someone tries to do a large scale update, there isn't really a hurry. --- Jura 08:23, 1 October 2019 (UTC)