Open main menu

Wikidata β

Wikidata:Requests for comment/When multiple sources are cited for a fact, should IMDB be deleted as one of them when used

< Wikidata:Requests for comment
An editor has requested the community to provide input on "When multiple sources are cited for a fact, should IMDB be deleted as one of them when used" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

Again this goes back to User:Nikkimaria. Should she be removing IMDB as one of the multiple sources for a fact, I am assuming because she considers it unreliable. You can see an example here. Can we also have a ruling where she does not remove any data without gaining consensus first? She just goes from source to source as we rule on each using her own rules on what she thinks is unreliable. It creates double the work to reverse as each ruling is decided. It would be easier for us to be proactive rather than react to each set of deletions she introduces. --Richard Arthur Norton (1958- ) (talk) 20:26, 1 July 2017 (UTC)


  •   Keep We accept IMDB as a source at Wikidata and multiple sourcing is good. Deletion prevents us from doing statistics on how many records IMDB uses a different date or the same date as other sources internally. --Richard Arthur Norton (1958- ) (talk) 20:26, 1 July 2017 (UTC)
  • @Richard Arthur Norton (1958- ): You are abusing the RFC process here - it's not meant as a first stop for discussion nor for every type of edit you happen to disagree with. You are also very incorrect in your premises. "We" do not consider IMDb and other similar sources to be blanket reliable; Help:Sources/Wikidata:Verifiability do not support that judgement, and for example it appears on the blacklist of the primary sources tool. Further, if there is a reliable source available for a fact, there is no reason to also use an unreliable source - that's why eg. replacing Wikipedia "imported from" statements with reliable sources is A Good Thing. Nikkimaria (talk) 22:12, 1 July 2017 (UTC)
    • It isn't abuse, it is where you go to get community consensus on rules. The answer at Findagrave was NOT to be removing the references. I am not sure why you interpret that to mean you should move onto another source and do the same thing. We have to address these one at a time because you keep moving on to new sources you find unreliable. I hope this RFC will address your behavior so I do not have to keep writing new RFCs. If you can explain to us what the "primary sources tool" function is, and why certain sites were blacklisted on it, that would be great. Otherwise you just searched for "blacklist" and "IMDB" appearing together. If you can point me to a discussion that demands we remove certain sources, I will gladly help you delete them. --Richard Arthur Norton (1958- ) (talk) 22:53, 1 July 2017 (UTC)
      • Since you like chapter and verse, see WD:RFC: "You are more than welcome to open a new RFC process to get opinions over a topic, but that should be done after a long discussion via the other channels" (my emphasis). Where have you discussed the issue of IMDB?
        • The discussion of you removing sources is on your talk page and the previous RFC. There is no need to start a new one there each time you decide that a new source is unreliable. The RFC is get you to comply, and stop removing all sources. The previous RFC made clear that the current consensus is to not remove sources. The Nikkimaria rule of reliability does not apply to Wikidata. --Richard Arthur Norton (1958- ) (talk) 22:50, 2 July 2017 (UTC)
          • The Wikidata rule of reliability applies to Wikidata, as outlined above. Nikkimaria (talk) 23:54, 2 July 2017 (UTC)
      • You can read about the primary sources tool at WD:PST; the blacklist is a means of removing unreliable sources. I've already pointed you to other pages emphasizing the importance of using reliable ones. Nikkimaria (talk) 01:30, 2 July 2017 (UTC)
    • Nikkimaria: it's past time you 1) stopped accusing people of acting in bad faith when they challenge your unhelpful edits and ii) stopped making disruptive removals of content from this project. Also, the blacklist to which you refer in your second comment is not a Wikidata blacklist but a blacklist used by the operators of a single tool. It has no wider community consensus and is not binding on other editors. I have already made this clear, on your talk page, in a previous discussion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:38, 2 July 2017 (UTC)
I can't find the discussion on Findagrave and IMDB, but I remember the reason we did not upload them, in their entirety, was the notability of the people in the data set, not the reliability of the data. --Richard Arthur Norton (1958- ) (talk) 18:32, 4 July 2017 (UTC)
  • As far as the process goes, I agree that RfC's aren't the best vehicle. R Having the debate in the project chat would work better. ChristianKl (talk) 13:53, 2 July 2017 (UTC)
  • The blacklist of the primary sources tool isn't a general blacklist against specific sources being added by Wikidata-editors but a decision against automatizing the import of data from specific sources.
    Having multiple sources is useful when there are some sources that say the person was born in X and other sources are born in Y. In cases where there are 10 sources, I think there's a case to be made to not list every one and focus on the most authoritative sources, but when there are three or less sources I'm opposed to removing sources like IMDB. ChristianKl (talk) 13:53, 2 July 2017 (UTC)
    • Having an unreliable source that says X does not provide value: either there are reliable sources that say X in which case the unreliable ones are superfluous, or there are not in which case we shouldn't make the claim. This is especially true with regards to living people. Nikkimaria (talk) 00:00, 3 July 2017 (UTC)

So here's an idea: start a discussion to get consensus to change Help:Sources, Wikidata:Verifiability, Wikidata:Living people and all related pages to say that Wikidata does not have quality standards for sourcing. Perhaps in such a case it would be best to mark WD:V and WD:BLP as rejected as well. After all, if what is claimed about sourcing is truly the will of the community, our documentation should reflect that to avoid future disputes or misunderstandings. Nikkimaria (talk) 00:48, 3 July 2017 (UTC)

It's clear that the pages you cite variously don't say what you think they say, or don't have the authority you assume that they do (or, indeed, both of those things). For example, both Wikidata:Verifiability and Wikidata:Living people carry the notice "The proposal may still be in development, under discussion, or in the process of gathering consensus for adoption.", and in its current, flawed, forms are not supported by consensus, as recent discussion of each has made clear. Rather than doing further harm to this project, you need to listen to what people are telling you, here and elsewhere. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:37, 3 July 2017 (UTC)
Help:Sources refers to Wikidata:Verifiability for identification of reliable sources, and Wikidata:Verifiability refers to en:WP:RS. If you (speaking generally here, not you specifically) don't believe that chain is supported by consensus, and that following that chain is harmful to this project, you should seek to change it, and develop guidance that makes explicit a lack of quality standards for sourcing. Because at the moment, the guidance we are directed to explicitly says that IMDb is generally unacceptable. Nikkimaria (talk) 12:05, 3 July 2017 (UTC)
As has already been pointed out to you, Wikidata:Verifiability is NOT a consensus document in wikidata, and is marked as such. Help:Sources is also somewhat outdated relative to current practice - as a Help page it is intended to be helpful, not to define consensus policy for this wikimedia site. Both could use people working on them to make them closer to agreement with current consensus, but that consensus is also evolving. Please review the recent RFC's on verifiability and living people to get a sense of where things are here. I don't understand why you think in any way removal of sourcing information is helpful or a good use of your time when there are so many other things in wikidata needing urgent attention. ArthurPSmith (talk) 15:36, 3 July 2017 (UTC)
I consider addressing poor sourcing to be an issue in need of urgent attention, but if you do not, develop a policy that says poor sourcing is a-okay. If current consensus is that Wikidata has no quality standards for sources, our policies, guidelines and help documentation should reflect that, so that everyone from newbies to experienced editors can see where things stand. Nikkimaria (talk) 18:15, 3 July 2017 (UTC)
If I could summarize, I believe the current consensus here is: use the best (most verifiable) source you have for data entry, but provide *some* source for every piece of data if possible. There is no consensus at this time for any deletion of sources, or for any deletion of unsourced or poorly sourced statements; rather if possible better sources should be added when they can. However there are exceptions when deletion of sources is ok: when the source disagrees with the current value of a statement; or when a more recent edition of the source supports the statement (see recent RFC on updating from databases). There is no blacklist of "unreliable" sources or whitelist of reliable ones - I'm not sure we even agree that "reliable" is the right criterion for data - "verifiable" has been more consistently mooted here. It might be helpful if somebody tried to codify this via an RFC or some edits on Help:Sources etc, but I'm not sure what I've just phrased is even the general consensus at the moment. Feel free to attempt this yourself, but I suggest rereading all the relevant RFC and Proejct Chat discussions first. Bot approvals also often get into discussions of sourcing so you might want to look at the archive of recent bot approval requests. ArthurPSmith (talk) 19:18, 3 July 2017 (UTC)
  • this feels like a repeat of RFC FindAGrave please go back to the drawing board and define the criteria you have for a good source. Just saying if you have more sources then always delete IMDB sounds not you have any thoughts at all? You can not define quality as everything except IMDB and if we upload things like FindAGrave, Wikitree, etc.. for the same fact they will be better. I miss some deeper thoughts of what a good source is.... - Salgo60 (talk) 13:01, 3 July 2017 (UTC)
  • Keep multiple sources, including Imdb, as long as ImDB is not blacklisted on wikidata. Wikipedia rules are not to be automatically applied on wikidata. If you don't want imdb source on enwiki, then filter it out in the templates, please. --Hsarrazin (talk) 19:14, 3 July 2017 (UTC)
  • IMDB is an unreliable source for this kind of information and should not be used. Wikidata links to countless other knowledge bases, if you need to rely on IMDB you aren't trying hard enough. Gamaliel (talk) 03:16, 4 July 2017 (UTC)
How is it unreliable if it had the same date as the other source? --Richard Arthur Norton (1958- ) (talk) 03:33, 4 July 2017 (UTC)
A generally unreliable source can still occasionally be accurate. If the other source is reliable, why is that not sufficient? Gamaliel (talk) 03:41, 4 July 2017 (UTC)
If you want to discuss error rates you will have to show some error metadata, and compare it to other data sets to see what the rate is per 100,000 records. --Richard Arthur Norton (1958- ) (talk) 18:34, 4 July 2017 (UTC)
So all sources you prefer are considered accurate until someone else can provide a mass of data proving otherwise? That's not how this works. Gamaliel (talk) 19:13, 4 July 2017 (UTC)
You wrote: "That's not how this works" This isn't about what I think, or what you think. It is based on community consensus, if you want to change that consensus, you will have to persuade us with statistics, not emotions and anecdotes. Every data set contains errors, and when you correct errors you also introduce new errors. Take a college level course in information theory. We have been discussing what error markers to use to judge a data set that uses dates of birth and death, such as dying before you were born, or the number of people over 110 years of age compared to the general population. --Richard Arthur Norton (1958- ) (talk) 03:19, 5 July 2017 (UTC)
  •   Keep through deprecate if it is the primary source. Do not add if there is already a verifiable source existing. Don't go around routinely removing sources "just because", they should only be removed for a significant reason.  — billinghurst sDrewth 04:30, 8 July 2017 (UTC)
  •   Keep per billinghurst. Marcus Cyron (talk) 17:10, 14 September 2017 (UTC)
  •   Comment: "Deprecate if something better shows up" sounds like a good plan. Any bulk-removals could be done by a bot when desired, not piece-meal. – 14:19, 13 October 2017 (UTC)
  •   Keep, IMDb is a great resource if no others are available, and if others are available it could serve as "an auxiliary source" (think of "link rot"). -- 徵國單  (討論 🀄) (方孔錢 💴) 13:49, 9 November 2017 (UTC)