Wikidata talk:Notability

Active discussions
(Redirected from Wikidata talk:N)
This is the talk page for discussing improvements to Notability.
Use the "Add topic" button in the upper righthand corner to begin a new discussion, or reply to one listed below.

For discussion of the "Exclusion criteria" section of this guideline, please see the /Exclusion criteria subpage.

Previous discussion at Wikidata:Project chatEdit

See Wikidata:Requests_for_comment/Notability.

Notability of translated Meta-Wiki pagesEdit

Q76833338 is an item for the Spanish translation of the privacy policy on meta (item: Wikipedia:Privacy policy (Q4994089)). Should there really be a separate item for these translated pages? --Pyfisch (talk) 13:40, 17 October 2020 (UTC)

Since it is a metawiki sitelink, it is per policy a "case-by-case decision". I would recommend to nominate it for deletion, and elaborate a bit why this is a special case (metawiki per criterium 1.9, but translation page per 1.1). Should be a clear case to delete the item IMO. ---MisterSynergy (talk) 22:37, 17 October 2020 (UTC)
@MisterSynergy: Additionally, we should let users know what's the meaning of "To be valid, a link must not be ... translations page, ...". --Liuxinyu970226 (talk) 01:56, 25 October 2020 (UTC)

Odd revision to #1.4Edit

"1. It contains at least one valid sitelink to a page on [..] Wikimedia Commons."
".."
"4. Category items with a sitelink only to Wikimedia Commons are not permitted, unless either a) there is a corresponding main item which has a sitelink to a Commons gallery or b) the item is used in a Commons-related statement, such as category for pictures taken with camera (P2033)."

Somehow the revision debated at Wikidata_talk:Notability/Archive_5#Change_to_1._4._regarding_Commons can be misread as allowing to create items for Commons categories or Commons galleries as long as they are linked from items that are not category items. Was this intended? --- Jura 17:21, 3 December 2020 (UTC)

@Jura1: What you call a "misreading" was I think the case per the letter of the policy even before Ghouston's diff and that discussion. Taken together with 1.1, it essentially means that if something has a category on Commons, but is describable as a thing in its own right (ie not just as a category on Commons), then it can have an item here. That doesn't seem a bad thing to me; and is allowing c:template:wikidata infobox to be used on a lot of Commons categories, powered by items here on wikidata. Looking at the two sides of that diff, that seems to be what the policy text said both before and after the conversation you linked to. Jheald (talk) 18:19, 3 December 2020 (UTC)
Seems like the main change went through in March 2018 ([1], change by User:Mahir256). Odd that I didn't notice it back then. --- Jura 18:33, 3 December 2020 (UTC)
There have been several discussions over a longer period, starting roughly in late 2017 or early 2018; some of them were held on the Project chat. To my understanding the reason for those discussions was that the Wikidata Infobox was deployed to Commons categories in large numbers at that time. Overall, the tendency was indeed to implement something like the Status Quo, so I do not think that this was unintended in any way. Yet, the various problems with this approach have not changed, unfortunately, and Commons problems have now become Wikidata problems. Wikimedia Commons is still pretty much dysfunctional regarding notability and deleting spurious content, there is a serious spam problem which quite some promotional editors exploit because of that, and Commons content remains to be generally unsourced. —MisterSynergy (talk) 18:40, 3 December 2020 (UTC)
I don't think most of these discussions would have happened if the change in March 2018 had been noticed. If one just reads point 4 and not the beginning, the result is somewhat different. So we now end up with people creating items for anybody with a category on Commons --- Jura 18:49, 3 December 2020 (UTC)
Yeah, unfortunately. And with spammers dropping content at Commons to become Wikidata-notable. These things have explicitly been discussed, but there was nevertheless clearly more support for the now-Status Quo than opposition. I'm still not a fan of this change, but I accept it. —MisterSynergy (talk) 19:12, 3 December 2020 (UTC)
Somehow I like to think that there are still plenty of things with Commons categories that need an item, but doesn't need this (possibly unintended) change to be notable. I don't really see why we need to include all potential problems as well to make this happen. --- Jura 19:29, 3 December 2020 (UTC)
@Jura1: Are you saying that the problems Wikidata has with Commons are coming from people continually discarding the "if and only if" clause in my rephrasing of that point at that time? I'm not sure what the change I made then has to do with the change @Ghouston: proposed more than a year later. Mahir256 (talk) 20:41, 3 December 2020 (UTC)
  • @Mahir256: Depending how the text is read by deleting admins, it may not matter. The first line of #1 says a sitelink to Commons is sufficient. The rephrasing allows to (mis-)understand #1.4 that only sitelinks from category items are limited further. (Maybe it should be mentioned that many items are notable for other reasons and can include a single sitelink that is to a Commons category, if it exists). --- Jura 11:55, 4 December 2020 (UTC)
Well, the benefits for creating an even broader acceptance of Wikimedia Commons categories will outweigh the negatives. Wikidata already has more pages than Wikimedia Commons (from memory 96.000.000 Vs. 87.000.000) and Wikidata has myriads of pages dedicated to celestial bodies an astronomer has maybe viewed once vaguely through a computer-generated zoom on a telescope, while sometimes nationally operating companies and businesses that have hundreds of franchises aren't "notable" enough for Wikidata purely because nobody on a Wikipedia has taken their free time to write about that subject. I think that Wikidata should just "bite the bullet" and accept Wikimedia Commons categories outright and face "a spampocalypse" that will require a lot of interproject work to clean up because without Wikidata items Structured Data on Wikimedia Commons (SDC) is a lot less functional. Wikimedia Commons also faced "the Selfiepocalypse" when it opened up to mobile users and I find random Indian and Thai selfies scattered across random categories all the time, but for each of those hundreds of good edits are done by mobile users every day. The question isn't even necessarily about if this broader definition is needed, Wikimedia Commons is the only Wikimedia website with no notability guideline but it still has a project scope that excludes a large number of things as all content should be deemed to be educational. What is “spam” is also a vague definition as uploading an image of a local business may be deemed “educational” by some but “promotional” or even outright “spam” by others, though usually such photographs tend to be categorised by the building (house number + street + human settlement) as opposed to the current occupant, products are a similar thing, but someone with the intention of uploading a photograph of let’s say a sports car might own that vehicle but not have any stakes in the financial success of its producer (some may see this as “a conflict of interest, while others wouldn't) so what constitutes “spam” is always left to the interpreter. Anyhow, the true value of accepting all Wikimedia Commons categories remains with the benefits for the whole Structured Data on Wikimedia Commons (SDC) project which is currently developing and many editors have often complained that items for a certain category may be missing, now Structured Data on Wikimedia Commons could become “an inferior version of the Commonswiki category tree” if it can only use more broader representations of the subject at hand, Wikimedia Commons categories can also be more specific indexes like “underwear -> blue underwear -> blue striped underwear -> blue striped male underwear -> blue striped men's underwear” (fictional example, but such methods are often employed), if using Wikidata for this then the person doing the search using SDC may only look up more general terms and then the old Wikimedia Commons category system remains superior for media discovery. Broadening the notability standards to allow ALL Wikimedia Commons category would allow the SDC project to start expanding much faster and much more efficiently, at the end of the day Wikidata should first and foremost exist to provide data that help with the structural needs of Wikimedia websites, not be the arbiter of what is and isn’t allowed to be properly structured. Of course, the current phrasing is already broad but not broad enough. -- Donald Trung/徵國單  (討論 🀄) (方孔錢 💴) 22:36, 3 December 2020 (UTC)
I think you miss the point here, completely. The Wikidata project is generally open for more content from Commons, and I think it is safe to claim that many here would love to see a prosperous SDC project. Yet, in the current situation the content from Commons is often a threat to Wikidata, for basically two reasons:
  1. Commons content is generally unsourced, and thus difficult to verify. One of the core principals of the Wikidata notability policy is that content needs to be at least borderline verifiable in the sense that practically any user, not just insiders, can directly and without own research understand what the item describes, and can ideally verify (parts of) the data provided in the item. This is usually being achieved by links to serious external resources, or links to Wikipedia pages which typically contain sources themselves.
    The auxiliary content at Commons, such as categories, are however usually unsourced. You can drop and claim practically anything at Commons and it is highly likely that nobody will ever question your claims as long as your edits seem otherwise technically fine. This unsourced data is then being imported to Wikidata, in order to be displayed in the Wikidata Infobox at Commons. Commons's problem with unsourced content is now Wikidata's problem as well, and (potential) data users complain that Wikidata is a messy and unsourced pile of dubious data.
  2. The other issue is the vague definition of the "educational project scope" at Commons. Aside from the fact that everybody seems to understand it differently and the widest possible interpretation is often applied, it seems pretty much to me that most new content is in fact not even actually compared against this scope policy, and Commons just takes whatever is uploaded technically correctly. This is not totally surprising, given that content is unsourced and difficult to verify anyways.
You need to be aware that Wikidata cannot grow indefinitely. Even a 10 times larger Wikidata is not feasible in many ways currently, both from technological and sociological standpoints. A setting where "everything is notable" is not possible here, but the Commons policy is not far from that in fact.
We also regularly see spammers who drop some content at Commons, in order to create notable Wikidata items which they think boost their Google ranking. Apart from the fact that we cannot verify their data—it usually stems from themselves only and cannot be verified against independent sources—this is a clear abuse of community resources purely for their commercial purposes. You need to be aware that data needs to be taken care of occasionally, and each and every item creates some workload every now and then. All of this meanwhile happens on a pretty professional level: you can hire agencies which then automate the creation of promotional content here at Wikidata, including images and categories at Commons (and in fact promotional articles on a couple of websites which are then being used as "references").
Another risk is that the lack of sources for Commons content can be exploited for libellous activities. Since content can be uploaded by anyone anonymously, and barely verified, it is not difficult to deliberately add wrong content about someone else, in order to potentially cause harm to them. Although we do not see this very often, there have been cases like this, so suddenly there is another Commons problem which is now Wikidata's problem as well.
If the Wikimedia Commons project was to adapt a more sourced-content-based workflow for their auxiliary content, I would not worry about larger imports from there. In the current situation, however, it seems that Wikidata does accept quite a lot of extra risks and it should solve problems which got out of hand at Commons, all of that without having much influence in Commons itself. Not really a desirable situation, if you ask me. —MisterSynergy (talk) 23:50, 3 December 2020 (UTC)
I think the problems of spam are similar on Commons and Wikidata. If spam is linked between the two, the solution would be to delete it on both projects. I'm not convinced that intersection categories on Commons, like the "blue striped men's underwear" example, really need to have category items at Wikidata, and this has been discussed here in the past. That's why when I proposed my change to Commons notability a while ago, I tried to restrict it to fix only the specific problem that I had, which was that once a Commons gallery had been linked to Wikidata, there was nowhere to link the Commons category. Ghouston (talk) 00:41, 4 December 2020 (UTC)
Those are all good points raised, regarding the "blue striped men's underwear" example, this is mostly for the search feature of Structured Data on Wikimedia Commons (SDC) search engine (as Wikimedia Commons has its own Google-like or Ecosia-like search engine based on SDC), and as for the spamming issue the solution might be technical, let's say that if a Wikidata item is created that only links to a Wikimedia Commons category but doesn't link to any sources then it could be no-indexed for search engines by default by the software, if a source is added then this will automatically be overrided. Spammers can continue spamming but will be unaware that their efforts won't accomplish their goals of higher rankings in search engines (Google, Microsoft Bing, Yahoo! Search, Ecosia, Lycos, Etc.). Lowered notability standards won't be an issue if certain items won't appear on search engines, then Wikidata can both fulfill Wikimedia Commons' structural and keep its reputation. -- Donald Trung/徵國單  (討論 🀄) (方孔錢 💴) 00:29, 5 December 2020 (UTC)
I'm afraid this is not feasible. The search engines might use their regular crawler bots to feed the search results, but I am pretty sure that many data users, including Big Tech, use the SPARQL endpoint or data dumps for their products, particularly for the interesting ones. "Noindex" does not work there if it is not the regular web search engine.
Besides that, Spammers are not stupid. The clear majority of them meanwhile manages to drop some promotional articles on a couple of websites and use them as sources here. I delete quite a lot of promotional content and occasionally have email interaction with some of those editors who are either complaining about deleted items, or request deletion of their content because they were not happy with it. Some are pretty open about their motivations. The scheme they describe is roughly that if you want to appear prominently on the Internet, you hire a web reputation agency; this company places (promotional) content/articles about you on several websites that sound important, and creates profiles on relevant platforms—Wikidata is just one of many, and it is used because the general assumption is that it secures them a Google Knowledge Graph entry. I would not even be surprised if many of the spammers would not care about the Wikidata item any longer, once they made it to the Knowledge Graph.
In general, Wikidata's reputation is closely related to the reference situation. References are crucial for this project, in order to be able to verify information and to even justify its existence on this platform. We often talk in a context where everybody assumes that a Wikidata entry would be generally desirable, but this does by for not apply to all persons and companies described here. When building a database such as this one, one should always consider the ethical responsibility that comes with collecting and publishing data about others. It might be seen as helpful and promotional by some, but also—rightly—as infringing on privacy or in some way damaging their reputation by others. If we rely on information that is already published by independent third-parties elsewhere and link these sources, we are pretty much on the safe side. Otherwise we are on dangerous territory as the project might harm people because it publishes information that should better not be published, or is abused by bad-faith actors who deliberately publish wrong or libellous information about someone else. —MisterSynergy (talk) 01:15, 5 December 2020 (UTC)

How many notability criteria are there?Edit

Before listing three criteria, the project page reads: "An item is acceptable if and only if it fulfills at least one of these two goals, that is if it meets at least one of the criteria below"

So should an established regular user (with permissions) change this to "three goals"?  – The preceding unsigned comment was added by Dimmer (talk • contribs).

I don't think so, the previous sentence lists only two goals. Ghouston (talk) 01:27, 11 March 2021 (UTC)
I've mentioned "three" but MisterSynergy reverted it; care to clarify why three "was semantically incorrect"? Fgnievinski (talk) 14:25, 22 March 2021 (UTC)
You messed up the top-level numbering by mixing list syntax (#) with indentation syntax (:). —MisterSynergy (talk) 15:12, 22 March 2021 (UTC)
Return to the project page "Notability".