Wikidata:Project chat/Archive/2023/10

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.


Help to resolve triggering an abuse filter via OpenRefine upload

Hi, I'm trying to upload new items to Wikidata via OpenRefine. The load will create new items for each issue of a journal title. I'll then do another load of article metadata to link to each issue Qid. I've successfully done this before for other journal titles, but this time I discovered my load was failing due to abuse filter 36: non notable subpages. https://www.wikidata.org/wiki/Special:AbuseLog?wpSearchUser=HelsKRW&wpSearchPeriodStart=2023-09-18T00%3A05%3A02.000Z&wpSearchPeriodEnd=&wpSearchTitle=&wpSearchImpact=0&wpSearchAction=any&wpSearchActionTaken=&wpSearchGroup=0&wpSearchFilter= I'm unsure what it is that Wikidata isn't liking, or where to ask for help as I can't see a discussion page on the abuse filter page itself https://www.wikidata.org/wiki/Special:AbuseFilter/36 Any advice or help much appreciated. Thanks HelsKRW (talk) 10:12, 29 September 2023 (UTC)

I think what's happening here is that the filter is meant to stop creating items with a sitelink (to another wiki) that contain specific strings, eg /archive, to stop things like people creating items for discussion subpages. Your upload has one url with /archive in it, and I wonder if somehow that's triggering things?
I'm not very experienced with these so it's not obvious to me if the filter is actually only stopping sitelinks or if it's stopping any new lines at all. pinging @Matěj Suchánek: who was the most recent editor there, and will understand it better... Andrew Gray (talk) 12:39, 29 September 2023 (UTC)
Thanks for your help @Andrew Gray I can replace the /archive URL with something else, which I've just done and am trying to upload, but OpenRefine is currently sitting at 0% complete! I'll have another go on Monday morning and report back! HelsKRW (talk) 14:55, 29 September 2023 (UTC)
I updated the edit filter not to check URL addresses. You should now be able to create those items. Sorry for the disruption. --Matěj Suchánek (talk) 08:16, 30 September 2023 (UTC)
Thank you both very much for your help - happy to report the upload has worked this morning! HelsKRW (talk) 08:41, 2 October 2023 (UTC)

How can I best help the Wikidata project?

stuck in bed for the next couple of months. How can I help? Bablebooks (talk) 01:56, 2 October 2023 (UTC)

@ Bablebooks: If you're really bored you can go to https://wikidata-game.toolforge.org/distributed/ and play one of the games. Though make sure you understand the rules of wikidata before contributing too much. BrokenSegue (talk) 02:17, 2 October 2023 (UTC)
If you are able to write bots you could also take a look at https://www.wikidata.org/wiki/Wikidata:Bot_requests ChristianKl10:49, 2 October 2023 (UTC)
I'd look up some subjects you are interested in, and start noticing things that you are annoyed are wrong or missing, and fix them. Then use the query service to produce lists of similar things are missing or wrong. Soon you'll be using https://quickstatements.toolforge.org/#/batch to do mass fixing of mistakes, and you'll be hooked! Vicarage (talk) 10:52, 2 October 2023 (UTC)

Are diff links borked?

Go to this URL https://www.wikidata.org/w/index.php?title=Q33126477&diff=1985218651&oldid=1586416561 and does the title of the item read as "<span class="wikibase-title-id">(Q33126477)</span>". I get this on multiple browsers with and without incognito mode. Does nobody else notice this? Been happening for a while for me? BrokenSegue (talk) 02:16, 2 October 2023 (UTC)

phab:T347578MisterSynergy (talk) 07:28, 2 October 2023 (UTC)

Wikidata weekly summary #596

Where to gain traction for proposed technical improvements?

I know how to create Phabricator tickets, but not how to get them prioritized. I think there is some recurring event where the community votes on improvements for the developers to prioritize, but I can't find it. Does anyone know what I'm talking about? Swpb (talk) 16:29, 29 September 2023 (UTC)

M2k~dewiki (talk) 20:40, 29 September 2023 (UTC)
There's the wishlist survery. Besides that you can also talk to Lydia Pintscher (WMDE) who's responsible for the main priorisitation around Wikidata. ChristianKl08:33, 30 September 2023 (UTC)
Thanks! Swpb (talk) 19:11, 3 October 2023 (UTC)

Need help with Q202014

Hello, I’m new to the site. How do I change the incorrect date on the inception of The Blue Marble (Q202014) to the correct one? The year is one digit off. Jarrod Baniqued (talk) 11:02, 2 October 2023 (UTC)

click on edit, type the new value, and click publish. If you can, add a reference to where the correct value is by clicking add reference, then in the left box select 'reference URL' and in right box paste the url of page with correct value, and click publish. This applies to website references, and there are different methods for dealing with references that are books, other Wikidata Items, or Wikipedia pages. Simonc8 (talk) 12:27, 2 October 2023 (UTC)
I presume I can only do this on desktop, since the mobile version won’t allow me to make edits to properties there. Jarrod Baniqued (talk) 03:43, 3 October 2023 (UTC)
@Jarrod Baniqued: I've gone ahead and fixed the date for you. But yes, in the future, follow Simonc8's advice. Huntster (t @ c) 23:13, 2 October 2023 (UTC)

Wikimedia category vs Wikimedia administration category

Should items such as Category:Articles with dead external links from October 2023 (Q122917476) be instances of Wikimedia category (Q4167836) or Wikimedia administration category (Q15647814)? If it is the second, should a bot be used to change this for a given set of items such as those containing "Category:Articles with dead external links from..." Kk.urban (talk) 04:34, 3 October 2023 (UTC)

Wikimedia administration category (Q15647814) is more precise; yes, there should be a periodical substitution, but it hasn't been engineered yet. --Epìdosis 08:01, 3 October 2023 (UTC)
Is it better to be more precise? Has this been agreed upon in general on Wikidata? Kk.urban (talk) 16:51, 3 October 2023 (UTC)
And if this is agreed upon, somebody could use a bot to change this on all items with a description starting with certain text, such as "Category:Articles with dead external links from..." I could provide a list of such text strings for the bot to operate on. Kk.urban (talk) 01:57, 4 October 2023 (UTC)

Property for conveying accessibility information about venues and events

We currently use the property disabled accessibility (P2846) to convey information about the accessibility of venues and events.

The property is presently used in about 7100 instances. In most of the cases, it is used to indicate the "wheelchair" accessibility of the place or event, as the label would suggest. In about 60 cases, however, it is used to indicate whether a place/event is accessible for people with visual impairment.

I am currently cooperating with the Sitios Association, which is the result of a recent merger of several initiatives in the field of accessibility information in Switzerland. The goal of the organization is to further build up and maintain a database with accessibility information about places/venues in Switzerland. The database, which can be found at www.ginto.guide (name change pending), currently contains about 12'000 entries - a number that is expected to double over the coming two years. We are about to start (1) linking the Sitios database to Wikidata (bidirectional linking via IDs), (2) complementing Wikidata entries based on the Sitios database, (3) adding high-level accessibility information directly to the respective Wikidata items, with links to detail pages of the Sitios database.

The dimensions for high-level accessibility information we would like to insert into Wikidata comprise:

  • Wheelchair accessibility (in general; toilets; parking place)
  • Availability of a magnetic loop (for the hearing impaired)
  • Availability of an offer in sign language
  • Accessibility for people with visual impairments
  • Accessibility for people with cognitive impairments

I think it would make sense to use the same property (disabled accessibility (P2846)) to convey the different types of accessibility information. In this case, I would however suggest to change its label to "accessibility of location or event". -- Are there any thoughts about this?

--Beat Estermann (talk) 10:08, 30 September 2023 (UTC)

I agree with the proposal to change the label as indicated and allow values that are subclasses of accessibility (Q555097) or inaccessibility (Q73019188). -- William Graham (talk) 14:07, 30 September 2023 (UTC)
Is there any international standard body that defines categories for this? ChristianKl14:51, 30 September 2023 (UTC)
There used to be a W3C "Linked Data for Accessibility Community Group". It was disbanded earlier this year, due to prolongued inactivity. Our Swiss partner, the Sitios association (and its predecessor organizations), has been in contact with the German lead of that Community Group. We have decided to take the standardization efforts forward in a more "bazaar"-style, feeding best practices established at a national level into Wikidata, expecting other (national) initiatives to follow suit, hopefully imitating each other and building upon each other's experiences. This might lead to a harmonization of practices over time. If divergencies in best practicies appear, we are ready to take it forward from there... --Beat Estermann (talk) 12:12, 3 October 2023 (UTC)
@Beat Estermann If there are national level best practices, then it might be useful to have items for the individual national classifications. I think it's desirable to have people use standards that are established somewhere.
In general, I don't think reusing disabled accessibility (P2846) is a good idea. I think it's better to create a new property for this purpose even if the use overlaps a bit with that. ChristianKl13:16, 4 October 2023 (UTC)

Ranges of values

How do I use rages of values in a statement? For example, very high frequency (Q152466) covers the range from 30 to 300 MHz. That information currently is only stored in text form. The same applies to all parts of the electromagnetic spectrum (Q133139) – things like ultraviolet B (Q22914692) are defined by minimum and maximum values of frequencies or wavelengths, but yet we only have them as text in the description.

It seems like I could set frequency (P2144) to 30 MHz and then have applies to part (P518) specify that this number represents the minimum (Q10585806) of the whole range (and then do the same for the maximum). Is that how it's supposed to work? El Grafo (talk) 15:22, 3 October 2023 (UTC)

@El Grafo: In Wikidata the Quantity datatype intrinsically allows a range of values - for example see tritium (Q54389) where several properties such as half-life (P2114) have a value with a +- uncertainty attached. However this is slightly tricky to use and I'm not sure is the right solution for wide ranges such as you are interested in; your approach with qualifiers may be the best option here. I guess let's see what others think? ArthurPSmith (talk) 19:14, 3 October 2023 (UTC)
@ArthurPSmith mmmh, that's a different thing though, where something is characterized by a single value that has some uncertainty to it. El Grafo (talk) 07:58, 4 October 2023 (UTC)
@El Grafo: I've never done it before but perhaps two statements frequency (P2144) with a qualifier applies to part (P518) and either maximum (Q10578722) or minimum (Q10585806)? --Jahl de Vautban (talk) 08:54, 4 October 2023 (UTC)
@Jahl de Vautban Yes, that's what I was thinking about (but apparently failed to convey clearly). Thing is, I've never seen it being done that way, so I'm unsure. TBH, it feels like something that should be possible without awkwardly messing around with qualifiers at all.
There's also upper limit (P5448) and lower limit (P5447), but their use is even more awkward, requiring a dummy when used as a qualifier. It's pretty much only used this way for modelling the medical reference range (Q1626599) via the dummy reference range as qualifier (Q55426051). That feels super obscure and I don't think we want to create a dummy value for everything that may be expressed as a range of values ...? El Grafo (talk) 10:11, 4 October 2023 (UTC)
Or maybe I'm thinking too complicated: I guess we could just add lower limit (P5447) = 30HHz directly to very high frequency (Q152466) (after adding instance of (P31) = radio frequency range (Q25110567) and/or part of (P361) = radio frequency range (Q16023911))? El Grafo (talk) 10:11, 4 October 2023 (UTC)

Anyone able to spot the difference?

One is supposed to be a "former" company, but then replaced by the other, of exactly the same name? Reedy (talk) 01:17, 2 October 2023 (UTC)

@Reedy Not exactly the same. The first is s.r.o., a "limited liability company", the second is "a.s.", a stock company. Many things have the same name but are a different concept here in Wikidata. Vojtěch Dostál (talk) 08:33, 5 October 2023 (UTC)

Property for body/organization that an event occurred in?

What's the best property to use on an item for an event, to specify the organization under which it occurred? Case in point: removal of Kevin McCarthy (Q122927660)???United States House of Representatives (Q11701)}—what property expresses this relationship?

location (P276) is wrong since it's not about physical location. legislative body (P194) is for specifying responsibility for legislating for the subject applies to jurisdiction (P1001) is for specifying where the subject has authority. court (P4884) is close, but it's specific to court cases (I suppose that's a subproperty).

73.223.72.200 21:42, 4 October 2023 (UTC)

organizer (P664) maybe? -- William Graham (talk) 22:24, 4 October 2023 (UTC)
Perhaps, but it seems subtly different. E.g., McCarthy's ouster occured within the framework of the House's rules, but it seems weird to say that the House "organized" it. A different example: if someone resigns unilaterally, that would constitute an event within an organization, but clearly the organization didn't organize it. 73.223.72.200 22:38, 4 October 2023 (UTC)
If you consider the motion to vacate to be an election, then you could use office contested (P541) to relate it to Speaker of the United States House of Representatives (Q912994), which is already related to United States House of Representatives (Q11701). Bovlb (talk) 04:33, 5 October 2023 (UTC)

Wikidata birthday: join the festivities and prepare a birthday present!

Hello all,

As you may know, Wikidata was launched on October 29, 2012, and every year in October, we celebrate the anniversary of the project with birthday wishes, events and cake. As we will celebrate the 11th birthday of Wikidata this year, we wanted to share with you how you can get involved in this event!

1. Prepare a birthday present

Every year for the birthday, people prepare some presents for the Wikidata community. These presents can be useful, fun or interesting: a new Wikidata tool, a new WikiProject, a logo or another piece of art, a blog post, an important community discussion… They can be worked on alone or in collaboration with other people.

If you have ideas for a Wikidata birthday, now is the perfect time to start thinking about it, and finding other people to work with you! Once the present is ready to be announced, you can add it to the Presents page. If you’re looking for inspiration, you can check what was done for the previous anniversaries on the Eleventh birthday page.

2. Attend the WikidataCon 2023

The WikidataCon 2023 will take place on October 28-29 with a hybrid format: while people from Taiwan and the neighbouring regions can join the onsite event in Taipei, people from all around the world will participate online. Most sessions will be broadcasted and recorded, in English and Chinese.

If you would like to participate, you can register for the event, and check out the program.

3. Present your birthday gift during the WikidataCon 2023

You prepared a present for Wikidata's birthday and would like to present it to the community? You can sign up for the birthday presents lightning talks session that will take place online during the WikidataCon 2023, on Day 2.5, October 29, at 14:15 UTC.

To register for a slot, please read the instructions and add your project to this page.

Please make sure that you are registered for the WikidataCon 2023 in order to access the session.

4. Join or organize other events

You can also autonomously organize a distributed birthday event with your community or in your area: when your event is ready to be announced, please add details on the Distributed events page.

To connect with other people organizing Wikidata-related events, feel free to join the Wikidata Events Telegram group.

If you have any questions or need support, feel free to contact me or to write a message on this talk page.

Looking forward to interacting with you all around the Wikidata birthday!

Best, Lea Lacroix (WMDE) (talk) 06:34, 5 October 2023 (UTC)

UKRI external IDs all broken

Daask recently brought my attention to this on my talk page: Topic:Xqlgc2p7wdeqsbur.

It looks as though all of the Gateway to Research organisation ID (P8501) values are now broken / obsolete. My hunch after a quick look is that they've generated an entirely new set of IDs for all of the organisations in their system. I don't have much free time at the moment so thought I'd flag to project chat if anyone wants to pick up the cause of investigating / resolving. Also perhaps someone knows more about the organisation and the change to these IDs. SilentSpike (talk) 22:53, 2 October 2023 (UTC)

@SilentSpike The GTR website lists some form of major data update in September 2023, which sounds plausibly like the source of the problem. I think you're right that all the data has been regenerated, which is a bit of a mess. Andrew Gray (talk) 20:49, 5 October 2023 (UTC)

Delaware County

Delaware County (Q27844) does not have an English site link because someone on the English Wikipedia moved the page to something unrelated, the wikidata item sitelink changed, then the article was moved back with no redirect left behind so it was deleted.

Can someone add it back? 115.188.126.180 22:50, 6 October 2023 (UTC)

  Done — Martin (MSGJ · talk) 23:01, 6 October 2023 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. RVA2869 (talk) 11:49, 13 October 2023 (UTC)

Add sitelink

Can outreach:Wikimedia:Sandbox be added to project:Sandbox (Q3938), thanks 115.188.126.180 09:23, 9 October 2023 (UTC)

Also wikifunctions:Help:Contents to Help:Contents (Q914807) 115.188.126.180 09:52, 9 October 2023 (UTC)
  Done Michgrig (talk) 20:04, 9 October 2023 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. RVA2869 (talk) 11:49, 13 October 2023 (UTC)

F.C. Lumezzane VGZ A.S.D.

This football club is now called Football Club Lumezzane. Is it possible to change the name of the page? Thank you! Raven10 (talk) 15:43, 9 October 2023 (UTC)

  Done Michgrig (talk) 20:07, 9 October 2023 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. RVA2869 (talk) 11:49, 13 October 2023 (UTC)

Q120242535

seems like bad modelling to me. Turkey (Q43) did not exist in her lifetime; also, I have no idea whether Ottoman Empire (Q12560) even had a clear concept of citizenship, and whether it would have included women. - Jmabel (talk) 23:45, 12 October 2023 (UTC)

Yes, just remove it. ChristianKl09:50, 13 October 2023 (UTC)
  Done
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. RVA2869 (talk) 11:48, 13 October 2023 (UTC)

suicide location data still present - Golden Gate Bridge (Q44440)

This edit from September 11 has instance of (P31) = suicide location (Q47408425). Found it by checking what links to the item about suicide locations https://www.wikidata.org/wiki/Special:WhatLinksHere/Q47408425. From the previous discussion about this topic I understand we don't want any human to cease living because of what we add here on Wikidata. Quoting ChristianKl (talkcontribslogs)

I don't think any property should used with it. I generally prefer not to do things on Wikidata that have a good chance to lead to people dying.

The only thing that starts with "suicide hots" now on WD is Suicide hotspots, interventions, and future areas of work at a Californian university (Q90126907). Onceuponmasshysteria (talk) 07:01, 9 October 2023 (UTC)

Data Modelling Days, online gathering, November 30 - December 2, 2023

Hello all,

Following the past events dedicated to data quality and data reuse, the Wikidata team wanted to host a new gathering dedicated to data modelling.

The Data Modelling Days will take place online over three days and will host a variety of discussions, workshops and practical sessions on the topics of Wikidata ontologies, EntitySchemas, modelling issues and various other challenges.

The event is open to everyone, regardless of your experience with modelling data on Wikidata. We particularly encourage people who are working on specific topics to join the event and present their modelling challenges.

If you know people or groups who are already discussing modelling issues on Wikidata, or would have something interesting to contribute, please share this message with them!

You can find more information on the dedicated page, sign up and let us know what you are interested in, you can already propose discussions and workshops on the talk page until November 19th.

If you cannot attend, don’t worry, most sessions will be recorded, notes will be taken and slides will be shared.

We are looking forward to seeing you and learning more about your modelling challenges during the Data Modelling Days! If you have any questions, feel free to reach out to me. Best, Lea Lacroix (WMDE) (talk) 14:26, 9 October 2023 (UTC)

Help requested with merging Q202241 and Q22110899

It looks like we have two wikidata items for the Pygmy hippopotamus: Q202241 and Q22110899. According to the Pygmy hippopotamus wikipedia page, there's been a back and forth between the two scientific classifications (Hexaprotodon liberiensis or Choeropsis liberiensis) but they both refer to the same species of animal so there's no reason why we should have two wikidata items for it. I tried to merge the two with merge.js, but there's a conflict because there are two wikispecies entries for both scientific classifications (Hexaprotodon liberiensis and Choeropsis liberiensis). Anyone have any thoughts on how to remediate this? BaduFerreira (talk) 19:29, 5 October 2023 (UTC)

The two wikispecies pages would need to be merged first — Martin (MSGJ · talk) 21:39, 5 October 2023 (UTC)
I've left a message at the village pump — Martin (MSGJ · talk) 21:42, 5 October 2023 (UTC)
Awesome, thank you Martin! BaduFerreira (talk) 22:04, 5 October 2023 (UTC)
I believe we keep synonymous taxons as separate items. BrokenSegue (talk) 22:26, 5 October 2023 (UTC)
The two taxons have been merged on wikispecies. I'm not sure what the status quo is in terms of keeping synonymous taxons separate or not, but Q202241 has 46 linked wikipedia articles and Q22110899 has 14 wikipedia article linked to it. I can't speak to other wikipedia language articles, but the english and portuguese articles are definitely on the same topic and cover the same scope. I believe the correct course of action is to merge the two. BaduFerreira (talk) 00:54, 6 October 2023 (UTC)
I have merged the two items, please revert if this is incorrect. But it seems we should not have two different items for the same species. — Martin (MSGJ · talk) 11:40, 6 October 2023 (UTC)
In fact the convention is to keep the items separate for each taxon name, with the corresponding data and identifiers in each item, so the information is not "lost". The relation between the items is made with different properties (taxon synonym (P1420), original combination (P1403), basionym (P566), replaced synonym (for nom. nov.) (P694)). Korg (talk) 12:19, 6 October 2023 (UTC)
Oh okay, is there any way that wikipedia articles can point to two wikidata items then? This all started when I saw that the portuguese article supposedly didn't have an english version, but it just pointed to a different wikidata item. BaduFerreira (talk) 13:32, 6 October 2023 (UTC)
Wikidata items can only point to one Wikipedia article and vice versa, but we can do something similar with sitelinks to redirects. We can have a link to the main article on one item, and a link to the corresponding redirect on the other. DoublePendulumAttractor (talk) 16:52, 6 October 2023 (UTC)
On some wikis, you can use Template:Interwiki extra (Q21286810) to annotate the article on the client project with additional Wikidata items from which to draw interwiki links. Bovlb (talk) 20:28, 8 October 2023 (UTC)
This edit shows how this can be done when there are Wikipedia articles and Wikimedia entries with different taxon names that all refer to the same taxon (moving sitelinks was done using the "Move" gadget at Special:Preferences#mw-prefsection-gadgets). Thus, the different articles and entries are interconnected in one item. The fact that not all entries have the same taxon name is an indication that some of them need to be updated. Korg (talk) 20:49, 9 October 2023 (UTC)

Request for input on Wikidata:List of properties’s poor condition

I am currently refactoring plenty of Pasleim’s User:PLbot/User:DeltaBot scripts, and one of them is maintaining property lists at Wikidata:List of properties with daily updates. Turns out this job in particular is not in good shape, and I am here to seek input what to do with it.

The five-columned sections on Wikidata:List of properties are currently identified by this query (column "labels"), and there is a complex structure of subpages that loosely and incompletely resembles the subclass tree of Wikidata property (Q18616576) that currently consists of 628 items. For those who are interested, the pywikibot script that maintains these lists is at GitHub.

Anyways, I would claim that this "overview" provides zero value for anyone in its current form, so it should either be retired or replaced by something else. It is relatively expensive to compile at the same time, i.e. the bot uses quite some resources to update these seemingly useless lists every day.

But what do we need instead? I am happy to hear your suggestions. —MisterSynergy (talk) 20:12, 7 October 2023 (UTC)

I usually point people at the Wikidata Property Explorer https://prop-explorer.toolforge.org/ Bovlb (talk) 20:09, 8 October 2023 (UTC)
Thank you.
Given there was no other reaction to my posting, I have now removed the bot-maintained property lists from Wikidata:List of properties. The tools advertised on that page are superior by every measure anyways. —MisterSynergy (talk) 22:56, 9 October 2023 (UTC)

Wikidata weekly summary #597

Opportunities open for the Affiliations Committee, Ombuds commission, and the Case Review Committee

Hi everyone! The Affiliations Committee (AffCom), Ombuds commission (OC), and the Case Review Committee (CRC) are looking for new members. These volunteer groups provide important structural and oversight support for the community and movement. People are encouraged to nominate themselves or encourage others they feel would contribute to these groups to apply. There is more information about the roles of the groups, the skills needed, and the opportunity to apply on the Meta-wiki page.

On behalf of the Committee Support team,

Elements corresponding Wikipedia articles - enrich WD item titles and descriptions based on WP articles

When I jump from a Wikipedia article that has a lot of interwikis, to the corresponding Wikidata item, I can notice the following. Despite of existence of the article names in a plenty of languages, they aren't copied into the Wikidata item name in the corresponding languages.

Descriptions of such items are also empty in these languages. When I look into the article, in the first paragraph I can easily see a very short description of the item that could be copied into the wiki data item description.

What do you think about it? Is such a copying worth doing? Is there any conceptual problem with such an approach? Is there a project that helps to fill out such Wikidata items automatically or semi automatically? Nikolay Komarov (talk) 16:59, 9 October 2023 (UTC)

Also see
M2k~dewiki (talk) 21:10, 9 October 2023 (UTC)

Acronyms as labels

Hi everyone! I noticed in the Help:Label/fr page this sentence: "Les acronymes ne peuvent être utilisés pour les libellés, ils doivent figurer dans les alias." No guidance about developed vs. acronym form is found in the same page in English (Help:Label). Who is right? Should we consider the sentence in the French page as a rule specific to labels in French?

Thank you for your help! Dipsode87 (talk) 17:19, 2 October 2023 (UTC)

I think that's good general guidance even in English. BrokenSegue (talk) 17:27, 2 October 2023 (UTC)
Thank you for your answer @BrokenSegue. I should have looked to the aliases help page that indicates explicitly that aliases are intended for acronyms and abbreviations. But the page specifies that it applies only to items, not to properties, and I should have mentioned that my question concerns properties (in particular, Property:P356 has an acronym as the label, while Property:P8091, another identifier schema, has a developed form). I found my answer in other help pages. I sum up my findings below, in case someone else would wonder.
For items, acronyms should be mentioned in aliases (Help:Aliases).  For properties, there is no similar rule, but other rules apply: 1)  "Unlike items, property labels must be unique." and 2) " Label, description, and aliases of a property are first discussed by the community during the property proposal process. Major changes to a property should be discussed on its talk page first." (Help:Properties)
Apologies for asking before having read the whole documentation! Dipsode87 (talk) 09:37, 3 October 2023 (UTC)
But there are definitely items for entities where the acronym or initialism has become the effective (or even legal) name: IBM (Q37156), laser (Q38867). And I don't see any problem with RCMP Technical Security Branch (Q7276109) (at least the English label) even though we are correct to spell out Royal Canadian Mounted Police (Q335175). Well, I do see one problem: surely those items should have some relation to one another. Also, interesting case: National Organization of Spanish Blind People (Q1750397), labeled "ONCE" in several languages, spelled out in others. That's an interesting case, because I bet a lot of people have heard of it (or of the eponymous former cycling team ONCE (Q2790516), which I see has no expressed relation to any of its historic eponyms) as "ONCE" with no idea what it stands for, because they are probably better known for running a lottery than for the work they do! - Jmabel (talk) 05:05, 10 October 2023 (UTC)

Cisgender and transgender as subclasses of person

cisgender (Q1093205) is a subclass of person (Q215627) but transgender (Q189125) is not a subclass of person (Q215627) (unless there's some path I'm missing between these items). Does this make sense? In this case, is "cisgender" used to refer to both "the concept of cisgender" and "the group of people who are cisgender"? If it only applies to the first than I don't think it should be a subclass of person, but in either case I'm pretty sure that either both cisgender and transgender, or neither, should be subclasses of person. Kk.urban (talk) 04:30, 3 October 2023 (UTC)

I would agree that they should be parallel, and I'd also guess that at the moment each of these items is a bit of a conflation. - Jmabel (talk) 04:47, 10 October 2023 (UTC)

I need help

Hello fellow administrators! I find Almohad Caliphate (Q199688) and Almohads (Q10406562) has the same of content, namely Almohad therefore I asked to merge them so as not to confuse them. Regards Badak Jawa (talk) 09:31, 10 October 2023 (UTC)

One is an item about the dynasty and the other about the country. That's not the same even if Wikipedia articles are similar. ChristianKl07:51, 11 October 2023 (UTC)
@ChristianKl where dynasty and where country? Badak Jawa (talk) 10:51, 11 October 2023 (UTC)
@Badak Jawa: Some Wikipedias have merged the two subjects, eg. en:Almohads is a redirect to en:Almohad Caliphate (and german wikipedia has the opposite redirection), but some other languages have separate articles for the subjects eg. sv:Almohadkalifatet and sv:Almohader. MKFI (talk) 13:14, 11 October 2023 (UTC)
@MKFI So, do you think the Wikidata items can be merged? Badak Jawa (talk) 13:22, 11 October 2023 (UTC)
No, they should not be. MKFI (talk) 14:26, 11 October 2023 (UTC)
If that's the question you are asking, it's quite obvious that you lack an understanding of Wikidata to understand what gets merged. It's the instance of (P31) value of the items.
Just because Wikipedia articles are similar does not mean we merge items. ChristianKl14:59, 11 October 2023 (UTC)

I take it I have not modeled spouse correctly here for someone whose spouse has no Wikidata item. Could someone please fix that for me? Thanks in advance. - Jmabel (talk) 23:17, 10 October 2023 (UTC)

I swapped name for object named as (object being spouse without an item). -- William Graham (talk) 01:49, 11 October 2023 (UTC)

How do I use item Historical person

I do have historical person (Q110897910). How to model this for a person who "probably" was a living human being recorded in like saga (Q180494) or chronicle (Q185363) but with no "official" records elsewhere. Shall Historical person be used insted of instance of (P31) of human (Q5) or both? Example Thorir Hund (Q2530678) with instance of (P31) and historical person (Q110897910) only? Pmt (talk) 14:09, 9 October 2023 (UTC)

I'm not really sure what this item is about, i.e. what its instances in Wikidata should be. Taking its edit history into consideration its main purpose seems to be to serve as a super class for Ogham Person Concept (Q110897921). This class has a couple of instances, e.g. acto (Ogham Person) (Q110897903). So, currently, acto (Ogham Person) (Q110897903) would be an indirect instance (and example) of historical person (Q110897910). I'm not familiar with this subject matter so I can't judge if it makes sense or not. @Fthierygeo: could you explain the purpose of this item?
Apart from this: For persons (humans) whose existence is disputed/dubious we have human whose existence is disputed (Q21070568). - Valentina.Anitnelav (talk) 16:49, 9 October 2023 (UTC)
Hi, I changed it to "human whose existence is disputed". Fthierygeo (talk) 21:55, 10 October 2023 (UTC)
What "it" do you mean? Pmt (talk) 19:22, 11 October 2023 (UTC)

Submitting taxon names (as published)

The definition of taxon name (Property:P225) is 'correct scientific name of a taxon (according to the reference given)'. This is the definition I would expect, and that a data source should capture the scientific name given by a publication, such that anyone using the data source can later make their own decisions about taxonomy and which names are synonymous/accepted.

However, it appears to be very common to find instances where 'accepted' names have been given rather than names as published. For example, the page related to quinine Q189522, provides a list of taxa the compound is found in. Considering the references given for Cinchona pubescens, a couple of these references do not actually mention Cinchona pubescens but instead a synonym - Cinchona succirubra; and so these should be references for the presence of quinine in Cinchona succirubra rather than Cinchona pubescens.

Am I correct in my assessment of this? And if so what could be done to improve this situation? The https://www.wikidata.org/wiki/Wikidata:WikiProject_Taxonomy page provides guidance for users wanting to add data to taxon items, but it is not explicit in how this situation should be handled by contributors - maybe this could be improved? Adamrb12345 (talk) 12:06, 10 October 2023 (UTC)

@Adamrb12345 I agree with your conclusions. found in taxon (P703) should always link to taxon names used in the referenced publication. Vojtěch Dostál (talk) 17:36, 11 October 2023 (UTC)

Basic question. How do I mark that a drug has a en:boxed warning?

Basic question. How do I mark that a drug has a en:boxed warning? I notice that boxed warning (Q879952)exists, but I just can't figure out what statements I should add. I don't see that binary values are supported (0/1 or No/Yes...). I'm not sure if how to do the source property stuff. Looks like I have to look up the Q#s for each thing? Or no? Use the UNII? Even once I figure out what to do, I need help. Can't use https://quickstatements.toolforge.org/#/ - I'm not autoconfirmed. See https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine#Black_box_warnings_project . RudolfoMD (talk) 21:31, 25 September 2023 (UTC)

@RudolfoMD Basically, the item would be attached to the drug by a property - "X [has effect] Y" or "X [is a type of] Z" or whatever. The underlying problem is deciding what property to use for modelling it, since these have to be taken from a specific list (or created) rather than just being a yes/no value linked to a specific item. Once you've worked out the model, actually adding the statements would be relatively straightforward.
I'm not an expert here, but after a bit of reading it seems that maybe the property legal status (medicine) (P3493) would seem appropriate, using a qualifier to indicate the boxed requirement - eg in Q5515069#P3493 you have a qualifier saying the Schedule 10 ranking applies in Australia, and there could be another qualifier to indicate it needs a boxed warning. (I'm not immediately sure what that qualifier property would be, but we could definitely figure something out.) That approach would make sense if the boxed warning only applies in the specific context of the national licensing for the drug - which I think is true? The US might require a boxed warning, and France use an additional pictogram, while Germany has nothing, or whatever.
The odd thing here is that legal status (medicine) (P3493) is very rarely used for some reason - but I'm guessing that whatever source you're importing this data from would let you also import the underlying licensing status. If not, an alternative approach would be to use legal status (medicine) (P3493):boxed warning (Q879952) on its own, seperate from the existing licensing statements if they exist.
Finally, the ultimate fallback is to create a new property ("boxed warning requirement" or something) which would then have a "requires boxed warning" or "does not require boxed warning" value - there's a process for property creation but if it's needed it could certainly be done. However, it feels like going down the qualifier route might make most sense. Andrew Gray (talk) 13:06, 29 September 2023 (UTC)
Thanks for the constructive guidance. So Q5515069#P3493, using P1001- that makes sense to me as a model, so looks promising. ( The data source doesn't identify the reason for each FDA-mandated boxed warning, so I can't use [has effect] or [is a type of]. ) Yes, applies to USA. My first attempt got shot down, but I couldn't get any constructive guidance 'till now. I think this looks reasonable and appropriate. Q57055#P3493. RudolfoMD (talk) 02:05, 1 October 2023 (UTC)
@RudolfoMD I think adding a reference there would be appropriate (presumably there's a list somewhere?) but otherwise seems OK to me. Once you've done a few edits manually you should be autoconfirmed and able to try a batch through QuickStatements. Andrew Gray (talk) 20:52, 5 October 2023 (UTC)
Thanks. I had added a reference (and mentioned it on WP). Since there are >16k labels with boxed warnings, it seemed to make sense to add the reference once. I guess https://www.wikidata.org/w/index.php?title=Q879952&diff=prev&oldid=1981610608 was the wrong way? IIRC, I was copying something I'd seen. RudolfoMD (talk) 19:30, 6 October 2023 (UTC)
@RudolfoMD (sorry for delayed reply - travelling this week). Ideally we'd want a reference on each specific statement - knowing that the FDA issues boxed labels in general isn't the same as knowing they've specifically issued one for aspirin or whatever. If we just source the central item boxed warning (Q879952) then there isn't any way to distinguish whether a given item linking to it was on that list, or added by mistake later on, while statement-level references give a bit more reliability. Andrew Gray (talk) 21:38, 11 October 2023 (UTC)
I'm still not autoconfirmed. How many edits does that take, and do they have to be to the main namespace? Oh answered at Wikidata:Autoconfirmed users. Only halfway to 50. RudolfoMD (talk) 04:38, 9 October 2023 (UTC)
RVA2869, this isn't quite resolved yet. Not certain (to me, anyway) that we have a working solution yet - that the way I've done a few will work for bulk populating wiki pages. Maybe we do... But we have yet to populate a single wiki page.-RudolfoMD (talk) 22:27, 13 October 2023 (UTC)

male and female

When I go to enter gender=, male and female are not showing up in the pulldown list. Is it just me? RAN (talk) 03:34, 11 October 2023 (UTC)

For some reason, male (Q6581097) and female (Q6581072) were removed by StarTrekker in this edit, then readded to the end of the list: [1], [2]. See also Property talk:P21#Suggestions list. Korg (talk) 10:22, 11 October 2023 (UTC)
If someone were to remove all the constraints, then add them all back manually in the "correct" position in a separate edit, is that the only way to influence the order? —Xezbeth (talk) 12:34, 11 October 2023 (UTC)
I'll much rather we just let male and female be the only in the pulldown list. The number of edits that require the other genders is minimal to say the least Trade (talk) 12:48, 11 October 2023 (UTC)
@Xezbeth: I removed and readded them because I have noticed that they have been showing up super far down on the list while more recently added items showed up further up, I figured readding them might solve the issue.StarTrekker (talk) 20:14, 11 October 2023 (UTC)
  • Thanks! It is a shame we do not have a bot that can reorder "one-of constraint" automatically based on number of times used. I am currently building a list for Property:P511. We have had several database imports where we imported the honorific prefix as part of the person's name. I am keeping "Sir" (per current consensus, see for instance Sir John Freind Robinson, 1st Baronet (Q55615178)), but moving "Capt." and "Dr." and "Rev." to their own properties. At one time we were capturing first_name when the property was left empty, but it was picking up and migrating the honorific prefix in error. --RAN (talk) 18:35, 12 October 2023 (UTC)
  • Rearranged manually but the 'binarity' genders are still 5th and 6th in the order. --Wolverène (talk) 10:39, 13 October 2023 (UTC)

Should Q123456789 be special?

We just started filling in the 123 million values for item (Q) id's, Q123456789 will be coming up in the next few months. Count von Count (Q12345) was special though I don't think any of the other ids like that were; should we assign something similar for this one? ArthurPSmith (talk) 19:45, 12 October 2023 (UTC)

Sounds like a nice idea.StarTrekker (talk) 22:09, 12 October 2023 (UTC)
  Support RVA2869 (talk) 07:49, 13 October 2023 (UTC)
How will we try to 'catch' that unusual QID with such a frequency of creations? Or is it assumed that it's possible to reserve any QID in advance? Actually, I'd love to see if the future item will be for some special subject. --Wolverène (talk) 10:59, 13 October 2023 (UTC)
Another opportunity is the upcoming P12345. It would be interesting to see if someone could come up with a funny property. Any databases on OCD research? Infrastruktur (talk) 11:59, 13 October 2023 (UTC)
  Oppose; special QID assignments have been done in the very beginning 11 years ago, but I think we should refrain from doing so again. Besides the question of how to technically achieve this, we would need to settle for a suitable entity that is not covered yet, and this would furthermore set a precedent for future "special" numerical QIDs that someone deems worthy for whatever reason. IMO, the accidental omission of Q100000000 for technical reasons was a good moment to end any thoughts about peculiarity of numerical QIDs. —MisterSynergy (talk) 15:35, 13 October 2023 (UTC)
I could argue this one is the last "special" number (of course there are nice round numbers with 00's or other patterns but this one is somewhat unique in using all the digits in order - maybe the only additional one like this would be Q9876543210 which I don't think we'll see for a while...). The technical question is definitely a concern though - I know initially some Q numbers were reserved ahead of time but probably that can't be done now? So maybe it's not practical any more. ArthurPSmith (talk) 16:21, 13 October 2023 (UTC)

3 locator map images for Poltava region

Hi, I'm afraid nobody is monitoring that Talk page so I'll duplicate my question here.

For some reason the item Poltava oblast has 3 different maps for "locator map image" property. And in fact only first of them is "locator map image", that is "geographic map image which highlights the location of the subject within some larger entity". Is there any sense in that or just a mistake? 95.24.118.176 16:35, 13 October 2023 (UTC)

  Fixed RVA2869 (talk) 18:08, 13 October 2023 (UTC)

Seems like there is a similar situation with Krasnoyarsk Krai

2023 elections in Ecuador

These two items, Q111944027 and Q122730883, are about the same thing, right? I'm asking here before merge them, because I may be missing out something. Kacamata (talk) 02:24, 14 October 2023 (UTC)

From Enwiki:
Snap general elections were held in Ecuador on 20 August 2023 to vote for President of Ecuador, members of the National Assembly and two referendums
So 2023 Ecuadorian presidential election (Q122730883) and 2023 Ecuadorian parliamentary election (Q105554774) are parts of 2023 Ecuadorian general election (Q111944027). And thus I don't think there are any merges that make sense, because the component elections have articles on both Eswiki and Frwiki. -- William Graham (talk) 03:04, 14 October 2023 (UTC)
Thanks @William Graham. I was really missing something. Fortunately, I asked here first. In Spanish, they created two separated articles, while in English and other languages, they created a single one. Again, thank you. Kacamata (talk) 03:19, 14 October 2023 (UTC)

Representing the winners of Eurovision Song Contest (Q276)

We are presently using at least four different ways to model winners of Eurovision Song Contest (Q276)! I have started a discussion at Wikidata_talk:WikiProject_Eurovision#We_need_guidance_and_example_queries_for_how_to_model_winners in the hope of establishing community guidance on the best way to do this and to work towards a complete data set. Anyone want to help resolve an ontological mess? MartinPoulter (talk) 20:09, 14 October 2023 (UTC)

Wikidata weekly summary #598

Review and comment on the 2024 Wikimedia Foundation Board of Trustees selection rules package

You can find this message translated into additional languages on Meta-wiki.

Dear all,

Please review and comment on the Wikimedia Foundation Board of Trustees selection rules package from now until 29 October 2023. The selection rules package was based on older versions by the Elections Committee and will be used in the 2024 Board of Trustees selection. Providing your comments now will help them provide a smoother, better Board selection process. More on the Meta-wiki page.

Best,

Katie Chan
Chair of the Elections Committee

01:12, 17 October 2023 (UTC)

From metaphysics perspective, the subject seems wrong. (If I am wrong on this, can someone enlighten me on P31 and P279?)

But there are instances in such relationship, as A = GDP per capita (Q93392206)  ; B = economic indicator (Q1167393) ; C = gross domestic product (Q12638). (Should we fix it, and how?) JuguangXiao (talk) 08:02, 14 October 2023 (UTC)

Is it's not ideal and would likely be good to clean up the area. ChristianKl14:53, 15 October 2023 (UTC)
In most such cases the first instance of (P31) should be subclass of (P279). Subclass is usually the best representation of relations between abstract concepts, unless you are really trying to create metaclasses which should be done with care. ArthurPSmith (talk) 19:22, 17 October 2023 (UTC)

Register for the online or onsite WikidataCon 2023

Hello all,

As you may know, the WikidataCon 2023, the conference dedicated to the Wikidata community, will take place on October 28-29.

People around the world can join online, access the content of the conference live or in replay, and discuss with other participants on the interactive and friendly platform Gathertown. People living in the neighbouring regions have the possibility to join the onsite event taking place at the National Taipei University.

Whether you join online or onsite, we strongly encourage you to sign up by filling out the registration form before October 22, so you can receive all the information you need to attend the event.

While waiting for the WikidataCon to start, you can already have a look at the exciting program put together by community members from the ESEAP region and all around the world.

Finally, you can also consider preparing a present for the Wikidata birthday and presenting it during the birthday presents lightning talks session, and join or organize a satellite event with your local community.

Questions ? Feel free to contact the organizing team by writing on this talk page or at contact@wikidatacon.org.

We are looking forward to seeing you at the WikidataCon 2023!

信件主旨:線上註冊參加或現場參加 WikidataCon 2023

大家好,

WikidataCon 2023 是專門針對 Wikidata 社群的會議,將於 10 月 28 日至 29 日舉行。

來自全球的人可以在線上參加,選擇觀看直播或看重播的會議內容,並在具有互動性和友善的平台 Gather Town 上,與其他會議參與者進行討論。 居住在臺灣鄰近區域的人,將有機會參加在國立臺北大學舉辦的現場活動。

無論你是參加線上,或是參加實體,我們強烈建議:請於 10 月 22 日以前,填寫報名表,以便你可以取得參加活動所需的所有資訊。

期待 WikidataCon 開始的時間,你可以看到由來自 ESEAP 地區與世界各地的社群成員已經彙整出的精彩議程內容。

最後,你還可以考慮,為 Wikidata Birthday 準備一項禮物,並在生日禮物閃電演講會議期間獻上祝福,並與當地社群一起參加或組織衛星活動。

還有哪些問題嗎? 請隨時在此討論頁面上寫信,或發送電子郵件至 contact@wikidatacon.org 與主辦團隊聯絡。

我們期待在 WikidataCon 2023 上見到你!

Lea Lacroix (WMDE) (talk) 06:23, 17 October 2023 (UTC)

Thank you for the reminder, Léa! Just one question, and admittedly a silly one, but I would like confirmation nonetheless: The time slots on pretalx page are local time (Taipei), correct? Jonathan Groß (talk) 07:11, 17 October 2023 (UTC)
Thanks for asking! On Pretalx, it should be displayed by default in your local time zone. On the top right corner of the page, there's a field that displays the time zone and allows you to switch from the official event timezone (Asia / Taipei) to your local time zone. Lea Lacroix (WMDE) (talk) 07:34, 17 October 2023 (UTC)

Is this a valid reason for deprecation? Trade (talk) 12:46, 11 October 2023 (UTC)

I think you'll have to ask a more specific question. Bovlb (talk) 16:21, 12 October 2023 (UTC)
Wondering why the English description for broken English (Q20504733) says it is a 'pejorative term' (personally I never regarded the term like this). If the term is really pejorative, it can't be a 'reason for deprecation' - it's assumed that values for P2241 are at least neutral. --Wolverène (talk) 11:20, 13 October 2023 (UTC)
I used the item as depreciation on a identifier on Marcos loyalism (Q97161678). I just wanted to know if it was a valid reason--Trade (talk) 22:09, 13 October 2023 (UTC)
I think it's not entirely clear what is broken about the identifier value's English and why that should lead to the statement deprecation... Vojtěch Dostál (talk) 18:49, 18 October 2023 (UTC)

How to add member countries to international organizations?

I am very new to wikidata, but I would like to add member countries to international organizations. If you look at the Organization of the Petroleum Exporting Countries (Q7795) you will see countries added to the statement has part(s) (P527). While a country is part of OPEC I thought this would be used for parts of the organization itself, such a public relations branch. The listing for the NATO (Q7184) does not have any countries listed as members. I tried to update the item for North-East Atlantic Fisheries Commission (Q15106130) using the statement signatory (P1891) as a country has to sign a formal document to be a member, but I get an error "Entities using the signatory property should be instances or subclasses of document or contract (or of a subclass of them), but North East Atlantic Fisheries Commission currently isn't." What is the correct statement to use when adding member countries to an international orginization? GeoMorgan (talk) 12:53, 16 October 2023 (UTC)


I may have found an answer to my question. It seems countries are tagged with member of (P463) and the organization, but the organization doesn't have the members listed. Did I get that correct?  – The preceding unsigned comment was added by ‎GeoMorgan (talk • contribs) at 13:13, 16 October 2023 (UTC).

I suppose you could use has part(s) (P527). If it works for The Beatles (Q1299) ... --El Grafo (talk) 13:24, 16 October 2023 (UTC) don't listen to me, I didn't read properly --El Grafo (talk) 13:30, 16 October 2023 (UTC)
You don't need to add anything. They are already available like this. Midleading (talk) 04:12, 18 October 2023 (UTC)
Thanks for the reply. Your solution works. I am still learning and appreciate the help. Using this method, I have started tagging countries that are members of the Antarctic Treaty System (Q182814). I have been looking up the members on their official website. I have added India as a member, but it does not show up in the query. Here is what I wrote https://w.wiki/7ppj Any idea why it is not being included as a memeber? GeoMorgan (talk) 15:12, 18 October 2023 (UTC)

Crear biografía de Tito Oses

Quiero contribuir creando la biografía de uno de mis cantantes favoritos. Me doy cuenta de que él no tiene biografía en Wikipedia, es un cantante importante en mi país (Costa Rica). Abrilzv (talk) 05:51, 18 October 2023 (UTC)

Hello and welcome. We do not work here with the requests for article creations in Wikipedia, here is the Wikimedia database. The item for him is Tito Oses (Q22338204). When you will need to connect a Wikipedia article (I did not find one...), just click 'Edit' in the Wikipedia sidebar, then enter the language code, name of the article, then click 'Publish'. --Wolverène (talk) 07:35, 18 October 2023 (UTC)

Sarcocaulon camdeboense vs Monsonia camdeboensis

Hi there! These two articles are in fact the same species and synonyms of one another. Regards. Oesjaar (talk) 05:57, 19 October 2023 (UTC)

@Oesjaar Actually, different taxa names have independent items here in Wikidata, even it they are thought by someone to refer to the same species. These two are indeed properly linked via basionym (P566). Vojtěch Dostál (talk) 08:57, 19 October 2023 (UTC)

Mismatch Finder tool improvements: Ability to report mismatches on Qualifiers

Hello everyone,

We have a quick update regarding the Mismatch Finder tool. In the next deployment scheduled for November 1, the tool will let you report mismatches on qualifiers in addition to the main part of a statement.

A Quick Overview

For those who might not be familiar with the Mismatch Finder, it's a tool that helps spot potential mismatches between Wikidata Items and external databases, presenting them to editors for review and correction. This tool is also used to suggest new statements that should be part of Wikidata, but need a human-review step before adding them. You can explore more about it at Wikidata:Mismatch Finder.

What's on the Horizon?

The Mismatch Finder currently allows mismatch providers to report mismatches to data that is stored in the main part of a statement. In the upcoming deployment, mismatch providers will have the option to report mismatches found within qualifiers, as important data is stored there as well. This change will require mismatch uploaders to adjust their workflow, as we will introduce a new 'type' column to the accepted CSVs. The CSV upload will look as follows:

CSV for a Q42 statement with qualifier, where both the statement and the qualifier are mismatched:

item_id,statement_guid,property_id,wikidata_value,external_value,external_url,type

Q42,Q42$A3B1288B-67A9-4491-A3AA-20F881C292B9,P3373,Q14623673,”Shoshanna Adams”,example.com,statement

Q42,Q42$A3B1288B-67A9-4491-A3AA-20F881C292B9,P1039,Q10943095,”cousin”,example.com,qualifier

CSV for a Q42 statement with qualifier, where only the qualifier is mismatched:

item_id,statement_guid,property_id,wikidata_value,external_value,external_url,type

Q42,Q42$A3B1288B-67A9-4491-A3AA-20F881C292B9,P1039,Q10943095,”cousin”,example.com,qualifier

This new feature is currently being developed under T313467 where you can find more detailed information and to follow the progress of this enhancement. If you have questions or concerns or want to provide feedback, please use the linked Phabbricator ticket or leave us a note at Wikidata talk:Mismatch Finder.

Cheers, Mohammed Abdulai (WMDE) (talk) 14:22, 19 October 2023 (UTC)

To take away my birthday

My name is as a singer michael raitner my real name rattner...i dont want anymore my birthday date on any description...thanks to help me..... 2A01:CB01:1035:5C89:E337:27FA:B603:BA7C 11:01, 18 October 2023 (UTC)

It may be very difficult to remove this information as it is quite easily available from multiple sources (including some of the references used in the French wikipedia article about you). Google's "Knowledge panel" also displays your date of birth, as do a number of music database sites (e.g., Encyclopédisque), and "Quel âge a le chanteur Michael Raitner" is one of Google's suggested searches. You may have better luck requesting this directly at French Wikipedia (if their policy is to accommodate such requests) – although this would only apply to their article and may have little impact on other sources. Good luck! -- Cl3phact0 (talk) 11:52, 18 October 2023 (UTC)
Birth date is not covered by our living people policy. Even if we remove it, if it's well known on the internet someone is likely to add it back.
If some service you use have birth-date as as security question I suggest switching providers, you will be better of as they have demonstrated their incompetency and needs to go bankrupt in my opinion. Infrastruktur (talk) 18:04, 19 October 2023 (UTC)

First edition of the Language & Internationalization newsletter

Hello everyone, We are thrilled to introduce the first edition of the Language & Internationalization newsletter, available at this link: https://www.mediawiki.org/wiki/Wikimedia_Language_engineering/Newsletter/2023/October.

This newsletter is compiled by the Wikimedia Language team. It provides updates from July–September quarter on new feature development, improvements in various language-related technical projects and support efforts, details about community meetings, and contributions ideas to get involved in projects.

To stay updated, you can subscribe to the newsletter on its wiki page. If you have any feedback or ideas for topics to feature in the newsletter, please share them on the discussion page, accessible here: https://www.mediawiki.org/w/index.php?title=Talk:Wikimedia_Language_engineering/Newsletter. Cheers, Srishti - MediaWiki message delivery (talk) 01:12, 20 October 2023 (UTC)

Updating official site

I used to be moderately active in wikidata but not recently. I checked out:

Nebraska Cornhuskers women's basketball (Q6984699)

Which list the official website as: this site

That generates a 404

I tried to update it with a better URL

https://huskers.com/sports/womens-basketball

But trying to publish it gave me an error asking me to set the preferred rank. I'm not familiar with that concept. Can someone help? Sphilbrick (talk) 12:54, 18 October 2023 (UTC)

For information on value ranking see Help:Ranking. Basically when a value changes but the previous value was at one point correct (or thought to be correct at the time), we will deprecate that value and set a reason for deprecated rank (P2241) qualifier. In the example of the official URL that no longer works, I would set the qualifier reason to link rot (Q1193907). Then you can add the current correct value and either leave it as normal rank or set it to preferred rank. There is also a qualifier property of reason for preferred rank (P7452), but I don't see that being used very often. -- William Graham (talk) 16:55, 18 October 2023 (UTC)
Thanks Sphilbrick (talk) 18:53, 18 October 2023 (UTC)
To offer a slight correction, we do not deprecate values that were once correct. The correct modelling would be to use end time (P582) and end cause (P1534) and only set preferred rank on the latest value. Deprecation is for sourced statements which are false. SilentSpike (talk) 09:00, 20 October 2023 (UTC)

Chemical entity and Type of chemical entiry

There are lots of specific chemicals that are marked as instances of type of chemical entity (Q113145171) whereas (as I believe) they should be instances of chemical entity (Q43460564). I thought that correct examples of the type of chemical entity (Q113145171) are real types of chemicals like acid, salt, etc. and not specific chemicals like quercitrin (Q1649777). Am I wrong? Nikolay Komarov (talk) 12:46, 20 October 2023 (UTC)

Also see
M2k~dewiki (talk) 16:25, 20 October 2023 (UTC)

Can someone reimport chemical formula (P274)?

chemical formula (P274) statements imported from PubChem (Q278487) are so low quality. Look at strontium sulfate (Q414440)chemical formula (P274)O₄SSr, lithium chloride (Q422930)chemical formula (P274)ClLi, zirconium(IV) sulfate (Q27231449)chemical formula (P274)O₈S₂Zr. All these need to be corrected. Midleading (talk) 07:49, 17 October 2023 (UTC)

@Midleading You may need to start a discussion on Wikidata:WikiProject Chemistry because technically, these all seem to be correct formula imported from Pubchem. I am not sure how the importer decided which formula to import in case several were available. Tagging @Sebotic who seems to have done the import. Vojtěch Dostál (talk) 09:03, 19 October 2023 (UTC)
Items like iron(II) acetate (Q2657418) now has two chemical formula (P274) values, one high-quality from German Wikipedia (Q48183) iron(II) acetate (Q2657418)chemical formula (P274)Fe(CO₂CH₃)₂, another low-quality from PubChem (Q278487) iron(II) acetate (Q2657418)chemical formula (P274)C₄H₆FeO₄. I would like just remove those imported from PubChem (Q278487). Midleading (talk) 09:22, 19 October 2023 (UTC)
@Midleading We usually do not remove sourced statements. If they are deemed incorrect, we can deprecate them... Vojtěch Dostál (talk) 11:06, 19 October 2023 (UTC)
Cross-posted here: https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry#Can_someone_reimport_chemical_formula_(P274)? --Egon Willighagen (talk) 09:49, 19 October 2023 (UTC)
Part of the problem is that "chemical formula" is an umbrella term. Of those two values for iron (II) acetate, one is the condensed formula and one is the molecular formula. Another issue, I think, is that the PubChem importer just went for the most alphabetized version and ignored everything else (how are you supposed to put the anion after the cation if you're not even keeping track of which atoms are part of which ion?). DS (talk) 12:41, 19 October 2023 (UTC)
There is statement zinc sulfate (Q204954)chemical formula (P274)O₄SZncriterion used (P1013)Hill system (Q900739). However condensed formula is seldom used in inorganic compound. Midleading (talk) 13:10, 19 October 2023 (UTC)
I really don't see a problem here. Formulae imported from PubChem are correct, but the notation may not be the one you are looking for. There are many ways to write a chemical formula, Hill notation is the best to use in databases, but may not be preferred in other uses. So the problem here is the lack of other formulae you are looking for. You have a good example in zinc sulfate (Q204954), however, I'm not sure there is a proper source that can be used to correctly import formulae in other notations. Wostr (talk) 00:02, 20 October 2023 (UTC)
PubChem includes the correct chemical formula for most of these compounds, and it is featured right in HTML title [3]. We should just mark those old PubChem chemical formula with criterion used (P1013)Hill system (Q900739) and then reimport the most common notation in the HTML title. Midleading (talk) 03:03, 20 October 2023 (UTC)
There is no "old" chemical formulae and there is no need to "reimport" anything. Hill notation is a standard, unambiguous system of writing chemical formulae in chemistry, other notations are methods that are used in specific areas. In other words, Hill notation is a preferred way to write chemical formulae in a database. And it is not easy to do what you want; firstly, you'd have to check which chemical formulae present in WD are in fact Hill formulae, which are not, then mark them with a proper qualifier. After that you could import other formulae. Wostr (talk) 23:36, 20 October 2023 (UTC)
It seems many editors have just edited chemical formula (P274) statements without changing references, or deleted them and added new. Technically this is correct, as PubChem also has them. I just restored those Hill system (Q900739) formulae and labelled it with criterion used (P1013) and marked another formulae best rank. Hill system (Q900739) formulae is easy to identify with simple rules as well. Midleading (talk) 03:41, 21 October 2023 (UTC)

Infobox person/Wikidata (enwiki usage)

I've been experimenting with the use of the {{Infobox person/Wikidata|fetchwikidata=ALL}} template (which seems like an enormously powerful and practical use of the data we have) on a number of enwiki article that I'm editing, and have encountered some behaviours that I'm not able to explain (e.g., test on w:Jan Schalauske only picks-up the website and photograph from  Jan Schalauske (Q28548119): German politician). What am I missing? (If this is a question that should be posted elsewhere, please let me know where.) -- Cl3phact0 (talk) 08:32, 21 October 2023 (UTC)

I'd guess it's because the statements don't have (suitable) references. Almost all of the statements that have a reference are using imported from Wikimedia project (P143) which I assume is ignored by the template/module. M2Ys4U (talk) 08:44, 21 October 2023 (UTC)
Ok, that makes sense. I'll try to improve the refs and see if that does the trick. -- Cl3phact0 (talk) 08:51, 21 October 2023 (UTC)

Possible duplicate category

Charles Gregory (Q67472631) and Charles Gregory (Q61657032). Not enough data about the former to give any confidence that it differs from the latter. - Jmabel (talk) 19:02, 22 October 2023 (UTC)

One is a contributor on a paper that lists the affiliation as National Institute of Arthritis and Musculoskeletal and Skin Diseases (located in Maryland, USA) and the other has an ORCID profile listing a country of Australia. I think they're unlikely to be the same person. -- William Graham (talk) 20:56, 22 October 2023 (UTC)

How/where to find the deleted topics from this chat list?

As in subject, particularly what I created and replied. Thanks! :-) JuguangXiao (talk) 05:23, 23 October 2023 (UTC)

Wikidata:Project chat/Archive. —Justin (koavf)TCM 05:35, 23 October 2023 (UTC)

Wikidata weekly summary #599

The Greco-Italian War (Q223604) interrupted the 1940–41 Greek Cup (Q2168801). How do I properly add that?

How do I properly add a reference for 1940–41 Greek Cup (Q2168801) regarding the property sports season of league or competition (P3450) where one of the qualifiers is series ordinal (P1545) = no value due to Greece being invaded / being at war with Italy in the Greco-Italian War (Q223604) conflict which interrupted the Greek Football Cup which led it to not count? (it got stuck between cup 4 and 5 due to the war)

(TL;DR) I added series ordinal = no value and my question is:

how do I properly add as a reference that this sports competition got interrupted by the Greco-Italian War (Q223604) which led to it to not officially count in the Greek Football Cup (Q1122820)? MythsOfAesop (talk) 23:06, 23 October 2023 (UTC)

Google Knowledge Graph ID

On Rhinos Price (Q8550383) there is a statement for Google Knowledge Graph ID. This leads to a Google Scholar search for "Rhinos Prize". The search results are articles about rhinos that have nothing to do with the prize. Is this as it should be? Kk.urban (talk) 17:58, 21 October 2023 (UTC)

@Kk.urban: I removed the offending statement. Duckmather (talk) 01:30, 24 October 2023 (UTC)
Thank you! Kk.urban (talk) 01:34, 24 October 2023 (UTC)
I, for one, would love to know what purpose the Freebase IDs and Google Knowledge Graph IDs serve. As far as I can see, they add nothing of value. I know that Google's "donation" of Freebase data (which was basically just a collection of search terms) was considered a big deal back then, but it didn't cost Google anything as they abandoned the had project, and Wikidata in turn gained very little ... If I'm wrong, I'd love to be educated. Cheers, Jonathan Groß (talk) 19:18, 24 October 2023 (UTC)

Dates are not being parsed correctly

I typed in 04/10/2023 into a publication date (P577) field for a reference and it was showing the incorrect date of 10 April 2023 when it should be 4 October 2023 Piecesofuk (talk) 08:58, 24 October 2023 (UTC)

Yeah, that seems to be the case. Ambiguous dates gets parsed as MDY instead of DMY for british english, when this should only be true for american english. I don't think these things gets fixed unless they are reported on Phabricator. Personally I tend to enter dates either in ISO-ish form (YYYY.MM.DD) or use 3 letters for the month, this will work regardless of locale. Infrastruktur (talk) 17:13, 24 October 2023 (UTC)
Use ISO! ArthurPSmith (talk) 17:56, 24 October 2023 (UTC)
I just copy and paste dates from websites into date statements, usually the date is formatted like 24 October 2023 which works fine, but if it's something like 04/10/2023 then it reverses the day and month, 23/10/2023 works fine though. Piecesofuk (talk) 18:06, 24 October 2023 (UTC)

How to call a specific statement from a QID in Wikipedia, but only if it has a reference from a specific domain?

Hi all

I know there's a template you can use to import statements/images etc from Wikidata into Wikipedia, I've used it before to import images from Wikipedia eg

{{subst:wikidata|property|raw|QID|P18|format=\File:%p \}}

My question is there a magic bit of code I can add to say only import the statements if it has a reference from a specific domain (not a specific URL, a whole domain) eg powo.science.kew.org/

Thanks very much

John Cummings (talk) 10:35, 24 October 2023 (UTC)

How to re-open a declined property proposal?

Wikidata:Property proposal/Linkinfo ID was rejected last December due to lack of support. I've been working on filling out Wikidata's now-extensive coverage of enumerated mathematical knots and links for some time, and I've now reached the point at which LinkInfo IDs would be extremely useful for building out the linked data web that encompasses the disparate sources for knots and links, which have a wide variety of different notations for identifying both, and one of which, Knotscape, died last year, taking its data with it.

LinkInfo is also a very useful data source, and it seems crazy to add data from LinkInfo without a backlink to the source of that data. Having these backlinks would also have the likely effect of making Internet archival sites make copies of these pages, so the information will not be lost if a site goes down.

One complicating factor is that the mapping from Wikidata items to LinkInfo identifiers will be one-to-many, with 2^N KnotInfo identifiers for a link with N components. However, for reasonable numbers of crossings (up to an including 12), N does not get very large, so I think this does not present a serious problem. If it were to be a problem, as an alternative to listing all 2^N identifiers, I could always identify the (0..) oriented variant of the link as canonical, and just give that.

I could also use the "object named as" qualifier to make it clear when things like object polynomial properties have different versions for each distinct orientation of the components within the link; this seems to me to be more elegant than creating a Wikidata item for every oriented version of each link.

I'd like to repropose this property now, as I think I can now make a strong case for its inclusion, but I cannot see any mechanism for doing so politely, as I don't want to just ignore the previous rejection.

Can anyone suggest how I should go about this? @DannyS712: is this something you can advise me on? The Anome (talk) 19:48, 24 October 2023 (UTC)

Generally, create a new proposal and link to the existing one will pinging everyone involved in the previous discussion. ChristianKl20:08, 24 October 2023 (UTC)
Thanks -- what should I name the new proposal, as I don't want to change the proposed property name, but don't want to clobber the old proposal? Would Wikidata:Property proposal/Linkinfo ID, second proposal work? The Anome (talk) 20:30, 24 October 2023 (UTC)
Yes, that would work. ChristianKl21:08, 24 October 2023 (UTC)

By the way, I'd also note that Corinne Cerf's knot tables seem to be gone; they are archived here, though: https://web.archive.org/web/20120326045322/http://at.yorku.ca/t/a/i/c/31.dir/cerf.htm All this precious infomation, lost, like tears in rain. We can rescue these resources by linking them from Wikidata! The Anome (talk) 20:36, 24 October 2023 (UTC)

Importing airport timezone information from FAA Chart Supplements

I would like to import airport timezone information from the Chart Supplements provided as PDF files on the FAA website. Since timezones often have an illustrious history of changes, I thought about matching the information given in the PDF files (e.g. UTC-6(-5DT)) to current IANA timezone data, then importing claims of the type ?airport located in time zone (P421) ?ianaTimeZone where ?ianaTimeZone instance of (P31) IANA time zone (Q17272692), effectively outsourcing the documentation of historical and future changes to IANA.

The PDF extraction code (Rust) and the Wikidata import code (Python) is mostly done, as is a first pass at timezone mapping. The import will probably be a one-time process; airports do not have the tendency to change their geographical location, nor do geographical locations have the tendency to relocate into different IANA timezones.

Anything I should keep in mind before performing the import? WhiteTimberwolf (talk) 22:52, 24 October 2023 (UTC)

Proposed End-User Paintings Download Interface

I am an end-user on a Windows 10 desktop. This status, and my objective, appear incompatible with, or at least not at the level of sophistication of, other entries appearing on this page. Lacking knowledge of a better alternative, I will proceed here nonetheless, requesting redirection to a better forum as appropriate.

I see that Wikidata has an enormous collection of images of famous paintings in its Sum of all paintings WikiProject. I would like to view and learn about some such paintings. One way to do this, which may or may not be easily achieved on the Windows end, would be to display such paintings in a slideshow, perhaps as a screensaver, with information about each such painting (e.g., artist, title, date) optionally superimposed on the relevant painting. Ideally, there would also be a checkbox allowing me to choose whether to view each such painting again at some later point, or to exclude it from further viewing with either a positive vote (i.e., I already know I like it) or a negative vote. Upon completing the process of voting on each such painting, those with positive votes could yield a stable slideshow for personal enjoyment and self-education.

As a step toward achieving something of that nature, I would like to download a set of images of suitable size (e.g., 4K) and appeal. A useful Wikidata interface for this purpose would feature drop-down menus with checkboxes to narrow the universe to specified artists, time periods, and other criteria. Within the resulting display of tentatively preferred paintings (with a view optionally narrowed to focus on subsets), each record would offer a thumbnail view (as in one presently available Wikidata webpage) and a checkbox permitting deselection. The interface would allow me to save that search. I could then request download of the selected paintings, subject to restrictions as needed (on e.g., speed or volume of download). The target folder on my computer would connect with Wikidata to avoid duplicative downloads after return visits to Wikidata to refine my search.

Given that wish list, my question is whether and how I could begin to approach that ideal within Wikidata, or by known means related to Wikidata. Raywood1 (talk) 19:03, 24 October 2023 (UTC)

Before you dig in to something like that, you might want to take a loog at Crotos (Q48029227). Jane023 (talk) 11:13, 25 October 2023 (UTC)

Webpage content request: on item page, display its downstream/part list

I think I had asked for this one year ago, when I was newbie of Wikidata, then I learnt to solve it via SPARQL. But I still think it is convenient for lay users to view it on item page.

For example, parent taxon (P171) pointing upwards, and having no opposite or inverse property for it (that is fine). In Homininae (Q242047)parent taxon (P171)Hominidae (Q635162), how much I appreciate to see a list of child taxa of Hominidae (Q635162) in its item page. Since such pseudo-property (like `list of child taxa`) can be derived from our current data structure, the only/main effort is on webpage side (rather than database). Thanks JuguangXiao (talk) 08:54, 25 October 2023 (UTC)

There is a "relateditems" gadget available, which adds a button to the bottom of item pages that, when clicked, shows such inverse statements. You can enable it by going to preferences > gadgets. It's near the bottom of the wikidata-centric group of gadgets. M2Ys4U (talk) 16:32, 25 October 2023 (UTC)
Great! Many thanks! Gadgets are new oil :-) JuguangXiao (talk) 16:58, 25 October 2023 (UTC)

Interested in vs. Field of work

I've made over 100,000 edits to Wikidata and just now noticed interested in (P2650). After a brief perusal of the examples on the property page and actual uses of the property on Wikidata, I find that I would have used field of work (P101) for all of these claims. Can someone explain to me why one is better than another for any particular claim? Should they be merged? Daask (talk) 12:32, 18 October 2023 (UTC)

I suggest you ask at field of work (P101) chat too, but they sound too close to be useful. I use field of work (P101) a lot, I will modify my queries to include interested in (P2650) just in case. Vicarage (talk) 08:00, 19 October 2023 (UTC)
I agree with @Vicarage here. I thought that interested in (P2650) is more focused on non-professional areas, but the examples on the property page suggest otherwise. As it stands, I think we could merge all statements from interested in (P2650) to field of work (P101). Tagging original discussion participants @Joshbaumgartner, Pigsonthewing, Izno:. Vojtěch Dostál (talk) 08:55, 19 October 2023 (UTC)
It could possibly be used for hobby botanists who were also politicians, where field of work (P101) was their professional dossier (housing, immigration, etc) versus their possibly more significant interest in botany. I don't think that occupation (P106) needs splitting between paid and unpaid occupations, so for such a person, both botany and dossiers can be added under field of work (P101). If the one is meant explicitly in the present tense (such as for Wikiprojects) then the description needs some clarification. Currently the discussion only lists it as the inverse of a rejected property, without linking to the rejected property (which would be the inverse reasoning fo this one). Jane023 (talk) 09:42, 19 October 2023 (UTC)
The original intent, in the property proposal, was "Allows a community or project to make a list of items that are of interest to them". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:05, 19 October 2023 (UTC)
I guess the only way you'd know it was a project documenting itself if the citation was within the project's official website, which feels clumsy. I think the meaning of field of work, in English at least, does not have any emphasis on the work being paid or part of a formal appointment, while occupation does. I think the minnow interested in (P2650) with 17k uses could easily be incorporated in the 1m use field of work (P101), as the size differential is based on obscurity rather than any genuine distinction Vicarage (talk) 04:50, 20 October 2023 (UTC)
Ah thanks Andy I missed that initial comment - it should be added to the property description, because obviously that is a better "raison d'être". So the short answer to this whole question is yes, all current uses that are not items for WikiProjects should probably have their interested in (P2650) switched to field of work (P101) and all WikiProjects currently using the interested in (P2650) should probably switch to using it as a qualifier to on focus list of Wikimedia project (P5008), which would send the casual reader to the proper WikiProject on Wikidata. And if there is no proper item for the WikiProject, one should be made that points to the proper Wikidata namespace page (see e.g. my pet project on WikiProject Provenance (Q98801351)). I don't think we should be facilitating Wikipedia projects on Wikidata without requiring some sort of Wikidata namespace page + item on Wikidata. For local projects, the Wikidata page can just show a short English description pointing to the specific pages in Wikipedia space on their language-pedia. I know WikiProjects are a mess (one that I contribute wholeheartedly, yet haphazardly to) but some navigation through the chaos is required. Jane023 (talk) 07:12, 20 October 2023 (UTC)
After reading the original proposal to which Andy pointed, I still think this property is too similar to P101 and should me merged. The current definition of the property is "item of special or vested interest to this person or organisation" and we should not repurpose it lightly. Instead, let's merge the duplicates and start a new proposal if required for some other (or perhaps the originally intended) use case. Vojtěch Dostál (talk) 19:00, 20 October 2023 (UTC)
I surely support the proposal of merging the two properties made by @Vojtěch Dostál:; many users have often asked me which one should be used for indicating the field of study of scholars and I think the two properties, as currently defined, are too similar and are often used in both ways with no significant difference. --Epìdosis 19:14, 20 October 2023 (UTC)
@Jane023: I don't see why WikiProjects can't use field of work (P101) for their areas of activity as well. Automatically creating inverse statements of on focus list of Wikimedia project (P5008) seems fine and appropriate. However, I don't understand your proposal to use interested in (P2650) as a qualifier for on focus list of Wikimedia project (P5008) statements. Can you give an example? Daask (talk) 23:51, 22 October 2023 (UTC)
I was thinking in my case about WikiProject Provenance (Q98801351) and how it may be useful for Wikiprojects to have qualifiers that point to other WikiProjects or concepts. In this case I could see adding a qualifier to on focus list of Wikimedia project (P5008) on items to match a interested in (P2650) on the WikiProject. In my case for WikiProject Provenance (Q98801351) I suppose you could say Nazi plunder (Q328376) was an interest for the project as well as credit line (Q23726180) for items of art dealers that donated to institutions but who also traded in lost or stolen art in WWII. I have no idea whether interested in (P2650) is in use for any projects, but it certainly pre-dates on focus list of Wikimedia project (P5008). It just occurred to me as I was trying to think of uses. We don't have much infrastructure for WikiProjects and this could be useful.Jane023 (talk) 07:26, 23 October 2023 (UTC)
Hi, I'll leave here some examples of the use with fictional character items that should be taken into consideration, I think. I did not take a deeper look and I did not check if all of these uses are well modelled, but with respect to some of these examples field of work (P101) seems a bit off. Here my query: https://w.wiki/7sbV . - Valentina.Anitnelav (talk) 11:12, 23 October 2023 (UTC)
That certainly shows the trivia that Trekkies record about Jean-Luc Picard (Q16276). But while field of work should be substantial, we shouldn't really be recording anyone's hobbies. Vicarage (talk) 15:21, 23 October 2023 (UTC)
Maybe we should, but not using this property - its definition does not really overlap with "hobbies". Vojtěch Dostál (talk) 06:43, 24 October 2023 (UTC)
@Daask, Vicarage, Vojtěch Dostál, Jane023, Pigsonthewing, Valentina.Anitnelav: I have proposed the deletion of P2650; probably the discussion can continue there. --Epìdosis 19:42, 25 October 2023 (UTC)

Excluding bot edits from watchlist history

I don't really care if some bot makes the 100th mundane change to an item this week and the massive amount of them just makes it impossible to find more meaningful edits made by normal users. So is there a way to exclude bot edits from the watchlist history? Adamant1 (talk) 04:51, 25 October 2023 (UTC)

Yes, you can use the watchlist filters to hide bot edits. See mw:Help:New filters for edit review/Filtering for more info on how filters work. –FlyingAce✈hello 06:30, 25 October 2023 (UTC)
Awesome. I didn't that was a thing. Thanks. --Adamant1 (talk) 06:28, 26 October 2023 (UTC)

Suggestion for student activities on Wikidata

Hi folks. My Wikidata class is now in its second edition. You can find list of students at https://outreachdashboard.wmflabs.org/courses/Hanyang_University/Understanding_Small_and_Big_Data_with_Wikis_(2023) (syllabus). I will be posting my tasks for them at https://www.wikidata.org/wiki/User:Hanyangprofessor2 (I did not do it for the first edition last year, unfortunately). I'd be happy to hear what ideas for tasks/assignments you'd have for them, and I hope my students will be useful not disruptive for Wikidata. You can see the last edition at https://outreachdashboard.wmflabs.org/courses/Hanyang_University/Understanding_Small_and_Big_Data_with_Wikis_(Fall_2022) . If you see any red flags from back then or now (students repeatedly making wrong type of edits, etc.) do let me know so I can try to prevent bad edits by revising tasks or adding new ones, etc. Final notes: most of my students are ESLs (Korean and Chinese), and I expect this class to run once each year for the foreseeable future. Cheers, Hanyangprofessor2 (talk) 07:28, 23 October 2023 (UTC)

Hello @Hanyangprofessor2:, there are always unconnected articles, for example:
which could be connected to existing wikidata objects or new objects could be created, if not yet existing. M2k~dewiki (talk) 11:58, 23 October 2023 (UTC)
Some matching strategies are described at de:Benutzer:M2k~dewiki/FAQ#Wie_finde_ich_ein_bestehendes_Wikidata-Objekt_zu_einem_Artikel? in german, possible approaches are to look the up by ID (e.g. VIAF-ID, for people, IAAF-ID for athletes, Transfermarkt-ID for footballers, IMDb for movies, CAS-ID for chemicals, ASN-ID for flight incidents, ....) or by coordinates for geo-referenced objects (buildings, places, rivers, mountains, ...)
M2k~dewiki (talk) 12:04, 23 October 2023 (UTC)
Also activation of
might help. M2k~dewiki (talk) 15:12, 23 October 2023 (UTC)
Not a question for the OP, but is there any gadget that marks editors who are registered on the Outreach Dashboard? This might give administrators more options in dealing with any problems. I'm thinking especially of the unfortunate cases where we have to block an editor because they continue to make mistakes, and they refuse to communicate. If we knew they were editing under the mentorship of another editor, then it would give us one more option to try before blocking. Bovlb (talk) 16:29, 23 October 2023 (UTC)
@Bovlb Excellent idea. Ask (suggest it) at en:WP:EDUN. Hanyangprofessor2 (talk) 05:24, 27 October 2023 (UTC)
Thanks for the suggestion. I started a topic. Bovlb (talk) 15:38, 27 October 2023 (UTC)

Two species that need merging - California sea cucumber

They are the same species, different Latin names. The merging here is not trivial, because it generates warnings. I never merged anything on Wikidata (yet), so I don't want to ruin stuff.

Tupungato (talk) 11:48, 26 October 2023 (UTC)

How to make reference to LMÍ

I have about 1000 GPS coordinates that for the past 11 months I've been connecting to wikidata items in an excel sheet. I've imported about 50 items as a test (example Q16428164). I just put the website where you can download the data as a reference. But I've noticed that they actually want us to write: "Inniheldur gögn frá IS 50V gagnagrunni Landmælinga Íslands frá 10/2023" (See: https://www.lmi.is/is/um-lmi/starfsemi/skilmalar-og-gjaldskra/gjaldskra ). It translates to "Includes data from IS 50V database from National Land Survey of Iceland from 10/2023". How would I write something like that in the reference tab?

Also, I've added the unique identifier for the data with the gps coordinates of Hofsá Q16428164. Is that OK? Steinninn (talk) 19:16, 14 October 2023 (UTC)

Are you trying to describe the whole item as having data from the IS 50V database or just a specific property value? -- William Graham (talk) 21:43, 16 October 2023 (UTC)
Just the specific property value. --Steinninn (talk) 00:46, 22 October 2023 (UTC)
No one wants to help me with this one? --Steinninn (talk) 12:58, 26 October 2023 (UTC)
That sentence is an example of how it should be, but they are requesting the dataset, the institution and the date retrieved to be specified. Since this is an structured database, it makes more sense to put the required information in statements than a sentance. Most of them I have added in Special:Diff/1999553009, just not sure how to add the dataset. @William Graham Maybe Property:P1476 would be ok for an dataset source, in this case "IS 50V"? Snævar (talk) 23:37, 28 October 2023 (UTC)
Apologies for not replying sooner. There isn't a great way to include an exact string for this purpose as a reference. So I'm going to talk through what properties exist in the hope that there can be further discussion.
There was a comment property that has been deprecated for use in items and author name string (P2093) probably is more meant for a person or organization's exact name. So you're probably best expressed in a reference using other properties.
So create an item for "IS 50V". Then in a reference it would look something like:
stated in (P248)IS 50V
author (P50)National Land Survey of Iceland (Q106704593)
retrieved (P813)
I think that does an ok job of expressing the string "Includes data from IS 50V database from National Land Survey of Iceland from 10/2023" -- William Graham (talk) 00:15, 29 October 2023 (UTC)

Merging

Hi everyone. When merging, I thought items were redirected to the older ones. Is that the case? If so, why d:Q56149564 is a redirect to d:Q97654352? strakhov (talk) 06:02, 28 October 2023 (UTC)

@Strakhov I think El Pantera didn't turn on "Always merge into the older entity (uncheck to merge into the 'Merge with' entity)". --Tmv (talk) 11:02, 28 October 2023 (UTC)
Ah. I see. Well, I re-merged the items properly. Thanks, Tmv. strakhov (talk) 11:55, 28 October 2023 (UTC)

Sitelinks to Wiktionary

On the WIKTIONARY Sitelinks help page it is said:

"The links to pages of the main namespace should not be stored in Wikidata. A link to a Wiktionary entry should not be linked on a Q-item on Wikidata, since Q-items are about concepts and not words."

I find this argument questionable. For a Q-item there already are a label and a description. This combination already has the structure and the purpose of a dictionary article (it explains the concept in a natural language) but we don't remove labels and descriptions from the entities. If there is a shortest description, why can't there be a Wiktionary link to a word label with a longer description?

Yes, a concept can have or don't have a corresponding word in this or that language. This aspect could be handled by actually having or not having a link to a corresponding word. What's the problem with that approach?

When we make natural text generators in Abstract Wikipedia, we are going to need these relations from concepts to the words. Do we currently have it with current state?

Was there a discussion regarding that topic? I couldn't find it. Nikolay Komarov (talk) 10:11, 22 October 2023 (UTC)

As I understand it Lexemes are to Wiktionary, what Wikidata is to Wikipedia, basically a machine-readable version of the key information. It is correct that Wiktionary should not be linked to Wikidata for the stated reason. You can link lexemes to items using item for this sense (P5137) with a mapping relation type (P4390) qualifier, this establishes a stronger link between the word and what it represents. The property discussion from 2018 doesn't really say much, but could try asking User:Denny about Abstract Wikipedia. Infrastruktur (talk) 10:38, 22 October 2023 (UTC)
I think you meant "lexemes are to Wiktionary, what items are to Wikipedia" — Martin (MSGJ · talk) 13:41, 23 October 2023 (UTC)
Because items are about objects and not their names. One object can have several names and so several (many-many!) Wiktionary sitelinks corresponding to it, how to choose? Infovarius (talk) 08:54, 26 October 2023 (UTC)

We have 82 items matching a search for "soft redirect to Wiktionary"; for example Q109284139; they mostly have that string as a description. What should be done with them? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:41, 29 October 2023 (UTC)

How to patrol edits in Indonesian?

Hello,

I am suspicious of the edits by Special:Contributions/114.79.3.149. They replaced the indonesian description of many items of castles and forts by "bangunan kuil" which Deepl says mean "temple building". However, I don't feel very comfortable mass-reverting edits in a language that I personnally don't understand.

Where can I find Indonesian speakers interested in Wikidata who can confirm whether the edits should be reverted? GrandEscogriffe (talk) 11:29, 28 October 2023 (UTC)

https://www.wikidata.org/wiki/Wikidata:WikiProject_Indonesia. I've reviewed the English changes, and they are generally for the good Vicarage (talk) 15:38, 28 October 2023 (UTC)
Hello, I speak Indonesian, and I agree with you, I myself do not feel comfortable descripting historical (some abandoned) military fortresses as "temple building" since temple or kuil in Indonesian mean "place of worship" or "place to honor things that are transcendent". However, as far as I know, there are attempts to "standardize" properties for historical or cultural building and other heritages to make them easier to query. I will forward this to the Indonesian Wikidata Community for further discussion. Thank you for pointing that | HA (talk) 00:27, 29 October 2023 (UTC)
Hello @GrandEscogriffe someone on the WikiProject Indonesia confirmed this is vandalism, you can just mass reverse them. HA (talk) 04:34, 29 October 2023 (UTC)

Specifying countries for historical events

I think where historical events like military operation (Q645883) have a country (P17), it should be the modern country, not the historical one in which the operation occurred, but participant in (P1344) should refer to the names of the participating states at the time. This allows anyone doing a geographical query to know it will be split only 200-odd ways, and allow deduction from location (P276) as a backup. We already have a warning mechanism to ensure the correct state is used, and modern borders are better defined than ancient ones. Battle of the Boyne (Q644960) is an example. I'm less keen on removing country (P17) entirely and only using location (P276), as I think people expect to see explicitly that the Battle of Waterloo (Q48314) was fought in Belgium, even though the state did not exist at the time. Note that the latter example still has the date constraint bug claiming that 18 June 1815 is later than year 1815 (raised as https://phabricator.wikimedia.org/T349971). Vicarage (talk) 10:33, 29 October 2023 (UTC)

Please delete Q123230797

I accidentally created an unncessary item, apologies. Piotrus (talk) 02:39, 29 October 2023 (UTC)

Hello, have you tried asking through Requests for Deletions? Thank you. HA (talk) 04:30, 29 October 2023 (UTC)
Already has been   Done Ymblanter (talk) 20:08, 29 October 2023 (UTC)

Interlanguage link issue

I can't figure out how to correct the issue so I'm trying here. Foreign relations of East Timor (Q13588629) correctly contains en:Foreign relations of East Timor and de:Außenpolitik Osttimors but when using the interlanguage link sidebar on the English page and click one "Deutsch", you get de:Osttimor#Außenpolitik. Anyone can fix that? Semsûrî (talk) 21:17, 29 October 2023 (UTC)

@Semsûrî Not on Wikidata. There is a manual interwiki link on en.wp for some reason. --Emu (talk) 22:02, 29 October 2023 (UTC)
Ah! I've removed it now. Thank you. Semsûrî (talk) 22:03, 29 October 2023 (UTC)

Michael Goldstein (Q104032136) is no longer a DAB page

en:Michael Goldstein used to be a disambiguation page. I recently turned it into a real article, but Q104032136 still says it's an instance of Wikimedia disambiguation page. Rather than me trying to fix it and probably making a mess, could I impose on somebody to do the right thing there? Thanks. RoySmith (talk) 13:52, 30 October 2023 (UTC)

I created Q123247038. Ymblanter (talk) 15:42, 30 October 2023 (UTC)

Welcome to the 600th Wikidata Weekly Summary!

Request for Comment: Quonter Vandal

I have a tool in alpha that I'd like some feedback on. It's called Quonter Vandal and it is a tool that surfaces likely bad edits using machine learning. It's meant to be more efficient than looking at recent changes. For more information see Wikidata:Quonter Vandal.

General questions I'd like answered are: 1) would a tool like this be useful 2) are there areas you would want to see its accuracy improved 3) is it missing some kinds of vandalism that matter to you 4) what other functionality would you want from a tool like this.

BrokenSegue (talk) 19:13, 30 October 2023 (UTC)

Get stations of a line

Hi, I'm a Wikivoyager on javoy and I'm writing an article about the rail line now, so I want to get a list of the stations from Wikidata using Lua, but there are problems.

To begin with, I don't know which property is for the stations. This may be because of my insufficient experience and exploration in Wikidata. Although I found P527, I think this is too ambiguous, so is there a better property?

And the second problem is order. When you list stations in articles, it makes no sense if the order is messed up. However, from what I can see, the order in which the values are added seems to be reflected as is. Do you have any idea on this issue? Thanks, Tmv (talk) 11:30, 22 October 2023 (UTC)

Hello @Tmv: for example see
M2k~dewiki (talk) 12:16, 23 October 2023 (UTC)
@M2k~dewiki Thanks! But... I don't see any case where only a line is specified and a list of stations on that line is displayed, so do I have to get the last-stop station, then reference the adjacent station (P197) there and repeat this several times? This method requires access to Wikibase several times to tabulate stations on a single route, so the load will be heavy :( If this is the only method to generate a list of stations on a line, okay, I would think of some measures (or simply give up and rely on the manual work of editors). --Tmv (talk) 11:56, 24 October 2023 (UTC)
Was this your need : https://w.wiki/7ygz ? Bouzinac💬✒️💛 09:26, 31 October 2023 (UTC)

Display date format

How do I make sure that dates on entity pages are displayed as YYYY-MM-DD, like below example?

point in time
  1920-01-26
0 references
add reference


add value

A diehard editor (talk) 19:50, 29 October 2023 (UTC)

Despite enthusiastic requests for this option from users in the East Asian region, the demand for this feature has been ignored (phab:T63958). The reasons for this are not entirely clear, but it may be because it's not relevant to Western users. Afaz (talk) 06:36, 31 October 2023 (UTC)

Linking Q108917642 to a userspace page

Hello! This is my first substantive discussion at Wikidata, so I am not entirely sure if I am doing this correctly. I originally started a discussion at the appropriate talk page, but Dorades recommended I bring this here for wider discussion.

Q108917642 connects three essays talking about how various Wikipedia editions are biased (e.g. biased towards chemistry, biased against alchemy). All three essays currently connected to the item are in projectspace, so the item is undoubtedly notable. However, all three essays were originally translations of en:User:Guy Macon/Yes. We are biased. In light of this, I think it is appropriate to make an exception to the general rule that userspace pages are not linked to items. Thoughts? HouseBlaster (talk) 18:39, 31 October 2023 (UTC)

An exception is difficult to keep up permanently. I recommend to make cross-namespace redirect in English Wikipedia, and use the redirect as a sitelink. —MisterSynergy (talk) 18:56, 31 October 2023 (UTC)
That might work, but the inter-language links would not work when viewing the English Wikipedia version. It is better than nothing, but is there not a way to put a note to future editors that it is not a mistake? HouseBlaster (talk) 19:43, 31 October 2023 (UTC)
There is no default place to look for such a note. Editing is also to a large extent automated, and bots might miss that note as well or simply not understand it. This is what I meant with "difficult to keep up permanently". —MisterSynergy (talk) 19:48, 31 October 2023 (UTC)
Having clear rules is good. This means that people over at Wikipedia can take our rules into account when deciding about what the want to put on user pages and what they want to put in project space.
If Guy Macon wants his article to be interlinked, maybe ask him to move it into the normal projectspace? ChristianKl20:42, 31 October 2023 (UTC)

Votes

In the items representing UN resolutions I would like to add information about which countries voted for, against or blank. Is there any existing property that could be used for this purpose, or should I suggest a new property? Cavernia (talk) 07:32, 31 October 2023 (UTC)

I am also in doubt if sponsor (P859) is the correct property for countries sponsoring the resolution. --Cavernia (talk) 07:41, 31 October 2023 (UTC)
I don't think we have an existing property for this, so a new proposal would make sense. It's also not as clear how we would want to model this as we usually don't have properties with 193 values. It's worth discussing whether store this kind of voting information in a table in commons. ChristianKl20:29, 31 October 2023 (UTC)
@ChristianKl: Thanks for your answer. I see the challenge by having av large number of values, a notable example is the study Measurement of the top quark mass using proton-proton data at(s)=7and 8 TeV (Q56192409) with 2309 authors which takes time to load. I am not familiar with tables in commons, the main question is if it's possible to call the data from a Sparql query or not. --Cavernia (talk) 10:39, 4 November 2023 (UTC)

Concept of bot edits

There is a problem I would like to ask the community about. The description will be long, I will ask the specific questions at the end.

Vojtěch Dostál imported data on hundreds of thousands of individuals from the Czech National Library's NKC database this summer. This is a big and important project, although the data was incomplete and sometimes wrong, I think I was not the only one who was basically happy with the project, followed the import and corrected and completed the items.

Another editor, Frettie, with the help of his bot (Frettiebot), started to add more data to the items : occupations, birthplaces, languages spoken, etc. This also meant forcing a lot of problematic data. Violent, because it is currently the case that if one corrects or deletes an erroneous data, Frettiebot will add the same data again, if one deletes it again, it will add it again, and this repeats in an endless cycle. Unfortunately, communication with Frettie is at a very low level, despite being told repeatedly and repeatedly that what he is doing is a problem, he neglects requests and usually gives a condescending answer: if you don't like the data added by the bot, change it to "deprecated". His edits have led to edit wars: between editors and Frettiebot on the one hand, and with other bots on the other (the latter has led to two bots being blocked from a page)

Let's look at the problem with occupation data: obviously all NKC-identified persons have at least one occupation, but it is common to include two or three statements for the P106 trait. For an import of hundreds of thousands of persons, this is hundreds of thousands of data. If only ten per cent of this is incorrect or redundant, it is also in the order of tens of thousands, if one per cent, it is also in the order of thousands. The fundamental problem is that for data imports of this magnitude, it is the wrong methodology to build a project around correcting data 'manually' over and over again. The right thing to do would be not to overwrite the already corrected data by the bot.

Not a problem for me, but I note that if a source database gives this much erroneous data, the reason for deprecated rank (P2241) added to the "debrecated" flag will eventually include source known to be unreliable (Q22979588), which in turn qualifies the entire Czech National Library database. But I don't think the source is that unreliable, it's just a bad concept of data distribution and the bot operator doesn't hear the problem signal.

The conceptual question is, where do we import from and how much do we build on the source data?The personal database of the Czech National Library is not a biographical database, just as the other library catalogues are not. The intention of the database creators was simply to be able to distinguish between identical forms of names in some way. Therefore, for example, they do not or rarely include detailed biographical data: they do not include exact dates and places of birth or death, perhaps only years, exact occupations, education, and obviously cannot be used as archontological data. For example, Hrvatski biografski leksikon ID (P8581) or Vienna History Wiki ID P7842 etc. point to a biographical database, but neither NKC, Viaf nor OSZK are biographical databases. Data imported from the latter should be treated with a certain degree of caution, rather than forcibly rewritten over and over again to the items. Here, however, it seems that despite all the feedback requesting corrections, the NKC data are treated by the bot host as if they were dead certain.

The countless incorrect or unnecessary data added in this way will only turn the Wikidata page into a swamp. Why, for example, do you need five or ten occupations, three or five of which should be set to obsolete because they are either wrong or simply add nothing extra?Let's see what are the typical mistakes:

For example, if a person's occupation is Lutheran pastor (Q96236305), but is recorded in the NKC as priest (Q42603), parson (Q955464) or pastor (Q152002), Frettiebot will add it to the existing Lutheran pastor (Q96236305), sometimes all of them. If someone is known to be living in the 18th arrondissement of Paris, but the NKC only records his place of birth as Paris, Frettiebot comes and adds Paris to the element, even though it already records that he was born in the 18th arrondissement of Paris. If this element is not a person but several people (e.g. a duo, twins, married couple, etc.), then certain attributes are not added to this element but to the element containing P31 Q5. Such a property is, for example, P1412, which is not added to the group of several people, but to each person, but Frettiebot ignores this caveat.

These are just a few examples, obviously I have not brought this to the attention of the community just because of three problems, but because there are countless - in my opinion conceptually flawed, unnecessary - bot editing practices.

The specific question is: is it correct for a bot to repeatedly enter the same data into an element if that data is incorrect, redundant or out of place? Is it correct to extract specific biographical data not from a biographical database but from a non-specialised catalogue? Is it right to put the burden of correction so much on the users when it could be done by the bot operator?

Of course, I'm also waiting for Frettie's reply, because - although asked - he never described what justifies the bot having to re-enter redundant, redundant or incorrect data over and over again, i.e. why is it better for the user to set it to obsolete, rather than the bot changing the data entry? Pallor (talk) 20:18, 21 October 2023 (UTC)

Bot is not fortune-teller, bot cannot know what has been deleted by other user. That's the main problem. It could find this from the history, but it would make the script run longer (a lot). Moreover, I personally think that deprecated values are better left, because we are able to detect that it is wrong in the source data and possibly have it fixed. Which is sometime done, the cooperation between WM CR and NK CR is mutual. By the way, I'm a man. --Frettie (talk) 21:40, 21 October 2023 (UTC)
I would have thought a bot should only be used if its edits are correct. If it adds substandard information, for example, then please no. Automated undoing of corrections? Um, Maculosae tegmine lyncis (talk) 21:47, 21 October 2023 (UTC)
Given you know there are problems with your data sources, can't you record your inputs for each tranche and only apply the differences each run. It seems very bad practice to reapply the complete dataset knowing that it will recreate errors. And deprecated values are for "often thought to be true, but actually not", not to act as a database of problems in your data sources. To create that you should do a report from WD and compare it with your sources outside the WD system. Vicarage (talk) 22:05, 21 October 2023 (UTC)
@Vicarage I don't think your definition of deprecated rank is correct. It's also used to mark sourced, but incorrect statements (i.e. information that was never correct, but was at some point thought to be). Vojtěch Dostál (talk) 18:14, 22 October 2023 (UTC)
Yes, but I don't think WD should be used as a staging area for fixing other people's data, as @Frettie merely hints they might do, particularly in this case when the approach is irritating others. Vicarage (talk) 18:59, 22 October 2023 (UTC)
Why shouldn't the bot be suspended until at least it stops edit warring? Assuming "Unfortunately, communication with Frettie is at a very low level, despite being told repeatedly and repeatedly that what he is doing is a problem, he neglects requests and usually gives a condescending answer: if you don't like the data added by the bot, change it to "deprecated". His edits have led to edit wars: between editors and Frettiebot on the one hand, and with other bots on the other." is accurate, are you unwilling or unable to resolve the problems, starting with stopping it from edit warring? I think Vicarage is right. RudolfoMD (talk) 03:52, 23 October 2023 (UTC)
@Frettie? RudolfoMD (talk) 18:13, 23 October 2023 (UTC)
Iam disagree with: "Unfortunately, communication with Frettie is at a very low level". It's fact, stopping of edit warring is by leave "mistake" with deprecate status. It is correct way. --Frettie (talk) 19:16, 23 October 2023 (UTC)
@Frettie, I see you continue to refuse to explain why is it better for the user to set it to obsolete, rather than the bot changing the data entry. This is unacceptable: if a person's occupation is Lutheran pastor (Q96236305), but is recorded in the NKC as priest (Q42603), parson (Q955464) or pastor (Q152002), Frettiebot will add it to the existing Lutheran pastor (Q96236305), sometimes all of them. Draceane is right. As you refuse to fix the bot, it should be blocked. It would be bad if the bot added them with the flag obsolete, but at least that would make leaving the bot running defensible. Adding them as it's doing is indefensible. RudolfoMD (talk) 19:29, 23 October 2023 (UTC)
I see that Frettiebot is still being run while complaints about its use are being discussed here. This is inexcusable. Vicarage (talk) 19:42, 23 October 2023 (UTC)
If is people part of Lutherian pastor and priest, so it is ok, because, he is pastor AND priest, no Pastor OR priest, it's my point of view. So, if bot would be fixed – how? What is best practice? Do you have some ideas? If some value is imported and later removed, bot dont have this information. Bot can save pairs "QID" + "PROPERTY" + "VALUE" from all runs and if this is again ready to save, bot does not save this. It can be possible, but it will be slower. @Vojtěch Dostál: – what do you think? Adds new values only once. --Frettie (talk) 06:36, 24 October 2023 (UTC)
@Pallor From my point of view, Wikidata is a database aggregator. We collect data (with a bot) and then we sometimes curate them (usually by manually setting ranks). That's how I understand Wikidata's general approach. P.S. I note that your examples with Lutheran pastor (Q96236305) and Paris (Q90) aren't in fact examples of incorrect data, am I right? Vojtěch Dostál (talk) 18:18, 22 October 2023 (UTC)
Vojtěch Dostál yes, we collect data, but we are lucky that we are human beings, not machines, we can make decisions that machines cannot. We also operate the machines and we can tell them what to do and what not to do. With all this in mind, the aim obviously cannot be to put all the variations of all occupations, or all the occurrences of a settlement, on a data sheet and increase the noise to infinity, because that would turn the Wikidata database into a swamp. We can make good decisions and bad ones. The evangelical pastor, Paris, and all the other examples not listed here show that it is possible to pour data into Wikidata that makes a piece of data - which was previously precisely defined - redundant or ambiguous. I can give a particularly bad example, when a graphic artist/photographer's album of historical sites was written in the descriptive data that the author was a historian, but your vitalapod also had a case of incorrect data. All my examples support the point that you should not spread data like this, you should give users a chance to correct what the source does not know well, you should not force the issue of putting up incorrect and redundant data at all costs. Pallor (talk) 18:42, 22 October 2023 (UTC)
You are obviously right about the importance of humans for Wikidata and I understand that. But I have hard time understanding how the presence of "less precise" professions turns Wikidata into a swamp. How is the "profession:priest" statement preventing you from querying all Wikidata for all lutheran pastors? I see how it would be a problem in Wikipedia, but isn't it a purely aesthetic problem for Wikidata? And on the contrary, if the source for "lutheran pastor" is later deemed incorrect and the corresponding statement deprecated, because the person actually was a priest but not lutheran, we still have a rough idea about his profession with the less precise statement... Vojtěch Dostál (talk) 18:57, 22 October 2023 (UTC)
I feel this is still our (my and Vojtěch's) ongoing dispute over data representation. IMO WD should be not only machine readable, but also human readable. For you it's just aesthetics, for many others this is the matter of usability. — Draceane talkcontrib. 14:47, 23 October 2023 (UTC)
Yes, I have the same feeling about this discussion :) It's about the desire of a part of the Wikidata community to turn it into a second Wikipedia :-). Vojtěch Dostál (talk) 06:52, 24 October 2023 (UTC)
WD is a curated database, yes we might reduce the workload by using bots for mass import, but if a human decides the information is wrong, I think they should remove it. There are clearly techniques in the AI world where a learning machine can absorb vast quantities of machine scraped information and do probabilistic assessment of which facts are most likely to be correct, but using them here would overwhelm the GUI we have, and I agree with @Pallor we'd have a swamp. Vicarage (talk) 19:07, 22 October 2023 (UTC)
We agree that we have to record certain data even if it is not true: this could be, for example, a historical error or a poorly drawn conclusion, since it is widespread, and we help to refute it by indicating it. But we usually do this on the basis of reliable sources and thus help to refute incorrect/erroneous data. But here the source itself is not perfect either, since - as I explained above - we do not take the data from a biographical database, but from a library catalog. The aim of the librarian was not to position the person between the denominations, but to distinguish him from the person of the same name, perhaps born in the same year, and for this it was sufficient to describe a more general, schematic occupation. It's like the system of tags and descriptions in Wikidata: you don't have to be extremely precise there either, but when you fill in the P106 field, you're obviously trying to create the most accurate model of reality, you're not forced to rough out the description. If someone is a high school teacher, we don't have to describe that he is a educator, a instructor, AND a high school teacher, the last one is enough, there is no need to add the other two - especially not if our source is not completely reliable in this regard. Pallor (talk) 19:33, 22 October 2023 (UTC)
I personally don't think that this is a majority view. I would be surprised if the community here really thinks that we should remove incorrect sourced statements rather than deprecating them. Can we somehow determine what the consensus really is? Let's write it down somewhere afterwards, because I feel I already had this discussion somewhere. Vojtěch Dostál (talk) 19:15, 22 October 2023 (UTC)
Bot is machine. If is some type of wrong edit made very often, is good to add some exception to bot.
But not only for this case it would be fine, if there is some universal solution. What about some bot which would deprecate statements which are one level upper than some other statement? When there is eg. genre=adventure film (Q319221), statement genre=film (Q11424) will be marked as deprecated. THe same for occupation, place of birth, category combines topics etc.. JAn Dudík (talk) 07:44, 23 October 2023 (UTC)
That bot job would be against Wikidata rules. True statements should never be deprecated. Vojtěch Dostál (talk) 14:12, 23 October 2023 (UTC)
I don't generally agree. By applying this rule literally, we could add to all items instance of (P31) entity (Q35120), to all people place of birth (P19) Earth (Q2). Yeah, it's true, but um... If you added all superclasses of the statements, you would just made WikiSwamp, incomprehensible for humans. — Draceane talkcontrib. 14:47, 23 October 2023 (UTC)
@Draceane That would be absurd, but I don't see a relevant source that collects all people born on Earth as opposed to people born on other planets :). Vojtěch Dostál (talk) 06:45, 24 October 2023 (UTC)
That's exactly Draceane's point. The examples he gives are true statements, yet as absurd as the additions the bot owner is being asked to stop making, and you are saying should not be deprecated merely because they are true. Your argument makes no sense. It seems like Frettie is trying, hard, to not understand, but AGF makes me assume it's a language barrier. (For clarity, I'm referring to the notion that "It is difficult to get a man to understand something, when his ego depends upon his not understanding it!")
Are the edits the bot is making so valuable as to outweigh the problems its causing? I suggested an admin suspend the bot. RudolfoMD (talk) 00:45, 25 October 2023 (UTC)
It is sadly not Frettie who does not understand. Actually, I think other people find it hard to understand elementary rules of Wikidata: 1) Wrong sourced claims should not be removed but deprecated and 2) Preferred claims are marked with ranks, not by removing less precise yet true claims. These rules are essential to the way Wikidata operates and cause no significant problems at all to reuse of Wikidata, but it is sometimes difficult for Wikipedians to get a grasp of them. Vojtěch Dostál (talk) 14:19, 25 October 2023 (UTC)
After reading all this I feel a strong urge to express my agreement with Pallor and Vicarage. Not because I have new points to add in favor of their opinion, but as a counterpoint to Vojtěch Dostál’s claim that their point of view marks a misunderstanding of Wikidata's principles. I think this comes close to assaulting them and like-minded Wikidata users like me on a personal level. In my opinion, this discussion is too important to be bogged down like this. Let's try and keep the exchange productive and respectful, please.
On the point of Frettie's alleged "not understanding": My argument applies here too (mutatis mutandis). But I must confess I have a hard time understanding what you are trying to say, Frettie, because of your English phrasing. Maybe the same is true for others? Jonathan Groß (talk) 16:38, 25 October 2023 (UTC)

Discussion after bot suspension

A day ago, at Wikidata:Administrators'_noticeboard#Suspend_a_bot;_remove_incorrect_admin_claims? our request on the administrators' message board, Frettiebot was suspended until this discussion was closed. I'd like to lay down some basics (although I've already mentioned some of them).

  1. The transfer of data from the NKC database to Wikidata is fundamentally good, so it benefits Wikidata.
  2. Frettiebot has some useful edits.
  3. The goal is not to make a rule that says: a bot cannot fix or override a person's edit (see e.g. the {{Autofix}} template, which I think is useful)
  4. At the same time, we also don't want a bot to UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data.

If others agree with this point 4, then we respectfully ask Fretti to improve the operation of the bot, upload all data from NKC only once, and accept when this data is corrected or deleted. I am pinging a few people who have participated in the debate or have previously made a request to Frettie in a similar matter to write down if they can support point 4. Of course, VD and Frettie can also ping people who have previously commented on the question anywhere.

@Maculosae tegmine lyncis, Vicarage, RudolfoMD, Draceane, Jonathan Groß, GrandEscogriffe: @Emu, Canley, U. M. Owen, Andrew Gray, RAN, Jackie Bensberg: @Polarlys, Vanbasten 23: (I apologize to those who are no longer interested in the topic, but still had to come here) Pallor (talk) 23:00, 27 October 2023 (UTC)

  Support for point 4, although even if this were to become consensus (which it should), the assessment of what is "wrong data" will always be a point of contention. In any case, thank you for this clear and constructive comment. Jonathan Groß (talk) 05:40, 28 October 2023 (UTC)
  Oppose I find myself perfectly in accordance with Vojtěch's vision of what is Wikidata. We should aggregate first, sort (and not delete) later. I think Frettiebot is doing an important job of providing references to P106 that are too often not referenced, making them basically worthless. Frankly, I'd even wish other bots would do the same with LC or GND. Now sure, as it was said, NKC is only a library catalog, therefore it might not be the best source available, nevertheless it is a legitimate source. I think the real problem here isn't much the bot's edits but rather how do we model competing or hierarchical values for P106? The bot is only exposing the problem, but it would have come sooner or later. --Jahl de Vautban (talk) 06:38, 28 October 2023 (UTC)
Yes, you have explained the problem very well. If a person is a footballer it makes no sense to also add that he is an "athlete" o "sport people", because we would be filling Wikidata with useless data. Many Wikipedias use this data for their templates and what we are achieving is that these files are full of professions that do not inform readers of anything, on the contrary, they confuse them more. These users see it as normal and there is no room for much discussion. --Vanbasten 23 (talk) 07:54, 28 October 2023 (UTC)
  Support importing bots should only attempt to add data once. @Frettie allows his bot to do this multiple times, while not engaging, or even pausing his bot after multiple complaints. The huge differential in human time in setting a bot in action, and reviewing and flagging the results means anyone running a bot needs to be cautious, and what we have here is reckless behaviour. Vicarage (talk) 08:40, 28 October 2023 (UTC)
"importing bots should only attempt to add data once" is in practice super complicated and IMO not feasible in most situations. Use ranks to indicate which claims should be visible (and which ones not) to end users; and ask the bot operator not to import already existing values regardless of their ranks. —MisterSynergy (talk) 09:05, 28 October 2023 (UTC)
Since I have been pinged: I probably don’t understand all nuances but it seems to boil down to “do import unless an issue is raised with an edit or a type of edit, in this case resolve manually”. That’s the general idea with mass edits anyway, so yeah, no reason to act differently in this case. On a more general note: Vojtěch Dostál is right, no notes from me on that issue. --Emu (talk) 09:10, 28 October 2023 (UTC)
  Support for point 4 of course. Also I agree with Emu that there are two different issues. First a general good practice of bot programming that bots should always accept human corrections, and never get into edit wars. This should be consensual. Second the more fundamental question of which kind of data should appear in Wikidata, which I am surprised has not been resolved earlier in the history of the project. There I am in the Pallor/Vicarage/Draceane/RudolfoMD/Jonathan Groß camp. I think that some statements are both true, sourceable, and useless because they are superseded by a more precise true statement, and that such statements should not appear in Wikidata at all. Yes Vojtěch Dostál my opinion is informed by Wikipedia, but what is the problem with that? Isn't supplying the Wikipedias the main original mission of Wikidata? Are there use cases of Wikidata where it is in fact useful to have large lists of redundant imprecise statements? --GrandEscogriffe (talk) 11:10, 28 October 2023 (UTC)
Sure but that doesn’t mean that we have to answer to the whims of infobox programmers from other projects, to put it bluntly. I often find it quite helpful (when researching and/or disambiguating) to have many statements of varying precision and even accuracy. This gives me a fuller picture of what is generally known about a person – whether true or not, whether precise or not. It also sometimes helps to trace how inaccuracies over time evolved into falsehoods. This is different from Wikipedia where we generally only strive for the best available version of the received opinion of the truth. --Emu (talk) 11:41, 28 October 2023 (UTC)
Can you give an example? GrandEscogriffe (talk) 12:18, 28 October 2023 (UTC)
I don’t have a good example for occupation (P106) at hand (and most of those cases would be hard to explain since there is often an element of language dependency and I mostly work with German sources) but in the past (in a very, very similar discussion) I have mentioned Q94694204#P569 as an example in that direction. --Emu (talk) 22:18, 28 October 2023 (UTC)
@Emu: I agree with you and Epìdosis below that keeping incorrect sourced statements as deprecated is useful. My problem is mostly with redundant (and therefore correct) statements. In your example, I do not see what can be the use of the correct, redundant statement date of birth (P569) 1831 — unlike the deprecated 1841s which inform users not to add 1841 at normal rank. Every user (human or bot) who is tempted to add the imprecise 1831 should already "see" that 1831-09-09 is present. So the imprecise 1831 does not play the safeguarding role that deprecated common falsehoods do.
Also, this example has only one best-ranked value (as it should) so it does not clutter the external users*. A big problem with Frettiebot is that it put everything at the normal rank. I would be much less bothered if it upgraded the already existing more precise statement to preferred rank every time it adds a less precise statement. Although even then I would not really see the point.
*Of these users, I am familiar with Wikipedia, but I guess other external users also rely and the rank system, and I am really curious of who these other users are. Perhaps Wikidata should not be at the whims of infobox programmers specifically, but it should make/keep itself useful to the people who use it. GrandEscogriffe (talk) 21:25, 3 November 2023 (UTC)
To take the example of Lina Wasserburger (Q94694204): The probably correct precise value is sourced with user-generated content and a primary source. The statement with year precision however has a secondary source, so do the other deprecated statements. In theory, you could also query statements that are sourced by Österreichische Schriftstellerinnen 1880–1938 (Q104601081) against our best guess therefore estimating the accuracy (and precision) of a given source which to me is quite an interesting use case. And finally: Precise values can be deleted for all sorts of legitimate reasons – resulting in missing statements instead of other sources statements with lower precision.
Don’t we have a bot job that periodically sets a preferred rank in those cases? Of course it would be ideal if Frettie took care but I imagine it’s not that simple. --Emu (talk) 21:53, 3 November 2023 (UTC)
  Oppose to point 4 (with one precisation at my point 4 below); first of all, I very much agree with @Jahl de Vautban: in the comment above: aggregating data from authoritative sources (among which national authority files are surely to be counted) and then ranking the statements; the phrase "The bot is only exposing the problem" (of managing competing or hierarchical values for P106) perfectly summarizes the situation (BTW, since these topics are clearly of general interest, I think they would deserve a RfC, in order to involve more users; the Project chat has tens of messages each day and is very difficult to follow). However, since I understand the concerns motivating users who have expressed critics on some aspects of the activity of Frettiebot, I would like to try to address these concerns proposing a few solutions alternative to the necessity of changing the present activity of the bot (points 1 and 2); I add a small comment about edit wars (point 3); finally, I would like to propose myself one change in the bot activity which, as far as I see, wasn't mentioned above (point 4). I apologize in advance because I will write a lot, but I think the importance of these themes deserves a detailed analysis.
  1. "is it correct for a bot to repeatedly enter the same data into an element if that data is incorrect, redundant or out of place?" (the initial question by @Pallor:): I think these three categories need to be considered separately (and, as it appears both from comments above, and from my personal experience, the most frequent problem is redundant data, so I will dedicate to this part more space):
    1. incorrect data can be entered repeatedly by a bot, if supported by an authoritative source (as I said above, IMHO national authority files are authoritative sources), for two reasons: 1) as a principle, "Wikidata simply provides information according to specific sources; those sources may or may not reflect contemporary thought or scientific consensus" (quotation from Help:Ranking); 2) technically, ""importing bots should only attempt to add data once" is in practice super complicated and IMO not feasible in most situations" (I'm not a bot operator, but I trust @MisterSynergy:, who is a bot operator, so I'm quoting his comment above). Given this premise, in order to avoid incorrect data being received by Wikipedia and other data reusers (a legitimate concern, which I obviously share), these incorrect data need to be set to deprecated rank (as stated by Help:Ranking#Deprecated rank), with qualifier reason for deprecated rank (P2241)error in referenced source or sources (Q29998666) (or typographical error (Q734832), useful in some specific cases). Of course keeping incorrect data as deprecated clutters the items, worsening their readability for humans (which is a legitimate concern, although I think it's rare to see more than 1 or 2 incorrect deprecated statements in the same item): this can be addressed in at least two ways, the first being collapsing not-best-ranked-values (see below point 2) and the second being data round-tripping, which I treat here at point 1.1.1.
      1. Data round-tripping (Wikidata:Data round-tripping) is crucial for Wikidata data quality because, if some authoritative database outside Wikidata contains mistakes, these mistakes risk to damage Wikidata in many ways as long as they exist (the most problematic way is e.g. a deprecated incorrect statement deriving from one import is removed on Wikidata, maybe from a user in good faith just judging it useless, and then another import readds it with normal rank, reintroducing the mistake in full power; the less problematic way, nevertheless problematic, is that deprecated incorrect statements clutter items); ideally we should have a workflow implying that a) when we notice that statement X, supported by an entry Z of the authoritative database Y, is incorrect, we are able to report this mistake to database Y; b) database Y reads our reports and solves them on a regular basis; c) once entry Z is fixed, we can remove statement X (I think that, once the supporting source is fixed, removing the statements has more advantages than keeping it as deprecated), ideally the removal should be performed by the curators of database Y at the same time as they fix entry Z. This workflow should be improved (see e.g. phab:T312718); the more efficient this workflow is, the less time incorrect statements remain on Wikidata. Of course improving this workflow is a task for Wikidata community and not for bot operators; however, if a bot operator has a longstanding collaboration with the curators of the database which they periodically import to Wikidata, they could encourage the curators of the database to improve this workflow (and to remove from Wikidata incorrect statements sourced by their entries, once they have fixed these entries).
    2. redundant data can be entered repeatedly by a bot, for the reason 1 quoted about incorrect data. Redundant data clutter the items, worsening their readability for humans (which is a legitimate concern, especially in the case of occupation (P106), and I very much share it; in fact I periodically remove unsourced redundant values of P106 to reduce a bit the issue, which is very serious): this can be addressed IMHO in one main way, i.e. collapsing not-best-ranked-values (see below point 2). Redundant data can also clutter Wikipedia and other data reusers receiving them (another legitimate concern), and this should be avoided using ranks. It needs to be noticed here that deprecated rank is designed "for statements that are known to include errors (i.e. data produced by flawed measurement processes, inaccurate statements) or that represent outdated knowledge (i.e. information that was never correct, but was at some point thought to be)" (quotation from Help:Ranking), so not for redundant statements, which aren't wrong stricto sensu. I propose two different procedures for ranking redundant values:
      1. for properties having single-best-value constraint (Q52060874) (mainly date of birth (P569), date of death (P570), place of birth (P19), place of death (P20)), if there are 2(+) values all supported by authoritative sources, the most precise one should get best rank; if the values only differ in precision (i.e. day vs year, or village vs municipality), the best rank can be motivated with qualifier reason for preferred rank (P7452)most precise value (Q71536040). I requested to do it for dates through a bot (preferrably, but not necessarily operated by the same bot operator adding less precise values) a few years ago, and I think it is presently done by BorkedBot (per this task approved in 2021; @ BrokenSegue: could you confirm?); programming a bot to do the same for places, on the basis of recursive located in the administrative territorial entity (P131), should be doable and I would support it; of course, the automatisation has some limitations both for dates and places (see the mentioned bot task), e.g. if a birth date has values 1948, 31/10/1948 and 1949 (or a birth place has values Paris, XVIII arr. of Paris and Saint-Denis) we need a human to choose if 31/10/1948 (or XVIII arr. of Paris) deserves best rank, but in fact a bot can safely operate in most cases.
      2. for properties allowing multiple values (mainly occupation (P106)), which are more seriously affected by the issue of redundancy, two choices are possible: a) set to best rank all "good" values (with qualifier reason for preferred rank (P7452)most precise value (Q71536040)), leaving redundant values in normal rank; b) set to deprecated rank all redundant values (with qualifier reason for deprecated rank (P2241)value to be decided), leaving "good" values in normal rank. Since the number of "good" values is in most cases higher than the number of redundant values, I would probably prefer solution b) just because it would imply to change fewer ranks than option a); however, solution b) has the drawback of deprecating statements which are redundant but not wrong stricto sensu, and this contradicts the present definition of deprecated rank. I think this choice deserves further reflection and discussion. Once we choose one option, it can be mostly applied by the bot, as in the previous case: we just need a bot operating on the basis of recursive subclass of (P279), which will allow it to know which values are redundant and which aren't; of course I would support such a bot.
    3. out of place data (which I would define as values neither incorrect nor redundant, but problematic because they are placed under the wrong property) must not be entered by a bot, neither one nor multiple times. Given this principle, let's draw some practical consequences, outlining different responsabilities: 1) the community (not bot operators) should add constraints to property, wherever possible, so that out of place data get marked as constraint violations; 2) bot operators must avoid adding data which trigger constraint violations, ideally using a mechanism which is always synchronized with constraints (which frequently are added, edited and sometimes removed); 3) if a guideline states that a certain combination of property-value is out of place and should be fixed to another one, but this guideline has not been "translated" into a constraint, bot operators are not required to know it (guidelines are scattered among various WikiProjects and it's often difficult to have in mind all of them); 4) however, if a user writes to a bot operator reporting them that a certain combination of property-value is out of place according to a certain guideline and should be fixed to another one, the bot operator must comply the mentioned guideline as soon as possible (I remember one such case, in which I had no complaint about Frettie's answer).
  2. "I feel this is still our (my and Vojtěch's) ongoing dispute over data representation. IMO WD should be not only machine readable, but also human readable. For you it's just aesthetics, for many others this is the matter of usability." (comment by @Draceane:). I very much agree with this comment of Draceane, Wikidata should be readable not only for machines but also for humans. In the points 1.1 and 1.3 I supported keeping inside items both incorrect statements (with deprecated rank) and redundant statements (either with deprecated rank, or in normal rank with most precise statements in best rank); the use of ranks I propose solve the issue of machine readability, meaning that Wikipedia and other data reusers can read only best-ranked data, thus avoiding incorrect and redundant data. In order to make items also easily readable for humans, I propose the solution of collapsing not-best-ranked values: if a property has 2(+) values and these values have 2 or 3 different ranks, a button appears near the property allowing the user to collapse (= hide) all values which haven't the best rank (i.e. if at least one value has preferred rank, all not-preferred-ranked values are collapsed; if a property has only normal and deprecated values, deprecated values are collapsed). I think a gadget like this would make items perfectly readable; the user should also be able to activate it by default (i.e. not-best-ranked values are collapsed when the item is loaded, and the user can just click the button near one or another property to show the not-best-ranked values for that property if they are interested).
  3. about edit wars between bots: of course they should not happen; I see basically two solutions: 1) the bot operators should encode in their bots some constraint like "if you make the same edit on the same item for a total of N times (with e.g. N = 3), stop editing the item" (I think we have no precise guideline about this, but it would be positive IMHO); 2) we probably need an admin bot which monitors items and, if an edit war between bots develops on one item (e.g. bots A and B adding and removing the same statement on the same item for N times, with e.g. N = 3, then block both bots indefinitely from editing that item and send a message to both bot operators about this). Solution 2 would make 1 not strictly necessary and I hope it's not too difficult to enact.
  4. finally [precisation], @Frettie: my request of one improvement to Frettiebot's handling of some occupation (P106) values: I have noticed that, for "composite" occupations recorded in NKC, Frettiebot sometimes duplicates them, adding both the composite occupation (correctly) and the basic occupation (incorrectly introducing a redundancy absent in NKC). To be clearer, some examples: humans being both historians and art historians, often sources support both values (e.g. Renate Kohn (Q66685235)) and so everything is fine, but in other cases (e.g. Renata Zemanová (Q95156951) before my last edit) the source NKC has only "historičky umění" as occupation but the bot added also the basic occupation "historian", which in fact is wrong because it is absent in NKC - I have seen other similar cases with "historian" wrongly added where in fact NKC has only "historian of X"; another example, humans being both professors and university professors, in nearly all these cases (e.g. Elliott R. Jacobson (Q112427327) before my last edit) NKC has only "vysokoškolští učitelé" but Frettiebot also added the basic occupation "professor". In these cases the mistake lies in how Frettiebot imports the data from NKC; I would ask to avoid such mistakes when the bot will restart its activity and possibly to try to spot existing cases like the ones outlined above and remove these values (here there is no need of deprecation, because in fact the source mentioned in the references does not contain such values). This is the only change in the bot activity I would require.
--Epìdosis 14:47, 28 October 2023 (UTC) P.S. I have added a subparagraph "Discussion after bot suspension" for better readability, feel free to edit it
Thank you for your work! I agree, just two notes:
  1. As you said, deprecating true statements should be avoided – enforcing our ranking rules is difficult enough as it is now without an ad-hoc extra rule just for a set of cases.
  2. Do we have examples where “statement clutter” is a real problem for human readability? I would imagine that our current color coding for ranks (enabled per default AFAIR) is helpful. In some cases, rearranging values (first best values followed by normal ranks, deprecated ranks at the hand) by hand might be helpful. Collapsed values always carry the danger of overlooking important data and even adding those statements a second time. --Emu (talk) 22:36, 28 October 2023 (UTC)
@Epìdosis As for (4) - adding historian AND art historian and how it happens - we actually have a conversion table that prevents cases like this and tries to understand the whole phrase "art historian" in descriptions (see [4]). The occupation "historian" was added by me to that item two years ago, before this specific handling of occupations was not possible. Vojtěch Dostál (talk) 15:31, 29 October 2023 (UTC)
I'm still preparing for an answer, but it's slower because of my work (an anon archived the section) Pallor (talk) 09:37, 31 October 2023 (UTC)
Thank you for your patience.
I also thank Epidosis for the very detailed summary. Many strategic questions have now been emphasized, but I still feel that we would say yes to a data entry method that will gradually make Wikidata more difficult for both machines and humans to read in the long term. This situation is like when the waves of the sea wash over the shore, which we take for granted and do not put a stop to it. But when the water starts washing garbage ashore, we can't say again, "this is the order of nature" and let it happen. In this case, something must be done to keep the coast free from waste, we must install some kind of filter in order to save both the water and the coast from garbage.
Let's start with the most important, the source.
You write that you think the "national authority files" are the authentic data. I already wrote about this above: it is a library database, that is, it serves to record the descriptive data of the books. This is complemented by a database that lists the authors of the books to a depth that is absolutely necessary to distinguish authors of the same name. What this resource can be used for is to find out: what is the title of each book, who is the author, publisher, where and when it was published, what is the size, weight, number of pages, binding, what is the theme of the book, etc. In these data, the NKC is just as authentic as any other national library. However, this database cannot be used to find out what the authors' authentic and precise(!) biographical data are. Not only because the database does not take into account who studied where and when they obtained what education, what their family relationships are, but not even the exact birth and death data. He is satisfied with the fact that he was born and died in a certain year, but not where and on what day. simply because it is not needed in the NKC database, it fulfills the purpose it was created for without it. The same is the case with occupations: it is enough for the NKC to write about someone as a priest, teacher or athlete. This is a necessary superficiality that satisfies NKC's needs, but not Wikidata's.
This situation is like using the database of a company that trades in agricultural products as a source of SI units, citing that they also use the terms ton and metric meter, or if we were to process the product range of a paint factory to support the values of the compounds as a source, citing that chemical engineering is also behind it. In addition, both example databases can be used, obviously only in the right place. The database of the Czech National Library can also be used when it comes to books, in fact, it can be used to create elements of persons missing from Wikidata, but with regard to precise data, a biographical source must be sought, rather than constantly rewriting superficial data just because someone somewhere on the world wide web he belched them up. I would emphasize again that the problem is not that this data was found, but that, although it is trivial that it came from an inappropriate source, it was constantly rewritten.
If, for example, we were to take over the birth data of the persons in addition to all the values entered precisely in the format: year month day, we could also include the data containing only the year. Or, for example, for all places of birth or death that are narrowed down to a specific administrative unit, we could enter the data of the broader unit one above it. Could this data be wrong? No, they're just not as accurate as what's already in Wikidata. If we accept Epidosis's argument, we open the door to writing any more superficial data from any database. In fact, we could even do it automatically ourselves, since we don't lie with any of them, and sooner or later we're sure to find a source that supports it. Enter only the year of birth under each date of birth. Would Wikidata be better than that? We have to enforce practical aspects that preserve the coherence of Wikidata. And if we don't want to write 1720 next to the exact date of birth (May 8, 1720) in every element, then we have to follow a similar principle for the occupation: we don't want to write pastor next to the Lutheran pastor, and write it next to the secondary school teacher , a pedagogue, next to the hydraulic engineer, that he is an engineer, because it is completely unnecessary. This will just flood Wikidata with unnecessary and meaningless data.
(I'm showing one more error in Frettiebot's editing, which someone may find correct, but I think it's grossly unnecessary. Some positions usually have an element that applies to a specific country and a specific position. For example, the representatives of a country's parliament have the position held (P39) element used in : member of parliament (Q486839) is obviously not an error, but where a local element exists, we use it (see.
Compared to this, Frettiebot mercilessly wrote that the person was also Q486839 for the persons for whom it was mentioned in the source, even though the more accurate element was already there. This query shows the current situation, i.e. those who are members of parliament, their position element has Q486839 as a subdivision, but P39:Q486839 is also specified. There are currently 958 results, of which 459 are Czech or Slovak. Let's look at two: Q1294312 or Q895898. Both have Q486839 with five or six sources, all of which are NKC. Do we need this? No. Can we expect the NKC to describe the precise position in the given context as we would use the Wikidata table? Again no. Whether Frettiebot added this unnecessary element or it was included, it is clear that the proportion of data added unnecessarily would decrease by 50 percent if the data Q486839 were deleted from them, or if we look at the reverse, then the number of meaningless data increased by 100 percent. If we project this onto the properties of birth dates, places and occupations, we can see how much Wikidata would swell if we accepted that superficial data should also be included. I only examined this for a single position, obviously if you look at the number of presidents, finance ministers, museum directors, fire chiefs, etc. is in the database, which can be titled as president, minister, director, commander using a more superficial database as a source, essentially we could "expand" Wikidata indefinitely, without adding a single meaningful piece of data. Not to mention that if a co-editor writes it in, we can correct it, but if it's a bot, we can't?)
Of course, I understand the part of the argument that says that if a common biographical error needs to be corrected, an excellent method is to record the data, source it, make it obsolete, and indicate the correct (according to more recent research) data accurately and with sources, but I think it is quite clear that this is not the case in these cases.
I still maintain that bot editing should end there, where you upload a piece of data and then leave it up to the community members (the people) to decide if the data is important, necessary, and act responsibly without using a bot they should fight. Pallor (talk) 15:49, 3 November 2023 (UTC)
Both of the positions expressed throughout this thread can be sympathised with, but think Pallor's post here is an excellent representation of the general approach to weighing which sourced statements belong in Wikidata. My own opinion of this is formed by having read help pages over the years and finding their advice to be well reasoned and appropriately opinionated.
In summary, this thread revolves around three concepts:
  • Imprecise statements (to be unprioritised). There are infinitely many true statements under an open world assumption. As such, these are unnecessary where a more precise statement is available. The exception to this rule is when their sourcing makes the imprecision somehow notable (e.g. an imprecise year of birth thought to be irrecoverable and widely sourced as such, later discovered precisely in historic records).
  • Incorrect statements (to be deprecated). There are infinitely many incorrect statements under an open world assumption. As such, these are unnecessary where their sourcing is insignificant or not authoritative.
  • Appropriate sourcing. This is the crux of the issue discussed here, because it applies to both of the above. I think Pallor has covered it well in the message I'm responding to, but we probably shouldn't be pulling biographical from a library database unless there is no better source already present.
As for the question of the bot, I agree that not restoring statements removed seems appropriate. Adding them in the first place may generate some cruft, but that's not a huge deal - which is why removing it should also be respected. SilentSpike (talk) 09:20, 5 November 2023 (UTC)
I’m not sure why everybody seems to be so hung up on the fact that NKČR is a library database (or at least has its origins in this field). Why does this make the database less authoritative? --Emu (talk) 11:37, 5 November 2023 (UTC)
Let's look at a specific example to understand: Walt Whitman is perhaps a well-known American poet and essayist, so that we can use his data to examine whether the NKC data is suitable as a source. This is what the NKC data sheet looks like: jn19990009101
This is what some other biographical database items look like:
It can also be determined at a glance that the NKC's data are incomplete and simplified. But not because the NKC is bad, but because the NKC has enough data for its own purposes to distinguish the American writer Walt Whitman from, for example, the American actor Walt Whitman. For Wikidata, however, this is not enough data, because Wikidata strives for completeness. More is needed here.
But I would also like to add that it is not a problem that many new elements have been added based on the NKC, because each new data sheet opens a door to expand these data sheets, supplement and correct incorrect or incomplete data, and remove unnecessary data. The problem is that this data is written back again and again by the bot, you can't get rid of it. It is as if they want to convey through the bot that there is no more accurate data than the NKC data, although we can clearly see that the data is insufficient because it comes from a database that does not provide a complete biography. This makes the concept flawed. Pallor (talk) 12:20, 5 November 2023 (UTC)
  • I had some problems with the bot over the summer but those were fixed. My thoughts on the general principles -
  1. Wikidata has our own data model, and it may not view the world in exactly the same way as other databases. This is fine - we don't need to mirror the exact structure and content of every other database. For example, whether a certain thing goes in occupation (P106) vs position held (P39) was the issue I had problems with. Similarly we may not want to have a generic item for something (like member of parliament (Q486839), mentioned above) when we can have a more specific one. So if Wikidata prefers to use a different property, or something more precise, we should not worry about imports being moved or updated afterwards.
  2. A bot should not be edit-warring with people or with autofix bots. If its edits are being repeatedly undone - especially on a day-by-day basis - it should not keep making them. It might be the autofix bot is wrong - so fix that instead, don't just keep making edits that will get undone.
  3. Considering 1 and 2 above, "Only upload data once" is a good rule of thumb to aim for. Reuploading data should only be done when you are doing it intentionally and you have a reason for doing it.
  4. Deprecating "wrong information" is good but it shouldn't be done just because we imported it in the wrong way - if it's something like "this value should be in P39 instead of P106" then it's just going to confuse people to keep a deprecated value around. It implies it is incorrect / outdated when it's simply misplaced. Andrew Gray (talk) 23:23, 3 November 2023 (UTC)
Three notes to @Andrew Gray's points: 1) We *are* trying to get the bot to understand the Autofix templates and NOT editwar with the autofix bots. This is sometimes difficult for us (I think no other bot is trying the same thing as we are) and it would be better for everyone to come up with a systematic solution for all bots. Currently it is difficult for the bots to load all these autofix commands and keep them updated in our code. 2) We are not asking the community to deprecate our statements in cases where the value was just moved from property to property based on Autofix rules. 3) However, in basically all other cases, as MisterSynergy pointed out, it is virtually impossible for bots to avoid adding the statement unless we stick to our rules and deprecate wrong sourced statements. Therefore, we are asking the community to respect this rule, so that content-adding bots have a place in Wikidata. Vojtěch Dostál (talk) 09:49, 5 November 2023 (UTC)
Vojtěch Dostál: let's be careful not to read what others have written one-sidedly. It's not the problem that sometimes the bot enters incorrect or unnecessary data (although we talked a lot about choosing the right source, didn't we). Sometimes people mess up the data entry, it happened to me too. That's not the problem, because it can be fixed.
The problem is that the bot uploads incorrect and unnecessary data again and again and again, even though people delete it, which means it CANNOT BE CORRECTED. This should be changed. Pallor (talk) 10:08, 5 November 2023 (UTC)
Actually, as you know, the bot does *not* reinsert the wrong statement if it is not removed but deprecated instead. So it is not true that the wrong data entered by the bot cannot be corrected. Vojtěch Dostál (talk) 10:29, 5 November 2023 (UTC)
Then let's start over, because it seems the essence of the discussion didn't get through.
The request is that the bot does not upload the same data over and over again. The bot is a machine and cannot decide whether that data is unnecessary or incorrect. Sometimes there are data that are both incorrect and unnecessary. Part of the reason for this is that the bot spreads them based on an inappropriate source.
However, people can decide and have the ability to correct it. Either by making it obsolete or by deleting it. My proposal is to leave this decision to the people. Let the bot upload the data once and let people decide what to do with it.
I see that there is a consensus that certain erroneous data should be preserved and marked out of date (at least that's what I communicated). Perhaps there is agreement that certain unnecessary data should simply be deleted. Should there be an agreement that this bot should decide, or should we leave it to the people? I prefer the latter. Pallor (talk) 11:25, 5 November 2023 (UTC)
Respectfully, I know what your proposal is. However, if humans remove incorrect statements, Wikidata will be a much more difficult world for bots. I am merely suggesting that we humans agree to not remove unnecessary or incorrect data - and rather set ranks to them, as is the official Wikidata policy. I feel that we both already know what the other wants, and it's now on the community to either go my way or suggest amendments to Help:Ranking Vojtěch Dostál (talk) 17:06, 5 November 2023 (UTC)
  Oppose Generally is possible to remove incorrect statement, because it might be added by mistake (even with source). But if some bot is readding this again, is better to deprecate this statement and prevent bot-revert-warring. JAn Dudík (talk) 07:02, 8 November 2023 (UTC)

Summary

make This conversation will be archived soon, I would like to summarize it.

The claim raised was: "At the same time, we also don't want a bot to UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data." In other words, Frettiebot uploads a piece of data only once, and then entrusts the judgment and fate of that data to the (human) users.

Some of the contributors to the discussion expressed their agreement or opposition by using a template, in which opinions are equal:

supported by Jonathan Groß, Vicarage, GrandEscogriffe
opposed by Jahl de Vautban, Epidosis, JAn Dudík.

The others did not use a template, but you can reconstruct from their comments whether they supported or opposed it (if I drew the wrong conclusion, please let me know):

opposed by @Emu, Vojtěch Dostál, Frettie:
supported by @Vanbasten 23, SilentSpike, Andrew Gray: and finally myself, Pallor [edit: adding myself - RudolfoMD - I also expressed support]

I judged that @MisterSynergy: suggested a third, intermediate solution.

From the summary, I came to the conclusion that several people support the fact that the bot should add some value to Wikidata only once.

I hope this lesson can be used for future data dissemination by other bots. Questions such as choosing the right source or creating a project sheet to record a significant amount of data, for example, were not discussed, but this discussion may provide ammunition for a debate about these later. Pallor (talk) 10:11, 14 November 2023 (UTC)

I honestly don't know what the lesson from this discussion is. Some people agree with our rules at Help:Ranking, some don't, but I don't see a consensus for change. Many prolific bot operators explained why it would not work to remove wrong statements instead of deprecating them. Still, one bot is blocked as a result of this inconclusive discussion. Vojtěch Dostál (talk) 20:22, 14 November 2023 (UTC)
Also, could you please stop making the false claim that Frettiebot added 'unoverwritable' data again and again? I explained numerous time that this is not true, and Frettiebot would not add the data again if they are correctly deprecated. --Vojtěch Dostál (talk) 20:25, 14 November 2023 (UTC)
Unfortunately, from the very first moment, I feel that the communication moves at the level where you react to whatever you want, but you do not write anything to those comments to which you do not have an adequate answer, in fact, you pretend that they were not written at all. This had already been expressed in me before, but I did not want to make this discourse personal. Thus, it is naturally hopeless to reach a consensus, this expectation is only good for delaying the conclusion of the debate. In this situation, of course, there is no other option than to accept the majority opinion.
If it's really the case that you don't understand what the seven editors who agreed with my suggestion were trying to achieve, then at this point in the discussion I can't recommend anything other than re-read the conversation. If you really understand what it's about, you just want to dramatize the situation, then please find another partner, because I don't want to get involved in this play.
I will propose to remove the restriction of the bot's operation, but with the guarantee that it will only write a piece of data to Wikidata once. Pallor (talk) 22:50, 14 November 2023 (UTC)
I’m still puzzled by this discussion: What exactly seems to be the problem? I think we can all basically agree with the idea that a bot shouldn’t UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data. I for one would support such a statement albeit not in the context it was put forward: It has been shown how to handle wrong data. As to “unnecessary“, well, there seems to be some disagreement about what constitutes necessity – but that’s not really a bot issue per se, is it? --Emu (talk) 00:19, 15 November 2023 (UTC)
Was it ever established what fraction of the bot's changes are regarded as unhelpful? Before we re-enable it we need to know how much human time will be spent clearing up after it, so we can assess whether it is of net benefit to WD. Vicarage (talk) 04:43, 15 November 2023 (UTC)
Is unnecessary the same as unhelpful? If so, the core of the problem still doesn’t seem to be the potential various misdeeds of the bot but rather different opinions about necessity and helpfulness … --Emu (talk) 06:27, 15 November 2023 (UTC)
Emu: I want the bot to publish data for an item only once. After that, whatever happens to this data - community members make it obsolete, delete it or fix it - it would no longer publish this data. This is clearly a bot operation issue. Pallor (talk) 09:45, 15 November 2023 (UTC)
Of course, I am also oppose, btw. And method, that bot will be allowed to work by importing only ONCE is a very dangerous precedent and may lead to threat to Wikidata as an updated (and still actual) database. This will defacto set a precedent where any human edited item can never be overwritten or edited by a bot. And I see that as a huge threat.--Frettie (talk) 10:38, 15 November 2023 (UTC)
I agree. This proposal would mean that each database could only be imported by a bot once. This would eliminate one of the main advantages of bot usage: Updating statements isn’t exactly fun and we need fun to attract and keep human volunteers. Therefore, this cumbersome process should be left to bots if possible. And they can’t do that if they can only touch statements or even items once. --Emu (talk) 11:07, 15 November 2023 (UTC)
Sorry, but there is some fatal misunderstanding here. That's the summary, the discussion continued one stage higher. What you are writing here, you have already partially described above, I think it is unnecessary to describe it in every section. If I did not open a new section for the summary, the discussion would have been archived today.
What has not been answered at all is, for example, the inappropriate choice of sources. The issue of mitigating the redundant data (for example - but not exclusively - what will be the fate of "National Assembly representatives"). Low-quality communication (even now I could point to a section on Frettiebot's discussion board that is unresolved). And of course I could give other examples that could have been discussed in the above section, but did not take place.
For my part, I insist that Frettiebot not get his editing rights back as long as he is in danger of uploading unnecessary data to Wikidata, because that poses a greater threat to our database than the issue of updates, and as the dispute stands it seems that bothers several editors. Pallor (talk) 11:37, 15 November 2023 (UTC)
But those are very different things. Not responding at all is a problem, that is true. Not fixing obvious problems is a problem too. But uploading data you consider to be low quality or unnecessary isn’t a problem per se. The discussion has clearly shown that your views on those concepts aren’t exactly consensus. --Emu (talk) 11:42, 15 November 2023 (UTC)
Volunteers who find their accurate changes overridden by a bot won't stay. At least with a dispute with a person there can be discussion and the time devoted is equal both sides. That's not true with a bot, especially if the owner will not engage. Remember frettiebot continued to run well after the issues were raised. Vicarage (talk) 11:57, 15 November 2023 (UTC)
Please be aware that accurate changes, when done properly, are never overridden by Frettiebot. Vojtěch Dostál (talk) 12:28, 15 November 2023 (UTC)
Of course, I understand that, from the point of view of the storage space, uploading unnecessary data is not a problem, as it will fit. I also understand that it is not a problem from the point of view of some queries either, because whoever is looking for version "A" does not mind that there is also "A1" and "A2" and "A3". However, there is a problem when we perform maintenance and want to clarify the data entered with the A3 version and correct it to "A" or/and "B" or/and "C".
It should also be seen that the description of Wikidata data states that the data should serve to better understand the given thing. If I write "pastor" next to the occupation of an evangelical pastor, do we understand better? I do not think so. If I write "member of parliament" next to the position of a member of the Czech parliament, will it be more understandable? No. This is a method that goes against the basic principles of Wikidata. Pallor (talk) 12:37, 15 November 2023 (UTC)
I understand that you only want the most precise version of a given information to be in Wikidata. Several people including me have tried to explain why this might seem like a good idea at first glance but at the same time is a flawed concept and indeed in many cases detrimental to the project (the occupation (P106) issue is a little more nuanced, I’ll give you that, but we seem to be beyond nuances at this point). In any case, this wish can’t be grounds for blocking a bot. --Emu (talk) 16:51, 15 November 2023 (UTC)

SIMPLE: Lymantria blocked Frettiebot "Until resolution of issues on Frettiebot's editing". The consensus that issues with the bot's editing require code changes (which were not forthcoming) is is what caused the block and are the reason it's still in place. Code changes to address the issues haven't been made. There are no grounds for unblocking the bot. The end.

Folks, if you don't understand what the issues are, then "re-read the conversation". If you still don't understand what it's about, leave it to those who do.

RudolfoMD (talk) 05:49, 19 November 2023 (UTC)

Nope, this is not an accurate summary of this discussion. I am afraid I see no clear consensus for change. @Lymantria, what do you think the bot owner should do in this case to qualify for unblocking? Vojtěch Dostál (talk) 08:56, 23 November 2023 (UTC)
Lymantria: we did not receive substantial answers to some of the questions that arose, so no consensus could be formed. At the same time, Vojtěch also admitted that some of the entered data was unnecessary. In the summary, I quantified that more people opposed the previous operation of the bot than supported it. Of course, the operation of the bot cannot be blocked forever, but the previous operating principle does not have adequate support. These aspects must prevail. Pallor (talk) 09:40, 23 November 2023 (UTC)
@Pallor Yes, there indeed is some opposition to how the bot operates, but the discusssion should not be evaluated by the number of 'votes'. Furthermore, that opinion collides with some of our key principles outlined in our written documentation, which I've linked before. To me, the fundamental question arising from this discussion is how we should operate bots when there are clashes on these fundamental principles - should this be further discussed here, or should the bot operator start a RfC on these fundamental topics, or should it be discussed via a Request for a Bot Permission? This is why I tagged @Lymantria who I think is experienced in these matters, but of course, anyone else's opinion is also appreciated. Vojtěch Dostál (talk) 10:00, 23 November 2023 (UTC)
Is there anything concrete (beyond your rather far-reaching ideas about necessity and usefulness) that the bot operator could fix? --Emu (talk) 18:13, 23 November 2023 (UTC)
Emu Yes, the data should be uploaded by the bot only once.
I realize that I am not considered an old editor, because I have only been here for 5 years. But I have never, ever seen a data spread that added the same data to the element multiple times. So far, there have been examples where the bot entered data into the element once. After that, the editors decided whether that data was appropriate, relevant, or not. He Frettiebot also works this way, then I will be satisfied. Pallor (talk) 10:32, 24 November 2023 (UTC)
I think it has been sufficiently shown by MisterSynergy that this is not a reasonable thing to ask from a bot operator. --Emu (talk) 11:16, 24 November 2023 (UTC)
I don't think that a bot should judge whether sufficiently sourced data is "unnecessary" or not. I do think however ranking correctly can be requested from a bot. A bot should not be asked to deprecate correct data, but it can be asked to give preferred rank to more (in fact the most) precise data, which it can determine by subclass of (P279). Is Frettie capable to change its bot in order to take care of this? If data is wrong, but sourced, it should be deprecated if a (bot or a) human notices that. I noted that Frettiebot recognise deprecated data and does not change its ranking. Correct but possibly "unnecessary" data I judge as unproblematic if coming from a source that has shown to be a useful, as is the case in this discussion. --Lymantria (talk) 19:52, 23 November 2023 (UTC)
The request to assign a preferred rank if a more precise information is already available seems fair to me. --Emu (talk) 09:17, 24 November 2023 (UTC)
Lymantria I'm sorry that you see it this way, since I wrote at length about the fact that the source used is not the most optimal, there are better sources, and I supported this with examples. Much of the data entered is too imprecise, redundant or simply not Wikidata compatible. With such a decision, we are opening the door for all parliamentarians, ministers, mayors, ambassadors, etc. among his positions, let's add the general designation next to his already existing specific position, all this just to make it obsolete: Minister of Foreign Affairs in Belgium (Q1670832)=minister (Q83307) or Lord Mayor of London (Q73341) = mayor (Q30185), etc. We are sure to find a source where these common names are mentioned. And it's actually priceless to find a generic, unnecessary name for anything and fill Wikidata with it. So I still maintain that this is not a good source for the uploaded data.
If I ask you to write an RfC for this, will you do it? I have not done this before and my English is not strong enough. Pallor (talk) 10:23, 24 November 2023 (UTC)
I fixed the indentation on your comment, Pallor. Also, I think it falls to someone wanting the bot reactivated to make the case/write an RFC, rather than on Lymantria. My summary was accurate. RudolfoMD (talk) 11:39, 24 November 2023 (UTC)
Okay, let’s be more specific: Are you suggesting that the bot shouldn't import certain positions that are unsuitable for occupation (P106) usage because they belong to position held (P39) and are too unspecific? And could you come up with a list of those values? This could be a compromise that is beneficial to Wikidata. --Emu (talk) 14:31, 24 November 2023 (UTC)
The solution has been presented umpteen times. The bot should keep track of what it has added (or use wikidata history) to not override manual deletions. Again, this is not just about P106. RudolfoMD (talk) 21:55, 24 November 2023 (UTC)
I repeat: I think it has been sufficiently shown by MisterSynergy that this is not a reasonable thing to ask from a bot operator. --Emu (talk) 22:03, 24 November 2023 (UTC)
I don't. It wasn't shown. And he did NOT say it was infeasible in this situation. FS! RudolfoMD (talk) 22:24, 24 November 2023 (UTC)
To expand on this: The "09:05, 28 October 2023 (UTC)" doesn't 'show' anything. It makes a claim. And not the one you present.
Clarifying what is the most appropriate solution IS productive, IMO. RudolfoMD (talk) 22:32, 24 November 2023 (UTC)
Also, the bot doesn't have to literally keep track of what it has added or use wikidata history; when re-run, it could only add new data by only extracting new data to add in the first place. RudolfoMD (talk) 23:48, 24 November 2023 (UTC)
I don't mind if we decide to run a bot which up-ranks the most precise occupations, and if this is the only issue standing in the way of unblocking the bot, I am sure Frettie would assist and we could together devise such a bot job. But can you please make this proposal clearer? Because we need to define such a job and we need the help of those who propose it. For example, we might want to up-rank all occupations when no other statement is subclass of that occupation. We probably want to do this for all statements, not just the sourced ones. However, we might want to skip the statements which already have a non-normal rank. And we might also skip all items where no occupation statement is a subclass of another occupation statement. This is already getting quite complicated and it shows why ranking is usually left to human editors... Vojtěch Dostál (talk) 19:58, 24 November 2023 (UTC)
I'm skeptical Frettie is willing to make such bot. Evidence is needed. (Also, my read is that there is much opposition to this solution, as there is a lot of support for not adding low-quality position info when there is high-quality position info; the bot should simply be modified to stop adding low-quality position info when there is high-quality position info. Many maintain it is the case that Frettiebot kept adding 'unoverwritable' data again and again because deprecation is not the correct solution; saying it is over and over doesn't make it so. And you've been chastised for pushing this over and over already, e.g. by Jonathan Groß.) There's already a ton of info on what the bot should not add for Frettie to act on, but no interest expressed in doing so that I have seen. RudolfoMD (talk) 21:17, 24 November 2023 (UTC)
Please try to be productive. --Emu (talk) 22:12, 24 November 2023 (UTC)
Please clarify. Clarifying what is the the situation and most appropriate solution IS productive, IMO. RudolfoMD (talk) 22:35, 24 November 2023 (UTC)
Frettie hasn't yet replied to Vicarage's comment of 22:05, 21 October 2023 (UTC), far above. There is reason for skepticism. RudolfoMD (talk) 23:52, 24 November 2023 (UTC)
Please bear in mind that we are all volunteers here and nobody is under any obligation to respond in a certain time frame or at all. --Emu (talk) 08:26, 25 November 2023 (UTC)
I find that comment is inappropriate. I asked you to clarify and you are avoiding/refusing to do so. On wikipedia, at least, there is an expectation (PAG) that admins, especially, respond to reasonable questions. Not here. Your comment that I asked you to clarify was implicitly threatening me with your tools, and tersely/harshly critical, yet you refuse to clarify. I would ask that you strike it if you won't clarify it, or at least drop the matter. A comment below supports that my skepticism about willingness is well-founded. RudolfoMD (talk) 09:44, 26 November 2023 (UTC)
Useless to fix a bot at a time when there is discussion about possibly banning all active (other than only insert once) bots. --Frettie (talk) 11:59, 25 November 2023 (UTC)
Surely its trivial to flag pairs of occupations where one is a subclass of the other, and remove the most generic. Vicarage (talk) 22:50, 24 November 2023 (UTC)
Removal where? --Emu (talk) 23:25, 24 November 2023 (UTC)
From the person. But it equally well applies for all the military museums that are also instances of museum and tourist attraction. Vicarage (talk) 06:34, 25 November 2023 (UTC)
Valid referenced statements should never be deleted Piecesofuk (talk) 08:12, 25 November 2023 (UTC)
Exactly. --Emu (talk) 08:27, 25 November 2023 (UTC)
As so often other sources do not match the WD ontology, they can pollute as well as as inform. WD needs to be a consistent, editable, queriable resource, not a rag-bag of others facts Vicarage (talk) 11:28, 25 November 2023 (UTC)
I like this. This is a good direction. Everyone should think about this. Pallor (talk) 01:39, 25 November 2023 (UTC)
This could be part of a solution. Would need to also address what other axes? locations? dates? remove the most generic, yes? (as you mentioned, Paris, year of death...) I'll be pleasantly surprised if its easier than avoiding overrides. RudolfoMD (talk) 07:20, 25 November 2023 (UTC)
I would not participate on developing a bot which *removes* sourced statements, as opposed to up-ranking. Vojtěch Dostál (talk) 07:12, 25 November 2023 (UTC)
I agree. I also see a lot of possible criticism from other users who don't want to delete everything that three users here wish.--Frettie (talk) 12:01, 25 November 2023 (UTC)
Frettie, I still see this as low level communication. On the one hand, because you know full well that it is not the wish of three users, since I have aggregated how many editors disagree with the editing principle of your bot. On the other hand, because what you read is a suggestion in the direction of compromise. You don't have to accept it, but not to discuss it is to reject the compromise. Please consider this to be the first suggestion in the debate between the two positions that points in the direction of a possible solution. (However, it is possible that RudolfoMD is right, and that it is a more complicated solution than setting the bot to edit once per data, but in a democracy sometimes the more complicated and costly solutions represent the consensus.) Pallor (talk) 12:15, 25 November 2023 (UTC)

Second Summary

To sum up again: The bot is currently blocked [u]ntil resolution of issues on Frettiebot's editing. When questioned what specifically has to change, a few ideas emerged:

  1. Data should only be added once: MisterSynergy’s assessment of the impracticality of this request has not been substantially questioned, at least I haven’t found a rebuttal when reading the whole discussion again.
  2. The bot should keep track of what it has added and not override manual deletions: Do you have the same doubts that apply to your response to request #1, @MisterSynergy?
  3. The bot should set the most precise occupations to preferred rank: There seems to be no real opposition but it seems to be questionable if that‘s really the problem.
  4. Certain values should be avoided: The interested parties haven’t come up with a list of those values.
  5. The bot should delete imprecise statements even when sourced.
  6. Low-quality communication should be improved upon.

The main problem with #5 seems to be that this goes against several Wikidata principles. The problem with #6 seems to be that it’s unclear what should change and how change would be measured. --Emu (talk) 14:42, 25 November 2023 (UTC)

What I contributed earlier to this discussion still stands. Bot editing is effectively a stateless operation; a bot does not have sufficient access to its previous edits, or to edits others have made to a given page. While revision histories and contribution lists can be accessed to read revision metadata, it is super difficult to extract useful information from it regarding the actual editorial content of an edit. It is thus reasonable to assume that by default all bots do not know anything about past activity; and that every bot operates based only on the current state of an item page, and the content of an external source (in this case).
In order to change that, a bot operator would somehow need to set up a shadow database regarding previous edits of their bot, but given the wide range of different edits a bot can make, it is unclear how this could work in a reliable way and there is no existing solution one could readily use. If a bot would be required to do this, it would effectively render its operation impossible.
In other words: #1 and #2 would kill this bot, and set a dangerous precedent for future cases. —MisterSynergy (talk) 18:33, 25 November 2023 (UTC)
I think #1 is a perfectly reasonable request in this context, which is a bot that got blocked mostly because it kept adding data even when people were removing it. I don't agree that we can't ask for it because it wouldn't work for all bots ever.
Is it impossible for this bot to make a reasonable attempt to not upload the same data? (I don't think anyone has said yes or no on this - only talked about general precedents.) I don't know how its data is generated, but it feels like this should be achievable. No need for edit-history parsing or 100% accuracy, just a reasonable good-faith attempt to avoid pushing the same data into WD over and over again. Most bots & batch uploaders seem to manage it. Andrew Gray (talk) 00:47, 26 November 2023 (UTC)
After rereading the whole thing again, I'm still not convinced that this a bot problem at all.
1 and 2 at least wouldn't arose if people deprecated validly sourced statement instead of deleting them because they think that the source is worthless. I certainly do thing that some sources are worthless, but not national library authority files. I have only seen Walt Whitman (Q81438) put forward as an example of why library authority files shouldn't be used, and apart from the dates that are year and not day level I don't see anything wrong with the data.
3 might be a good idea theoretically, but if that means pushing as preferred rank values which are unsourced I don't think it's a progress. As a basic we also need to be sure of our subclasses' modelling quality.
4 as for dates, following the example I took earlier, an improvement would be not to import dates when a more precise and sourced value is available, though I'm not sure if a bot can tell that a value is more precise than another without a qualifier to say so. However my main concern with Frettiebot is when it's edit warring with KrBot over autofixed values. This is what lead to László Szalay (Q1294312) or Ferdinand Friedensburg (Q895898) situation with member of parliament (Q486839). But Vojtěch said previously that keeping track of all autofix template is difficult and I see no reason not to believe them. Still, that would be a really good thing.
5 is an absolute no.
6 frankly, I have seen a lot of passive-aggressive comments or outright mistrust of good faith in this thread and I think it isn't only on the bot operator side to improve their communication. --Jahl de Vautban (talk) 07:01, 26 November 2023 (UTC)


Correct. I agree w/ Andrew Gray. As I have proposed earlier, the bot doesn't even have to literally keep track of what it has added (though that is certainly do-able) or use wikidata history; when re-run, it could only add new data by only extracting new data to add in the first place. This could work in a reliable way too. #1 does NOT effectively render its operation impossible. Bots are not necessarily stateless. (Felt the need to reply as MisterSynergy had said something new - responded to explain what he saw as the hurdles. I still sense glacial progress.)
5 is a straw man. RudolfoMD (talk) 23:28, 26 November 2023 (UTC)
A few notes on what @Andrew Gray and @Jahl de Vautban have written (thank you both for civil comments, appreciated). This discussion may create an impression that this bot's edit have a high revert rate. I don't think this is true - 99.99% edits are OK. The largest share of the edit-warring is the aforementioned Autofix template, which develops over time and it is sometimes hard for us to keep pace with it - even though we try to do our best. This issue is relevant for all potential similar bots and I would like that we build a framework that provides easy access to Autofix commands to all bots in real-time. This is something we are actively thinking about. The last concern is the "add-only-once" policy. Adding our data only once is difficult (among other reasons) because the entries improve over time, and we want to make these updates appear in Wikidata. Therefore, a more complex system outlined by @MisterSynergy would be required. It is not impossible but I think that a better solution - more in touch with our existing policies - is to deprecate the (very few) outright-false statements which may appear in National Authority files. Vojtěch Dostál (talk) 07:44, 26 November 2023 (UTC)
@Vojtěch Dostál I agree that updated records can be a problem, but some of the issues here were data being added and re-added five or six times in a single week - it seems likely that was a bot setup issue and not the original source being continually re-checked? Hopefully that sort of thing should be relatively easy to fix. Again, I don't think we need to aim for 100% perfection or checking the item history or anything, just setting things up so it isn't likely to happen too much.
I don't disagree with "deprecate the outright false statements", but how we define 'false' is still a bit of a question mark - is it just things that are objectively wrong (born in 1787, not 1878) or things that are correct when stated elsewhere in the data model (we use P39:mayor, so deprecate P106:mayor)? I'd not be keen on that last one, since it feels like the only reason we have the statement is to avoid bot problems - it feels like it would be better to find some way of preparing the upload so that things go into the right properties in the first place. Andrew Gray (talk) 00:19, 28 November 2023 (UTC)

How long will we argue over how many angels can dance on the head of a pin? I think this conversation should be closed; at this rate, it'll be months to get anywhere close to resolution. Not worth it; seems more disruptive than productive. Unsubscribing not sufficient, so closing, based on state of discussion. No new or sound old arguments presented.

SIMPLE: Lymantria blocked Frettiebot "Until resolution of issues on Frettiebot's editing". The consensus that issues with the bot's editing require code changes (which were not forthcoming) is is what caused the block and are the reason it's still in place. Code changes to address the issues haven't been made, and Frettie has expressed little interest in making any. There are no grounds for unblocking the bot. The end.

4: Misconstrues. There's a lack of consensus on those values.
How long will we argue over how many angels can dance on the head of a pin? I think this conversation should be closed; at this rate, it'll be months to get anywhere close to resolution. Not worth it; seems more disruptive than productive. Unsubscribing. RudolfoMD (talk) 12:17, 26 November 2023 (UTC)
It would be simple if you would use ranks, just as it is the norm in Wikidata and as it was suggested several times in this discussion. IMO the bot should immediately be unblocked without any requirements. —MisterSynergy (talk) 12:51, 26 November 2023 (UTC)
I do not disagree with this assessment of the current state of this discussion. --Emu (talk) 20:11, 26 November 2023 (UTC)
Discussion closed. Documentation at Template:Closed is wrong missing; template import is incomplete; 'result' argument is being ignored. Unsubscribing not sufficient, so closing, based on state of discussion. No new or sound old arguments presented. --RudolfoMD (talk) 23:11, 26 November 2023 (UTC) .
What I am missing is a notification by Frettie of willingness to implement some ranking, which IMHO is consensus upon. --Lymantria (talk) 08:06, 27 November 2023 (UTC)
Hi, @Lymantria, thanks a lot for your response, we will try to implement some realistic ranking process. It is not yet clear how the process should look like. We would be happy to have the community help us with this. --Frettie (talk) 08:32, 27 November 2023 (UTC)
I have given this reaction some time. It seems to me that it is time to "release" Frettiebot. --Lymantria (talk) 19:46, 4 December 2023 (UTC)
What was the resolution of the issues that led to the block? There's been no change to Frettiebot noticed here. So no resolution of the issues that led to the block, or consensus that the block was improper. ISTM, reversing the block would be wheel warring. RudolfoMD (talk) 02:02, 5 December 2023 (UTC)
There seem to be no issues left that are still relevant after this discussion, at least not in a way that would make blocking necessary. --Emu (talk) 08:00, 5 December 2023 (UTC)
I judged the same as Emu. As blocking admin I don't see how wheel warring comes into play when I am the unblocking admin as well. --Lymantria (talk) 08:09, 5 December 2023 (UTC)
@Lymantria: this is a strange and surprising decision. Nothing has been fixed, and Frettie has admitted that he is not going to fix anything now because he does not know what to do. Frettiebot is back doing problematic edits like these redundant places of birth/death. The practical consequence of this edit was degrading the infoboxes of several languages' wikipedias with duplicate data.
The decision process should have been (could still be): stop the bot -> make consensus on what the bot shoud generally aim to -> make consensus on what exactly the bot shoud do -> implement it in the bot's code -> restart the bot. We are barely at step 2. GrandEscogriffe (talk) 22:09, 5 December 2023 (UTC)
Please try not to delete statements just because you consider them to be redundant. --Emu (talk) 23:27, 5 December 2023 (UTC)
Adding sourced statements, perhaps redundant, is not problematic for the aim of Wikidata. They should never be removed. The correct way to deal with them is to use ranking, if necessary. Frettie has expressed his willingness to adjust the bot accordingly and asked for help to practically do so. I suppose, GrandEscogriffe, you have offered such help to deal with cases like this one. --Lymantria (talk) 07:38, 6 December 2023 (UTC)
I and several others disagree with your position on redundant sourced statements. Even if I had the time and ability (currently I have neither, if this means bot programming expertise) I would not help you implement something that I disagree with and which is apparently not consensual. I think at this point the way forward is a formal vote (or request for comments) on the divisive questions.
In any case I am not going to have much time for Wikidata, but I had to at least support Pallor's and Rudolfo's point. GrandEscogriffe (talk) 17:49, 6 December 2023 (UTC)
Indeed. (But my mistake on the wheel warring!) RudolfoMD (talk) 01:50, 6 December 2023 (UTC)

I really regret this decision. I think it is clear from the summary of the discussion that the majority of participating users do not agree with the operation of the bot. It is also clear that no change has been achieved in this - even by offering compromise solutions. I do not consider this to be a democratic solution, I would like to indicate that I only acknowledge the decision out of necessity. Pallor (talk) 08:51, 5 December 2023 (UTC)

I challenge all interested to come up with a realistic proposal for how the profession-ranking job should be set up. I outlined some basic concepts and challenges above on 19:58, 24 November 2023 and we really are serious in the promise that we would assist on it, but a more systematic discussion will be required to turn it into reality. Possibly start a proposal at Wikidata:Bot_requests and tag me and Frettie there...Vojtěch Dostál (talk) 10:34, 5 December 2023 (UTC)

Vojtěch Dostál It's a sympathetic gesture, but I don't operate a robot, so I can't ask relevant questions. It would be helpful if you opened that discussion and asked the questions that knowledgeable bot operators could answer. On the other hand, I feel that this is more of a practical problem than a theoretical one. As I understand it, we want existing property assertions to not be subclasses of the new assertion (and vice versa). The main question is whether the elements are properly filled out, for example, is it stated for the pastor of each church that they belong to the pastor's subdivision? Pallor (talk) 09:30, 6 December 2023 (UTC)
On a related note, what will happen when the subclass arrangement changes and occupation1 ceases to be a subclass of occupation2? Would the bot then be expected to change the ranking scheme again? How does it know if the ranking scheme is actually not an outcome of a manual edit of a user. Situation then becomes very complex very soon. This shows how difficult this bot job will be. I can't even see all possible repercussions and like you, can't ask all the relevant questions. Vojtěch Dostál (talk) 10:27, 6 December 2023 (UTC)
In edge cases, the bot could just err on the side of doing nothing. GrandEscogriffe (talk) 17:50, 6 December 2023 (UTC)
If the class structure is subject to change, who is to say that the source's meaning of the word is WD's. And what if the source changes. This whatiffery needs to be balanced with clutter. A bot doesn't care about clutter, a query doesn't care about clutter (unless it times out, a common occurrence now), but for sure people eyeballing the entries do. The subclass system passing information to the highest relevant node is key to WD's brevity and usability, using external sources with coarser precision undermines that. Vicarage (talk) 18:20, 6 December 2023 (UTC)
With all due respect: Basically you (and others) are saying that you have aesthetic objections if there are too many statements. Seems like a job for a userscript or gadget – not a reason for blocking bots. ----Emu (talk) 20:13, 6 December 2023 (UTC)
I have started a bot request here. I can't fail to notice that while I am among those that were fine with Frettiebot, I am the one starting a bot request to find a solution to problems other are seeing. --Jahl de Vautban (talk) 20:22, 6 December 2023 (UTC)
I think my vision for WD differs from yours. I think of it as a human-curated set of facts, with bots used to aid their collection and justification, but a single set of hierarchical answers. Some AI system can clearly collect the opinions of worldwide authorities and other AI tools can present them, but that's not what I want WD to be. My vision requires that human assessment is key, so usability for humans is key, and aesthetics are vital for that. Perhaps WD will fork as the AI tools develop, and the automated one will prevail, as Google did over Yahoo. But I think the Yahoo approach has merit. Vicarage (talk) 20:29, 6 December 2023 (UTC)
Again: If human preferences are an issue, you should create a user script that hides any imprecise information that might bother you. --Emu (talk) 21:28, 6 December 2023 (UTC)
Is there a user script that hides all the citation qualifiers? I rarely have any interest in any of that. Or one that hides all the deprecated information which the bot-human interaction is likely to generate. Vicarage (talk) 21:51, 6 December 2023 (UTC)
  • If you want a certain bot policy, take the effort to think through our existing policy documents and make proposals about how to change them. Having a long-drawn out discussion on the project chat that's going to be read by relatively few people is not a good way to change our bot policy. ChristianKl02:14, 12 December 2023 (UTC)