Wikidata:Project chat/Archive/2021/08

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Adding a category when its in use

Hi. I wanted to add Commons Category:Confederação Nacional da Indústria to Q1125075. I do this by clicking Multilingual sites, specifying commons then linking. When i do it that way a link to the category appears in the sidebar in WP. but in this case the cat is linked to item Category:Confederação Nacional da Indústria (Q27601937). What is the recommended way of dealing with this? Leave as is or is there a way to do it. TIA ̴̴̴̴̴̃Gbawden (talk) 11:20, 30 July 2021 (UTC)

Could I fix it? Please check. Obrigad@. --E4024 (talk) 14:17, 30 July 2021 (UTC)

ˑLooks like it. Thanks @E4024: Gbawden (talk) 14:21, 1 August 2021 (UTC)

Wikimedia disambiguation page

Are there plans when we have an entry for a Wikimedia disambiguation page like John McGovern (Q6247812) to list all the entries at Wikidata like they are listed in Wikipedia? It would be great when looking "John T. McGovern" if I can look at all the people we already have at "John McGovern" or are we worried that we would be overwhelmed by all the "John Smiths"? --RAN (talk) 18:12, 30 July 2021 (UTC)

No. There are no such plans. --Tagishsimon (talk) 18:38, 30 July 2021 (UTC)
So some (including me) do that and others don't. But now I don't have a specific plan as to when to do it. I do it when it suits me. But I also think it's good if the data object also contains the entries from the Wikipedia disambiguation page. --Gymnicus (talk) 18:57, 30 July 2021 (UTC)
From which DAB page, Gymnicus, given that an item for a DAB can be connected to many distinct DAB pages? How exactly do you deal with the issue that the contents of each linked DAB page will differ one from another? --Tagishsimon (talk) 19:04, 30 July 2021 (UTC)
I would like to list such items in Wikidata too. Why to not use different from (P1889) or something other? Sometimes different from (P1889) is used for example in Masterpiece (Q933985) but if it says criterion used (P1013) title refers to multiple creative works (Q55761780) where is the sense of it? Here is also the problem if one song has for example 15 different releases of its single so if you have just 15 same titled songs you can have 150 releases of it single + music videos, lyrics videos with same title. Eurohunter (talk) 19:51, 30 July 2021 (UTC)
different from (P1889) is used to indicate that this item differs from that item. It is not much use as a solution to the question of how a DAB item deals with the disparate content of language wiki DAB pages. In other news, WD already lists this stuff; just, sensibly, in a series of discrete items, rather than all in a single item. There is a school of thought that wishing to replicate the contens of a language wikipedia list/DAB in a WD item is to mistake, completely, pretty much everything that WD is about. --Tagishsimon (talk) 20:07, 30 July 2021 (UTC)
Wikidata does need better and more easily accessible tools to allow humans (not robots) to more quickly locate ambiguously named items, but in my opinion listing them all on disambiguation items is not the way to do it. We have multiple similarly named disambiguation items already (e.g. Tom Smith (Q529269), Thomas Smith (Q423054), Thomas A. Smith (Q18230777), Thomas B. Smith (Q18381926) Thomas R. Smith (Q18230781), Thomas S. Smith (Q18230783)...), which could have a lot of overlapping listings. And since any individual items (there could easily be hundreds) listed on a disambiguation page won't have descriptions or any other distinguishing features displayed, except perhaps unique middle names, disambiguating items isn't necessarily any easier, and more clutter is created. The better solution I think is some sort of advanced inhouse search option for people who don't know SPARQL: e.g. the ability to limit "Tom Smith" results to humans born after 19XX, whose occupation is X or subclass of X. Wikidata Query Service comes close, but could revamped to be much more user friendly and intuitive, with more lookup boxes and options in the Query Helper. -Animalparty (talk) 20:09, 30 July 2021 (UTC)
We can create dedicated pages such as Wikidata:James Baker, but it is difficult to maintain thousands of such pages.--GZWDer (talk) 08:17, 31 July 2021 (UTC)
  • I think the structure would be to have the entry "John Smith" use has_part "John A. Smith" and "John B. Smith" as well as the entries for people only known as "John Smith". "John A. Smith" would use has_part "John Aloysius Smith" and "John Adolphus Smith". "John Aloysius Smith" would use has_part "John Aloysius Smith I" and "John Aloysius Smith II". It would have to be automated. --RAN (talk) 17:27, 31 July 2021 (UTC)
Perhaps rather than thinking about embarking on the building of this folly, you might reflect that WD already provides means by which lists of people can be assembled by reference to their names. To return to your original question "Are there plans when we have an entry for a Wikimedia disambiguation page like John McGovern (Q6247812) to list all the entries at Wikidata like they are listed in Wikipedia?", neither you nor Gymnicus has seen fit to address the issue that the contents of similarly named language DAB pages differ one from another, presumably complicating or frustrating any attempt to replicate their contents in a single item. Not content with skipping way from that flawed premise, you're now contemplating a hierarchical array of people name pages seemingly detatched from any need or purpose. It's entertaining enough, but please start making it make sense something. --Tagishsimon (talk) 17:51, 31 July 2021 (UTC)

Where are the 4 fallback languages for the "In more languages" section defined for unlogged users?

I understand that for logged in users, the languages displayed depend on Babel preferences, but what determines the language selection for IP users? Iketsi (talk) 04:28, 31 July 2021 (UTC)

  • The system looks up your country and then lists the commonly spoken languages according to a table. There were previous discussion but I don't know whether there's a page someone that documents it well. ChristianKl06:59, 1 August 2021 (UTC)

"Mayor of X" (the job title) versus "mayor of X" (the generic office)

Which do we standardize on? We have President of the United States (Q11696) and I think all lower offices should be capitalized. We currently have a mix of both forms with editors switching between the two forms to their preferred version. If we standardize, it will save time and effort of the switching back and forth. --RAN (talk) 04:58, 31 July 2021 (UTC)

You presuppose that standardization is desirable, and/or that there is any consensus as to what that standard might be. Taking office/function versus position, versus custom in the many countries that have mayors, it's possible to imagine that your one-size-fits-all approach is wholly erroneous. --Tagishsimon (talk) 00:27, 2 August 2021 (UTC)

ejector (Q784245) - ejector vs ejection port

Today, I found the item for ejector (Q784245) and merged the Polish equivalent (Q9380101) into it. However then I realized that the German label says "ejection port". I checked the history and found that originally it was a German item about an ejection port. Later, someone merged the Norwegian item about an ejection port (Q19394045) into it. Then someone changed the subject of this item from an ejection port to an ejector. This is wrong, since Qids are supposed to be permanent and stable, so third parties can rely on them. This is why we do redirects and such.

My question is: how do I fix all this mess? The end state should be two separate items: one for ejector and one for ejection port. Should I reuse the existing IDs for that? If yes, then which ones for which subject? Or should I create new IDs? Please help. --Tengwar (talk) 21:54, 1 August 2021 (UTC)

referencing specific statements

It took me some time to find out that specific statements have an URI like https://www.wikidata.org/wiki/Q1073242#P8533. How can I use this as reference? reference URL (P854) rejects it because it contains "wikidata.org" and inferred from (P3452) only accepts items. What property to use? --SCIdude (talk) 08:18, 1 August 2021 (UTC)

You can even link to specific statements by using the statement id, as in https://www.wikidata.org/wiki/Q1073242#Q1073242$687de105-40fc-29a9-4366-0bbadb353e39.
However, there is no property which directly supports these sort of URLs. I think something similar as inferred from (P3452) would be desirable, to keep these "internal" references separated from external URLs. —MisterSynergy (talk) 08:33, 1 August 2021 (UTC)
How to propose a property if there is not even a Wikibase datatype (Q19798645) for statements or claims? --SCIdude (talk) 08:51, 1 August 2021 (UTC)
See https://phabricator.wikimedia.org/T285157 --SCIdude (talk) 08:57, 1 August 2021 (UTC)
I don't think we want such properties. What if someone deletes the statement/claim and remakes it? BrokenSegue (talk) 19:05, 1 August 2021 (UTC)
The same happens with URLs, the solution there is the retrieved date. --SCIdude (talk) 14:09, 2 August 2021 (UTC)

Wikidata weekly summary #480

New WD:DEV

Hello, Would you like {{Discussion navigation}} to return to the new WD:RATP page? To give your opinion, let's meet there. —Eihel (talk) 00:23, 3 August 2021 (UTC)


Call for Candidates for the Movement Charter Drafting Committee

Movement Strategy announces the Call for Candidates for the Movement Charter Drafting Committee. The Call opens August 2, 2021 and closes September 1, 2021.

The Committee is expected to represent diversity in the Movement. Diversity includes gender, language, geography, and experience. This comprises participation in projects, affiliates, and the Wikimedia Foundation.

English fluency is not required to become a member. If needed, translation and interpretation support is provided. Members will receive an allowance to offset participation costs. It is US$100 every two months.

We are looking for people who have some of the following skills:

  • Know how to write collaboratively. (demonstrated experience is a plus)
  • Are ready to find compromises.
  • Focus on inclusion and diversity.
  • Have knowledge of community consultations.
  • Have intercultural communication experience.
  • Have governance or organization experience in non-profits or communities.
  • Have experience negotiating with different parties.

The Committee is expected to start with 15 people. If there are 20 or more candidates, a mixed election and selection process will happen. If there are 19 or fewer candidates, then the process of selection without election takes place.

Will you help move Wikimedia forward in this important role? Submit your candidacy here. Please contact strategy2030 wikimedia.org with questions. --YKo (WMF) (talk) 03:57, 3 August 2021 (UTC)

Delay of the 2021 Board of Trustees election

We are reaching out to you today regarding the 2021 Wikimedia Foundation Board of Trustees election. This election was due to open on August 4th. Due to some technical issues with SecurePoll, the election must be delayed by two weeks. This means we plan to launch the election on August 18th, which is the day after Wikimania concludes.

For information on the technical issues, you can see the Phabricator ticket.

We are truly sorry for this delay and hope that we will get back on schedule on August 18th. We are in touch with the Elections Committee and the candidates to coordinate next steps. We will update the Board election Talk page and Telegram channel as we know more. --YKo (WMF) (talk) 13:12, 3 August 2021 (UTC)

Importing help

I'm looking to import data on enrollments (students count (P2196)) at higher education institutions from English Wikipedia via the |students= parameter of w:Template:Infobox university. Would this be something Harvest Templates could do, or would there be another tool? I haven't done something like this before and documentation is severely lacking (at least as far as I can find; on Harvest Templates the button is literally grayed out), so I'd appreciate assistance from anyone more familiar with the process. {{u|Sdkb}}talk 18:54, 3 August 2021 (UTC)

I think Harvest Templates may be frustrated by w:Template:Infobox university tending to use w:Template:HESA student population to provide values. You'd probably be as well downloading the source from https://www.hesa.ac.uk/data-and-analysis/students/where-study and using the UKPRN of the HE provider as a key in wikidata. --Tagishsimon (talk) 19:27, 3 August 2021 (UTC)
Given that students count (P2196) has a mandatory qualifier point in time (P585), something like Tagishsimon's suggestion above will be required anyway. You can prepare your import in a table processor such as Microsoft Excel and then import them with QuickStatements. See Help:QuickStatements, that's one of the better help pages available around here.Vojtěch Dostál (talk) 19:35, 3 August 2021 (UTC)
Thanks both. I looked at QuickStatements, but doing this seems beyond my technical ability. I'm not sure how I'd take the UKPRN and match that up with Wikidata items so that I could then add the students count in the row over (same thing with IPEDS data for the U.S., which is more my area of focus). {{u|Sdkb}}talk 21:32, 3 August 2021 (UTC)
A simple WDQS query will give you a lookup between UKPRN and wikidata QIds. So a VLOOKUP in a spreadsheet would attach student numbers such that you could craft the fairly simple quickstatements which add a statement and a qualifier, along the lines of - Q12345 tab P2169 tab "8000" tab P585 tab +2020-01-01T00:00:00Z/9 S854 "http://somewhere.or.other". For practical purposes, you'd need to extend the query to check that the item does not already have a P2196 for the year you're intending to add. Wikidata:Request a query is the place to get help crafting whatever reports will assist you. --Tagishsimon (talk) 21:38, 3 August 2021 (UTC)

URLs statistics for Discogs (Q504063) and MusicBrainz (Q14005)

Hi everyone,

Following some feedback by Azertus (thanks!), I collected statistics on the most frequent Web domains that occur in Discogs (Q504063) and MusicBrainz (Q14005). It looks like some of them may be candidates for identifier property creation, while others stem from a failed match against known properties, mainly due to inconsistencies in URL match pattern (P8966), format as a regular expression (P1793), and formatter URL (P1630) values.

You can have a look at them here: m:Grants:Project/Hjfocs/soweego_2/Timeline#July_2021

It would be great to gather thoughts on the next steps. Two main questions:

  1. should we go for a property proposal for each of the candidates?
  2. what's the best way to fix URL match pattern (P8966), format as a regular expression (P1793), and formatter URL (P1630) values, so that the next time we can convert URLs to proper identifiers?

Thanks! --Hjfocs (talk) 13:35, 29 July 2021 (UTC)

@BrokenSegue: thank you for your feedback, much appreciated! Let me apply your suggestion to an edit example to see if I got it correctly:
  1. add statement Birth Control (Q560732)exact match (P2888)https://www.musik-sammler.de/artist/birth-control/;
  2. add reference node (based on heuristic (P887), artificial intelligence (Q11660)), (stated in (P248), MusicBrainz (Q14005)), (MusicBrainz artist ID (P434), 4d89b968-cda2-471b-87c5-873436a100ae), (retrieved (P813), 30 July 2021).
How does that sound?
With respect to URL regexes, yes, the main problem was failed matches due to the variability of URLs, such as presence/absence of www and the like. I'm not sure it's a good idea to update all relevant URL match pattern (P8966) and format as a regular expression (P1793) regexes, also based on what Lockal has just added below.
Cheers --Hjfocs (talk) 09:39, 30 July 2021 (UTC)
Yeah those proposed edits look good though I'm unsure if this is really artificial intelligence (and not just a join between two datasets). As for updating the match patterns I think this is going to be mostly a manual process unfortunately. We could check for things like optional wwws but we're never going to be able to fully check all cases automatically. BrokenSegue (talk) 13:12, 30 July 2021 (UTC)
  1. "a property proposal for each of the candidates" - should be solved case by case. There are reasons why most of these websites were not added yet.
    • www.youtube.com, user namespace: non-canonical and unstable representation of channel namespace. Therefore Google itself prefers to use channel namespace.
    • itunes.apple.com - seems to be the same as music.apple.com. Consider adding https?://itunes\.apple\.com/(?:[a-z]+/)?artist/[^/+]/id(\d+) as URL match pattern. Another popular domain variant for Apple Music looks like https://geo.music.apple.com/album/id1564530719. It redirects user to regional version of service, described here and used, for example, by Last.fm.
    • lyrics.wikia.com - was moved to lyrics.fandom.com and died recently due to financing issues. There was a bot on MusicBrainz to add "end date" for such URLs, iirc. Also you can map all fandom.com URLs to Fandom article ID (P6262).
  2. URL match pattern (P8966) and URL match replacement value (P8967) were created specifically for converting arbitrary URLs to Wikidata format. Please, do not use formatter URL (P1630) to implement non-trivial URL mapping! P1630 is hardcoded in Wikibase configuration and has very limited usage: for example it must have only one preferred value (otherwise it will "break" database dumps or tools like Reasonator). It does not support imdb-like replacements (see Wikidata External ID redirector) by design.
@Lockal: I'm totally grateful for your detailed analysis and will include it in the soweego project reports if you don't mind.
A special thanks to the comment on formatter URL (P1630), as I'm using it when URL match pattern (P8966) is not available for a given identifier property: unfortunately, this happens quite frequently, due to the low coverage of URL match pattern (P8966). I'll have a look at URL match replacement value (P8967) and see if I can get rid of formatter URL (P1630).
Following your points on the Web domains, my feeling is that it may be worth to engineer regexes just for the most frequent ones, and discard others.
With respect to the dead URLs/domains, I have a plan to run validation and resolution to all the URLs I encounter in a given catalog like MusicBrainz: this would be beneficial both for Wikidata (avoid processing useless URLs) and for target catalogs (notify their communities about dead URLs).
Cheers --Hjfocs (talk) 10:08, 30 July 2021 (UTC)
Sure! Btw, there is no need to have a high coverage of URL match pattern (P8966).
  • You can derive thousands of regexps by concatenating formatter and format (just don't forget to escape dots, question marks and other special characters in formatter)
Got it, I believe I have a similar mechanism in place, although it's probably less robust than yours, as it doesn't really concatenate regexps: it partitions a given URL using formatter URL (P1630) values, then fires the format as a regular expression (P1793) regexp when available. I'll open an issue with your suggestion.
  • There are approximately 90 properties with "wd" (slug word) word in formatter without URL match pattern at this moment. All of these properties should have URL match pattern if you want to parse them.
I think I see what you mean. BTW, this SPARQL query should count all the relevant properties we mentioned.
The soweego system as a whole includes supervised machine learning, so that's the main reason why I'm adding that piece of the reference node. However, I completely agree this specific task is rule-based: the results are still based on heuristic (P887), but do you have any ideas on a more suitable QID value?
@Lockal: please see my replies inline. Thanks again! --Hjfocs (talk) 13:31, 30 July 2021 (UTC)
If the process that results in these edits uses AI (like a machine learning model or something) then I think it's best to annotate thusly. If it just compares string distances or something then AI wouldn't be right I don't think. It would be hard for me to suggest an item representing the heuristic given I don't know the internals of your system. You could make your own heuristic item subclassing heuristic (Q201413) if you plan to make lots of such edits and want to be really specific about the method. BrokenSegue (talk) 13:39, 30 July 2021 (UTC)
@BrokenSegue: I believe I found a relevant item, which exactly looks like the task the bot is carrying out: record linkage (Q1266546). I'm going to use it instead of artificial intelligence (Q11660). Thanks! --Hjfocs (talk) 10:49, 4 August 2021 (UTC)

[Breaking change] Languages of entity stubs in RDF output

This breaking change is relevant for anyone who consumes Wikidata RDF data through Special:EntityData (rather than the dumps) without using the “dump” flavor.

When an Item references other entities (e.g. the statement P31:Q5), the non-dump (?flavor=dump) RDF output of that Item would include the labels and descriptions of the referenced entities (e.g. P31 and Q5) in all languages. That bloats the output drastically and causes performance issues. See Special:EntityData/Q1337.rdf as an example.

We will change this so that for referenced entities, only labels and descriptions in the request language (set e.g. via ?uselang=) and its fallback languages are included in the response. For the main entity being requested, labels, descriptions and aliases are still included in all languages available, of course.

If you don’t actually need this “stub” data of referenced entities at all, and are only interested in data about the main entity being requested, we encourage you to use the “dump” flavor instead (include flavor=dump in the URL parameters). In that case, this change will not affect you at all, since the dump flavor includes no stub data, regardless of language.

This change is currently available for testing at test.wikidata.org. It will be deployed on Wikidata on August 23rd. You are welcome to give us general feedback by leaving a comment in this ticket.

If you have any questions please do not hesitate to ask.

Cheers, -Mohammed Sadat (WMDE) (talk) 13:14, 2 August 2021 (UTC)

@Mohammed Sadat (WMDE): how to retrieve the full RDF for one item after this change? Multichill (talk) 17:00, 2 August 2021 (UTC)
@Multichill Special:EntityData still returns the full RDF of one item, regardless of this change. It will only return less of the RDF of other items (a subset of their labels instead of all of them). Lucas Werkmeister (WMDE) (talk) 08:53, 3 August 2021 (UTC)
@Mohammed Sadat (WMDE), Lucas Werkmeister (WMDE): I thought my question was clear, but apparently not, so let's rephrase: How do I get the exact same RDF output that we get now? So the current item and linked items fully expanded for all languages instead of the stripped down output. Multichill (talk) 18:13, 3 August 2021 (UTC)
@Multichill There is no single URL that will still give you the same RDF output. If you really need all the labels of all linked items in all languages, you’ll have to make more requests and combine the results yourself. Lucas Werkmeister (WMDE) (talk) 08:22, 4 August 2021 (UTC)

Vandalized item

Q7802. --E4024 (talk) 15:44, 4 August 2021 (UTC)

fixed. --Tagishsimon (talk) 16:22, 4 August 2021 (UTC)

Wanted: Uninvolved users

I wonder if James Haworth (Q106715007) requires more participation from users other than those who frequently revert each other's contributions... --E4024 (talk) 19:17, 4 August 2021 (UTC)

That would be welcomed :) Apologies for taking editors time. Sputnik12 (talk) 20:22, 4 August 2021 (UTC)

How to mask "||" (OR filter condition) in SPARQL template?

I couldn't figure out how a SPARQL filter condition like filter(!bound(?x) || ?x=7) can be used in a SPARQL or SPARQL2 template: The display of the saved query aborts at "||", executing the code with "Try it" results in a syntax error (example). Help much appreciated - Jneubert (talk) 05:45, 5 August 2021 (UTC)

Possibly use || ... which is, {{!}}{{!}}--Tagishsimon (talk) 06:50, 5 August 2021 (UTC)
Thank you - works perfectly! --Jneubert (talk) 07:47, 5 August 2021 (UTC)
Alternatively, use double negation and De Morgan's laws (Q173300):
filter( !!( !bound(?x) || ?x = 7 ) ) -> filter( !( bound( ?x ) && ?x != 7 ) ).
--Matěj Suchánek (talk) 11:12, 5 August 2021 (UTC)

Naseer (Q6966675)

The item Naseer (Q6966675) has either been vandalized or wrongly merged. Is it a "male given name" item or a "disam page"? Looking at the page history (with my heart beating high in case I made something stupid about it :) I saw that someone had even added "Occupation: Javelin thrower" in the past... Would someone like to handle this mess? --E4024 (talk) 00:41, 5 August 2021 (UTC)

Looks like you and BrokenSegue have cleared it up. Circeus (talk) 15:22, 6 August 2021 (UTC)
Thanks, BrokenSegue. --E4024 (talk) 15:54, 6 August 2021 (UTC)

New tags for edits done via Wikidata’s user interface

Hello,

As you may know, revisions made on Wikidata are associated with special tags that makes it possible to query with what interface an edit was made, such as ios app edit for edits made from mobile app for iOS, or quickstatements [2.0] for edits made by the QuickStatements tool.

There is currently no way on Wikidata itself to easily find out which edits are done via the Wikidata user interface directly (Wikibase View) or via Wikipedia and co (Wikibase Client), and so we are proposing a "wikidata-ui” tag for “Wikidata User Interface” edits (as well as some other tags e.g. for “Sitelink Change from Connected Wiki”). This will enable users to filter out all automated edits, for example.

We kindly request for an Admin to create the tags for this.

If you have any questions, please do not hesitate to ask.

Cheers, -Mohammed Sadat (WMDE) (talk) 10:46, 5 August 2021 (UTC)

@Mohammed Sadat (WMDE): I surprised why they are supposed to be on-wiki managed tags when they are only supposed to be applied by the software. On-wiki managed tags can be actually managed by anyone, i.e. added to irrelevant changes as well as removed from tagged changes (and admins can even disable them which could break the whole wiki). Why not using the built-in registry to "protect" them from this? --Matěj Suchánek (talk) 07:27, 6 August 2021 (UTC)
@Matěj Suchánek These are good points. The underlying problem is that the user interface is accessing Wikidata’s information using the Wikidata API in most situations. As a consequence, there is no easy way to safely distinguish these edits from other edits on the server-side. To solve this issue, we also considered other means of storing that information. Which would have meant that the information would become less accessible by Wikidata editors. Using on-wiki managed tags is indeed a compromise, but it seems like the best available solution. -Mohammed Sadat (WMDE) (talk) 09:52, 6 August 2021 (UTC)
As a consequence, there is no easy way to safely distinguish these edits from other edits on the server-side. Oh, I missed that point. So it indeed seems like the best available solution.
But I think client-automatic-update could be an exception. --Matěj Suchánek (talk) 10:20, 6 August 2021 (UTC)

American World War I casualty

Most American WWI casualties are marked with "event=American World War I casualty", can someone write a query to find Americans killed in action during WWI that might have been missed. Maybe figure out how to include ones not marked with a "military casualty classification", maybe find deaths in 1918 for soldiers that do not have a "military casualty classification" yet. Going through the list at Wikipedia, they were marked several ways, some missing conflict=WWI and some missing a "military casualty classification", almost all had a death date of 1918. I want to have a full list ready for Memorial Day. --RAN (talk) 23:59, 5 August 2021 (UTC)

Would you class victims of the 1918-1919 flu pandemic as "World War I casualty"? If not, a search for deaths of American soldiers in those years will include a lot of false positives. You can still collect the data but be careful in the interpretation. From Hill To Shore (talk) 04:09, 6 August 2021 (UTC)
If they were in military service at the time of the flu death that counts as a "event=American World War I casualty". The "military casualty classification=non battle casualty" for those deaths in military records. I think there are 6 military casualty classifications that I can find in the records now online that were used during WWI. --RAN (talk) 17:50, 6 August 2021 (UTC)

Start time qualifiers that are aligned with an item's inception

I've run into this issue a few times and would like to see if the community can decide on the best way to handle it. Consider an item for a country that started off its existence called Foo and later changed its name to Bar. For official name (P1448), we'd want to have two values, with Bar preferred-ranked with reason for preferred rank (P7452) set to most recent value (Q71533355). For Bar, we'd want to have start time (P580) with the date of the renaming, and probably end time (P582) set to "no value" to indicate that the new name has persisted through the present. For Foo, we'd want to have end time (P582) with the renaming date, but what should we have for start time (P580)? In many places, I've seen it just duplicate the value of inception (P571), which is one way to do it. But another option would be to leave it out, since a property without a start time can be assumed to apply to the item starting with its inception. Going further down that path, the way we signify something is intentionally omitted is through setting it to "no value", so perhaps we should set the start time for Foo to "no value" just like we do for the end time of Bar. This would be more elegant, since it'd prevent us having to record the inception in multiple places, but it'd introduce the possibility of the "no value" being misread as the value having existed forever. What do you all think is best practice here? {{u|Sdkb}}talk 21:18, 3 August 2021 (UTC)

There does not seem a great problem in Foo having a start time equal to the Inception statement value (presuming that marks the start of Foo). All of the alternatives seem more problematical. Not least, failing to state the start time leaves the reader wondering whether or not it might be the same as the inception date, and WD best avoids this sort of ambiguity, at the small cost of duplicating values - but each time supplying them with a useful context: the start of the thing; the start of this name of the thing. --Tagishsimon (talk) 21:32, 3 August 2021 (UTC)
There is also "some value" ("unknown value") as an option, if you're not sure that was the name when the entity was created, or inception date is unknown. ArthurPSmith (talk) 16:43, 4 August 2021 (UTC)
Please do not set end time (P582) to no value. This is never correct.
--Quesotiotyo (talk) 02:45, 5 August 2021 (UTC)
@Quesotiotyo: Do you have any suggested alternative for indicating that something extends through the present? P582 doesn't have any way to set something to "present" that I know of. {{u|Sdkb}}talk 04:28, 6 August 2021 (UTC)
Why would you use end time (P582) for something that has not ended? Perhaps reading through Help:Evolving_knowledge will provide some clarity.
--Quesotiotyo (talk) 16:55, 6 August 2021 (UTC)
Interesting page, but doesn't look too developed. I think we'd want to have some way of being able to note that a value is current. Perhaps some additional tools are needed. {{u|Sdkb}}talk 17:11, 6 August 2021 (UTC)
It's both developed and accepted by the WD community. You're flogging a dead horse. There's no practical way of providing a no end date value that assures the reader that a statement is valid now. At best, we could conceive of a value which states that at this timestamp there was no end date, but as you'll appreciate, that timestamp will not be now, but whenever the statement validity was last checked. For practical purposes, the heuristic no statement end date and no item end date (e.g. date of death, date dissolved, abolished or demolished) = statement is valid seems to work about as well as at this timestamp there was no end date. --Tagishsimon (talk) 17:43, 6 August 2021 (UTC)
@Tagishsimon: The page is only 3500 bytes long, has seven edits over its lifespan, and when I just fixed some pretty glaring typos, my edit was the first since 2018. If that qualifies as a well-developed page, we're in serious trouble. I'm interested in discussing this topic to reach a consensus, since I haven't come across any substantial discussion on it before and have heard contradictory views/seen contradictory approaches in the showcase items. It's fine for you to have whatever views on the topic you have, but saying "stop making noise about this, it's already been discussed and settled" without pointing to any actual prior discussion on it or established guidance is not helpful. If such prior discussion does exist and just no one here has found it yet, that would genuinely be great and I'd like to see it. {{u|Sdkb}}talk 23:42, 7 August 2021 (UTC)
You've floated two suggestions here; an idea that statements should not have start time if the start time matches the inception ... although without explaining why that's a problem in the first place. The suggestion got no support and some knockback. Your suggestion number 2, is that statements should affirmatively specify that they have not ended. You've ducked the argument that such affirmation would always be historic, and thus not much more use than the current assumptions users work under; and you're instead arguing the toss over whether a help page which has been settled since 2018 is in fact settled. You say you want to develop consensus. But consensus is already well evidenced by the millions of statements which are not end dated, and by the absence of any mechanism - or indeed, demand for a mechanism, yours aside - for an affirmtive no-end-date solution. One of the excellent things about WD is how taciturn its users are; and in this case I think the users have spoken by declining to join in the conversation. So I say again: this horse is not pining for the fjords, but has in fact shuffled off its mortal coil. --Tagishsimon (talk) 00:42, 8 August 2021 (UTC)
I would just add that "a property without a start time can be assumed to apply to the item starting with its inception" is not true. Wikidata uses the w:en:Open-world_assumption so any information not present is not assumed (though obviously in practice it would be impractical to add a start time to every statement ever, my general rule of thumb is that if the statement applies for the entire item lifetime then there's no need). --SilentSpike (talk) 08:51, 5 August 2021 (UTC)
Ah, good point. {{u|Sdkb}}talk 04:28, 6 August 2021 (UTC)

Flying ace

I want to tag all the WWI flying aces, do you think it should be "award=flying ace" or "significant event=flying ace"? --RAN (talk) 17:23, 5 August 2021 (UTC)

Neither. It's not an award. It's not an event. Perhaps has characteristic (P1552)? --Tagishsimon (talk) 17:38, 5 August 2021 (UTC)
That sounds better, currently there is no one way they are tagged. --RAN (talk) 00:00, 6 August 2021 (UTC)
Occupation seems to be the most commonly used property for flying ace with over a thousand hits, eg Alan Jerrard (Q4706977) and Ernst Udet (Q57179) Piecesofuk (talk) 16:58, 6 August 2021 (UTC)
True - https://w.wiki/3nmC - yet it's well arguable that their occupation is aviator, whilst they have the distinction of being flying aces; & so occupation is a poor property choice. --Tagishsimon (talk) 17:48, 6 August 2021 (UTC)
  • there are several fields containing "flying ace", and many more unmarked, and some only marked in the description, that is why I want to harmonize it so it can be found in a single simpler search. --RAN (talk) 17:52, 6 August 2021 (UTC)
Occupation aviator with has characteristic (P1552) or subject has role (P2868) as qualifier is an option as well. subject has role (P2868) can be used as main statement as well. The most important thing is that we're consistent. I recently finished a similar project, where I harmonized the use of (super)centenarian. I settled on subject has role (P2868), see https://www.wikidata.org/wiki/Q312082#P2868.
There's a lot of these items that are used as a way of tagging items/people. millionaire (Q1075912) is another. Curiously, both of these can (sometimes) be inferred from other properties (net worth (P2218) and age at death). I don't think it'd be easy to settle on a single way of tagging, since it's dependent on what properties are available in the field of interest, but it's maybe something to think about. Or at the least we could start documenting the various approaches. --Azertus (talk) 15:59, 7 August 2021 (UTC)

Addition and future update of References to WikiTree site for selected statements.

I intend to add references to some statements that have data available on WikiTree site for persons that are linked to https://www.WikiTree.com This is to addition to identifier WikiTree person ID (P2949) that is used on over 215K person pages. The properties I initially intend to add references to are:

The reference statements I intend to add are:

I also intend to maintain the references as the data on WikiTree changes in the future.

I have several Questions:

1) Should the reference also be added to the Identifier WikiTree person ID (P2949) defining the link?

2) What are the preferred statements in the reference for this case? Are the P248, P2949 and P813 optimal solution. I could add just some of them. I could also add more, like subject named as (P1810). I noticed that reference URL (P854) is often used, but I see no point in adding it, since P2949 already contains the link.

3) How are the references maintained in case of a change on the wikitree site. Should I only add additional reference to the new value with the new retrieved date? Or should the old reference be deleted? In case it should be deleted, should the old statement also be deleted if there are no other references on it?

4) Is there a statement, that I could use to identify the reference as automatically added by me. That way I would know that I can delete it if it is no longer valid.

5) I am doing all my updates using QuickStatements. Is that optimal/preferred way to do it.

Here is an example of how I would add a reference for the father.

Q96305421	P22	Q96305756	S248	Q1074931	S2949	"Bennet-71"	S813	+2021-09-01T00:00:00Z/11

SparQL queries References to WikiTree Aditional statmentes on references to WikiTree

Thanks for your responses. Lesko987a (talk) 22:03, 1 August 2021 (UTC)

Good plan. 1. No need for reference; Id value links to wikitree (although retrieved (P813) maybe useful). 2. IMO it would be beneficial to have reference URL (P854), which makes it easier for users to traverse to the reference source. Schlepping down to P2949 will be non-obvious to many users, a PITA on items with very many statements. Useful to add subject named as (P1810) as a qualifier of P2949 since it can be useful for signalling mistakes, provding new alises, &c; but P1810 not required on each statement ref. 3. I think you can delete dead refs where you have a new valid ref to add. Otherwise, I think, leave in place. 4. Probably not. 5. QS works well, and your QS string looks good. You could also consider https://github.com/maxlath/wikibase-cli although in this case, probably not any great advantage. --Tagishsimon (talk) 22:34, 1 August 2021 (UTC)
1) I was thinking of adding retrieved (P813), but I do validate the property each week and correct them in case of a change or remove them in case the profile on WikiTree was deleted. So all WikiTree person ID (P2949) are up to date. 2) Adding reference URL (P854) doesn't add much to the reference, since P2949 is automatically translated to the target URL. It just adds another thing to maintain. I don't understand what PITA stands for. However adding subject named as (P1810) on Relatives statements seems like a good point. 5) I think I will have a problem with QS, since it can't delete a single reference. It seems only the whole statement can be deleted. I will have to find something else. wikibase-cli looks good, but I can't use JS apps on my end. Probably I will have to use API directly. Lesko987a (talk) 09:35, 2 August 2021 (UTC)
I made the first parent references for father: http://www.wikidata.org/entity/statement/Q100446957-43c5b448-6090-4fed-a8d2-342b00a47289 and http://www.wikidata.org/entity/statement/Q100447009-8ceb5020-72d3-4560-a450-7bd0ecf63da6 with
Q100447009	P22	Q100446957	S248	Q1074931	S2949	"Sears-62"	S1810	"Capt Paul Sears (20 Feb 1638 - 20 Feb 1707)"	S813	+2021-08-03T00:00:00Z/11
Q100446957	P40	Q100447009	S248	Q1074931	S2949	"Sears-63"	S1810	"Paul Sears II (15 Jun 1669 - certain 17 Feb 1740)"	S813	+2021-08-03T00:00:00Z/11
Would that be ok. Note that the "named as" contain the name of the relative. I guess that is the way it should be. Lesko987a (talk) 23:29, 2 August 2021 (UTC)
subject named as (P1810) should contain the name of the person (subject) of the item where you have set the statement. If you are wanting to record how the name of an object (relative) is recorded in the source, you will need to use object named as (P1932). From Hill To Shore (talk) 00:27, 4 August 2021 (UTC)
  • Great plan! "Named as" is important for wives, since they may be here in Wikidata under their married name, or maiden name, or a combo of the two with their maiden name as the middle name. --RAN (talk) 23:18, 3 August 2021 (UTC)
  • @Lesko987a: The referencing format should be Help:Sources#Databases, not Help:Sources#Web_page (with P854).
    I don't think it's acceptable to use Wikidata as a temporary mirror, that is to add and remove references and statements as the other database changes. Statements should be deprecated (or marked as preferred). See Help:Ranking about this.
    Also, can you spell out the agreed way of handling changing parent-child relationships somewhere (maybe Wikidata:WikiProject_Genealogy and mention in on the relevant property talk pages). I don't think Wikidata has well equipped to multiple hypotheses. --- Jura 06:08, 7 August 2021 (UTC)
    • I figured out it shouldn't be a Web Page (P854). But it is used by people Query.
      It is not meant as a mirror. but removing the wrong data seems like a reasonable thing to do. Having two fathers is always confusing. I guess queries ignore the deprecated rank. It can also be a case of invalid WikiTree Profile to WikiData Item match. You definitely don't need that as part of the history. Same goes for the Dates. There are cases of typos, conflicting sources, ...
      I wasn't aware of WikiProject_Genealogy. It certainly makes sense to move this discussion there. I will do it with some changes, that I already got resolved. Lesko987a (talk) 15:26, 7 August 2021 (UTC)
      • We don't want ever changing statements on Wikidata. Once it's added, only the rank should change or additional references added. Maybe you want to wait some time until you import new data from (or references to) Wikitree (e.g. 6 months after addition to Wikitree). --- Jura 07:59, 8 August 2021 (UTC)

Master’s student/advisor properties

In some fields, a master’s degree is a terminal degree (for example, architecture or fine art). So for these fields and academics in them, it is important to have properties master’s advisor and master’s student, corresponding to PhD properties doctoral advisor (P184) and doctoral student (P185). Does that make sense? Have I missed existing properties to use for these? Perhaps to limit clutter, they should only be used for such fields? —Michael Z. 18:31, 6 August 2021 (UTC)

You may be right that WD would benefit from master’s advisor and master’s student properties, but I'm not convinced by your assertion that some fields peg out at the Master level - https://www.arct.cam.ac.uk/courses/postgraduate/phd-in-architecture - https://www.southampton.ac.uk/wsa/postgraduate/research_degrees/courses/phd-fine-art.page --Tagishsimon (talk) 18:48, 6 August 2021 (UTC)
Having properties for noting Master's degree advisors would be benefiticial. Some universities may publish these metadata online. We also have opponent during disputation (P3323) and thesis committee member (P9161) which are general enough to be used for Master's theses as well. Vojtěch Dostál (talk) 06:15, 8 August 2021 (UTC)

Double entry, single Dutch something

Q7497525 ("Shinichiro Kobayashi"), which until today propagated the fiction that Kobayashi was born in 1939, is about the same person that Q17222216 ("Shin'ichirō Kobayashi") is about. Some joker dreamt up 1939 for this unreferenced edit, which I fixed today (14 months later). (The man's name is normally pronounced in Japanese Kobayashi Shin'ichirō. Kobayashi is the surname; the order is normally reversed for anglophone consumption. Macrons and apostrophes are often dropped.) Really, it's the same one person, born in 1956. I fixed the fictional YoB in Q7497525 and then attempted to use Special:MergeItems, but was told (in red!) "Failed to merge Items, please resolve any conflicts first. / Error: Conflicting descriptions for language nl." Maybe I'm sleepy, but I don't notice anything Dutch here. Over to you! -- Hoary (talk) 00:32, 9 August 2021 (UTC)

Done? --E4024 (talk) 00:47, 9 August 2021 (UTC)
Thank you, E4024. I was going to merge them in the opposite direction, but I don't suppose that the direction matters. I have to say that I'm most puzzled by all this use of "imported from Wikimedia project: [language] Wikipedia"; I'd been intending to add this (PDF of a leaflet put out by a museum) and similar as references wherever possible: but hadn't yet figured out how to do so: WD seemed to demand an item, and I was deterred from creating one by the warning that an item had to be "notable", and surely a single sheet of A4 was not notable. -- Hoary (talk) 02:07, 9 August 2021 (UTC)

Unique constraint violation with deleted object

Hello, the newly created object d:Q107986623 shows a unique constraint violation for GND and VIAF. According to the error message, GND and VIAF are also included in object d:Q107166925, which recently has been deleted. Can the two objects be merged? How can a deleted object have a VIAF and GND? --M2k~dewiki (talk) 21:19, 7 August 2021 (UTC)

Most likely lag in the system somewhere. The item was deleted earlier today. Grafana is showing only 30 mins WDQS lag right now, but WDQS thinks the item still exists - https://w.wiki/3oVe . The constraint warning arises from a WDQS query, afaik. So maybe the processing of deletions is delayed. Come back tomorrow, maybe. --Tagishsimon (talk) 21:34, 7 August 2021 (UTC)
Thanks a lot! Now the error message disappeared. The deleted object ID is still used in the VIAF record (see history of the VIAF entry): https://viaf.org/viaf/80232010/ --M2k~dewiki (talk) 21:43, 7 August 2021 (UTC)
Mmm. That's unfortunate. Don't quite know the background, but wonder if it's appropriate / possible to redirect from Q107166925 to Q107986623 - e.g. by undeleting and then merging? @Mahir256: based on https://www.wikidata.org/wiki/Special:Log?page=Q107166925 --Tagishsimon (talk) 21:50, 7 August 2021 (UTC)
I'd rather the lag take care of things rather than cave in to the contributions of a globally banned user. Mahir256 (talk) 22:00, 7 August 2021 (UTC)
We'll have to trust that VIAF is on the ball, then. --Tagishsimon (talk) 22:03, 7 August 2021 (UTC)
We need to find a list of VIAF entries referring items that no longer exists.--GZWDer (talk) 15:01, 8 August 2021 (UTC)
The unique constraint violation for GND and VIAF for d:Q107986623 and d:Q107166925 are now shown again in object d:Q107986623 (before, the error message disappeared after a while). --M2k~dewiki (talk) 07:35, 9 August 2021 (UTC)
WDQS still thinks the item exists. Given the constraint violation appears and disappears, the probability is that the item exists on some of the WDQS servers, does not exist on others. A trip to phab is indicated. --Tagishsimon (talk) 07:56, 9 August 2021 (UTC)
Deletions on query server are currently "manual", i.e. WMF launches a script that deletes items (and lexemes) once in a while. There was a report about it at Wikidata:Contact the development team/Query Service and search, but somehow GZWDer made it complicated to find it. --- Jura 08:57, 9 August 2021 (UTC)

Wikidata weekly summary #477

Bot mistake

Some bots (or scripts, templates, whatever it is called) make a strange Turkish language description for names. Please use "kadın adı" for female given name and not "kadın ismidir" which means "It is a female name." (You cannot write a complete sentence w/o capitalization and the full stop.) Same thing for male given name. Plase change the Turkish description script to "erkek adı". "erkek ismidir" is a complete sentence, as such unnecessary, and written in a wrong way w/o capitalization and full stop. Both these options use a now forgotten Arabic word "isim" instead of the modern and pure Turkish "ad/adı". BTW surname = soyadı. --E4024 (talk) 01:57, 9 August 2021 (UTC)

@Jura1: Yours: https://www.wikidata.org/w/index.php?title=Q21401534&type=revision&diff=912121647&oldid=912121361 --Tagishsimon (talk) 10:39, 9 August 2021 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── It is not a great issue, but what I defend is better. Nobody would support an old word like "isim" among Turkish participants. "Complete sentence" is a common fact; I understand we avoid them. Less we would make any without capitalization nor adding a period/full stop at the end. (OTOH the script correctly uses soyadı for Turkish and not "soyismi" or "soyisim". Ask a primary school student these two latter words and probably s/he will not understand.)

BTW at the above thingy I just noticed something like 'İtalya\'nın Komünleri'... In Turkish we do not write ordinary nouns with a capital letter. It should be İtalya'nın komünleri. (I ignore if the slush sign thereat responds to any technical necessity.) Always ready to help with Turkish wording. Thanks for the input. --E4024 (talk) 15:33, 9 August 2021 (UTC)

@E4024: Looks like the .js is being maintained, in response to requests made on its talk page - MediaWiki talk:Gadget-autoEdit.js. That's probably the best place to take this next so as to improve handling of Turkish. --Tagishsimon (talk) 16:52, 9 August 2021 (UTC)

Welcome, Manuel!

Hey everyone :)

I’m super excited to announce that Manuel Merz has joined the Wikidata team as Analytics Product Manager!

Manuel and I will work closely together in supporting Wikidata’s development and you all. As Wikidata's second product manager, Manuel will especially focus on strengthening our data analytics and long-term maintenance work. This includes the product responsibility for bug reports from the Wikidata community, so you will see him more and more on Phabricator.

Manuel is a long-time Wikimedian: He started as a volunteer Wikipedia editor in 2006, has built up Wikimedia Deutschland's impact orientation since 2012, and in 2019 in his free time, he published his Ph.D. thesis on the Wikipedia community. He also runs a small software development company which aids to his product management experience.

Please leave a note on his talk page at User:Manuel Merz (WMDE) or write to him directly anytime at manuel.merz@wikimedia.de. He will also be at Wikimania, e.g. the Wikidata Pink Pony session, so this would be an excellent way to get to know him!

Manuel, it's great to have you on the team!

Cheers --Lydia Pintscher (WMDE) (talk) 18:45, 9 August 2021 (UTC)

Another "how to" question

How do I relate (link) Category:Songs written by Olcayto Ahmet Tuğsuz (Q32732721) with Category:Songs composed by Olcayto Ahmet Tuğsuz (Q32732722)? (BTW I cannot believe that this guy has created several songs for the Eurovision Song Contest and has no WP articles!..) --E4024 (talk) 15:07, 7 August 2021 (UTC)

New WikiProject: Neighborhood Public Art in Boston

Hi, we've just started a new WikiProject focusing on public art in Boston neighborhoods, starting with Roxbury (Q20138) and South End (Q2304457)). It's still early days; we'll be expanding the project page with more details, but would definitely welcome feedback and/or new participants. Thanks! --Ballerlikemahler (talk) 20:40, 10 August 2021 (UTC)

Istanbul Airport (Q3661908)

The Airport was inaugurated on 29.10.2018 but began functioning on 05.04.2019. I could not find out how to introduce this "operational since" date. --E4024 (talk) 15:51, 11 August 2021 (UTC)

service entry (P729) maybe? --Tagishsimon (talk) 16:33, 11 August 2021 (UTC)
Exactly! Tyvm. --E4024 (talk) 16:41, 11 August 2021 (UTC)

How to return administrative / municipal boundary data (e.g. city, township) based on a Wikidata ID?

I'm trying to use the Wikidata Query Service to return the geojson polygon data for administrative boundaries, based on a Wikidata ID.

For example, given the Wikidata ID for Chicago (Q1297), I would like to return the geojson polygon information for it. I'm trying to use this query in the SPARQL Editor:

SELECT ?item ?itemLabel ?geoshape ?geoshapeLabel WHERE {

 VALUES ?item { wd:Q1297 }
 ?item wdt:P3896 ?geoshape.
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

}

But I'm getting "No matching records found."

Is there a problem with my syntax in the query statement? Is what I'm trying to do possible? Any help would be much appreciated

Chicago (Q1297) does not have a geoshape (P3896) statement. As a proof of concept for your report, here it is with Spain (Q29) added as a VALUE.
SELECT ?item ?itemLabel ?geoshape ?geoshapeLabel WHERE {

 VALUES ?item { wd:Q1297 wd:Q29 }
 ?item wdt:P3896 ?geoshape.
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

}
Try it!
--Tagishsimon (talk) 20:10, 11 August 2021 (UTC)

Yet another "How to" question

Hello! What's the correct way to link to a former official website of a defunct project? The domain name was obtained by a third party and doesn't point at the said website anymore. Myuno (talk) 20:00, 11 August 2021 (UTC)

Presuming it needs doing, either or both of, add an end date qualifier; and set the rank of the current official website to 'preferred' whilst leaving the former official website statement as 'normal'. --Tagishsimon (talk) 20:12, 11 August 2021 (UTC)

Separate entries for persons and for biographical articles

I posted in Requests for comment, but this may be a better place. I am working on naval biographies, and have found several people who have entries for both the person and one of more biographical articles, for example John James Onslow (Q5933370) (British Royal Navy officer}; Onslow, John James (Q96779042) (entry in the Royal Naval Biography); and Onslow, John James (NBD) (Q24020295) (entry in the Naval Biographical Dictionary). There is no apparent link between these items, which is not helpful. I can see two possible approaches: (1) merge the items into a single person item with the biographies specified by "described by source (P1343)"; or (2) to retain the existing items and add a "main subject (P921)" property to each of the biographical items to refer to the person. Is there a policy/preference on this? Thanks Kognos (talk) 20:13, 11 August 2021 (UTC)

The more important of the two is main subject (P921) to point the biog item at the person item. However described by source (P1343) from the person to the biog item is also welcome. A person item is quite useful even if it does not point to all/any biographical sources. A biography item is less useful if it does not identify the subject of the biography. --Tagishsimon (talk) 20:22, 11 August 2021 (UTC)
@Kognos: because the biography item in your case has more information than just an external web link (it has an author and a link to Wikisource), it would not be possible to merge the two items by distilling the biography item into a single statement and adding that statement to the person item. Mahir256 (talk) 20:25, 11 August 2021 (UTC)
Thanks, very helpful Kognos (talk) 20:54, 11 August 2021 (UTC)

Blazegraph is unmaintained and does not scale - is JanusGraph a viable replacement for WDQS?

I'm investigating alternatives to Blazegraph (Q20127748). JanusGraph (Q56856628) based on Apache TinkerPop (Q20942022) is the best alternative I could find and it scales well and it used by giants like Netflix, RedHat, IBM, AWS, HADOOP, Microsoft, and others. It is FLOSS and in active development, BUT it uses Gremlin (Q5607337) graph traversal language instead of SPARQL...

I found a SPARQL->Gremlin traversal transpiler (stale since 2020) that could help us migrate because it seems to support most of the SELECT queries we currently have in our examples.

I really like the idea of federation, and TinkerPop seems to sort of support that too, see https://github.com/unipop-graph/unipop (stale since 2018 though)

Let's face it. Amazon killed Blazegraph[2]. It's not going to get any bug fixes or new features. It's brittle (ask the WMF operations team or take a look in phabricator) and it is not built for the number of triples now in WD (according to Lydia in the WD telegram channel).

Time to move with the rest of the industry and adopt TinkerPop as backend framework and retire Blazegraph as a legacy-solution after a transition period?

Maybe WMF/WMDE can encourage someone else to run/develop Blazegraph or any other viable SPARQL 1.1 update capable database that scales. (there are currently no others with a feature set on par with Blazegraph to my knowledge). If you ask me Blazegraph seems like a dead end and SPARQL 1.1 has not seen mass adoption (many endpoints listed here are now dead :/) and I find it unlikely that it ever will.

After AWS aqui-hired Blazegraph and renamed it to Amazon Neptune they immediately added support for Gremlin and now mentions that first in their promotional pages. This if anything is a sign that SPARQL is not doing so well. Note that "An acqui-hire basically is a fancy way to say your company is being bought predominantly for the fabulous team you've assembled and not for the product/service you were (trying) to bring to market." which again points to the fact that Blazegraph is and was not ready for market, but the competent team seems to have since made a viable PaaS in-house at AWS.

AWS did the math and decided they could not market graph PaaS WITHOUT support for Gremlin. Microsoft decided not to support SPARQL in CosmosDB because of lack of demand, but has both Gremlin and Cassandra APIs which is another nail in the coffin if you ask me.

Gremlin has an active user group and over 2600+ questions on stackoverflow (sparql has 5000+ but that could be because it is a much older language).

I'm certain that moving away from SPARQL/RDF will kill Wikidata, it will become just another data island. Even if Gremin is twice as popular as it is now Graph weaver (talk) 14:39, 12 August 2021 (UTC)Graph weaver

I say we go with the flow and immediately start building a working prototype of WDQS with JanusGraph as backend, WDYT?--So9q (talk) 10:40, 26 July 2021 (UTC)

See also this comparison of Dgraph, Neo4j and JanusGraph.
My guess is that whatever we choose, we are going to lose some functionality in the beginning until the new backend is improved according to our needs (we can encourage coding stuff we need with a round of Grants focused on this). Keeping a legacy Blazegraph-WQDS around can help transition more smoothly. I suggest we keep a WDQS where the scholary articles are filtered out to get below the threshold that Blazegraph can handle. Those who need to search scholary articles via SPARQL can then either set up their own Blazegraph-Wikibase or use the query language in the new system (likely Gremlin because that seems to have the widest adoption at the moment).--So9q (talk) 11:41, 26 July 2021 (UTC)
Yes of course. I would love to see WMF/WMDE involve the community more and communicate about this challenge. It is probably a too big a task to fit in a max $100.000 grant so maybe the best way forward is for WMDE to be tasked with this challenge. Problem is that WMDE from what I know already has their hands full. Anyone who feels like can start hacking on this. The problem is only going to get worse. WD is growing a lot and Blazegraph is a bottleneck.
Has anyone here tried loading WD into another graph database like Dgraph or JanusGraph? What are the obstacles? Are the datatypes supported? Do the triplets need to be converted? It seems that for Dgraph a schema is needed.--So9q (talk) 21:23, 26 July 2021 (UTC)
@So9q: so I'm a SWE with too much free time at the moment. Investigating and prototyping this is something I'd be interested in. Just the infrastructure costs of this project would take non-trivial money but I'm not sure why you think $100k is not enough (and I'm unsure where that number comes from, do you mean 10k)? I'd also be keen to collaborate with people on this. BrokenSegue (talk) 23:58, 26 July 2021 (UTC)
Thanks for your insight and thoughts on the issue of scaling up WDQS's graph backend. This is one of the top priorities for the WMF Search team this fiscal year (in collaboration with WMDE), as we know Blazegraph is not currently scaling well and is also end of life, as you have pointed out. There are a number of criteria (~50 or so last time graph backends were evaluated) that the Search team will need to consider in a Blazegraph alternative (I notice that you are aware of and active on the phab ticket to find an alternative). This is something we want to work closely with the community on, and to learn from your experience with different alternatives and how they suit your needs (or not), so I appreciate that the conversation around this is already happening here. If you end up building out a prototype on a different graph backend, we'd love to see it!
On our end, we're in the process of analyzing the volume of queries to WDQS and structure of WD for ways to optimize based on current usage patterns: this includes better understanding potential benefits of graph-splitting, as you mentioned with the case of scholarly articles. We are finalizing WDQS user research initiatives right now, and are also planning on speaking more about WDQS scaling challenges at WikidataCon 2021. The Search team has a lot on our plate right now, and we appreciate your patience and continuing participation with us as we work on improving our services. MPham (WMF) (talk) 12:34, 27 July 2021 (UTC)
  • It would be interesting to see a trial installation of JanusGraph on toolforge with a subset of Wikidata items. Maybe one with everything directly linked from instances of film (Q11424). This would help compare the two query languages and experiment with subsets of items. --- Jura 08:02, 29 July 2021 (UTC)
    • I think even such a small subset of items would need substantially more than toolforge's default quota. I wonder if they'd be willing to grant us more to try this out. @MPham (WMF): do you know? would such a trial be useful to you? BrokenSegue (talk) 13:18, 30 July 2021 (UTC)
      • @BrokenSegue: It is a little too early for the Search team to know exactly all the tests we'd need to run on which potential Blazegraph alternatives, as we have other immediate commitments for this quarter. Unfortunately, this means there aren't exact specifications for how we'd want to evaluate each candidate to determine if it will meet our scaling needs. For consideration though, here is the output of the last time graph backend candidates were evaluated, and the criteria that were considered -- please keep in mind though that this document is years old and does not necessarily reflect all the candidates, or the priorities and criteria we currently need for scaling WDQS.
      • @BrokenSegue: It does not have to run on the toolserver. Petscan runs in a VPS in https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS (where Toolserver is also running). If I were to poke at this I would request a VPS and start from there.--So9q (talk) 12:45, 4 August 2021 (UTC)
I don't anticipate that my team can allocate resources to help with external trials at the moment, but if you provided a proposal/request with more detail, it is something we could officially consider. MPham (WMF) (talk) 15:22, 3 August 2021 (UTC)
@MPham (WMF): Thanks for letting us know about your status.--So9q (talk) 12:45, 4 August 2021 (UTC)
@MPham (WMF): Seems the status hasn't much progressed since the community raised the problem more than three years ago (Wikidata:Contact_the_development_team/Archive/2018/07#Future_of_query_server). Maybe it's time to seek external help on the issue. The approach suggested by BrokenSegue could be one. --- Jura 05:54, 7 August 2021 (UTC)
  • While I see your concerns for Blazegraph you seem to completely underestimate what happens when you move away from it. It's not just another query language, it's a completely different data model. The point of RDF is that it has universal identifiers that are used by many people out there, that's the beauty of Wikidata in its current form. With SPARQL/RDF we can fetch data from Wikidata that we don't have to maintain in our own endpoints. I'm one of many people/companies/organizations that do that and you cannot simply take that away and say "here is Gremlin, good luck". I would even argue that the selling point of Wikidata is that it is available as Linked Data in RDF. If that is gone, using the data will be waaaay harder and for me the point of Wikidata is to maintain data that is useful for others in here so we all share the work and profit from it. When this is no longer the case and it's hard to link it with other RDF based data, I would not see a single motivation anymore to use Wikidata and move back to DBpedia. I don't have a problem when the data is available in other query languages as well but none of them will be as useful as the result is siloed data, not a re-usable datamodel like RDF. I know this is very hard to understand for some who are not working in this stack but for the love of god, don't f... this up. For moving forward: I would be interested in discussing potential alternatives. On the top of my head, I see two ways forward: Implement an RDF & SPARQL layer on top of another DB (Postgres was a candidate we talked about at a W3C Graph Workshop in Berlin 2019) or get more funding/development for one of the existing RDF/SPARQL DBs. I have a pretty good idea of the space as this is my daily business and the one I would talk to here is Oxigraph. That is currently mainly a one-man show but it's Rust and the guy is really skilled. Given additional resources/money I could imagine this could be a potential candidate for scaling. TheKtk (talk) 12:44, 9 August 2021 (UTC)
    • I concur, moving away from RDF and SPARQL would make wikidata lot less useful. For scaling there are other SPARQL capable stores, including opensource ones that can scale beyond what blazegraph could on identical hardware (e.g. virtuoso opensource). Dropping RDF means dropping ShEx for quality control in practical terms. I also don't believe JanusGraph actually scales to provide the required performance at the required scale. Adding new properties and doing so automatically requires significant custom work that would also be required by tools that map SPARQL to SQL. Developing that would allow the WikiMedia team to reuse existing skills for managing databases and have a much wider selection of implementations than what TinkerPop allows. A solveable problem is migrating the special SERVICE calls implemented for Blazegraph. These are also very hard to migrate to JanusGraph etc. Jerven
    • Update: I got pointed to this issue, apparently the consensus is that SPARQL is not optional: https://phabricator.wikimedia.org/T206560 TheKtk (talk) 14:32, 9 August 2021 (UTC)
    • Just, please don't. RDF and SPARQL is are open standards, which is why Wikidata is such a useful source for shared knowledge. Dropping it would be a great loss Tomasz Pluskiewicz
  • I share the concern about using an abandoned solution such as Blazegraph and I was wondering about how long Wikidata will continue to use it myself. However, I never thought about it in terms of abandoning RDF and SPARQL just because one implementation, even though it is the currently used one, was abandoned. There are many other triplestore and SPARQL implementations that can be used instead. Access to Wikidata in RDF is my primary use case and without it, the data would lose a lot of its usability and with it, motivation to contribute to Wikidata. I think this would be a huge mistake. Jakub Klímek (talk) 06:34, 10 August 2021 (UTC)

Nasty suggester script interfering with sitelinks-wikipedia

One can made the following exercise (tested on the Vector skin both current and legacy with no suspicious gadgets or user scripts enabled).

  1. Go to #sitelinks-wikipedia on any entity and press edit.
  2. Go to the lower left input field (defined by the code <input class="noime ui-suggester-input" placeholder="wiki"…>) and type ru into it.
  3. Leave the field (by Tab ↹ or pressing a mouse button).

We see our input replaced with qu.

I deem this interference unreasonable, error-friendly and frustrating. If there is no way to disable it via Special:Preferences, then at least suggest which piece of your JS makes the bad change. Incnis Mrsi (talk) 15:41, 1 August 2021 (UTC)

You need to select the ru option from the dropdown list of ~4 options presented to you. If you merely tab, you in effect select the first on the list, which happens to be qu. You can argue that this is poor UI but. --Tagishsimon (talk) 21:09, 1 August 2021 (UTC)
The question (namely, how to identify and disable bad JS) was not addressed by Tagishsimon and stands intact. Dismiss the reply above. Incnis Mrsi (talk) 06:12, 2 August 2021 (UTC)
@Incnis Mrsi: you are blocked on multiple wiki's for the way you interact with other users. Please try to be civil and friendly. I'll give an example reply:
@Tagishsimon: thanks for your reply, but I'm still looking for the piece of javascript that is causing this. Do you have any idea what is causing this or how to find it? Multichill (talk) 18:21, 3 August 2021 (UTC)
t/y Multichill. I suspect tab selecting the top of the list of values is by design, not error. Seems useful to me, if a trap for the unwary. Incnis Mrsi would need to convince any of us that the behaviour is unreasonable, error-friendly and frustrating, rather than, say, quite handy, especially for lists of length 1. --Tagishsimon (talk) 18:37, 3 August 2021 (UTC)
Partial solution: disable the site ID to interwiki to be able to type ruw (for ruwiki) and similar. Found the source of interference with some help by Google many days ago; contrast it to human regulars who either contributed nothing or directed the thread to my person. Incnis Mrsi (talk) 16:11, 10 August 2021 (UTC)
You have still yet to establish that there is a problem, rather than a feature you dislike; per the above thread. --Tagishsimon (talk) 16:20, 10 August 2021 (UTC)
Am not obliged to Wikidata, and am not much interested which opinions hold the majority of regulars (or a plurality thereof). I deem jQuery.ui.languagesuggester intrusive and silly, whereas anyone is free to agree or disagree. Incnis Mrsi (talk) 08:37, 11 August 2021 (UTC)
Sure. But you'd need to build consensus that there is a problem in order to get a change made to the UI. As you don't seem to have any skills at all in that department, the probability that the UI will be changed as a result of your intervention is vanishingly small. So this whole thread works quite well as performance art, less well as a serious attempt to improve the WD UI. --Tagishsimon (talk) 08:46, 11 August 2021 (UTC)
The user interface is personal, isn’t it? Seriously we should remind it to a user who pops in a technical thread? Now look above: it’s allegedly Incnis_Mrsi who “doesn't seem to have any skills at all”. Incnis Mrsi (talk) 10:32, 12 August 2021 (UTC)
There's a single design of user interface, which we all use. If you wish to change the behaviour of the user interface, it changes for everyone. That's why consensus would be required. This is not difficult. --Tagishsimon (talk) 10:35, 12 August 2021 (UTC)

Wikimania talk - input on important stuff over the last year

Hey folks :)

I have a talk at Wikimania in about a week to talk about cool/interesting/important things that happened around Wikidata over the past year. If you have something that you think is important to mention please let me know.

Cheers Lydia Pintscher (WMDE) (talk) 14:05, 8 August 2021 (UTC)

Hello @Lydia Pintscher (WMDE): some of the projects of the german languages communities (Wikidata:WikiProject Germany, Wikidata:WikiProject Austria, Wikidata:WikiProject Switzerland)) have been working on:

See Wikidata:WikiProject_COVID-19. --M2k~dewiki (talk) 09:03, 9 August 2021 (UTC)


Thanks so much everyone! I'll not be able to mention all of it but I'll try to cover some. <3 --Lydia Pintscher (WMDE) (talk) 18:39, 9 August 2021 (UTC)

Wikidata Query Service (WDQS) User Survey 2021

In the 2021-2022 fiscal year, the WMF Search team is working on scaling up Wikidata Query Service (WDQS) to handle increasing graph size and queries -- we will be following up shortly with more updates regarding this. In order to design a querying service that works for most use cases, it is important for us to understand the needs of WDQS users.

We are trying to cover the diverse use cases of WDQS. Whether you are an occasional user of the service, running queries regularly, for example for maintenance purposes, or hitting the WDQS daily with your tools, your feedback is welcome on this short survey. If you are interested in providing feedback, please fill out our survey: https://docs.google.com/forms/d/e/1FAIpQLSe1H_OXQFDCiGlp0QRwP6-Z2CGCgm96MWBBmiqsMLu0a6bhLg/viewform?usp=sf_link

Thanks for your participation!

This survey will be conducted via a third-party service, which may subject it to additional terms. For more information on privacy and data-handling, see the survey privacy statement For technical issues with the survey, please contact mpham@wikimedia.org MPham (WMF) (talk) 15:44, 12 August 2021 (UTC)

Notability of aircraft

A very simple question: are planes, helicopters etc. (I mean indvidual ones, not types etc.) notable enough to have their own individual items, named most likely after their registration? For ships the answer is yes, as far as I can see, but what about aircraft? Many have their categories on Commons, but I know that is obviously not enough as a reason at WD. I'm looking forward to reading your opinion. Powerek38 (talk) 17:27, 12 August 2021 (UTC)

They meet WD:N criteria 2; aircraft have long been publicly documented for the plane spotter community, and authorities such as the UK CAA make their register publicly available. [3]. A rough count of the figures in w:en:List of most-produced aircraft suggests ~1M aircraft have been built. So, on the face of it, in scope and not of a volume that would cause great concern. --Tagishsimon (talk) 19:43, 12 August 2021 (UTC)

Some script issue?

I see that several items, when added a German description, lack letters. Example: Selin Sayek Böke (Q21044822). The German description says "ürkische Wirtschaftswissenschaftlerin". I am correcting this to türkische Wirtschaftswissenschaftlerin, w/o knowing German, but I assure you this is not the first such case I noticed. I corrected several others time ago, although I do not remember the item titles now. Somewhere there must have been an issue that causes this. --E4024 (talk) 20:33, 12 August 2021 (UTC)

The editing was definitely done by the Edoderoobot. But it's new to me that bots write German descriptions. To be honest, I've never seen this before. --Gymnicus (talk) 20:39, 12 August 2021 (UTC)
Some more, fwiw. --Tagishsimon (talk) 21:08, 12 August 2021 (UTC)
Clear mistake, of course “türkischer” (men) and “türkische” (women) ist correct. According to this edit there might be a problem with Edoderoobot’s edits taken from w:Vorlage:Personendaten. I fixed 9 women but there are still 75 “ürkischer” men left. --Emu (talk) 21:10, 12 August 2021 (UTC)
  Done per Edit group 7f8c4d478c9 --Emu (talk) 21:21, 12 August 2021 (UTC)

How to mark items of nonsense/unidentifiable subjects? GeoNames problem.

Many items based on ceb+sv Wikipedia articles are linked using GeoNames ID (P1566) to GeoNames (Q830106), a low quality "geographic" project with many errors, nonsenses, garbled names, and inaccurate or nonsensical coordinates. In some cases, it is possible to trace which subject the item should apply to, and then the coordinates and name can be corrected. But what about such items, where the data are so erroneous and ambiguous that they cannot be assigned to any real place and object? Is there any property that can be used to mark an item as invalid or untrusted without suggesting that it be deleted? (See Dračice (Q23821033) as an example – there exist many watercourses called "Červený potok" but none of them at the coordinates stated here.) --ŠJů (talk) 15:19, 11 August 2021 (UTC)

Maybe Xinstance of (P31)possibly invalid entry requiring further references (Q35779580). --Matěj Suchánek (talk) 09:45, 13 August 2021 (UTC)
  • So many typographical errors and duplicates, we worked for months to merge duplicates and flag errors. But, to be fair, all large data sets contain errors. We have lists of them awaiting correction at the source here: Wikidata:WikiProject_Data_Quality. It also has a list of queries we run to identify errors. For instance VIAF contains error that are caused by Wikidata, they mirror OUR errors when we do a bad merge by conflating people with a similar name. --RAN (talk) 17:18, 13 August 2021 (UTC)

subclass

There is an unsolved conflicting type constraint regarding musical ensemble (see ABBA). It notices that a musical group is a group of humans but it is not a human.
human and group of humans are not effectively connected. - Coagulans (talk) 22:07, 12 August 2021 (UTC)

It would be helpful if you could point more precisely to an instance of the issue. The item for Abba does not have a value of musical ensemble; items that do, do not have a type constraint issue that I can see. 'group of humans' is connected to 'human' ... effectiveness, or not, perhaps rests more with the setup of the constraint than the contents of the item? As I cannot see an instance of the issue, and as you have not pointed to the property on which the constraint exists, I think we might be in the dark. --Tagishsimon (talk) 22:23, 12 August 2021 (UTC)
12 IDs with type constraint issues on ABBA page: Munzinger, Theatricalia, RollDaBeats, Victoria and Albert Museum, Juno Download, Moov, NME, The DJ List, IFPI Danmark, SNEP, AccuRadio, Filmdatenbank. "Entities using (...) ID property should be instances of human (or of a subclass of it), but ABBA currently isn't." - Coagulans (talk) 01:36, 13 August 2021 (UTC)
So the cure is to fix the constraint definitions: example. --Tagishsimon (talk) 08:55, 13 August 2021 (UTC)
For all of them? There are a lot of person IDs on WKD. It's about the absurdity of actually dealing with "human" and "group of humans" as if they were totally unrelated entities. We should find a way to effectively connect them. - Coagulans (talk) 11:12, 13 August 2021 (UTC)
They /are/ effectively connected. But constraint type definitions look either for instance of or subclass of, and a group of humans is not an instance of or a subclass of a human. It isn't really too much to ask that the constraint type statement for a property carries the values necessary to make it perform correctly; that is preferable to requesting a bodge to the modelling of the connection between group of humans, and human, to avoid setting up the constraint definition properly. As for quantity, see Quickstatements. --Tagishsimon (talk) 11:26, 13 August 2021 (UTC)
They are inconsistently connected. The item for group of humans is a subclass of group of living things but human is not a subclass of living thing (just a metaclass), hence the issue mentioned. - Coagulans (talk) 17:57, 13 August 2021 (UTC)
It may well be the case that there is inconsistency in how both of them are connected to other things, but that has no bearing on the constraint violation issue you raised. There is a lot wrong with many WD class trees. --Tagishsimon (talk) 18:53, 13 August 2021 (UTC)
Indeed, it wouldn't solve the problem. Maybe human or group of humans? - Coagulans (talk) 19:10, 13 August 2021 (UTC)
The search for the w:en:Philosopher's stone continues. --Tagishsimon (talk) 19:17, 13 August 2021 (UTC)

Using Wikidata as vanity spam?

Added today on Wikidata, with a Commons pic, is what I believe is a blatant piece of vanity spam with obvious spam links... i.e. using Wikidata as a personal promotional web space, see here. Can those who have greater Wikidata experience, and editorial oversight, please look at this? Thanks. Acabashi (talk) 10:18, 13 August 2021 (UTC)

Yes. Very spammy. Have asked for its deletion. --Tagishsimon (talk) 10:27, 13 August 2021 (UTC)
Good catch. Just blatant self-promotion. --Christian140 (talk) 10:47, 13 August 2021 (UTC)

content analysis property?

Hello. For items representing bibliographic citations, is there a property that one could use to link to relevant content analysis? For example, the textual analyser Voyant Tools (Q28405731) produces a summary analysis of words in this article. Is there a corresponding content analysis property? Thanks. -- Oa01 (talk) 14:27, 13 August 2021 (UTC)

Presuming there isn't, one could use described at URL (P973) with a qualifier object has role (P3831) taking a value such as content analysis (Q653137). --Tagishsimon (talk) 14:35, 13 August 2021 (UTC)
@Tagishsimon: Thanks for the suggestion! Tried it out here. -- Oa01 (talk) 14:51, 13 August 2021 (UTC)
is there value in such links? BrokenSegue (talk) 14:39, 13 August 2021 (UTC)
It's a different way of reading. Useful for quickly understanding a work, for example. -- Oa01 (talk) 14:56, 13 August 2021 (UTC)
I'm not questioning the value of content analysis. I'm questioning the value of the links. I assume these are machine generated so every document could have such a link. BrokenSegue (talk) 15:07, 13 August 2021 (UTC)

Mistakes in TR:WP

Mistakes in TR:WP cause issues like this. This is not a disam page; it is only about a "mahalle" (small urban settlement). Can someone make it only a "mahalle" (neighbourhood) item in so many languages please? We need a bot. --E4024 (talk) 20:20, 13 August 2021 (UTC)

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. User:E4024

Idea: 2 types of deletion

  • Soft deletion (the item will be possible to be "recycled" with any content) E. g. let's say that:
    a deletion discussion on Q9876543210 closed with soft delete due to no notability. Then an administrator could soft-delete Q9876543210, and when one finds a link to Q9876543210, one could make it with a notable subject and sources.
  • Hard deletion (the item isn't possible to be recycled, only undeleted. This is what we have.)

This allows for fewer item names that just exist unused because they got deleted and can't be recreated. Comments on this? Alfa-ketosav (talk) 16:43, 13 August 2021 (UTC)

Recycling item IDs such that at time a they mean one thing, and time b they mean another, is a poor idea, for the reason that we do not know what third party uses have been made of the ID whilst it was associated with its first meaning; and so changing the meaning is liable to make things that point to the ID point to a thing with, in context, a 'wrong' meaning. Besides which, there is no shortage of numbers; no reason to try to conserve and re-use numbers. --Tagishsimon (talk) 17:32, 13 August 2021 (UTC)
Generally, ID stability is a value for databases and while Wikidata isn't always perfect about upholding it this proposal means that we would be worse at that metric while we get little in return. ChristianKl20:32, 13 August 2021 (UTC)
One idea is to redirect all soft-deleted items to one item, though it may break if there are external sites referring to such IDs.--GZWDer (talk) 01:05, 14 August 2021 (UTC)
I just regret this fact in the deletion process : it is hard to find what was the content of the Q before its deletion so it's difficult to say the deletion was justified and then contest the deletion, afterwards. Bouzinac💬✒️💛 06:40, 14 August 2021 (UTC)

Lachenalia wrightii

Guys, this is a independent specie as far as I can establish. Your views? Regards. Oesjaar (talk) 19:05, 15 August 2021 (UTC)

Kew seems to think it's a thing. http://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:537272-1 --Tagishsimon (talk) 20:03, 15 August 2021 (UTC)

Is there a way to use a gadget on the mobile version of wikidata?

I tried using the desktop view but I can't seem to find the top bar with the gadgets..  – The preceding unsigned comment was added by Jimman2003 (talk • contribs) at 15 August 2021, 21:25‎ (UTC).

Probably not. I believe no gadgets are written to work on mobile anyway. --Matěj Suchánek (talk) 16:12, 16 August 2021 (UTC)

Wikidata weekly summary #481

Universal Code of Conduct - Enforcement draft guidelines review

The Universal Code of Conduct Phase 2 drafting committee would like comments about the enforcement draft guidelines for the Universal Code of Conduct (UCoC). This review period is planned to begin 17 August 2021.

Community and staff members collaborated to develop these draft guidelines based on consultations, discussions, and research. These guidelines are not final but you can help move the progress forward. Provide comments about these guidelines by 17 October 2021. The committee will be revising the guidelines based upon community input.

In Wikidata, you can participate through discussion page for the Wikidata community.

There are planned live discussions about the UCoC enforcement draft guidelines:

Conversation hours - 24 August, 31 August, 7 September @ 03:00 UTC & 14:00 UTC
Roundtable calls - 18 September @ 03:00 UTC & 15:00 UTC

The facilitation team supporting this review period hopes to reach a large number of communities. Having a shared understanding is important. If you do not see a conversation happening in your community, please organize a discussion. Facilitators can assist you in setting up the conversations.

Discussions will be summarized and presented to the drafting committee every two weeks. The summaries will be published here.

Please let me know if you have any questions. --YKo (WMF) (talk) 14:31, 17 August 2021 (UTC)

  Moved from Help talk:Merge

It doesn't seem appropriate to have a separate item for states (current or historical). Could we please find a way to merge these? Courtesy pinging User:Ghouston, User:TomT0m, and User:Infovarius, who have discussed this before here and here. {{u|Sdkb}}talk 20:06, 16 August 2021 (UTC)

When it was discussed previously, several times, I didn't see any point in having separate "former entity" items. Wikidata doesn't need that. However, I don't think there was any consensus. I think the largest discussion (by number of contributors) was Wikidata:Project_chat/Archive/2019/04#Classes_for_defunct_entities, and Project Chat seems like the right place for such discussions. Ghouston (talk) 23:59, 16 August 2021 (UTC)
I think I’m the only advocate of a scheme in 3 classes :  « old state », « present day state » and « present and past state ». It make sense if both « old state » and « new state » are subclasses of « present and past ».
With only two classes like in the title of this discussion it works poorly however, so we would need to add historical country (Q3024240)     . État (ancien et actuel) (Q96196009) should not have instances.
With that in mind, there is probably not a lot of supporters besides myself, I think it’s safe however to ignore the idea. author  TomT0m / talk page 07:41, 17 August 2021 (UTC)

Request for comment notification

Here is a link to a RFC on Meta concerning all Wikimedia projects. Lionel Scheepmans Contact Désolé pour ma dysorthographie, dyslexie et "dys"traction. 23:23, 16 August 2021 (UTC)

this RFC perhaps applies more to wikidata than other wikis since we hold much of the valuable data. BrokenSegue (talk) 01:38, 18 August 2021 (UTC)
The RFC concerns 'Wikimedia Enterprise'. Not sure why it's taken til the third post to mention that, but there we are. --Tagishsimon (talk) 01:40, 18 August 2021 (UTC)

Wikidata:Query_Builder has been deployed

Hi everyone,

As you may know, the Wikidata development team has been working on a form-based query builder. It will allow people who don’t know how SPARQL to build lists of Items. I’m happy to announce that the Wikidata Query Builder has now been deployed: https://query.wikidata.org/querybuilder/

Example queries

One important thing to keep in mind: The Query Builder is not meant to allow you to build queries with the full power of SPARQL. It gives you access to some really important features of SPARQL that are already pretty powerful. We hope that it will even allow more people to get their first experience building queries and then maybe advancing to writing SPARQL directly once they figured out the basics with the Query Builder and got interested.

We started development last year and many of you participated in the testing sessions and gave us valuable feedback along the way. Thanks to you all!

Here are some of the things you can do with the Query builder:

  • Query for any property (supported data types: string, Item, external identifier) with or without a specific value
  • Add or remove several query conditions
  • Include results from subclasses when querying for Item values
  • Having multiple query conditions and connecting them via AND as well as OR
  • See the interface of the Query Builder in several languages by adding ?uselang= to the URL (example - Translatewiki page)
  • Show IDs instead of labels in the results (to prevent timeouts)
  • Add a limit to the number of results (to prevent timeouts)
  • Show the query in the Wikidata Query Service

Please try it out and let us know if you encounter any issues.

Cheers, -Mohammed Sadat (WMDE) (talk) 16:29, 17 August 2021 (UTC)

fr-N

Please someone tell me where was the charismatic French actor Grant Lawrens born ("Où es-tu né?").  – The preceding unsigned comment was added by Coagulans (talk • contribs) at 16:58, August 11, 2021‎ (UTC).

I don't understand why this edit was rolled back even though it shows a lot of errors:

Potential issues;
allowed qualifiers constraint
lexeme is not a valid qualifier for female form of label – the only valid qualifiers are: 
• reason for deprecation
• reason for preferred rank
• named after
• applies to part

In addition, the page is listed here as faulty: Single value per language.

Could you please have a look at that. --HarryNº2 (talk) 00:43, 18 August 2021 (UTC)

@Infovarius: might have more to say, but ... property constraints are often ill-considered and outdated. I think that might be the case here. It seems to be sensible/useful to relate lexemes to 'form of label' values, and so the problem flips to being the constraint design rather than lexemes as qualifiers. --Tagishsimon (talk) 01:00, 18 August 2021 (UTC)
Single value per language is a case in point. Father and Dad are very commonplace English labels for a male parent. The Single value per language constraint report takes no account of that; but points vaguely to the possibility of 'false positives' and mutters about the need to get around to putting together an exceptions list. All constraint warnings should be examined for sanity before being acted on, since as often as not, they reflect a failure of the imagination of whoever constructed the constraint, or, as noted, the passage of time, such as the introduction of new properties undreamt of at the time the constraint was written. WD is a work in progress. Its constraint warning system doubly so. --Tagishsimon (talk) 01:18, 18 August 2021 (UTC)
Also, if you use templates in headers, you break the navigation from e.g. watchlists & histories, to the sections. Wish people wouldn't do that; it's not clever. --Tagishsimon (talk) 01:27, 18 August 2021 (UTC)
Then change that. HarryNº2 (talk) 01:37, 18 August 2021 (UTC)
Etiquette is, in general, not to edit other people's work on talk pages. So 'then don't do it' is more to the point. --Tagishsimon (talk) 01:44, 18 August 2021 (UTC)
I just wonder how long the list will be if you include every label/name in every language. HarryNº2 (talk) 01:37, 18 August 2021 (UTC)
Who knows. Not as long as a bunch of other problem children, I assure you. The better question is, why would we wish to omit valid data in the pursuit of short lists; that's not what we're here for. --Tagishsimon (talk) 01:44, 18 August 2021 (UTC)
Perhaps another property should be suggested for this case. I think female form of label (P2521)/male form of label (P3321) is not the place for that. HarryNº2 (talk) 03:24, 18 August 2021 (UTC)
Not the place for what? Associating the label value with the lexeme? Why not? The values don't not exist on any other property; they're found on female form of label (P2521) and male form of label (P3321). That makes them the natural place to do it. Do it on another property and all you've done is a) duplicated the values and b) moved the supposed issue to the new property/ies. That makes no sense at all.
Or hold a single label per language on female form of label (P2521) and male form of label (P3321) and put other labels on another property? Again, why? What does that achieve? How do you select which is placed on which property? Again, it makes no sense. --Tagishsimon (talk) 03:52, 18 August 2021 (UTC)

Q107614784 should be deleted

Q107614784 conficts with Q107635470. The former item and the category [4] within it should be deleted. 佛祖西来 (talk) 15:51, 18 August 2021 (UTC)

They've been merged. --Tagishsimon (talk) 15:59, 18 August 2021 (UTC)

EN:WP typing mistake reflects here

Sabire Aydemir (Q6082164) has an EN:WP article with her correct name, but inside the text the surname was written (by mistake) as Aydenir. Now some stupid bot is adding an "alias" to her name as Sabire Aydenir (sic). Can someone working in that WP kindly correct the typing mistake, please? Thanks in advance. --E4024 (talk) 18:58, 18 August 2021 (UTC)

Done. --Tagishsimon (talk) 19:06, 18 August 2021 (UTC)

Comment notification update

Update: The promised update, which will let you "subscribe" to or follow individual discussions on a talk page, is   Done! If you go to Special:Preferences#mw-prefsection-betafeatures and enable "Discussion tools", you will get a [subscribe] button on talk pages like this one. Please try it out, and ping me or leave a note at  mw:Talk:Talk pages project/Notifications if you run into any problems. Whatamidoing (WMF) (talk) 19:01, 18 August 2021 (UTC)

Wikimedia disambiguation page (Q64875536)

I ignore the need for this item; however I guess Gospel Church (Q106695881) and a couple of other items should not be linked here. If we remove those, maybe I will be able to understand what this serves at. --E4024 (talk) 19:48, 18 August 2021 (UTC)

High level moderation question

I am part of a software engineering research team at University of California, Irvine focusing on educational interventions driven by the public representation of knowledge. We are intrigued by Wikidata's potential as a public source of high-quality, linked data. We are curious, however, how the various contributions are moderated - we'd love to have more insights into whether every new contribution to Wikidata is reviewed by at least one moderator, and if so, how quickly in general are statements reviewed? We are asking because we are exploring ways of leveraging Wikidata in a classroom settings and would love to understand more about the quality of the data before we present to the students and, in the end, ask them to work with the data and perhaps even make contributions. Thanks!

not all edits are even ever reviewed. there is no centralized moderation system and fewer active people patrolling than English Wikipedia and we lack an ML-powered vandalism detection robot. That said the vast majority of edits are "good" but also remember that the majority of edits are made by bots. BrokenSegue (talk) 17:42, 18 August 2021 (UTC)
There is a layer of WikiProjects that should (!) be contacted before new, high volume data imports. Their active members often use watchlists and constraint conflict logs to have a certain level of control. In principle, doing great harm requires a bot account, and knowledge to use it (which is one reason IMO for the relatively high quality of WD). In my experience (molecular biology, chemistry), most erroneous information comes from database imports, i.e. the databases themselves have it, and casual Wikipedia users. --SCIdude (talk) 08:17, 19 August 2021 (UTC)

Wikidata Query Service scaling update Aug 2021

Wikidata community members,

Thank you for all of your work helping Wikidata grow and improve over the years. In the spirit of better communication, we would like to take this opportunity to share some of the current challenges Wikidata Query Service (WDQS) is facing, and some strategies we have for dealing with them.

WDQS currently risks failing to provide acceptable service quality due to the following reasons:


1. Blazegraph scaling

a. Graph size. WDQS uses Blazegraph as our graph backend. While Blazegraph can theoretically support 50 billion edges, in reality Wikidata is the largest graph we know of running on Blazegraph (~13 billion triples), and there is a risk that we will reach a size limit of what it can realistically support. Once Blazegraph is maxed out, WDQS can no longer be updated. This will also break Wikidata tools that rely on WDQS.
b. Software support. Blazegraph is end of life software, which is no longer actively maintained, making it an unsustainable backend to continue moving forward with long term.

Blazegraph maxing out in size poses the greatest risk for catastrophic failure, as it would effectively prevent WDQS from being updated further, and inevitably fall out of date. Our long term strategy to address this is to move to a new graph backend that best meets our WDQS needs and is actively maintained, and begin the migration off of Blazegraph as soon as a viable alternative is identified.

In the interim period, we are exploring disaster mitigation options for reducing Wikidata’s graph size in the case that we hit this upper graph size limit: (i) identify and delete lower priority data (e.g. labels, descriptions, aliases, non-normalized values, etc); (ii) separate out certain subgraphs (such as Lexemes and/or scholarly articles). This would be a last resort scenario to keep Wikidata and WDQS running with reduced functionality while we are able to deploy a more long-term solution.


2. Update and access scaling

a. Throughput. WDQS is currently trying to provide fast updates, and fast unlimited queries for all users. As the number of SPARQL queries grows over time alongside graph updates, WDQS is struggling to sufficiently keep up in each dimension of service quality without compromising anywhere. For users, this often leads to timed out queries.
b. Equitable service. We are currently unable to adjust system behavior per user/agent. As such, it is not possible to provide equitable service to users: for example, a heavy user could swamp WDQS enough to hinder usability by community users.

In addition to being a querying service for Wikidata, WDQS is also part of the edit pipeline of Wikidata (every edit on Wikidata is pushed to WDQS to update the data there). While deploying the new Flink-based Streaming Updater will help with increasing throughput of Wikidata updates, there is a substantial risk that WDQS will be unable to keep up with the combination of increased querying and updating, resulting in more tradeoffs between update lag and querying latency/timeouts.

In the near term, we would like to work more closely with you to determine what acceptable trade-offs would be for preserving WDQS functionality while we scale up Wikidata querying. In the long term, we will be conducting more user research to better understand your needs so we can (i) optimize querying via SPARQL and/or other methods, (ii) explore better user management that will allow us to prevent heavy use of WDQS that does not align with the goals of our movement and projects, and (iii) make it easier for users to set up and run their own query services.

Though this information about the current state of WDQS may not be a total surprise to many of you, we want to be as transparent with you as possible to ensure that there are as few surprises as possible in the case of any potential service disruptions/catastrophic failures, and that we can accommodate your work as best as we can in the future evolution of WDQS. We plan on doing a session on WDQS scaling challenges during WikidataCon this year at the end of October 2021.

Thanks for your understanding with these scaling challenges, and for any feedback you have already been providing. If you have new concerns, comments and questions, you can best reach us at this talk page. Additionally, if you have not had a chance to fill out our survey yet, please tell us how you use the Wikidata Query Service (see privacy statement)! Whether you are an occasional user or create tools, your feedback is needed to decide our future development.

Best,

WMF Search + WMDE  – The preceding unsigned comment was added by MPham (WMF) (talk • contribs).

Thank you for working on this! IMO, separation of scholarly papers into a subgraph would not do much harm (especially if it would be an interim solution). As it is, one can hardly do any Wikidata queries on scholarly articles right now anyway due to their sheer number (more complex queries tend to time out anyway). Vojtěch Dostál (talk) 14:26, 20 August 2021 (UTC)

Khanum (Q16119358)

I do not quite understand how Khanum (Q16119358) ended up being a disam page. Its English description is: honorific title; female equivalent of "khan". Also see the connected "article" in EN:WP please. --E4024 (talk) 01:11, 19 August 2021 (UTC)

Probably because frwiki and ruwiki articles are marked as disambiguations. --Matěj Suchánek (talk) 13:21, 19 August 2021 (UTC)
Looking at the history, the first & only P31 applied to Khanum (Q16119358) was Wikimedia disambiguation page (Q4167410); but that was after 5 sitelinks had been added, only one of which was a DAB page. Khanum (Q37127819) was established at its outset as a family name item, but never got any sitelinks. I've redistributed the sitelinks amongst Khanum (Q37127819), Khanum (Q16119358) and Hanum (Q702967) - the last b/c two of the sitelinks were to articles about a commune in Germany. Not WD's finest hour :( --Tagishsimon (talk) 03:05, 20 August 2021 (UTC)
Continue spending your sleep hours to WD like this scribe and at odd hours of night you will discover more stupidities and will feel obliged to continue work with the eyelids falling with gravity... --E4024 (talk) 03:19, 20 August 2021 (UTC)
BTW I just found another disam page with this title and merged it into Khanum (Q16119358). --E4024 (talk) 03:26, 20 August 2021 (UTC)
Ah yes. Khanum (Q108165988). Very good plan. Suspect some more of the sitelinks will want to be decanted into that. --Tagishsimon (talk) 03:31, 20 August 2021 (UTC)

data size (P3575) limitations in constraints - Different languages for the same Firefox version have different data sizes

Firefox download sizes in version 91.0.1 in these languages English, French and Hungarian have the following sizes:

  1. English, 76050533 bytes in size
  2. French, 77141836 bytes in size
  3. Hungarian, 77579483 bytes in size

In constraints for data size (P3575) there is no allowed qualifiers constraint (Q21510851) property for language. Language specification is necessary because I've seen no proof of there being a "default language download" for Firefox from Mozilla. Oduci (talk) 13:04, 20 August 2021 (UTC)

It's not uncommon for qualifier constraint definitions on properties to omit qualifiers which are of legitimate use, such as is suggested in this instance. The cure is to change the constraint. They're not set in stone. Neither, IMO, does it need prior discussion; BOLD, revert, discuss, as on WP, works well here. --Tagishsimon (talk) 13:10, 20 August 2021 (UTC)

Data modelling: transport routes

Going in the footsteps of creators of the series of items on London Buses (Q1192411) routes, I have recently created a number of routes items for Warsaw Tramway (Q601176), bus transport in Warsaw (Q9162513) and other local transport networks in my native Poland (in the same time I'm also cleaning up the related photos on Commons). While working on this, I realised there are some data modelling challenges involved . The most important issue, at least in my view, is the following: how do we reflect in WD the changes in routes? My idea is to use terminus (P559) with qualifiers start time (P580) and end time (P582), but that's just one option. The other thing: how to properly record the fact that a route was created, that disbanded and then reactivated with the same number, but sometimes different route. Powerek38 (talk) 14:59, 20 August 2021 (UTC)

Is it possible to link to a specific entry of a property? ie. https://www.wikidata.org/w/index.php?title=Q55671&oldid=1448171241#P348#2

Is it possible to link to a specific entry of a property inside a property item? ie. https://www.wikidata.org/w/index.php?title=Q55671&oldid=1448171241#P348#2

Using the link above I wanted to link to the 2nd version entry, "5.0.2". Is there a way? Oduci (talk) 10:33, 17 August 2021 (UTC)

This, which relies on the statement node string? https://www.wikidata.org/w/index.php?title=Q55671&oldid=1448171241#Q55671$77ded0cf-4ec6-63f1-5577-b9c1f2f74b47 --Tagishsimon (talk) 10:59, 17 August 2021 (UTC)
@Tagishsimon Thank you, can you help me by referencing anything that will help me to find the node string for myself?(ie. help instructions, help guides etc.) I find this feature very helpful when I want to be very specific. If there is no documentation can you please provide me with instructions how to find a node string for anything I want? Oduci (talk) 13:10, 20 August 2021 (UTC)
Nevermind. I just realized I can look at a page's source code on a Wikidata item and then I find it. Now I know it's possible. Thank you Tagishsimon for your help! Didn't know this existed! Oduci (talk) 13:30, 20 August 2021 (UTC)
Exactly that, Oduci; the page information link on the item (e.g. https://www.wikidata.org/w/index.php?title=Q55671&action=info ) and select the rdf file from Alternate views - https://www.wikidata.org/wiki/Special:EntityData/Q55671.rdf?flavor=dump ... or via WDQS. --Tagishsimon (talk) 13:54, 20 August 2021 (UTC)
Or "DESCRIBE" on the same page. --- Jura 15:08, 20 August 2021 (UTC)
(bad sample here, as there are too many ;) .. It generally works. --- Jura 15:10, 20 August 2021 (UTC)

this seems like it should be a built-in feature of wikibase (i.e. click a button to get a link to this statement). guess it could be an extension. BrokenSegue (talk) 16:00, 20 August 2021 (UTC)

There is a user script by Nikki that works well: User:Nikki/AnchorLinks.js It adds little arrows next to statements and statement groups to generate the links. --Lydia Pintscher (WMDE) (talk) 16:07, 20 August 2021 (UTC)

Gwen Ifill

Wanted to make sure her cause of death was recorded properly Sales2Knowledge (talk) 04:01, 20 August 2021 (UTC) Sales2Knowledge

@Sales2Knowledge: Does this refer to Gwen Ifill (Q5623430)? Do you have an issue with the cause of death currently listed - and cited - on that item? Can you provide an alternative citation? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:12, 20 August 2021 (UTC)

Bot to preemptively protect highly transcluded pages

After observing the recent wave of template vandalism here on Wikidata, I spoke to @DannyS712 about bringing w:User:MusikBot II/TemplateProtector to Wikidata. This would automatically protect highly transcluded pages, based on thresholds the community defines. DannyS712 and I agreed this seemed worthwhile, so I filed Wikidata:Requests for permissions/Bot/MusikBot II. Then it occurred to us that preemptive protection of templates might need a proper RfC. I would think not, as this is standard practice on nearly all wikis, tantamount to protecting the Main Page. Once things become too visible, they become high-risk and warrant preemptive protection. Note again that you, the community, decide on the "thresholds" the bot uses, but let's save that for a different discussion.

So before I file an RfA for my bot, which is also needed, I thought I'd get a quick straw poll here: does preemptively protecting high-risk templates require an RfC? Or perhaps enough show of support here will suffice? — MusikAnimal talk 22:07, 18 August 2021 (UTC)

How to chose a class of item in JavaScript

Hello, I don't really have any particular knowledge in Javascript but I'm smart enough to managed to copy small portions of code and to built small easy things for me. One of them is e.g.: $(document).ready(function() { if (mw.config.get('wgNamespaceNumber') !== 0 || mw.config.get('wgAction') !== "view" ) return; mw.util.addPortletLink('p-tb', 'https://wcqs-beta.wmflabs.org/embed.html#%23defaultView%3AImageGrid%0ASELECT%20%3Ffile%20%3Fimage%0AWITH%0A{%0A%20%20SELECT%20%3Fitem%20%0A%20%20WHERE%0A%20%20{%0A%20%20%20%20SERVICE%20%3Chttps%3A%2F%2Fquery.wikidata.org%2Fsparql%3E%0A%20%20%20%20{%0A%20%20%20%20%20%20%20%20%3Fitem%20wdt%3AP171%2Fwdt%3AP171*%20wd%3A' + mw.config.get('wgPageName') + '.%0A%20%20%20%20%20}%20%0A%20%20}%0A}%20AS%20%25get_items%0AWHERE%0A{%0A%20%20INCLUDE%20%25get_items%0A%20%20%3Ffile%20wdt%3AP180%20%3Fitem%20.%0A%20%20%3Ffile%20schema%3AcontentUrl%20%3Furl%20.%0A%20%20BIND(IRI(CONCAT(%22http%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FSpecial%3AFilePath%2F%22%2C%20wikibase%3AdecodeUri(SUBSTR(STR(%3Furl)%2C53))))%20AS%20%3Fimage)%0A}', 'Image query (for taxa)', null); });

The code above give me a link in the left sidebar to a query depending on the item I'm in. However this query is specific for taxa, and I would want that the link be displayed in the left sidebar only when I'm in an item instance of (P31) taxon (Q16521). Is there anybody able to modify the above code in that purpose? Christian Ferrer (talk) 09:22, 21 August 2021 (UTC)

@Christian Ferrer: User:Succu/taxobox.js is an example of a similar check. --Lockal (talk) 13:11, 21 August 2021 (UTC)
Thanks, the check is done on a property present on taxa items, so it is indeed what I asked for. I guess it is also easily adaptable to make different checks for other properties and specific values within those properties. I will study this, thanks again. Christian Ferrer (talk) 14:47, 21 August 2021 (UTC)

Duplicate Polangen Volost

Q16468482 = Q86002800 89.12.222.242 11:40, 21 August 2021 (UTC)

Hopefully it's really the same. Merged. Vojtěch Dostál (talk) 17:23, 21 August 2021 (UTC)

Enable all ISO 639 codes

Is there a way to enable all ISO 639-3 (Q845956) codes on Wikidata? Wikidata only supports a few hundred languages, whereas there are more than 7,000 languages in the world.

I tried creating a lexeme for bàbò (Nupe (Q36720) for Lagenaria siceraria (Q1277255)), but technical restrictions prevented me from doing so. I also could not add the Nupe name to Lagenaria siceraria (Q1277255), since Wikidata items cannot be linked to Incubator pages. In the future, I would like to add lexemes for dozens of African languages that do yet have any officially launched wikis, but it appears that Wikidata cannot yet support this. Sabon Harshe (talk) 09:18, 21 August 2021 (UTC)

See phab:T195740 and phab:T168799. Currently you need to use mis, or mis-x-Qxxxx if there are more than one (this only apply to lexemes; monolingual text supports mis but not mix-x-Qxxxx; terms support neither.)--GZWDer (talk) 13:08, 21 August 2021 (UTC)
@GZWDer: I just did so here. It looks kind of clunky and awkward, so let me know if there's a better way to do it. Also, is there a way to mass import these vernarcular plant names in currently unsupported languages from CSV/TSV formats via, for example, QuickStatements? Sabon Harshe (talk) 08:20, 22 August 2021 (UTC)

Structure on WD for UN Security Council (or any other organisation) sessions & resolutions

Backgrounds: organizations with assembly, commission or similar has regularly meetings and in some of them they vote and issue a resolution. In WD we've one item by each UNSC resolution and just a few for meetings.
Problem: I try to upload, in the resolution items, the number of session in which that resolution was done. As there is not a property for "session number" or similar, I need to find a good alternative to hold the information. I tried two options:
1) To assume adding it in a new statement related with topic:

See example in United Nations Security Council Resolution 610 (Q2876306)

2) To assume modify the P9681 existing statement:

See example in United Nations Security Council Resolution 611 (Q3120119)

Question:which of them (or other proposal) should be used ? Alternatively to P790, I tested editor (P98), place of publication (P291), publisher (P123), creator (P170) or author (P50), but it is forced, and they trigger a lot of constraints because are though for person or companies, not for organization structures. Thanks a lot for your suggestions. Amadalvarez (talk) 12:13, 21 August 2021 (UTC)

@Laddo: Any suggestions ? Amadalvarez (talk) 14:49, 22 August 2021 (UTC)
For both of those properties, I'd be inclined to create items for each meeting (these would have P279=Q108172332); use the meeting item as the value for P790 and/or P9681; and use series ordinal (P1545) as a qualifier of a part of the series (P179) value in the meeting item; in much the same way as legislation is arranged into series, such as at Q100164823#P179. The problem right now is that you're trying to overload a statement in a situation where you can relegate the conjunction of "it's this sort of meeting" with "having this number" to a discrete item and save all the bother. Presumably there's lots of additional info which could be added to the meeting items - date, location, participants, described at, full text of &c. --Tagishsimon (talk) 15:00, 22 August 2021 (UTC)
Thanks, @Tagishsimon:, you're right. But there are almost 9000 meeting for 2600 resolutions. It's a pharaonic work, specially because the most important aspect is "the resolution", and that is the information piece we can find in WPs, the meetings are not. Some documents are available as PDF, but I'm not sure to find a source to bulk upload the other properties: assistants, vote for each of them, etc. Even more, I'm not sure to have info for meetings without resolution (and maybe doesn't worth it). I discarded any use of P179 because as, a main property, it belongs to the resolution; really your proposal to use it is the correct one. I'll evaluate the effort and will decide whether to upload the meeting number or wait for my next live to do it. Many thanks for your answer. Amadalvarez (talk) 11:23, 23 August 2021 (UTC)

Hi guys! I am still busy creating flora species on the Afrikaans Wikipedia focussing predomiantly on fynbos. Geissorhiza ixioides is now seen as an independant specie, see [5] and [6]. It was previously seen as part of Geissorhiza leipoldtii. Can we please have Wikidata updated? I am not an expert here. Regards. Oesjaar (talk) 07:57, 22 August 2021 (UTC)

@Oesjaar: well you're more expert than me on the subject matter. I tried my best based on what you said. I made Geissorhiza ixioides (Q108203307) with the name of the new species, marked it as different from Geissorhiza leipoldtii (Q15564151) and migrated the af wikipedia article link to that new item page. Does this look right to you? BrokenSegue (talk) 23:08, 22 August 2021 (UTC)
@BrokenSegue: spot on! Thank you (dankie in Afrikaans). The expert remark, I meant Wikidata! Regards! Oesjaar (talk) 05:11, 23 August 2021 (UTC)

[BREAKING CHANGE] Pagename/filename normalization on saving an edit

Hello,

As you may know, Wikibase currently does not normalize pagenames/filenames on save (e.g. underscores in the input for properties of datatype Commons media are allowed). At the same time, Wikidata’s quality constraints extension triggers a constraint violation after saving, if underscores are used. This is by design as to long-established Community practices. As a result, this inconsistency leaves users with unnecessary manual work.

We will update Wikibase so that when a new edit is saved via UI or API, and a pagename/filename is added or changed in that edit, then this pagename/filename will be normalized on save ("My file_name.jpg" -> "My file name.jpg").

More generally, the breaking change is that a user of the Wikibase API may send one data value when saving an edit, and get back a slightly different (normalized) data value after the edit was made: it is no longer the case that data values are either saved unmodified or totally rejected (e.g. if a file doesn’t exist on Commons). Since this guarantee is being removed with this breaking change announcement, we may introduce further normalizations in the future and only announce them as significant changes, not breaking changes.

The change is currently available on test.wikidata.org and test-commons.wikimedia.org. It will be deployed on Wikidata on or shortly after September 6th. If you have any questions or feedback, please feel free to let us know in this ticket.

Cheers, Lucas Werkmeister (WMDE) (talk) 11:48, 23 August 2021 (UTC)

Wikidata weekly summary #482

Arabic script (Q107017430)

Why do we need Arabic script (Q107017430)? It has been created by an experienced user; however I see nothing linked there. (Commonscat is redundant, IMHO.) There (at Commons) there are several redundant items related to anything "Arabic", from "Arabic chicken dishes" to "Arabic script", again IMHO. Why don't we merge script and alphabet? --E4024 (talk) 20:42, 18 August 2021 (UTC)

@E4024: You could have perhaps pinged me? 'script' and 'writing' are two separate concepts, which need separate items. I have no idea of how 'chicken' features. Thanks. Mike Peel (talk) 20:57, 18 August 2021 (UTC)
I don't understand the distinction you're making between the items either. Arabic script (Q1828555) is the item for the Arabic script, so why doesn't the Commons category for the Arabic script belong there?
In fact, I don't understand what Commons is even doing. I would've thought "Arabic writing" would be about text in the Arabic language, especially given that it's in the category "Arabic language", but it contains files about the Arabic script plus the category "Arabic script" itself.
- Nikki (talk) 22:56, 23 August 2021 (UTC)

Handy little script

Just thought I'd drop off a handy script I wrote a short while ago that makes it easier to copy-paste Q/P-ids. It's been helping me a lot: User:Inductiveload/scripts/ShowQsAndPs.

 

Unless there's a built-in way to show this? Inductiveload (talk) 20:14, 20 August 2021 (UTC)

There isn't a built-in way - we have phab:T249580 asking for a gadget.
I have a similar script at https://www.wikidata.org/wiki/User:Nikki/ShowIDs.js which works after edits and on lexemes as well.
- Nikki (talk) 22:20, 23 August 2021 (UTC)

Item merge conflict due to distinct Wikimedia Categories; Palazzo di Spagna (Q9054484) and Embassy of Spain to the Holy See (Q56326462)

These two items Palazzo di Spagna (Q9054484) and Embassy of Spain to the Holy See (Q56326462) do not have a conflict in terms of Wikipedia links, but do have a conflict with their Wikimedia Commons categories. I could not figure out how to resolve it. Shushugah (talk) 22:22, 23 August 2021 (UTC)

Embassy of Spain to the Holy See (Q56326462) is apparently for the embassy, not the building it's housed in. Ghouston (talk) 07:07, 24 August 2021 (UTC)
Yes. They should not be merge. It's not the same one building and the organization that "now" is within. Gare d'Orsay (Q2698691) has not anymore a gare function, but a museum, Musée d'Orsay (Q23402). Amadalvarez (talk) 16:31, 24 August 2021 (UTC)

In Commons: You should remove Category:Embassies in Rome to the Holy See and Category:Embassies to the Sovereign Military Order of Malta from Category:Palazzo di Spagna, "to begin with". After that it will be easier to see the situation and make more edits if you deem so. --E4024 (talk) 02:09, 25 August 2021 (UTC)

Blacklist of sources

Do we have any blacklist of sources not to use? If yes, what are the criteria to add anything there? --E4024 (talk) 04:46, 24 August 2021 (UTC)

We have the Wikidata-specific and global spam blacklists, which work for some particularly egregious cases, but as far as I know there's no equivalent of English Wikipedia's deprecated sources. Vahurzpu (talk) 12:57, 24 August 2021 (UTC)

Findagrave tool

I think as one time someone was working on a Findagrave tool, where you would enter the Findagrave ID and it would create a Wikidata entry by transferring the data, anyone know if it was created? --RAN (talk) 22:03, 24 August 2021 (UTC)

Aliases for people

A salutary tale: Prior to the closing session of Wikimania yesterday, I failed to find the item about one of the panellists, Peter Singer.

The item existed, as Q7172444, but was labelled (in English) as "Peter A. Singer", with no "Peter Singer" alias (neither in English nor any other language).

  1. Please try to add all plausible aliases when creating or updating items about people.
  2. Can our search be made to find such "fuzzy" matches?
  3. Is there a user script to add plausible aliases [1]? Or can someone please make one?
  4. Can bots that create biographical items be made to do this?
  5. Can a bot do this for existing items? Even if only a subset with western names, it would be a start.

[1] In this case, "Peter A. Singer" would generate "Peter A Singer|Peter Singer|P. A. Singer|P.A. Singer|P A Singer|P. Singer|P Singer|Singer|Singer P|Singer P.|Singer P. A.|Singer P A|Singer PA" as per this edit by User:Daniel Mietchen.

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:26, 18 August 2021 (UTC)

I support all points, but I have a doubt about the quoted edit by @Daniel Mietchen:; since I read in Help:Aliases (which is a draft, still to be approved) that aliases should not include "Alternative word order for people names (first name followed by last name vs. last name, comma, first name)", I tend to think that also forms like surname only ("Singer") and surname followed by initials ("Singer P|Singer P.|Singer P. A.|Singer P A|Singer PA") should be avoided as aliases, at least until the problem is definitively settled. --Epìdosis 12:33, 18 August 2021 (UTC)
Note that solving (2) would make the rest a bit redundant. If you go to the bottom of the drop-down to "Search for pages containing..." (i.e. the main search page here) it does generally work to find people's names like this, but they may be well down the page. Making it easier to add things like "haswbstatement:P31=Q5" would maybe help a bit? ArthurPSmith (talk) 17:12, 18 August 2021 (UTC)
Why did you fail to find the item? Special:Search/Peter Singer -book, for example, finds plenty of Peter Singers, including P. W. Singer (Q320696) who isn’t labeled or aliased “Peter Singer” in any language. Aliases can be useful, of course, but I think it’s rather excessive and entirely unnecessary to record every minor variation you can think of (“P. A. Singer” and “P.A. Singer” and “P A Singer”, really?). --Lucas Werkmeister (talk) 17:13, 18 August 2021 (UTC)
(simultaneous edit) Arthur said basically the same thing, except that I wanted to avoid haswbstatement in my example search ^^ --Lucas Werkmeister (talk) 17:14, 18 August 2021 (UTC)
Hard to say, now that the item as been updated; but P. W. Singer appears in the latter half of the second page of search results for "peter singer" (without quotes), below results such as "Where Borders Become Meeting Places: Review of Charles C. Camosy, Peter Singer & Christian Ethics: Beyond Polarization" and "The moral status of the human embryo according to Peter Singer: individuality, humanity, and personhood". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:44, 21 August 2021 (UTC)

Reversed personal name as alias

It has some usage, it should be included, otherwise no result is returned in the drop down search field. - Coagulans (talk) 11:22, 24 August 2021 (UTC)

... a common cause for accidental duplicates - Coagulans (talk) 07:07, 25 August 2021 (UTC)

Public domain statements will become stale

When a work is in the public domain due to 'post mortem auctoris (Q2105922)' (i.e. the author's been dead for a certain amount of time), I have seen the following done:

However, this will become stale over time. Is there a better way to handle this? A qualifier of some sort that includes the relevant death date? Inductiveload (talk) 22:57, 21 August 2021 (UTC)

(EC) Putting to one side that in your example, there's a disjuncture between 50 and 100 years pma ... my reading of 50 years PMA is the author died 50 or more years ago. Unclear to me how that would go stale. --Tagishsimon (talk) 00:15, 22 August 2021 (UTC)
I'm running some copyright bots, and I am planning to work no a bot that does a yearly check if the jurisdiction can change from 70 to 100 years pma to worldwide etc. I was also wondering if a calculated property with Functions could replace this after a while. --Hannolans (talk) 11:06, 22 August 2021 (UTC)
What is a "calculated property with Functions"? Have I missed a memo? --Tagishsimon (talk) 15:04, 22 August 2021 (UTC)
I assume they are referring to the upcoming abstract wikipedia. [7]. My guess is it will have lots of implications for wikidata. BrokenSegue (talk) 15:38, 22 August 2021 (UTC)
@Jura1, Tagishsimon: because every year on the first of January (assuming all countries use 1st Jan), perhaps hundreds of thousands of works will transition from, say "PD in 50 years pma countries" to "PD in 60 years pma countries"; "60" to "70", "70" to "80", "80" to "90", "90" to "95" and finally (currently) "95" to "100".
The data won't become inaccurate as such (because it will remain PD in the 50 year countries), but it will be incomplete (because it's not 50 years any more, it's 60). Notably, if, say, WS were to attempt to use WD as a source for licensing, via some template, it would become wrong for roughly 10% of texts every year. Locally, WS uses the actual data of the author's death to calculate the template to be used, so the templates update every year as needed. Inductiveload (talk) 20:56, 22 August 2021 (UTC)
yeah sounds like we either should wait for wikifunctions or setup a bot. BrokenSegue (talk) 23:00, 22 August 2021 (UTC)
  • Wouldn'it it be better to just state how many years need to pass from the person's death for his/her work to become PD? Any end-user of our data could then calculate if it's PD using date of death (P570). Things like Wikidata Queries or Wikipedia templates would be easy to set up that way. Vojtěch Dostál (talk) 09:36, 23 August 2021 (UTC)
I recall having a conversation about this a long time ago, probably with Jarekt. My proposal was to create items for the 50 years (1970 - 1921):
Every year on public domain day:
That isn't a lot of work to set this up and to do this once a year. Multichill (talk) 19:05, 23 August 2021 (UTC)
  • As for Wikidata statements in general, I think we should stick with incremental additions and not do any overwrite. If new statements can be added by new criteria, add these. There is no need to overwrite past ones. --- Jura 11:42, 25 August 2021 (UTC)

Actor and voice actor

Pekcan Koşar was the voice of Tarık Akan (Sleep in peace, both) in his films. How can we set this relation in their respective items? --E4024 (talk) 14:03, 25 August 2021 (UTC)

@E4024: I think the standard way to do this would be to use performer (P175) on the characters they played and note that one subject has role (P2868) voice actor (Q2405480). Alternatively if they had a relationship where they worked closely together you could do significant person (P3342) Pekcan Koşar (Q6088880) subject has role (P2868) voice actor (Q2405480) on Tarık Akan (Q2064734)'s item. BrokenSegue (talk)

Sports

How do we know who is a "powerlifter" and who is a "weightlifter"? Which one is the classical olympic sport that we all know? Thanks in advance. --E4024 (talk) 01:48, 25 August 2021 (UTC)

probably one should be a subclass of the other. people often confuse profession and occupation. i think the former is meant to be something you are paid for. I'd say neither is strictly a profession so both should be occupations. there are lots of such inconsistencies in wikidata. just use your best judgment. BrokenSegue (talk)
@E4024: In relation to athletes, I always mostly use job and occupation at the same time, because of course there are certainly athletes who practice their sport professionally, but then there are also athletes who only do it as an occupation. That's why I usually use both, even if, as BrokenSegue said, it's actually inconsistent.

WikidataCon update - August 2021

Hello all,

As the organization team at Wiki Movimento Brasil and Wikimedia Germany is moving forward with designing and preparing the conference, we will be able to share more exciting news with you over the next few weeks. Today, we would like to remind you of the general context and structure of the WikidataCon 2021, to let you know about our experimental grant program for Latin America and the Caribbean, and to give you a glimpse of the next steps and milestones for the conference.

First of all: the WikidataCon 2021 is taking place in a bit more than two months, on October 29th, 30th and 31st. Since the current state of the pandemic doesn’t allow onsite events in Berlin or São Paulo, nor international travels, the conference will take place entirely online and all sessions, presentations and workshops will be accessible through our remote event platform Venueless. We hope that this remote format will allow more people from various countries to participate in the WikidataCon 2021.

The first day of the conference will include various keynotes and presentations curated by the organization team. The detailed program will be available soon, but we can already mention a few topics that will be highlighted on Day 1, related to the main topic of the conference, envision a sustainable future for Wikidata: we will talk about knowledge justice and decolonization with Whose Knowledge, discover the preliminary results of Reimagining Wikidata from the margins, address the technical challenges of Wikidata, dive into the experiments of Brazilian cultural institutions, and explore the future of Wikidata and Abstract Wikipedia.

The first day will also include the now traditional birthday celebration sessions - birthday presents, community awards, and of course, a birthday party! These events will be organized together with some active community members, and we can’t wait to reveal more about how you can contribute to Wikidata’s ninth anniversary.

The second and third day of the conference will be entirely yours! An open unconference format will allow participants to schedule discussions, presentations and workshops, organized in different tracks, and covering a large time slot to accommodate various time zones. More information will come soon and you will be able to schedule sessions starting at the beginning of October. In the meantime, you can already think about the kind of sessions you would like to organize or participate in, and discuss on this talk page.

In 2021, with the collaboration between Wikimedia Germany and Wiki Movimento Brasil, the WikidataCon has a focus on Latin America and the Caribbean, and we would especially like to encourage people from these regions to participate in the conference. This is why we designed a grant process that will support affiliates from Latin America and the Caribbean to organize pre-conference online events, in order to onboard their local communities to Wikidata and ensure a smooth transition to the conference. The grant will also cover an “e-scholarship” to support people from these communities to attend the WikidataCon in the best possible conditions.

You can find more information about the grant here. The call for proposals is now open until September 12th.

As this process is new for the organizers (collaboration between two Wikimedia chapters, allocating grants to other affiliates for online events), and requires a significant amount of human resources, we decided to focus our experiment on Latin America and the Caribbean. This means that unfortunately, people and organizations from other areas of the world will not be able to request support from the WikidataCon organizing team for local events and e-scholarships.

As the conference is approaching, we will update the different sub-pages of Wikidata:WikidataCon 2021 and post updates more regularly on this talk page. Here’s an overview of the next steps and what you can expect to come in the next few weeks:

  • More information about the program of Day 1, the organizations supporting the conference, and the tools that we will use to run the online conference: beginning of September
  • Call for proposal for the Latin America and the Caribbean grant: August 25 to September 12 (see more details here)
  • Registration opens: September 15th
  • More information about the unconference tracks and the community awards: end of September
  • Open unconference schedule opens: first week of October

Thanks for reading this update! If you have any questions or suggestions, feel free to reach out to us by leaving a comment on this talk page or contacting info wikidatacon.org.

For the WikidataCon organization team, Lea Lacroix (WMDE) (talk) 15:16, 25 August 2021 (UTC)

Pasha Yiğit Bey (Q7141944)

Why do I receive an exclamation mark when I add the noble title "Pasha" to Yiğit Bey (Q7141944) if he has died in 1413 (XVth Century) and the title of Pasha was also introduced in the XVth Century? There should not be a time conflict, or not? --E4024 (talk) 15:54, 25 August 2021 (UTC)

It's not quite clear where the fail is; one of two places, probably. 1) the constraint calculation ignoring time precision - Pasha has a date to precision 7, which is Century, and a value of 1500-01-01. Simple 1413 / 1500 comparison fail. 2) the constraint calculation having a different idea about what the stored data means than the user interface - the WD UI takes 1500-01-01 precision 7 as C15. The constraint calculation may not have got this memo. --Tagishsimon (talk) 16:46, 25 August 2021 (UTC)
Probably the former, see T168379. Lucas Werkmeister (WMDE) (talk) 17:35, 25 August 2021 (UTC)

Wikimedia Language Diversity

Wikimedia Language Diversity (Q96745657) has a label and description in English only... Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:46, 25 August 2021 (UTC)

I have added labels in 17 languages. Some communities (de:, nl: to name just two of them) use the name in English. B25es (talk) 18:49, 25 August 2021 (UTC)

To scrape or not to scrape

As many of you may have seen already, there has been an interesting public discussion about Wikipedia & Wikidata and scraping versus querying.

It started with a blog post:

And a response by @Denny:

There were also some threads of the discussion on Facebook and Twitter, not linked here. Denny's post has also been mentioned in the latest Wikidata summary by the tireless @Mohammed Sadat (WMDE). Thanks for your work!

As one of the commenters already noted, one of the things that really surprised me, was that the Wikidata Query Service tutorial by Wikimedia Israel was not linked anywhere on Wikidata! It's also really interesting when one of the projects gets candidly discussed "out in the wild", and maybe it could spark further discussion here. All of it is well worth reading, IMO. --Azertus (talk) 18:25, 27 August 2021 (UTC)

  • Just some random thoughts: Some of the larger wikis finally use Wikidata for their infoboxes, so scraping there should be less interesting. Some others don't. Some topics spread across multiple languages are hard to get directly from Wikipedia .. There is still a lot to do with content that is in list form in wikis .. --- Jura 20:10, 27 August 2021 (UTC)

Shouldn't there be a QID specific for sprinkles made of chocolate?

hagelslag (Q152547) is a popular candy topping in many cultures, and many of them employ chocolate in those sprinkles. However, researching a bit, I came across a few links about rainbow sprinkles, and from the looks of it, they aren't made of chocolate at all. Should I create a QID for sprinkles that are made of chocolate? Tetizeraz (talk) 18:52, 27 August 2021 (UTC)

I don't think that is necessary. For example a statement like: book (Q571) made from material (P186) paper (Q11472) isn't a problem even if there exists a book made of all gold, or leather, or something else. That statement is fine because some books are made of paper.
Also hagelslag (Q152547) isn't a brand name product with a specific ingredient list. Maybe you want to create a new item for Betty Crocker Rainbow Sprinkles? Justin0x2004 (talk) 01:31, 28 August 2021 (UTC)
There is room for considerable improvement in the sprinkles ontology. I've removed the claim: made from material (P186): chocolate (Q195), from hagelslag (Q152547) since, (at least per the EN article) this should be the generic sprinkles item. And I've made nonpareils (Q316561) a subclass of hagelslag (Q152547), again based on (my reading of) the EN wiki article. That article makes mentions of, for instance, two specific hagelslags: chocoladehagelslag (chocolate sprinkles) and cacaofantasie or cacaofantasie hagelslag (cacao fantasy sprinkles), and so it is clear that these are identifiable and distinct subclasses of sprinkles for which we have no item, but which seem likely to meet WD:N. So long as there are serious and publicly available references backing up the item, or a structural need for the item, then there can be no objections to creating items. --Tagishsimon (talk) 02:08, 28 August 2021 (UTC)

Newbies

Hey I'm dineo can someone please show me how to create an account here  – The preceding unsigned comment was added by 41.115.112.69 (talk • contribs) at 12:02, 28 August 2021 (UTC).

There's a "create account" link at the top left part of the screen that you can use to sign up. If you have an account on any other Wikimedia sites, such as Wikipedia or Wikimedia Commons, you can use that here. Vahurzpu (talk) 15:38, 28 August 2021 (UTC)

Splitsing Q1743902

I think Q1743902 should be split because a smock-frock isn't the same as a kiel. The smock-frock is something specific from England and Wales while the kiel is from Belgium. The English Wikipeda page even mentions it as a similar dress (under the Walloon name). So should it be split? Jhowie Nitnek 14:43, 28 August 2021 (UTC)

On the face of it, yes. However you will need to check through the various language articles to check what they're referring to, & whether they're attached to the correct WD item. --Tagishsimon (talk) 15:41, 28 August 2021 (UTC)
Then I will split them and check all connected WD items Jhowie Nitnek 20:34, 28 August 2021 (UTC)

External Wikipedia-related discussion: Create template/link for things that have Wikidata items, but not articles

https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(idea_lab)#Create_template/link_for_things_that_have_Wikidata_items,_but_not_articles Lectrician1 (talk) 00:52, 29 August 2021 (UTC)

Remove datasets from COVID-19 items?

COVID-19 pandemic in Hungary (Q87119811) now seems to be 4MB(!) of structured data, which is causing significant problems for reusing the data, e.g., in the Wikidata Infobox on Commons. 99% of that seems to be values for number of medical tests (P8011), number of recoveries (P8010), number of cases (P1603) and number of deaths (P1120). Do we really need all of these values in Wikidata - or could they be recorded in a data table on Commons instead, and we could just show the latest and/or milestones here? Is anyone making use of these datasets? Pinging @Bencemac: since they've recently been editing that item (if you know of other people working on this, please also ping them!) Thanks. Mike Peel (talk) 18:05, 20 August 2021 (UTC)

I was having a look at bloated items earlier this week, leading to a report: Predicates implicated in non-scholarly-article item bloat and some shroud-waving on twitter in which I predict biochemists will cause the untimely demise of WD. I get no feeling of comfort that WD is handling / addressing the large-tabular-datasets-in-items issue, nor am I comfortable that the 'many-to-many relationships, where there are *huge* numbers at both ends' issue is even on the radar. --Tagishsimon (talk) 18:25, 20 August 2021 (UTC)
@Mike Peel: Thanks for raising this topic. I agree, this data is better placed in Commons. Additionally, I want to add that, the data about deaths is scientifically almost worthless because the countries and regions of the world have different ways and criteria of counting what is a covid-related death and not and autopsy conclusions (high degree of certainty of the cause of death) is mixed with guesses based on days after infection.--So9q (talk) 20:19, 20 August 2021 (UTC)
TiagoLubiana 01:35, 16 March 2020 Daniel Mietchen 01:42, 16 March 2020 (UTC)
Jodi.a.schneider 02:45, 16 March 2020 (UTC)
Chchowmein 02:45, 16 March 2020 (UTC)
Dhx1 03:38, 16 March 2020 (UTC)
Konrad Foerstner 06:02, 16 March 2020 (UTC)
Netha Hussain 06:19, 16 March 2020 (UTC)
Bodhisattwa 06:56, 16 March 2020 (UTC)
Neo-Jay 07:04, 16 March 2020 (UTC)
John Samuel 07:31, 16 March 2020 (UTC)
KlaudiuMihaila 07:53, 16 March 2020 (UTC)
Salgo60 09:11, 16 March 2020 (UTC)
Andrawaag 10:12, 16 March 2020 (UTC)
Whidou 10:16, 16 March 2020 (UTC)
Blue Rasberry 15:07, 16 March 2020 (UTC)
TJMSmith 16:15, 16 March 2020 (UTC)
Egon Willighagen 16:49, 16 March 2020 (UTC)
Nehaoua 20:32, 16 March 2020 (UTC)
Andy Mabbett (UTC)
Peter Murray-Rust 00:00, 17 March 2020 (UTC)
Kasyap 02:45, 17 March 2020 (UTC)
Denny 16:21, 17 March 2020 (UTC)
Kwj2772 16:56, 17 March 2020 (UTC)
Joalpe 22:47, 17 March 2020 (UTC)
Finn Årup Nielsen fnielsen) 10:59, 18 March 2020 (UTC)
Skim 11:45, 18 March 2020 (UTC)
SCIdude 15:15, 18 March 2020 (UTC)
Evolution and evolvability 01:23, 20 March 2020 (UTC)
Susanna Ånäs (Susannaanas) 07:05, 20 March 2020 (UTC)
Mlemusrojas 15:30, 20 March 2020 (UTC)
Yupik 20:23, 20 March 2020 (UTC)
Csisc 23:05, 20 March 2020 (UTC)
OAnick 10:26, 21 March 2020 (UTC)
Gnoeee 12:28, 21 March 2020 (UTC)
Jjkoehorst 14:27, 21 March 2020 (UTC)
So9q 08:58, 22 March 2020 (UTC)
Nandana 14:58, 23 March 2020 (UTC)
Addshore 15:56, 23 March 2020 (UTC)
Librarian lena 18:19, 24 March 2020 (UTC)
Jelabra 19:19, 24 March 2020 (UTC)
AlexanderPico 23:34, 27 March 2020 (UTC)
Higa4 02:51, 29 March 2020 (UTC)
JoranL 19:56, 29 March 2020 (UTC)
Alejgh 11:04, 1 April 2020 (UTC)
Will (Wiki Ed)) 17:36, 1 April 2020 (UTC)
Ranjithsiji 04:47, 2 April 2020 (UTC)
AntoineLogean 07:35, 2 April 2020 (UTC)
Hannolans 17:22, 2 April 2020 (UTC)
Farmbrough 21:15, 3 April 2020 (UTC)
Ecritures 21:26, 3 April 2020 (UTC)
  Notified participants of WikiProject COVID-19 WDYT?--So9q (talk) 20:24, 20 August 2021 (UTC)
Please set ranks as appropriate. --- Jura 20:43, 20 August 2021 (UTC)
@Jura1: The ranks are irrelevant, the problem is the size of the item that needs to be loaded. Thanks. Mike Peel (talk) 20:52, 20 August 2021 (UTC)
Can you point to the infobox that has problems? --- Jura 20:54, 20 August 2021 (UTC)
I already did, it's the Commons category linked to the wikidata item. See [8]. Thanks. Mike Peel (talk) 20:57, 20 August 2021 (UTC)
If Commons prefers to use Commons data instead of Wikidata, I think the solution is simple: do that. You don't need to do random deletions on Wikidata. --- Jura 21:00, 20 August 2021 (UTC)
@Jura1: Huh? The problem is loading Wikidata information, the solution might be to store it on Commons instead of here, but the infobox would still be loading the info from Wikidata? Also, what happened to your sig - it seems to include random css that makes pinging you difficult. Thanks. Mike Peel (talk) 21:08, 20 August 2021 (UTC)
If the infobox shouldn't load data from Wikidata, you could set that directly on Commons. --- Jura 21:12, 20 August 2021 (UTC)
@Jura1: If Wikidata can't serve something as simple as an infobox, there's a problem on Wikidata. Thanks. Mike Peel (talk) 21:32, 20 August 2021 (UTC)
Thank you for the ping. I'd be in favor of moving datasets to Commons, but I note that there are users on Wikidata that frown upon data deletion, though. On the CovidDatahubBot discussion (Wikidata:Requests for permissions/Bot/CovidDatahubBot), there was a clear veto on the deletion of historic data. For me, we could strive for a community consensus on time series. My proposal would be to have as guideline to keep only the most recent number on Wikidata and store time series as Tabular Data on Commons whenever needed, but I am not familiar with how general guidelines come to be. TiagoLubiana (talk) 20:46, 20 August 2021 (UTC)
  • I acknowledge that 4MB of data is too much for a single item, especially considering that the growth in size will continue with no end point. If anyone says this is creating query problems then we should consider other options. Wikidata is supposed to be general reference data, not comprehensive data, and if the amount of data is causing technological problems then we should take this seriously. Who is harmed by pruning this data? Who is harmed by not pruning it? How serious is this problem at this time? Blue Rasberry (talk) 21:15, 20 August 2021 (UTC)
    • Eventually we need to find a better solution (for cases other than wikis interested in using their own data), but rather than deleting data, I think we should consider making separate items by year. --- Jura 05:45, 21 August 2021 (UTC)
      • Ideally such data should be stored in Commons, but we can also consider creating new items for yearly or monthly data and remove any historical data from the main item.--GZWDer (talk) 05:47, 21 August 2021 (UTC)
@Mike Peel, I'd love to see a solution for this issue because I'm a bit tired updating the item, even loading it. I'm open for everything, but if it's possible, please don't throw my edits away (if it's necessary, of course I'm okay with it). Regards, Bencemac (talk) 06:45, 21 August 2021 (UTC)
Numerical datasets like this do not belong to Wikidata and are better placed on Commons. Vojtěch Dostál (talk) 17:20, 21 August 2021 (UTC)
Note that these data are used by multiple templates (20+) on multiple wikis and are usually considered as a successful example of WD use (fetching data from WD instead of daily updates of articles). It is a pity that WD are not able to handle this technically (scalability issues).--Jklamo (talk) 13:24, 24 August 2021 (UTC)
@Mike Peel, As part of COVID-19 task force, I have involved in adding information related to COVID-19 related items from last year. We added information in WD that was later used to display information in the article infoboxes in various languages. But now its not easy to load those items so I started adding the vaccination information to Commons instead of here. But I don't feel we should consider making a separate items for all by year. I agree with that we can add the milestone data here and the tabular data in Commons which can also be reused here as well as other Wiki projects.-❙❚❚❙❙ GnOeee ❚❙❚❙❙ 15:41, 29 August 2021 (UTC)

Data Quality Days: let's talk about data quality on Wikidata

Hello all,

The Data Quality Days, a series of community-powered events on the topic of data quality, will take place online from September 8th to 15th. Together, we hope to start some interesting discussions about data quality and highlight this topic through various angles, to explore what data quality means in different areas of Wikidata, to bring together people who are working on data quality on Wikidata and who want to contribute, and to highlight and create tools that can be useful when working on data quality.

The sessions are taking place online, at any time during the 8 days of the event, and the program is built by and for the Wikidata community. If you have any idea of presentation, workshop or discussion you would like to facilitate during the Data Quality Days, feel free to add it directly into the schedule. If you have any questions or need some help with preparing a session, feel free to ask on the talk page or to reach out to me directly.

Looking forward to talk about data quality with you all! Cheers, Lea Lacroix (WMDE) (talk) 07:45, 25 August 2021 (UTC)

EPA water regulation modeling

In the US, the EPA sets upper limits on the amount of contaminants in tap water. For example, the maximum contaminant level of barium (Q1112) in tap water is 2 mg/L.

Any tips on how to model that? My work in progress (with references) is here: Maximum Contaminant Level of Barium (Q108319193).

And an example of tap water with contaminants is Ashtabula County Tap Water (Q108226511). Justin0x2004 (talk) 15:46, 29 August 2021 (UTC)

@Justin0x2004: Not sure this is the right way to go for the long-term. Wikidata isn't built for complex numerical data. There will be different numbers for hundreds of countries of the world, and the numbers will evolve (so you'll have multiple statements for each country with start time (P580) and end time (P582) qualifiers). Maybe a data table for each element (or for each country) will better serve this purpose, see mw:Help:Tabular Data Vojtěch Dostál (talk) 16:09, 29 August 2021 (UTC)
@Vojtěch Dostál: But if we use a table for water quality data it won't be SPARQL query-able, right? I think my main interest in contributing to Wikidata is because of the availability of the SPARQL endpoint. Justin0x2004 (talk) 19:11, 29 August 2021 (UTC)
@Justin0x2004: That's true at the moment, although there are tentative plans to make it available to the Wikidata Query Service (phab:T181319 cc @Yurik, Smalyshev: from the ticket discussion). In the meantime, I think the tables are accessible with Lua scripts in templates. Vojtěch Dostál (talk) 19:30, 29 August 2021 (UTC)

Q67701136

At 1940 Floyd Bennett Field midair crash (Q67701136) listing the number of deaths, and then listing the QID for the dead people as participants is mutually exclusive. Anyone know why? It seems that both should be used. I would think we want to know both, and we may not have QIDs for all the dead people. --RAN (talk) 18:55, 29 August 2021 (UTC)

@Richard Arthur Norton (1958- ): In order to understand the error, you have to take a closer look at the property number of deaths (P1120). As of yesterday, Trade added a new restriction there because new properties were created. Instead of the participant (P710) property, the properties perpetrator (P8031) and victim (P8032) should now be used. In addition, remote "participating" persons can be linked with the property significant person (P3342). --Gymnicus (talk) 19:13, 29 August 2021 (UTC)
@Gymnicus: I tried solving the puzzle by looking at other disasters to see how they were modeled, and all I saw were the same error messages. If you look at the changes now made at 1940 Floyd Bennett Field midair crash (Q67701136), while resolving the participant error message, it has now cascaded into five new error messages. Can you take a peek? --RAN (talk) 20:20, 29 August 2021 (UTC)
Is it better now? @Richard Arthur Norton (1958- ):--Trade (talk) 20:54, 29 August 2021 (UTC)

Aandblom (Q149336)

Guys, can somebody help me please? Aandblom must be moved to Hesperantha. Aandblom is actually a disambiguation page which I will fix later on the Afrikaans Wikipedia. The current links of Aandblom must remain. Regards. Oesjaar (talk) 11:37, 29 August 2021 (UTC)

@Oesjaar: I don't understand the request. There is no concept of moving items in Wikidata. Everything uses a unique ID. Do you want the statements moved? The sitelinks moved? The item renamed? Etc. You should be able to do these things yourself. BrokenSegue (talk) 19:52, 29 August 2021 (UTC)
Thanks, I was not sure about moving articles. I will battle this one out. Regards. Oesjaar (talk) 05:14, 30 August 2021 (UTC)

Using wikidata as backing database for semi-independent project

Hi there,

I'm an academic scientist, who in short is interested in developing a database that collates, stores, and makes queryable the specifications of scientific instruments. As a short example, instrument foo, might have a mass of 10 kg (https://www.wikidata.org/wiki/Q11423), and a capability of spectrophotometry (https://www.wikidata.org/wiki/Q332084). This is a simple case, and I'd imagine I'd have to really propose a lot of new Wikidata properties to make the database ontology work for all the things I'd like to capture in the database.

I'd like this database to be collaboratively-edited & administered without strong institutional ties (ala Wikipedia), machine-readable and queryable via the semantic web (ala Wikidata). A semantic graph database database seems well suited to this, and Wikidata is to my knowledge probably the most user friendly version of this concept. At least, it is the software I am most familiar with for this type of database.

My question: Is it allowable to use Wikidata as the backing database for such a project? In other words, I'd like make an independent GUI / website / branding for the database (I need citations to professionally advance, after all), and the GUI interface for doing useful "views" of the data (thin wrapper around SPARQL queries, e.g. lets say you wanted to get a list of all the instruments that can do spectrophotometry), but the user authentication / editing / data storage / ontology maintenance could be done on Wikidata. From a copyright perspective / commercial perspective, this would be always be a freely (as in beer) available academic project, with open-culture copyright licensing - excepting some things like photos / videos that are being used under fair use & would presumably have to live on a server I'd host, or very large files like CAD files that aren't well suited to Wikidata or Wikimedia Commons. I'm clearly just at the early planning stages, so any feedback would be appreciated.

--Photocyte (talk) 17:03, 29 August 2021 (UTC)

@Photocyte: The main limitation here is that Wikidata is not intended for original research. I always point to FactGrid (Q90405608) (https://database.factgrid.de/wiki/Main_Page) as a wikibase for scientists, although that's probably mostly for historians.Vojtěch Dostál (talk) 19:33, 29 August 2021 (UTC)
I think there are two questions here. Can you use wikidata as a backing datastore for your website? Can you store the data you are working with on wikidata? If you used a dump of wikidata instead of hitting wikidata live you would definitely avoid the first question but you may not want to do that. I don't know about the level of load your website would impose on Wikidata but presumably at some low enough load it'd be fine. Maybe caching queries on your end would limit the load on Wikidata sufficiently. The answer to the second question depends on the nature of the data you are uploading. That would depend on the exact nature of the data. What you describe is not obviously a bad fit but more information would be needed to be sure. BrokenSegue (talk) 19:50, 29 August 2021 (UTC)
Wikidata is what's hosted on this website. The underlying software is called Wikibase. Scholia would be an independent GUI that uses Wikidata as it's datasource.
It's both possible to have your project interact with Wikidata the way Scholia does and it's also possible to have your own Wikibase installation.
Listing scientific instruments is completely in line with Wikidatas mission. I don't see any original reserach concerns here, given that there's likely citable documentation for the scientific instruments.
When it comes to integrating pictures/video's from other websites besides Commons into Wikidata, there's a limit to what our community agreed to be possible and we decided against having properties that freely integrate pictures from elsewhere into Wikidata. Generally, it's also questionable whether hosting pictures/videos for the purpose of them being integrated on another website is permissible under fair use. ChristianKl09:44, 30 August 2021 (UTC)

Church building azimuth

Hello, I'd like your opinion on modelling azimuth (angular direction) of church buildings. They often point East but there are nuances. I have a source to my disposal which measured azimuth of various church buidlings in degrees (0-360). This is specified as the angle to the ace between the main entrance and the presbytery of the church building. More generally, it is probably defined by the ace of the main entrance and the opposite side of building. What would be a good way to model this in Wikidata? Do you think heading (P7787) could be redefined to fit this purpose? Or should there be a new property for "building orientation azimuth"? Vojtěch Dostál (talk) 17:14, 21 August 2021 (UTC)

Yes, IMO, heading (P7787) should be a general purpose directional property. For your use, I'd be inclined to qualify the statement with object has role (P3831) taking a value for an azimuth item ... whether one of the existing ones, or a new item describing the situation for these church measurements. --Tagishsimon (talk) 17:27, 21 August 2021 (UTC)
@Tagishsimon Or maybe object has role (P3831) : orientation of churches (Q351064) ? Vojtěch Dostál (talk) 18:08, 21 August 2021 (UTC)
Excellent! --Tagishsimon (talk) 18:13, 21 August 2021 (UTC)
A thing on heading (P7787) to support this is Property talk:P7787#P1629 - the Wikidata item of this property (P1629) of the property is heading (Q4384217) which points to the navigational heading concept. A second thing on heading (P7787) to support this is in the proposal discussion, where, as I parse it, it got at least one support vote for the idea that it could be applied more widely than the photographic exemplar. Turning the argument on its head, I cannot see a good reason why a property entitled heading should be reserved to a use describing camera orientation, nor any harm done to the current use set if its scope is taken to be general. Why would we want, for instance, to have a second 'heading' property for say, buildings, and a third for cars & places & ships? Makes no sense to me. --Tagishsimon (talk) 12:12, 22 August 2021 (UTC)

OK, if there are no further objections, I will widen the scope of the property (also supported by Multichill at Property talk:P7787) Vojtěch Dostál (talk) 10:23, 24 August 2021 (UTC)

How would we avoid the problems mentioned in Wikidata:Property proposal/orientation? What's the downside of sticking with that property for buildings? --- Jura 11:56, 25 August 2021 (UTC)
If you mean problems with definition of the orientation of buildings, I used object has role (P3831) : orientation of churches (Q351064) which is clearly defined in both Czech and English Wikipedia artices.Vojtěch Dostál (talk) 06:44, 26 August 2021 (UTC)
@Jura1: You asserted discussion, so please discuss. Vojtěch Dostál (talk) 16:03, 29 August 2021 (UTC)
I think the find from the discussion with @Swpb, Thierry_Caro, Tinker_Bell, Arbnos: Wikidata:Property proposal/orientation was that a label "orientation" by itself is potentially ambiguous, something that at least orientation of churches (Q351064) in its English version doesn't exclude with certainty. A way around it could be to create a quantity datatype property for orientation or use "heading" as qualifier to another property, maybe to orientation (P7469) with "somevalue" as value. To first suggestion might be easier to work with. --- Jura 11:29, 30 August 2021 (UTC)
Counter-proposal: heading (P7787) should be a general purpose directional property, not a property reserved for the direction of cameras. Making it general purpose does not impinge on its use for camera direction. Making additional other 'heading' properties is needless & unhelpful speciation. --Tagishsimon (talk) 11:39, 30 August 2021 (UTC)
I think you are repeating yourself, but didn't address the point raised in the "orientation" discussion beyond stating that it's "needless & unhelpful". --- Jura 11:50, 30 August 2021 (UTC)
@Jura1: Can you explain your claim that "a label "orientation" by itself is potentially ambiguous, something that at least orientation of churches (Q351064) in its English version doesn't exclude with certainty" ? I've read through the entire article and cannot find anything that would relativize the definition in the introduction of the article. Maybe you are reading between the lines. To me, church orientation is well-defined and its use in object has role (P3831) makes it easy for everyone to understand what the definition is. Vojtěch Dostál (talk) 12:29, 30 August 2021 (UTC)
I referred to the property proposal discussion. --- Jura 12:31, 30 August 2021 (UTC)
@Jura1: The property proposal discussion at Wikidata:Property proposal/orientation only stresses the need for a definition of what is meant by "building orientation". I've already specified that definition in statements by linking to orientation of churches (Q351064) which is terminus technicus.Vojtěch Dostál (talk) 13:13, 30 August 2021 (UTC)

Referencing data on a webpage that mentions licensing its data

The item I referenced the data on is Ashtabula County Tap Water (Q108226511) and the data comes from this webpage https://www.ewg.org/tapwater/system.php?pws=OH0400711 which says "EWG’s Tap Water Database is provided solely for your personal, non-commercial use. You may not copy, reproduce, republish or distribute information from EWG’s Tap Water Database without EWG’s prior written permission. For information about licensing EWG data and analyses, contact TWDrequests[at]ewg.org."

Each statement where I reference that data has a reference URL (P854) to the source webpage and an account is not required to view the webpage. Can EWG prevent Wikidata from referencing their webpage content? More specifically, can I continue to reference EWG webpage data? Justin0x2004 (talk) 01:32, 30 August 2021 (UTC)

Chech out m:Wikilegal/Database_Rights. The copyright does not protect information or ideas per see, but there might be database rights protection.--Jklamo (talk) 10:17, 30 August 2021 (UTC)

Wikidata weekly summary #483

Any reason not to link non-free images appropriately saved at two wikipedias?

French and English Wikipedias each have their own copy of this image which is the logo of La Manif pour tous: (@ en-wiki, @ fr-wiki). This is a non-free image which is available via fair use at both wikis, but the license doesn't permit it to be stored at Commons. Is there any reason not to link these two images via a data item, or perhaps better phrased, should they be linked? (ping me; thanks!) Mathglot (talk) 03:00, 31 August 2021 (UTC)

Commons is only for free files I guess. There's a brand new wiki started by WikiProject Med - https://nccommons.org/wiki/Main_Page - maybe it will be an answer to your needs.Vojtěch Dostál (talk) 05:55, 31 August 2021 (UTC)
Or I wonder if it could be moved to Commons; I assumed not, but I should check first. Mathglot (talk) 14:08, 31 August 2021 (UTC)

dysfunctional property

Chamber of Deputies of Romania person ID (P9524)

Created four months ago, still not working. - Coagulans (talk) 05:27, 31 August 2021 (UTC)

Heh, I wanted to tag the proposer of the property but it's you :). What is wrong with the property - what is not working? I see that the links are dead, which is usually a problem of the website, not of Wikidata.Vojtěch Dostál (talk) 05:37, 31 August 2021 (UTC)
And how to fix it? The characters "&" and "=" are not properly read. - Coagulans (talk) 05:51, 31 August 2021 (UTC)
They are working in Property talk:P9524 - Coagulans (talk) 05:58, 31 August 2021 (UTC)
Yes, they are not working, because all external identifiers must be url-encoded before substitution. I fixed it by replacing formatter with https://wikidata-externalid-url.toolforge.org/ , but the fix will work after ~24h, because it takes some time to refresh cache on Wikibase side. --Lockal (talk) 09:52, 31 August 2021 (UTC)

Sondre Ørland and Stavanix

@Oiszevrvr: Q82444339 please figure out how Sondre Ørland and Stavanix are related. I thought Sondre Ørland took over this item but it may be the other way around. The items for these two (?) persons were merged and they may be the same person. --Pyfisch (talk) 11:17, 28 August 2021 (UTC)

@Pyfisch: They are the same person, Stavanix was the "musical stage name" of Sondre Ørland so they should be merged. Please revert all changes you made to this item. --Oiszevrvr (talk) 12:59, 31 August 2021 (UTC)
@Oiszevrvr: Thanks for making clear that they are indeed the same person. I don't understand however why you removed all information related to the musician Sondre Ørland? Please have a look at Wikidata:BLP, if you are Sondre Ørland yourself you can request the removal of information. An item only about Sondre Ørland the web developer and IT consultant may not be relevant to Wikidata (see also Wikidata:N). --Pyfisch (talk) 15:07, 31 August 2021 (UTC)