Wikidata:Report a technical problem/WDQS and Search/Archive/2022

This is a subindex of archived discussions from Wikidata:Report a technical problem/WDQS and Search. To place a page in Category:WDQS and Search archive, place {{Archive|category=WDQS and Search archive}} at the top of the page.

June


edit

To work on items without any claims and possible other queries, it would be helpful if one could directly query to exclude that a sitelink is a redirect.

This would be similar to the automatic deletion of any sitelink (when it's deleted on a client wiki) or the update (when it's being moved on a client wiki).

An update on a client wiki could add or remove this with an automatic badge.

Scope of the question: there are currently 495.000 redirects in sitelinks and a feature allowing them hasn't even been implemented.

This is independent of any other development about redirects (should they be possible, how to implement them, what (manual) badges to use to qualify them, etc).

@MisterSynergy, Epìdosis, DCausse (WMF): --- Jura 18:46, 11 December 2021 (UTC)[reply]

It would IMO be sufficient to make the manual redirect badges sitelink to redirect (Q70893996) and intentional sitelink to redirect (Q70894304) actually usable without any hacks. I would be able to create a bot that syncs these with the existing ~0.5M redirects in no time, and that keeps it synced on a daily basis or so. Unfortunately, we still cannot save redirect sitelinks without a temporary deactivation of the redirect on the client wiki (which some client wiki communities consider as vandalism if done systematically). —MisterSynergy (talk) 19:00, 11 December 2021 (UTC)[reply]
I surely agree with MisterSynergy, having the possibility to add redirect-sitelinks with no further problem would be very very useful to work on these items. --Epìdosis 19:07, 11 December 2021 (UTC)[reply]
I don't see an advantage of having to check several badges and having to wait for a bot to resync them.
Wikidata's key assumption that there is a page on a client wiki at the sitelink is currently broken. --- Jura 19:09, 11 December 2021 (UTC)[reply]
@DCausse (WMF), Lydia Pintscher (WMDE): which way do you prefer? Neither seems to require big developments and both approaches would lead to an improvement in data quality. --- Jura 13:12, 4 January 2022 (UTC)[reply]
From what I understand what you suggest does not change the mw:Wikibase/Indexing/RDF_Dump_Format so this is already a good thing. For the rest, about how changes are propagated between client wikis and wikidata I'm not knowledgeable enough to judge if it's easily feasible/fixable or not. I think that the scope of the problem you raise here is broader than just WDQS here and probably deserves being discussed on a page not dedicated solely to WDQS. DCausse (WMF) (talk) 09:28, 5 January 2022 (UTC)[reply]
I asked Lydia for input as well.
The relevant part of mw:Wikibase/Indexing/RDF_Dump_Format#Sitelinks is that wikibase:badge will be present for these. The idea is to find a pragmatic solution that doesn't require a lot of development. --- Jura 10:12, 5 January 2022 (UTC)[reply]

delete triples from WDQS: ?a rdf:type schema:Article

edit
SELECT (COUNT(*) as ?count) 
{
  [] rdf:type schema:Article
}
Try it!

The above counts some 80,000,000 triples. These are part of the sitelink mapping on items. Sample:


SELECT * { <https://en.wikipedia.org/wiki/Wikidata> ?a ?b }
Try it!

I don't think they serve any real purpose. I suggest to not export them to WDQS going forward/delete them. Obviously, after announcing the change. --- Jura 20:06, 22 December 2021 (UTC)[reply]

I also see little value to these particular triples but changing the RDF dump format is something that takes time & effort (communication/tests/migration). On the other hand this seems a very valid candidate for deletion if we ever enact the Wikidata:SPARQL_query_service/Blazegraph_failure_playbook, please feel free to forward your suggestion to its talk page, thanks! DCausse (WMF) (talk) 10:08, 3 January 2022 (UTC)[reply]
I think it's a cleanup that can be done even without Blazegraph about to fail. --- Jura 00:37, 4 January 2022 (UTC)[reply]

index "street address" (P6375) strings

edit

It would be interesting to be able to search for street address (P6375)-values, e.g. Special:Search/Getreidegasse Salzburg should find Q37970995. --- Jura 19:11, 28 December 2021 (UTC)[reply]

Thanks for the feedback, I agree that the current status quo about textual data in the search index (in general) is not ideal. We created phab:T240334 a while ago but given the lack of hardware resources we did not investigate further. But thanks to phab:T265621 (new dedicated search cluster for wikidata & commons) we might reconsider this and index more textual content. DCausse (WMF) (talk) 09:53, 3 January 2022 (UTC)[reply]
What would be the relative impact? We do seem to have a large number of statements with P6375, but for buildings (sample [[Q37970995] above), much would otherwise need to be added to the description and/or as alias. --- Jura 00:35, 4 January 2022 (UTC)[reply]
Impact is hard to evaluate before-hand. The two main things we will look at is space requirement (this is where having a dedicated search cluster will help) and then how it affects precision as by improving recall we will certainly degrade precision a bit and we should make sure it's not degraded too much. DCausse (WMF) (talk) 09:07, 5 January 2022 (UTC)[reply]
I suppose, in the meantime, we would have to copy the building addresses to aliases. --- Jura 10:14, 5 January 2022 (UTC)[reply]
@DCausse (WMF) where can I find the list of string properties that are currently indexed? --- Jura 11:59, 5 January 2022 (UTC)[reply]
The properties indexed (and searchable via haswbstatement) are the ones with the following data types:
  • string
  • external-id
  • url
  • wikibase-item
  • wikibase-property
  • wikibase-lexeme
  • wikibase-form
  • wikibase-sense
Minus these properties (that are explicitly excluded):
The intent here was to index only properties that do not contain natural language (only codes, IDs and the likes). It is very likely that switching from P969 to P6375 made this particular property unsearchable.
The purpose of phab:T240334 is specifically to make these other datatypes searchable. DCausse (WMF) (talk) 12:36, 6 January 2022 (UTC)[reply]
It may be that addresses were indexed when we used Property:P969 (string datatype). This was however lost when the content was moved to Property:P6375 (monolingual text datatype). --- Jura 12:48, 5 January 2022 (UTC)[reply]

SPARQL queries with MWAPI EntitySearch do not use the continue mechanism

edit

The SPARQL API does multiple queries to the MWAPI using the continue mechanism by default. However, that is not the case with the MWAPI EntitySearch.

For instance, this query which uses the MWAPI EntitySearch always returns at most 50 results (depending on the mwapi:search parameter):

SELECT ?item ?itemLabel WHERE {
SERVICE wikibase:mwapi {
  bd:serviceParam wikibase:endpoint "www.wikidata.org";
  wikibase:api "EntitySearch";
  mwapi:search "York"; 
  mwapi:language "en".
  ?item wikibase:apiOutputItem mwapi:item.
}  
SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}

A query to MWAPI EntitySearch returns a maximum of 50 results, but action=wbsearchentities supports continue, so when querying it outside of a SPARQL query, one can obtain n times 50 results.

However, there seems to be no automatic continuation for the MWAPI EntitySearch inside of a Wikidata SPARQL query, judging from the behavior.

Is that a bug or is that intended behavior? In case it's a bug: Can it be fixed?

Breslibomy (talk) 12:46, 5 January 2022 (UTC)[reply]

Indeed, this is unfortunately a known limitation of the current implementation, please see phab:T229291. DCausse (WMF) (talk) 13:04, 6 January 2022 (UTC)[reply]

A question about the background map

edit

When I run a query with #defaultView:Map I get a background map with a lot of rendering errors in it. That name labels are partly missing or hidden I can live with. But when lakes like Mälaren (Q184492), Vänern (Q173596) and Vättern (Q188195) are rendered as land and not water it gets problematic. Especially when I work with lighthouses and need to see the shoreline. Remarkably Hjälmaren (Q211425) and smaller lakes show up just fine. All lakes are rendered just fine on openstreetmap.org, so there should not be any modeling error. I also note that Lake Huron (Q1383) and Lake Erie (Q5492) are also rendered correctly, so the problem is not related to size. Is there perhaps a limit on how many nodes a polygon can contain to render correctly? /ℇsquilo 12:31, 11 February 2022 (UTC)[reply]

This query shows the problem for the 3 Swedish lakes not shown water, and Hjälmaren (Q211425) which is OK:
#defaultView:Map
SELECT ?lake ?coordinates
WHERE
{
  VALUES ?lake { wd:Q184492 wd:Q173596 wd:Q188195 wd:Q211425 }
  ?lake wdt:P625 ?coordinates
}
Try it!
Compare with https://www.openstreetmap.org/#map=7/59.590/15.276 --Dipsacus fullonum (talk) 10:13, 13 February 2022 (UTC)[reply]
Tickets have been raised for a number of map issues, such as T288897 Wikimedia map tiles don't show some natural features (e.g. lakes) after zoom 10, T240755 Victoria lake is missing in our maps which points to current action on the issue at T218097 OSM DB degradation during sync as a result of missing features, as well as my own modest and mostly unheeded T289101 Bring WMF map tile feature sets into line with OSM default feature sets. 11 January 2022 seems to be the most recent WMF activity on a couple of those threads indicating there has been (and we hope, still is) active work ongoing. --Tagishsimon (talk) 11:13, 13 February 2022 (UTC)[reply]


Documentation for wikibase:identifiers

edit

A new predicate, wikibase:identifiers, was created in 2017 (according to phab:T144476), but it was never documented in mw:Wikibase/Indexing/RDF Dump Format. I have now added a description with the edit mw:Special:Diff/5073859. Please check if my description is correct. --Dipsacus fullonum (talk) Dipsacus fullonum (talk) 09:09, 17 February 2022 (UTC)[reply]

Wikidata Query Service erroneously formats/fills partial dates into full dates

edit

When querying some places with population having partial dates of point in time, the results display an erroneous full date instead of the exact partial date. For example, Ruinen (Q1007156) has a population of 1,624 with point in time 1830 (census). But when running the query, it displays 1 January 1830 instead of just 1830. How can this be fixed?:

SELECT ?place ?placeLabel ?populationLabel ?populationDate WHERE {
  ?place wdt:P131 wd:Q835108;
    p:P1082 ?place_statement.
  ?place_statement ps:P1082 ?population;
    pq:P585 ?populationDate.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?place) (?populationDate)
Try it!

Sanglahi86 (talk) 16:19, 23 February 2022 (UTC)[reply]

@Sanglahi86: Mainly by interrogating WDQS for the date precision associated with the date - iirc 9 is year, 10 is month and 11 is day, and then utilising that to do something like BIND(YEAR(?date) as ?year)
SELECT ?place ?placeLabel ?populationLabel ?populationDate ?precision ?date WHERE {
  ?place wdt:P131 wd:Q835108;
    p:P1082 ?place_statement.
  ?place_statement ps:P1082 ?population;
    pqv:P585 [
                wikibase:timePrecision ?precision;
                wikibase:timeValue ?populationDate ].
  BIND(IF(?precision=9,YEAR(?populationDate),"") as ?date)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?place) (?populationDate)
Try it!
--Tagishsimon (talk) 16:47, 23 February 2022 (UTC)[reply]

Query Service has occasional issues (20:04, 6 March 2022 (UTC))

edit

I keep getting servers that lag, but Grafana doesn't show it. Can you disconnect the borked server? @DCausse (WMF):

@DCausse (WMF): --- Jura 20:04, 6 March 2022 (UTC)[reply]

Grafana showed it. https://twitter.com/Tagishsimon/status/1500534060095000580 ... wdqs1006 is down to ~17 hours right now, from greater than 24 hours, in the last four hours. Should sort itself out in the next 8 hours. wdqs1007 is the other problem child, at ~10 hours. --Tagishsimon (talk) 20:10, 6 March 2022 (UTC)[reply]
Seems I looked at the wrong chart (or the right chart the wrong way). Anyways, it's (now) visible on the one linked on the top of this page. --- Jura 20:32, 6 March 2022 (UTC)[reply]
Should be healthy again. https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts?orgId=1&from=now-6h&to=now&refresh=1m is also a useful view. Sjoerd de Bruin (talk) 21:29, 6 March 2022 (UTC)[reply]
Too many WDQS machine went down during the week-end causing an outage. Having some machines being killed due to some heavy queries/load is something we expect but sadly sometimes they go unnoticed for too long and cause these lag issues to be exposed to users when the machines get back online. Our current plan to mitigate this is to avoid having machines going down for too long so that they should no longer expose such high lag. DCausse (WMF) (talk) 12:06, 8 March 2022 (UTC)[reply]

wikibase:isSomeValue SPARQL function no longer working?

edit

Is it just me or is the wikibase:isSomeValue SPARQL function no longer working recently? If I run the query below, I get records where the ?coord is the .well-known IRI when I specifically wanted to filter those records out. —seav (talk) 01:07, 8 March 2022 (UTC)[reply]

SELECT ?marker ?coord WHERE {
  ?marker wdt:P31 wd:Q21562164 ;
          p:P625 ?coordStatement .
  ?coordStatement ps:P625 ?coord .
  FILTER NOT EXISTS { ?coordStatement pq:P582 ?endTime }
  FILTER (!wikibase:isSomeValue(?coord)) .
}
Try it!
Thanks for the report, it does seem that the servers configuration was mistakenly changed recently. This should be resolved soon (please see attached phabricator ticket). DCausse (WMF) (talk) 11:48, 8 March 2022 (UTC)[reply]

Label service seems to have problems with 'values' statements.

edit

The testcase results in 36 results instead of the expected 10.

SELECT * WHERE {
  VALUES ?val { 1 1 1 1 2  2 9 9 9 9 }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!

Don't know if it is a bug, but it looks kinda bug-like, ran under the sofa when I tried to whack it with a newspaper. Infrastruktur (talk) 06:11, 8 May 2022 (UTC)[reply]

Another testcase. Expected rows 2, returned 4. Infrastruktur (talk) 05:40, 9 May 2022 (UTC)[reply]

SELECT * WHERE {
  VALUES ?item { wd:Q42 wd:Q42 }
  ?item wdt:P569 ?dob.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Try it!
Indeed having duplicated entries in the VALUES block seems to confuse the label service, please feel free to file a bug report at https://phabricator.wikimedia.org/. In the meantime I suppose that the workaround is to make sure that no duplicated values are present in the list of values. DCausse (WMF) (talk) 07:45, 30 June 2022 (UTC)[reply]
@Infrastruktur, DCausse (WMF): I have filed a ticket in Phabicator as it seemed not done before now. --Dipsacus fullonum (talk) 07:29, 16 September 2022 (UTC)[reply]

Updater issue

edit

User:Bouzinac found what appears to be an issue in the new updater or something closely related. https://www.wikidata.org/wiki/Wikidata:Request_a_query#Curious_results_of_a_MINUS Infrastruktur (talk) 20:39, 2 June 2022 (UTC)[reply]

Updater issue ?

edit

this query tracks differences between articles categorized in the « good article » category on frwiki and article with badge good article badge (Q17437798) on Wikidata. Curiously it finds now two articles which are neither categorized nor have or have had any badges according to their respective history :

The following query

select ?item ?link ?badge {
   values ?item {
     wd:Q111030044 
     wd:Q17061596
   }
  ?link schema:about ?item ;
        wikibase:badge ?badge .
}
Try it!

confirms they currently have a badge.

For reference, the recorded page with the data in case the badges are later mysteriously autoremoved.

author  TomT0m / talk page 17:29, 14 June 2022 (UTC)[reply]

Hum, it seems this could be a redirection handling issue.

At some point, 27 febuary it seems, the page fr:Gayant has been a redirect to fr:Gayant_(folklore) which is a good article on frwiki. It was later ( June 3 ) added by the bot to the tracking page, the last update had been Febuary 12 so the redirect thing occured in between.

Did not verify but Catherine Schwaub also had a « not to confuse with » strings added, so maybe a mixup with another item also occured at some point ? author  TomT0m / talk page 17:40, 14 June 2022 (UTC)[reply]

A quick test adding and removing a good article badge on the wikidata sandbox resulted in the badge being removed from the graph as well. The updater is probably not to blame in this case. Infrastruktur (talk) 22:53, 14 June 2022 (UTC)[reply]
There is as far as I know nothing that could possibly add the badge in the Sparql export but the updater. This case seem way more complex than you quick test nevertheless, as the sitelinks changed in between. There is no trace of somebody adding a bagde on a sitelink on the article history. Maybe it’s never the case ?
What seem to have happened in the « Gayant » case is that the article name was in the past the page fr:Gayant_(folklore), was renamed and so transformed as a redirect and unlinked to the original item, then the redirect was moved to a disambig article.
Problem, it seem the GA badge for the page stayed in the RDF for the redirect page while the page was renamed. It should have been removed.
The original page is now linked to Gayant (Q3099646) and we can see the GA badge for it, added by Dexbot years ago. author  TomT0m / talk page 18:53, 15 June 2022 (UTC)[reply]


Listeria List for archive

edit

I’ll try a workaround to remove one, so before I put a wikidata list to store the results by the bot. {{Wikidata list|sparql= select ?item ?link ?badge { values ?item { wd:Q111030044 wd:Q17061596 } ?link schema:about ?item ; wikibase:badge ?badge . } |columns=?item,?link,?badge }} The workaround worked, fr:Gayant has now disappeared. I added a badge for the page in its item, and later removed it.

Large list of valid SPARQL query and results

edit

Is there a way to see a history of SPARQL Queries? I want to work on an English to SPARQL translation model. It would be very helpful to retrieve valid queries and whether or not it was the last query run in a series. For example, it would be perfect if I could obtain an anonymous list of query help requests and the subsequently built queries. SPARQL has the potential to greatly help in completing AGI, but I believe we need a natural language to SPARQL pipeline in order to make SPARQL more accessible -- especially for intelligent agents. I've been building free and open-source projects in AI for the past few years. https://github.com/MikeyBeez MikeBee2020 (talk) 02:04, 26 June 2022 (UTC)[reply]

You might be interested in having a look at what askplatypus have achieved in this domain. For query help you can have a look at Wikidata:Request_a_query, this is where users request assistance on SPARQL (archives are available). WDQS query logs have been analyzed and a sample have been made public here if you think this could be helpful. DCausse (WMF) (talk) 12:18, 22 September 2022 (UTC)[reply]

United States (Q30) does not have "English" as official language

edit

Using wdt:P37, you only get "American English": https://w.wiki/5t9Q .

Using p:P37/ps:P37, you get "Hawaiian", "Spanish", "American English", "English": https://w.wiki/5t9U .

Telling from https://w.wiki/5t9s , I think the reason is that only "American English" has wikibase:BestRank and wikibase:PreferredRank .

Shouldn't "English" have wikibase:BestRank and/or wikibase:PreferredRank, too? (I am not sure which of the two is relevant for inclusion into wdt:P37).

Note that both "English" and "American English" have the role (P3831) "de facto". Hannah Bast (talk) 11:12, 29 October 2022 (UTC)[reply]

Fewer results from wdqs20* than wdqs10*

edit

Running a simple query for all historic Dutch Senators consistently returns different results from all wdqs10* servers, than from all wdqs20* servers. I've been noticing odd results from this for many months, but it was remarkably difficult to track down, because it turns out that any queries I run locally only ever go to wdqs10*; the only way I have been able to replicate the behaviour is to run the query via Github Actions, which hits both sets of servers.

https://github.com/tmtmtmtm/ghatest2 shows this in action: the query in holders.js (including a run-time-specific comment to bypass caching) produces the outputs in results/ (named after the server handling the request and the date/time)

The results are consistent across the servers in each cluster: today each wdqs10* server returns 1361 rows, whereas each wdqs20* only returns 1357. These differences are actually made up of 8 people not being returned at all from wdqs20* (e.g. Q2053506 and Q2770742), 4 people who get returned twice from wdqs20* but only once from wdqs10* (e.g. Q2257961 and Q3469514), and one person who gets the same P39 data returned from each set of servers, but with a different schema:dateModified on each: (Q1845502, last modified on 2022-05-05 in the wdqs10* results, but on 2022-04-15 in the wdqs20* results).

I have seen similar behaviour from people with a position held (P39) of member of the German Bundestag (Q1939555), though not from any of the several thousand other political positions I track. --Oravrattas (talk) 09:50, 4 November 2022 (UTC)[reply]

@DCausse (WMF), Lydia Pintscher (WMDE): is there any other info I can give that would help with investigating this? --Oravrattas (talk) 10:44, 10 November 2022 (UTC)[reply]
@Oravrattas: Thanks for the report! I filed phab:T322869 to investigate this issues. DCausse (WMF) (talk) 17:33, 10 November 2022 (UTC)[reply]
@DCausse (WMF): Great, thanks. Let me know if there's anything more I can add. --Oravrattas (talk) 20:57, 10 November 2022 (UTC)[reply]

Endrick's Google Knowledge Graph ID doesn't work

edit

Hi.

The ID of Endrick Felipe https://www.wikidata.org/wiki/Q110644260, a brazilian footballer, doesn't work. I do not know if the code is wrong. Could you please fix it?

Thank you! Acascon (talk) 12:24, 17 December 2022 (UTC)[reply]

@Acascon Seems to work fine for me ([1]) - possibly a temporary glitch? Andrew Gray (talk) 16:44, 17 December 2022 (UTC)[reply]
Perhaps... I do not know, really... I tried to find the ID through the URL but there wasn't any code in Wikidata. I put it myself two days ago or so. But I don't know if this code /g/11rnhvy_25 is the right.
I let the URL where you can find the ID: https://www.google.com/search?q=%2Fg%2F11rnhvy_25&rlz=1C1GCEU_esES1027ES1027&oq=%2Fg%2F11rnhvy_25&aqs=chrome.0.69i59j69i58.439j0j7&sourceid=chrome&ie=UTF-8 Acascon (talk) 16:51, 17 December 2022 (UTC)[reply]
@Acascon I think it doesn't work if you search for /g/11rnhvy_25 in Google - that will look for pages that have that code in - but it does work if you put it into the URL formatter https://www.google.com/search?kgmid=/g/11rnhvy_25 . It is correct, it's just not meant to be used in normal search. Andrew Gray (talk) 18:18, 17 December 2022 (UTC)[reply]
Ah, right. Thank you so much for your help! Acascon (talk) 08:39, 18 December 2022 (UTC)[reply]