Wikidata:Contact the development team/Query Service and search/Archive/2020/08

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Search by statements also at Wikipedia (July 20)

Occasionally, I use text search at other wikis and then filter the resulting items based on statements on these items.

If Wikipedia (or other sites) directly indexed statements, one could use e.g. w:Special:Search/John Doe haswbstatement:P31=Q5.

Same syntax at Wikidata works (Special:Search/John Doe haswbstatement:P31=Q5), but obviously doesn't use the full text from enwiki.

If there is more interest in this (at Wikipedia), maybe this could/should be developed. --- Jura 08:09, 20 July 2020 (UTC)


Indeed this might be handy and would allow to join textual data (wikipedias & sister projects) with structured data from wikidata, but while it would be very efficient at query time I fear that it would add too much load to the elasticsearch cluster when updating wikidata items. Editing John Doe (Q3181360) alone would cause 13x more updates (13 sitelinks) with one update per wiki referencing this item. I don't have numbers nor precise estimations at hand but I'm afraid that this might not be doable with our current infrastructure and would require provisioning more hardware to support such feature. DCausse (WMF) (talk) 10:08, 20 July 2020 (UTC)
@DCausse (WMF): True. Part of it probably already happens now (WP, not Elastic). Maybe inclusion in (WP-)Elastic could be much more limited as it is already at Wikidata, e.g. most identifiers are probably already in some template at Wikipedia, so including them isn't useful.
Limiting it to "P31=Q5" and (any) P17 could be a reasonable start. Ideally the GUI at Wikipedia would support these somehow. --- Jura 11:45, 20 July 2020 (UTC)
Sure, I think here the main point to solve first is to get an estimation of the impact on the current infrastructure. I'll bring this up to rest of the search team. DCausse (WMF) (talk) 07:57, 21 July 2020 (UTC)
I discussed this with the rest of the team and the same impression was shared, it is unclear if the search team will have time to work on this in the short term but I created T258923 to not lose track of this discussion. DCausse (WMF) (talk) 09:37, 27 July 2020 (UTC)
The ticket seems overly broad. Even on WD, not that much is indexed. --- Jura 07:01, 12 August 2020 (UTC)

Cirrus search and deprecated statements (August 12)

[1] currently finds Q5078504 which has P31=Q5 with deprecated rank. I don't think this should be found. Any deprecated statements should be skipped. --- Jura 07:01, 12 August 2020 (UTC)

Why shouldn't it be found? The documentation for haswbstatement dosen't exclude deprecated rank. --Dipsacus fullonum (talk) 08:03, 12 August 2020 (UTC)
That documentation just describes how it's currently configured. Help:Ranking tells us that for most other applications they will never be used unless that is specifically requested ("For templates and queries, deprecated statements will never be used unless that is specifically requested.") --- Jura 08:39, 12 August 2020 (UTC)

@DCausse (WMF): fyi --- Jura 05:03, 17 August 2020 (UTC)

I've created a phab task to track this. It looks to me that we need more discussion to understand what the ideal solution is in this case. Feel free to comment on the phab task. --GLederrey (WMF) (talk) 13:07, 19 August 2020 (UTC)
They are actually not valid statements (except maybe in the technical sense). It seems to me that this was mostly an oversight.
@Lydia Pintscher (WMDE): what's the view for the "product"? --- Jura 13:11, 19 August 2020 (UTC)
I think there are valid reasons to search all or only the best ranked statements. So something worth considering is offering a second keyword to cover both cases. But I am not sure about the technical implications of this option yet. --Lydia Pintscher (WMDE) (talk) 11:22, 20 August 2020 (UTC)
I have cases where I'd definitely like to find a deprecated ID via the haswbstatement search (for example in a case where one ID has been redirected to another in the source database). If this is to be changed it should be in some way configurable. But I don't see the harm in being more inclusive for this particular type of search generally. ArthurPSmith (talk) 17:22, 19 August 2020 (UTC)