Wikidata:Contact the development team/Archive/2019/10

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

SPARQL query OFFSET by ?var (instead of number)

Items used: film (Q11424)

Properties used: instance of (P31)

SELECT * 
WITH
{ 
  SELECT (COUNT(*) as ?all) (xsd:integer(ROUND(RAND() * ?all)) as ?test) WHERE { ?item wdt:P31 wd:Q11424 }
} as %all
WHERE
{
    INCLUDE %all 
    ?item wdt:P31 wd:Q11424 
}
OFFSET ?test
#OFFSET 25341
LIMIT 10

Try it!

To get a random set of ten items, I used the above query. OFFSET is determined by RAND() to avoid getting the same ones every day. However, it gives me the following error:

MalformedQueryException: Encountered " <VAR1> "?test "" at line 11, column 8. Was expecting: <INTEGER> ...

Would be good if it was possible. --- Jura 12:16, 2 September 2019 (UTC)

It looks like the requirement comes from the SPARQL standard. Obviously, WMF Blazegraph could still support it.

Is there another way to create somewhat random selections directly on query server? Please don't mention SAMPLE().

It could be done by calculating the offset in a separate step or by retrieving all data and selecting afterwards, but doing it when query should be much more convenient. Any suggestions? --- Jura 10:50, 8 September 2019 (UTC)

@Jura1: You may be interested in this approach:

SELECT * WITH { 
  SELECT DISTINCT ?item WHERE { ?item wdt:P279 wd:Q18602249 }
} AS %all WITH {
  SELECT (COUNT(*) AS ?count) { INCLUDE %all }
} AS %count WITH {
  SELECT (?item AS ?item2) WHERE { INCLUDE %all }
} AS %all2 WITH {
  SELECT ?item (SUM(?x) AS ?i) WHERE {
    INCLUDE %all .
    INCLUDE %all2 .
    BIND( IF( STR( ?item ) > STR( ?item2 ), 1, 0 ) AS ?x ) .
  } GROUP BY ?item
} AS %main WHERE {
  INCLUDE %main .
  INCLUDE %count .
  BIND( FLOOR( RAND() * ?count ) AS ?rnd ) .
  FILTER( ?rnd = ?i ) .
}

Try it!

You need to associate each entry with a unique number. It can be the number of items which are "lower". But it only works with smaller datasets. Matěj Suchánek (talk) 18:58, 11 September 2019 (UTC)

I think it does exactly the same (with much more computing). I could run it on ≈3000 items: Wikidata:WikiProject Movies/reports/random/film is a good start. --- Jura 19:49, 11 September 2019 (UTC)

I changed it slightly, but it seems to be limited to results with 3000-4000 items. --- Jura 07:05, 1 October 2019 (UTC)

A solution could be to create a service that numbers the rows in results. On some level, Blazegraph already counts them. --- Jura 07:05, 1 October 2019 (UTC)

Open sourcing Wikidata

What's the status on phabricator:T226735? I think it really hinders editing that Wikibase doesn't provide the source.

Somehow I think we missed that Wikibase implemented a closed source approach for this. --- Jura 07:29, 29 August 2019 (UTC)

The task has not been picked up yet. I'll try to get someone from the team having a closer look at it. Lea Lacroix (WMDE) (talk) 09:30, 29 August 2019 (UTC)

Thanks. --- Jura 09:32, 29 August 2019 (UTC)

@Lea Lacroix (WMDE): what's the outcome? If I understand correctly, the issues are:

editing isn't efficiently possible as the original string isn't made available
searching isn't possible on query server or Special:Search as the string isn't indexed
users who need query server to retrieve the original string currently can't do so

This might have lead to the datatype being the only one where the overall number of statements is regressing. The design considerations are:

currently the content isn't open source as users can't retrieve the original string
new prefixes in RDF would be a solution specific to this datatype (similar to other datatypes that have special prefixes). A separate triple is the standard way of storing data.
including it in the MATH markup would require a bespoke way of retrieving the string and build-in a dependency on possible changes to that markup. Retrieval may be inefficient or brake on each update to MATH markup

As long as 1,2,3 are solved, I'm obviously indifferent how we get there, but I don't want to see it break on every update. --- Jura 08:52, 5 September 2019 (UTC)

Your assumptions are all correct. We will be working on this task over the next weeks. Lea Lacroix (WMDE) (talk) 11:44, 6 September 2019 (UTC)

Sounds good. Beyond the usecase mentioned in the ticket, it could be interesting to look at two other:

title in LaTeX (P6835) is available when the original title (P1476) is hard to read or unreadable without LaTeX markup. We have fairly larger number of such titles that need conversion. When retrieving titles on query server, I don't think the entire mathup is needed and the raw string should be available.
in defining formula (P7235)-qualifiers and values will allow to identify various elements within a defining formula (P2534)-value. To build and check these, I think the raw string should be used. Sample uses at Q862169#P2534.

From a dev point of view, I suppose the question is if one wants to spent time in import/export function for math-markup or simply on the triple addition.

Hope this helps.--- Jura 17:22, 9 September 2019 (UTC)

If it's too complicated or time consuming for development, we could try to mirror math markup in a string-datatype property (qualifier). What do you prefer? --- Jura 11:28, 29 September 2019 (UTC)
- @Lea Lacroix (WMDE): what do you/Lydia think? --- Jura 09:33, 7 October 2019 (UTC)

We agreed on following the solution described in phab:T195765. It still needs work to be correctly understood and evaluated by the development team. Lea Lacroix (WMDE) (talk) 10:18, 7 October 2019 (UTC)

@Lea Lacroix (WMDE): It's not clear to me what it would do now that the ticket was mostly re-written. Besides, there doesn't seem to be any demand from the Wikidata community for that approach. How would one be able to retrieve the source value? Currently, I don't think it solves your closed source problem and afaik WMDE development is committed to open source. --- Jura 11:28, 8 October 2019 (UTC)

Inability to edit or delete "also known as" field entry if the text overflows the visible space

Tracked in Phabricator
Task T234804

At Warren G. Hildenbrand, Jr. (Q69343171), for example, I no longer have the ability to edit or delete "also known as" field entries if the text overflows the visible space. This has been for about a month. It is independent of logged in/out and browser independent. If I ask another editor, they can edit/delete. --RAN (talk) 15:40, 30 September 2019 (UTC)

Hello @Richard Arthur Norton (1958- ):

Thanks a lot for reporting it! I can indeed reproduce, long aliases are not wrapping, both in read and edit more. I created a task.

I also tried editing a long alias in the sandbox. I can edit the hidden part by clicking in the field, then moving the cursor with my right arrow key until I see what I want to change. Is this workflow working for you? I acknowledge that it's not the best, but it can work until we fix the issue. Cheers, Lea Lacroix (WMDE) (talk) 10:06, 7 October 2019 (UTC)

I can't click in the field at all! Thanks for helping me. --RAN (talk) 12:29, 7 October 2019 (UTC)

I see, we will try to reproduce it on our side. Can you provide more information about your setup: OS, browser, any script or plug-in you use that could interact with Javascript?

In the meantime, if you want to edit an alias, you can use the special page https://www.wikidata.org/wiki/Special:SetLabelDescriptionAliases/Q4115189/en or even try our brand new mobile edition interface :) Lea Lacroix (WMDE) (talk) 15:23, 7 October 2019 (UTC)

Thanks, it looks like you just fixed it, you can now click on aka fields, that overflow with text, to edit or delete them. --RAN (talk) 18:45, 7 October 2019 (UTC)

Suggestion to extend P1630 and other formatters with "capture regex" qualifier

There's a suggestion at Wikidata:Property proposal/urn formatter that it would be nice to extend formatter URL (P1630) and the new proposed property by being able to additionally specify a "capture regex" qualifier, that would allow more sophisticated regex capture groups to be specified, that could then be included in P1630 statements as $2, $3 etc.

A comment from the development team would be useful as to whether this extension of functionality for formatter URL (P1630) and friends would be easy enough to implement, and any thoughts for/against its desirability, given that these url formatters are some of our most important properties for downstream re-users, eg formatter URI for RDF resource (P1921) for the Linked Open Data community. Thanks, Jheald (talk) 07:59, 8 October 2019 (UTC)

@Jheald: I don’t think that would be easy to implement, because we have no way to safely evaluate user-specified regexes in PHP. The WikibaseQualityConstraints extension uses the query service for this (which is why format constraint (Q21502404) is among the slowest constraint types to check), but in Wikibase itself we can’t rely on the existence of a query service (Beta and Test Wikidata don’t have one, for instance). --Lucas Werkmeister (WMDE) (talk) 12:20, 8 October 2019 (UTC)

Ranking of references

We made a Lua-script at nowiki to rank the best references, given some cultural bias. That list is effective for limiting the number of reference marks for each infobox entry, but it is not optimal for limiting the total number of reference entries. It would help if we could make a global list of most reused entries, and use that as a first level sorting of our preferred references.

Such a list would first bin the references on identifiers, like URL and ISBN, and then try to further bin unassigned references on weaker identifiers such as title, or slightly stronger author and title. This could be used for merging of references, but that could lead to reuse of entries that isn't portable. An obvious example would be the property retrieved (P813).

It is possible to build such a page-global list in Lua, but that would have to be redone for each used statement, unless the whole infobox is created with the same call. I believe the most common way to create infoboxes is row by row, but I could be in error. Jeblad (talk) 00:37, 19 September 2019 (UTC)

Hello Jeblad, and thanks for your suggestion. As preliminary research, we looked at how many of the existing statements in Wikidata have more than 3 references. As per September 23rd, there are 1,280,767 statements having 3 or more references, out of 751,832,149 in total, which makes 0,17% of the existing statements having 3 or more references attached to them. I think this is a number to consider before starting developping a new feature.

What about using the Query Service to build a list of the most used sources? Lea Lacroix (WMDE) (talk) 07:43, 24 September 2019 (UTC)

@Lea Lacroix (WMDE): Sorry for not responding, I just now noticed your reply. Yes, I know there are pretty few entries with a lot of references, but those are the most visible entries at nowiki. I believe this is a minor issue, but it was a real blocker for using references in the infobox at nowiki. I am pretty sure some version of a ranking algorithm is necessary for other Wikipedia communities to accept reuse of references, or some other way to limit the visual impact.

At the moment we have 8845 pages with four or more references for a statement (w:no:Kategori:Sider med fire referanser fra utsagn), out of 52509 pages with references (w:no:Kategori:Sider med referanser fra utsagn), or about 17% of the pages with references. This is within “fix the 90%”-rule. If the number is normalized over total number of pages it becomes 1.7%. This is still within “fix the 99%”-rule. The difference in my numbers and your numbers is that my numbers comes from references in statements that are actually used in the infobox, and then only those we have chosen as target for an initial test.

The problem becomes most troublesome at high visibility articles like Henrik Ibsen (Q36661) with 14 references on birth date [1], Knut Hamsun (Q40826) with 14 references on birth date [2], Fridtjof Nansen (Q72292) with 9 references on birth date [3], and Roald Amundsen (Q926) with 7 references on birth date [4].

If there were some better method connect references and avoid duplicates, then perhaps the community would be more forgiving, but the way it is now statements with many references also has a pretty high risk of having duplicates. That is a reference used in birth date is also used in birth place, but then in a shorter form. I wonder if the references in the statement should be links to common reused of some kind.

Hopefully I have explained some “why” and “how”. Jeblad (talk) 21:01, 11 October 2019 (UTC)

Transliteration of labels

At nowiki the community has chosen to use transliterated names for persons that otherwise can't be readily written in Norwegian. They have made a gadget for this to make it easy to transliterate a name on Wikipedia. Still the name has to be manually written into the Wikidata entry. That isn't so much of a hurdle for a single entry, but updating family members and similar could be a real workload.

ICU has a pretty decent transliteration engine, and PHP has an extension for this. So, I guess the question is pretty simple; what about adding transliteration for labels until it is necessary to fill them in manually? Jeblad (talk) 22:21, 11 October 2019 (UTC)

Thanks for reporting this usecase.

Adding new fallback rules is quite complex because translitteration depend on the target language, on the type of entities, etc. That's why we think this could be done by a bot, but not directly in the interface software. Lea Lacroix (WMDE) (talk) 11:02, 14 October 2019 (UTC)

Tools & features down

It seems for me, that a couple of tools & features are down. Like above already mentioned, the primary sources tool is not working. Additionally, I tested nameGuzzler on different PCs - it is not more working (Error message: notoken). Also, the copy-insert references functionality is gone. Probably more tools & features are affected. Florentyna (talk) 09:29, 14 October 2019 (UTC)

There seems to be some issues with https://tools.wmflabs.org/ at the moment, some tools may be impacted. I don't know much more but I will have a look. Lea Lacroix (WMDE) (talk) 11:03, 14 October 2019 (UTC)

Mixup of ⓬ and ⑫. ⑿ is ok

Tracked in Phabricator
Task T233204

Any idea why it's mixed up on query server: https://w.wiki/88J (the same appears in both columns) ?

It's correct on Q36977#P487.

Same for the others I checked. The ones in parentheses are fine. --- Jura 21:21, 9 September 2019 (UTC)

Is this a problem with server load or the export function? --- Jura 05:24, 12 September 2019 (UTC)

I had a look but couldn't find any explaination. Would you mind creating a Phab task so I can ping some people who could help? Lea Lacroix (WMDE) (talk) 10:55, 12 September 2019 (UTC)

I would have to find password for the account I use there and even if I did, I don't think there is much I could add and it would probably not go anywhere. Pinging should work here too. --- Jura 16:10, 12 September 2019 (UTC)

I can try to delete and re-add them, but that might hinder checks on your side. I think the same happened with some aliases. Hope there is no bug in the export functions. Just figured out why you wouldn't ping Stas. Too bad. --- Jura 11:05, 16 September 2019 (UTC)

At the moment, there is no dedicated person at WMF working on the Query Service. Things should evolve in the next months. Thanks for your patience. Lea Lacroix (WMDE) (talk) 07:57, 17 September 2019 (UTC)

Can you create the ticket and liaise with the team? (AFAIK that is your role as community liaison). I suppose WMF has several full stack developers. --- Jura 10:02, 17 September 2019 (UTC)

Thanks for investigating this and coming up with a test for it:

SELECT ("⓬" AS ?negativeCircled) ("⑫" AS ?circled) {}

Try it!

It also results in incorrect wdt: triples: compare the following two:

SELECT (COUNT(*) as ?count) { wd:Q36977 wdt:P487 [] }

Try it!

SELECT (COUNT(*) as ?count) { wd:Q36977 p:P487 [] }

Try it!

It probably also explains why items for 1, 4, and 5 have different results on query server: statements have been added earlier differently.

In addition to:

it seems to mix also:

Still, if it's limited to these, we could probably live with it in the short term. However, we should probably try to determine a test to make sure it doesn't negatively impact queries on lexemes in currently lesser used languages. --- Jura 13:23, 21 September 2019 (UTC)

Sample test: [5]. --- Jura 14:10, 22 September 2019 (UTC)

It seems there was already a report for this before at Property_talk:P487#Distinct_values_constraint_bug. --- Jura 05:44, 15 October 2019 (UTC)

Search is very poor at mis-spellings

Tracked in Phabricator
Task T235496

The search function seems very poor at presenting possible items, if the search text has been slightly mis-spelt (or spelt slightly differently to the exact labels or aliases we have) -- for example if I type one character wrong, or mistype one extra character in my search request, WD search will typically fail to find corresponding items.

Search should return near matches, not just exact matches.

Search on wikipedias seems to handle this much better, and to be much more robust at returning possible near matches. Can we investigate why?

This is such an important issue for helping people find the right items or properties, and to reduce the creation of duplicates. Jheald (talk) 12:33, 12 October 2019 (UTC)

There could two issues here, if you are referring to Did you mean suggestions in Special:Search it is based on the page titles, this search functionality has been designed for classic wikis and not yet been fully integrated with the Wikidata data model. This is the reason the feature has been disabled. There are multiple challenges to make this happen and thus I've created a task to followup on this. If you are referring to search-as-you-type the problem is slightly similar, classic wikis benefit from the mw:Extension:CirrusSearch/CompletionSuggester which is not enabled on wikidata, enabling the completion suggester on wikidata is slightly harder due to the size of this wiki and would require significantly more hardware. DCausse (WMF) (talk) 12:51, 15 October 2019 (UTC)

be-x-old has been renamed as be-tarask

Tracked in Phabricator
Task T235505

Bonjour,

be-x-old Wikipedia has been renamed as be-tarask, but wikidata interwikis still contain be_x_old and I see no way to change it.

Bots are complaining: WARNING: pywikibot-core/pywikibot/site.py:1897: UserWarning: Site wikipedia:be-tarask instantiated using different code "be-x-old"

Is there something planned to have this changed globally in wikidata?

Thank you for your help. – The preceding unsigned comment was added by Vargenau (talk • contribs).

Hello @Vargenau: Thanks a lot for reporting the issue! I just created a ticket, we will investigate about this issue. Lea Lacroix (WMDE) (talk) 13:55, 15 October 2019 (UTC)

Merci Léa ! Vargenau (talk) 14:15, 15 October 2019 (UTC)

P4839 datatype change

Tracked in Phabricator
Task T234221

Following the discussion at Property_talk:P4839#Data_type (also announced on project chat), please change P4839 from string to external-id. --- Jura 14:48, 26 September 2019 (UTC)

Ticket created. This should be done in the next weeks. Lea Lacroix (WMDE) (talk) 14:23, 30 September 2019 (UTC)

P7007 datatype change

Also, following the discussion at Property_talk:P7007#Data_type (also announced on project chat), please change P7007 from string to external-id. --- Jura 06:49, 19 October 2019 (UTC)

Stack Overflow on the query service (on a query used to document the Map visualisation

Tracked in Phabricator
Task T235540

see this. At the time I’m writing of course. author TomT0m / talk page 08:51, 23 October 2019 (UTC)

(SAMPLE(?location) AS ?location)

no longer works. Use

(SAMPLE(?location) AS ?location1)

instead. --- Jura 16:42, 23 October 2019 (UTC)

Actually, it's the MAX part. --- Jura 09:09, 24 October 2019 (UTC)

Is the unit conversion file published somewhere ?

How can we know which units are converted into what ? author TomT0m / talk page 13:35, 23 October 2019 (UTC)

In the git repo: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/unitConversionConfig.json. --Matěj Suchánek (talk) 09:00, 27 October 2019 (UTC)

Annoying message

Hello.This annoying message appears in Windows 7 during contribution and covers everything and make me forced to restart the computer.Why is it showing and how do I get rid of it?Thanks David (talk) 16:26, 27 October 2019 (UTC)

@ديفيد عادل وهبة خليل 2: Likely a browser issue, you should remove all your browser extensions and plugins and see what happens.--GZWDer (talk) 17:52, 27 October 2019 (UTC)

Query no longer working

The following query was working fine until recently (see history of ListeriaBot updates). Now, there is a systematic timeout error. Could you have a look at it, please?

SELECT (wd:Q5 as ?item)(COUNT(DISTINCT ?item) as ?count)
  WHERE {
    hint:Query hint:optimizer "None".
    ?item wdt:P131+ wd:Q142.
    ?item (wdt:P31/wdt:P279*) wd:Q16970
  MINUS { ?item wdt:P625 [].}

Try it!

}

Thanks. Ayack (talk) 10:43, 26 October 2019 (UTC)

@Ayack: have a look at hints mentioned here or the subqueries. --- Jura 22:12, 5 November 2019 (UTC)

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Lea Lacroix (WMDE) (talk) 13:16, 18 November 2019 (UTC)

Bar chart

Hi
I asked if I can force colors in the bar chart view and someone said to me that it's not possible (for now?). I also tried to order by the x-axis, without success (see the query below).
So, I want to know if these features are planned for the bar chart view ?

#defaultView:BarChart
select distinct ?législatureLabel (count (distinct ?item) as ?count) ?groupeLabel ?rgb where {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
?item p:P39 ?fonction ; 
      wdt:P39 wd:Q15964890 .
?fonction ps:P39 wd:Q15964890 ;
          pq:P4100 ?groupe ; 
          pq:P2937 ?législature ; 
          pq:P768 ?circonscription . 
  optional { ?groupe wdt:P465 ?rgb }
}
group by ?législatureLabel ?groupeLabel ?count ?rgb
order by desc (?législatureLabel)

Try it!

Simon Villeneuve (talk) 13:25, 28 October 2019 (UTC)

Currently we unfortunately don't have the resources to work on that. Sorry :/ --Lydia Pintscher (WMDE) (talk) 21:31, 5 November 2019 (UTC)

I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Lea Lacroix (WMDE) (talk) 13:16, 18 November 2019 (UTC)