Open main menu

Wikidata:Contact the development team

Contact the development team

Wikidata development is ongoing. You can leave notes for the development team here, on #wikidata connect and on the mailing list or report bugs on Phabricator. (See the list of open bugs on Phabricator.)

Regarding the accounts of the Wikidata development team, we have decided on the following rules:

  • Wikidata developers can have clearly marked staff accounts (in the form "Fullname (WMDE)"), and these can receive admin and bureaucrat rights.
  • These staff accounts should be used only for development, testing, spam-fighting, and emergencies.
  • The private accounts of staff members do not get admin and bureaucrat rights by default. If staff members desire admin and bureaucrat rights for their private accounts, those should be gained going through the processes developed by the community.
  • Every staff member is free to use their private account just as everyone else, obviously. Especially if they want to work on content in Wikidata, this is the account they should be using, not their staff account.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2019/10.

Development plan Usability and usefulness Status updates Development input Contact the development team


Open sourcing WikidataEdit

What's the status on phabricator:T226735? I think it really hinders editing that Wikibase doesn't provide the source.

Somehow I think we missed that Wikibase implemented a closed source approach for this. --- Jura 07:29, 29 August 2019 (UTC)

The task has not been picked up yet. I'll try to get someone from the team having a closer look at it. Lea Lacroix (WMDE) (talk) 09:30, 29 August 2019 (UTC)
Thanks. --- Jura 09:32, 29 August 2019 (UTC)

@Lea Lacroix (WMDE): what's the outcome? If I understand correctly, the issues are:

  1. editing isn't efficiently possible as the original string isn't made available
  2. searching isn't possible on query server or Special:Search as the string isn't indexed
  3. users who need query server to retrieve the original string currently can't do so

This might have lead to the datatype being the only one where the overall number of statements is regressing. The design considerations are:

  • currently the content isn't open source as users can't retrieve the original string
  • new prefixes in RDF would be a solution specific to this datatype (similar to other datatypes that have special prefixes). A separate triple is the standard way of storing data.
  • including it in the MATH markup would require a bespoke way of retrieving the string and build-in a dependency on possible changes to that markup. Retrieval may be inefficient or brake on each update to MATH markup

As long as 1,2,3 are solved, I'm obviously indifferent how we get there, but I don't want to see it break on every update. --- Jura 08:52, 5 September 2019 (UTC)

Your assumptions are all correct. We will be working on this task over the next weeks. Lea Lacroix (WMDE) (talk) 11:44, 6 September 2019 (UTC)
Sounds good. Beyond the usecase mentioned in the ticket, it could be interesting to look at two other:
  1. title in LaTeX (P6835) is available when the original title (P1476) is hard to read or unreadable without LaTeX markup. We have fairly larger number of such titles that need conversion. When retrieving titles on query server, I don't think the entire mathup is needed and the raw string should be available.
  2. in defining formula (P7235)-qualifiers and values will allow to identify various elements within a defining formula (P2534)-value. To build and check these, I think the raw string should be used. Sample uses at Q862169#P2534.
From a dev point of view, I suppose the question is if one wants to spent time in import/export function for math-markup or simply on the triple addition.
Hope this helps.--- Jura 17:22, 9 September 2019 (UTC)
  • If it's too complicated or time consuming for development, we could try to mirror math markup in a string-datatype property (qualifier). What do you prefer? --- Jura 11:28, 29 September 2019 (UTC)
We agreed on following the solution described in phab:T195765. It still needs work to be correctly understood and evaluated by the development team. Lea Lacroix (WMDE) (talk) 10:18, 7 October 2019 (UTC)
  • @Lea Lacroix (WMDE): It's not clear to me what it would do now that the ticket was mostly re-written. Besides, there doesn't seem to be any demand from the Wikidata community for that approach. How would one be able to retrieve the source value? Currently, I don't think it solves your closed source problem and afaik WMDE development is committed to open source. --- Jura 11:28, 8 October 2019 (UTC)

SPARQL query OFFSET by ?var (instead of number)Edit

SELECT * 
WITH
{ 
  SELECT (COUNT(*) as ?all) (xsd:integer(ROUND(RAND() * ?all)) as ?test) WHERE { ?item wdt:P31 wd:Q11424 }
} as %all
WHERE
{
    INCLUDE %all 
    ?item wdt:P31 wd:Q11424 
}
OFFSET ?test
#OFFSET 25341
LIMIT 10

Try it!

To get a random set of ten items, I used the above query. OFFSET is determined by RAND() to avoid getting the same ones every day. However, it gives me the following error:

MalformedQueryException: Encountered " <VAR1> "?test "" at line 11, column 8. Was expecting: <INTEGER> ...

Would be good if it was possible. --- Jura 12:16, 2 September 2019 (UTC)

  • It looks like the requirement comes from the SPARQL standard. Obviously, WMF Blazegraph could still support it.
Is there another way to create somewhat random selections directly on query server? Please don't mention SAMPLE().
It could be done by calculating the offset in a separate step or by retrieving all data and selecting afterwards, but doing it when query should be much more convenient. Any suggestions? --- Jura 10:50, 8 September 2019 (UTC)
@Jura1: You may be interested in this approach:
SELECT * WITH { 
  SELECT DISTINCT ?item WHERE { ?item wdt:P279 wd:Q18602249 }
} AS %all WITH {
  SELECT (COUNT(*) AS ?count) { INCLUDE %all }
} AS %count WITH {
  SELECT (?item AS ?item2) WHERE { INCLUDE %all }
} AS %all2 WITH {
  SELECT ?item (SUM(?x) AS ?i) WHERE {
    INCLUDE %all .
    INCLUDE %all2 .
    BIND( IF( STR( ?item ) > STR( ?item2 ), 1, 0 ) AS ?x ) .
  } GROUP BY ?item
} AS %main WHERE {
  INCLUDE %main .
  INCLUDE %count .
  BIND( FLOOR( RAND() * ?count ) AS ?rnd ) .
  FILTER( ?rnd = ?i ) .
}
Try it!
You need to associate each entry with a unique number. It can be the number of items which are "lower". But it only works with smaller datasets. Matěj Suchánek (talk) 18:58, 11 September 2019 (UTC)
I think it does exactly the same (with much more computing). I could run it on ≈3000 items: Wikidata:WikiProject Movies/reports/random/film is a good start. --- Jura 19:49, 11 September 2019 (UTC)
I changed it slightly, but it seems to be limited to results with 3000-4000 items. --- Jura 07:05, 1 October 2019 (UTC)
  • A solution could be to create a service that numbers the rows in results. On some level, Blazegraph already counts them. --- Jura 07:05, 1 October 2019 (UTC)

Mixup of ⓬ and ⑫. ⑿ is okEdit

Any idea why it's mixed up on query server: https://w.wiki/88J (the same appears in both columns) ?

It's correct on Q36977#P487.

Same for the others I checked. The ones in parentheses are fine. --- Jura 21:21, 9 September 2019 (UTC)

Is this a problem with server load or the export function? --- Jura 05:24, 12 September 2019 (UTC)
I had a look but couldn't find any explaination. Would you mind creating a Phab task so I can ping some people who could help? Lea Lacroix (WMDE) (talk) 10:55, 12 September 2019 (UTC)
I would have to find password for the account I use there and even if I did, I don't think there is much I could add and it would probably not go anywhere. Pinging should work here too. --- Jura 16:10, 12 September 2019 (UTC)
I can try to delete and re-add them, but that might hinder checks on your side. I think the same happened with some aliases. Hope there is no bug in the export functions. Just figured out why you wouldn't ping Stas. Too bad. --- Jura 11:05, 16 September 2019 (UTC)
At the moment, there is no dedicated person at WMF working on the Query Service. Things should evolve in the next months. Thanks for your patience. Lea Lacroix (WMDE) (talk) 07:57, 17 September 2019 (UTC)
Can you create the ticket and liaise with the team? (AFAIK that is your role as community liaison). I suppose WMF has several full stack developers. --- Jura 10:02, 17 September 2019 (UTC)

Thanks for investigating this and coming up with a test for it:

SELECT ("⓬" AS ?negativeCircled) ("⑫" AS ?circled) {}

Try it!

It also results in incorrect wdt: triples: compare the following two:

SELECT (COUNT(*) as ?count) { wd:Q36977 wdt:P487 [] }

Try it!

SELECT (COUNT(*) as ?count) { wd:Q36977 p:P487 [] }

Try it!

It probably also explains why items for 1, 4, and 5 have different results on query server: statements have been added earlier differently.

In addition to:

it seems to mix also:

Still, if it's limited to these, we could probably live with it in the short term. However, we should probably try to determine a test to make sure it doesn't negatively impact queries on lexemes in currently lesser used languages. --- Jura 13:23, 21 September 2019 (UTC)

  • Sample test: [1]. --- Jura 14:10, 22 September 2019 (UTC)

Line charts viz weirdness and feature requestEdit

Following a request in WD:Request a query to graph the number of members in an organisation

I find weird that a query like this one does not work : It seems really similar in spirit to this line chart documentation example, in spirit, except the datatype of « year » is number instead of string . It seems that a sum is performed over all the lines of the table result for a reason unclear to me. (This make sense to treat it as a number, this is easier to treat missing informations as gaps without having to deal with the problems of, say year 2000, 2010 and 2015 points shown with equal distance if we have only 3 datapoints. )

Which leads to the feature requests, motivated by a query request : it seems that all works fine keeping the date datatype instead of trying to extract the year, but … the intent was to display the year only on the chart, whereas the x-label of the datapoints are displayed as 1-1-year, ignoring the precision of the date. Should this display should not display the month and day if the precision is « year » ? This would avoid the need to extract the year of the date (as a number). author  TomT0m / talk page 21:06, 16 September 2019 (UTC) Now that I think about it, this is a stupid feature request in that case, as the precision is not available with simple RDF values of statements. As this query does use simple values I don’t know how WDQS could be aware of the precision of the value …

  • As a second thought, the feature request would be simply display the year in the X axis label if it seems there is no more than one value per year, or something like that. author  TomT0m / talk page 17:42, 17 September 2019 (UTC)
I think the issue you’re encountering is the long-standing bug T168341 (but perhaps I’m misunderstanding something). --Lucas Werkmeister (WMDE) (talk) 17:35, 18 September 2019 (UTC)
The reason of the weirdness seems to be bound to columns parsing and identifying which ones are X-axis and Y-Axis, which defined to support Number, Label and Datetime per Doc, and which columns define series. For the given query it identifies both org and year as series determiner, breaking up the logic of the graph. It might be fixed by converting year to text, in the same way as an example in the mentioned doc is doing, see the link. It does not necessary has to be defined in the inner query, the conversion in the outer select also working: see fixed query.  – The preceding unsigned comment was added by Igorkim78 (talk • contribs).
@Lucas Werkmeister (WMDE): The problem is that I wanted to avoid doing this conversion, I was aware this would kind of work. But if there is a missing year or several values for a year for some reason in the series, say one per month for just a year, the query become more complicated for no good reason imho : a year gap will result into an irregular spacing of the points (say 1980 is missing, the space between 1979 and 1981 will ). I would fix this, I think I found solutions in the past (querying the items of the years beetween the max and min dates, for example, can help adding a phantom point on the X-axis). Some sample or mean point would have to be chosen if there is several points for a year, or a filter according to date precision or something like that). But it’s trickier and correctly having a date-kind X-axis instead make easier to make the query robust. author  TomT0m / talk page 19:32, 30 September 2019 (UTC)

Ranking of referencesEdit

We made a Lua-script at nowiki to rank the best references, given some cultural bias. That list is effective for limiting the number of reference marks for each infobox entry, but it is not optimal for limiting the total number of reference entries. It would help if we could make a global list of most reused entries, and use that as a first level sorting of our preferred references.

Such a list would first bin the references on identifiers, like URL and ISBN, and then try to further bin unassigned references on weaker identifiers such as title, or slightly stronger author and title. This could be used for merging of references, but that could lead to reuse of entries that isn't portable. An obvious example would be the property retrieved (P813).

It is possible to build such a page-global list in Lua, but that would have to be redone for each used statement, unless the whole infobox is created with the same call. I believe the most common way to create infoboxes is row by row, but I could be in error. Jeblad (talk) 00:37, 19 September 2019 (UTC)

Hello Jeblad, and thanks for your suggestion. As preliminary research, we looked at how many of the existing statements in Wikidata have more than 3 references. As per September 23rd, there are 1,280,767 statements having 3 or more references, out of 751,832,149 in total, which makes 0,17% of the existing statements having 3 or more references attached to them. I think this is a number to consider before starting developping a new feature.
What about using the Query Service to build a list of the most used sources? Lea Lacroix (WMDE) (talk) 07:43, 24 September 2019 (UTC)
@Lea Lacroix (WMDE): Sorry for not responding, I just now noticed your reply. Yes, I know there are pretty few entries with a lot of references, but those are the most visible entries at nowiki. I believe this is a minor issue, but it was a real blocker for using references in the infobox at nowiki. I am pretty sure some version of a ranking algorithm is necessary for other Wikipedia communities to accept reuse of references, or some other way to limit the visual impact.
At the moment we have 8845 pages with four or more references for a statement (w:no:Kategori:Sider med fire referanser fra utsagn), out of 52509 pages with references (w:no:Kategori:Sider med referanser fra utsagn), or about 17% of the pages with references. This is within “fix the 90%”-rule. If the number is normalized over total number of pages it becomes 1.7%. This is still within “fix the 99%”-rule. The difference in my numbers and your numbers is that my numbers comes from references in statements that are actually used in the infobox, and then only those we have chosen as target for an initial test.
The problem becomes most troublesome at high visibility articles like Henrik Ibsen (Q36661) with 14 references on birth date [2], Knut Hamsun (Q40826) with 14 references on birth date [3], Fridtjof Nansen (Q72292) with 9 references on birth date [4], and Roald Amundsen (Q926) with 7 references on birth date [5].
If there were some better method connect references and avoid duplicates, then perhaps the community would be more forgiving, but the way it is now statements with many references also has a pretty high risk of having duplicates. That is a reference used in birth date is also used in birth place, but then in a shorter form. I wonder if the references in the statement should be links to common reused of some kind.
Hopefully I have explained some “why” and “how”. Jeblad (talk) 21:01, 11 October 2019 (UTC)

MacBook Pro TrackPad innaccuracy using Google ChromeEdit

Hi. I've found that my trackpad is misses the mark when trying to highlight something in a text box here in Wikidata. I can select text most anywhere with accuracy else in Wikidata or other sites. When in WD, I find the mouse highlights a few letters to the right of what I'm actually trying to highlight. Sometimes highlighting doesn't work. I don't think it's my MacBook, but maybe it is. Trilotat (talk) 14:26, 25 September 2019 (UTC)

Hello, one of my colleagues tried on his own Macbook. He selected content, both in and out of edit mode, without any issue.
Have you tried with a different browser? Lea Lacroix (WMDE) (talk) 14:50, 26 September 2019 (UTC)

P4839 datatype changeEdit

Following the discussion at Property_talk:P4839#Data_type (also announced on project chat), please change P4839 from string to external-id. --- Jura 14:48, 26 September 2019 (UTC)

Ticket created. This should be done in the next weeks. Lea Lacroix (WMDE) (talk) 14:23, 30 September 2019 (UTC)

IP Range QuerieisEdit

I created Wikidata:Property_proposal/IP_range_start, but I'm wondering if there is a technical solution that could be implemented that would prevent the data from having to be duplicated? If so, how would that be implemented? U+1F360 (talk) 15:50, 26 September 2019 (UTC)

Just mentioning here that the discussions continues on the property proposal talk page. Lea Lacroix (WMDE) (talk) 12:14, 30 September 2019 (UTC)

Del borked triplesEdit

phab: discussed here: it's seems a simple thing to fix, but I don't think it has been done yet.

When querying, it can give confusing results (multiple triples appear where there should be just one).

Can you look into it? --- Jura 11:26, 29 September 2019 (UTC)

Thanks for letting us know. Could you try editing one of the items, then run the query a few hours later? It may be cached somehow.
If this doesn't solve the problem, we can consider reloading the Query Service with a new dump - I'm not sure how long this would take. Lea Lacroix (WMDE) (talk) 12:31, 30 September 2019 (UTC)
Stas used to have a procedure to re-load individual entities to the Query Servers in case there is something wrong with them. He used it occasionally when edits were missing from WDQS servers for whatever reason. It seems to be the best idea if that script was used here, but we cannot do it as simple Wikidata users. Mind that we cannot do "nulledits" on entities, and finding a possible change for all entities seems not adequate here. --MisterSynergy (talk) 13:15, 30 September 2019 (UTC)
I agree, you can't really expect users to edit tens of thousands of items. Here are some:
SELECT * { { ?st wikibase:timeCalendarModel wd:P1985727 } UNION { ?st wikibase:timeCalendarModel wd:P1985786 }  }
Try it!
Maybe talk to Lucas Werkmeister, supposedly sitting next to you. I think he tracked down much of the issue. --- Jura 13:59, 30 September 2019 (UTC)
SELECT ?st (str(?unit) as ?str_unit) 
{
    ?st wikibase:quantityUnit ?unit .
    FILTER( strstarts( str(?unit), "http://www.wikidata.org/entity/P" ) )
}
Try it!
Here a second query with triples to delete. It doesn't solve it entirely, but deleting these two groups would improve the situation for critical details and might be fairly simple to do.
A point to check might be if the code is really fixed. The reported problem with P7295 is recent: the property was created on 8 September 2019‎ only. --- Jura 17:22, 30 September 2019 (UTC)

Inability to edit or delete "also known as" field entry if the text overflows the visible spaceEdit

At Warren G. Hildenbrand, Jr. (Q69343171), for example, I no longer have the ability to edit or delete "also known as" field entries if the text overflows the visible space. This has been for about a month. It is independent of logged in/out and browser independent. If I ask another editor, they can edit/delete. --RAN (talk) 15:40, 30 September 2019 (UTC)

Hello @Richard Arthur Norton (1958- ):
Thanks a lot for reporting it! I can indeed reproduce, long aliases are not wrapping, both in read and edit more. I created a task.
I also tried editing a long alias in the sandbox. I can edit the hidden part by clicking in the field, then moving the cursor with my right arrow key until I see what I want to change. Is this workflow working for you? I acknowledge that it's not the best, but it can work until we fix the issue. Cheers, Lea Lacroix (WMDE) (talk) 10:06, 7 October 2019 (UTC)
I can't click in the field at all! Thanks for helping me. --RAN (talk) 12:29, 7 October 2019 (UTC)
I see, we will try to reproduce it on our side. Can you provide more information about your setup: OS, browser, any script or plug-in you use that could interact with Javascript?
In the meantime, if you want to edit an alias, you can use the special page https://www.wikidata.org/wiki/Special:SetLabelDescriptionAliases/Q4115189/en or even try our brand new mobile edition interface :) Lea Lacroix (WMDE) (talk) 15:23, 7 October 2019 (UTC)
  • Thanks, it looks like you just fixed it, you can now click on aka fields, that overflow with text, to edit or delete them. --RAN (talk) 18:45, 7 October 2019 (UTC)

Suggestion to extend P1630 and other formatters with "capture regex" qualifierEdit

There's a suggestion at Wikidata:Property proposal/urn formatter that it would be nice to extend formatter URL (P1630) and the new proposed property by being able to additionally specify a "capture regex" qualifier, that would allow more sophisticated regex capture groups to be specified, that could then be included in P1630 statements as $2, $3 etc.

A comment from the development team would be useful as to whether this extension of functionality for formatter URL (P1630) and friends would be easy enough to implement, and any thoughts for/against its desirability, given that these url formatters are some of our most important properties for downstream re-users, eg formatter URI for RDF resource (P1921) for the Linked Open Data community. Thanks, Jheald (talk) 07:59, 8 October 2019 (UTC)

@Jheald: I don’t think that would be easy to implement, because we have no way to safely evaluate user-specified regexes in PHP. The WikibaseQualityConstraints extension uses the query service for this (which is why format constraint (Q21502404) is among the slowest constraint types to check), but in Wikibase itself we can’t rely on the existence of a query service (Beta and Test Wikidata don’t have one, for instance). --Lucas Werkmeister (WMDE) (talk) 12:20, 8 October 2019 (UTC)

Primary Sources tool downEdit

After a lot of grant money went into developing the primary sources tool, it seems down at the moment. What's the current status? Will it come back online again in the future? ChristianKl❫ 07:59, 8 October 2019 (UTC)

I just pinged its author. Lea Lacroix (WMDE) (talk) 11:01, 14 October 2019 (UTC)

Transliteration of labelsEdit

At nowiki the community has chosen to use transliterated names for persons that otherwise can't be readily written in Norwegian. They have made a gadget for this to make it easy to transliterate a name on Wikipedia. Still the name has to be manually written into the Wikidata entry. That isn't so much of a hurdle for a single entry, but updating family members and similar could be a real workload.

ICU has a pretty decent transliteration engine, and PHP has an extension for this. So, I guess the question is pretty simple; what about adding transliteration for labels until it is necessary to fill them in manually? Jeblad (talk) 22:21, 11 October 2019 (UTC)

Thanks for reporting this usecase.
Adding new fallback rules is quite complex because translitteration depend on the target language, on the type of entities, etc. That's why we think this could be done by a bot, but not directly in the interface software. Lea Lacroix (WMDE) (talk) 11:02, 14 October 2019 (UTC)

Search is very poor at mis-spellingsEdit

The search function seems very poor at presenting possible items, if the search text has been slightly mis-spelt (or spelt slightly differently to the exact labels or aliases we have) -- for example if I type one character wrong, or mistype one extra character in my search request, WD search will typically fail to find corresponding items.

Search should return near matches, not just exact matches.

Search on wikipedias seems to handle this much better, and to be much more robust at returning possible near matches. Can we investigate why?

This is such an important issue for helping people find the right items or properties, and to reduce the creation of duplicates. Jheald (talk) 12:33, 12 October 2019 (UTC)

There could two issues here, if you are referring to Did you mean suggestions in Special:Search it is based on the page titles, this search functionality has been designed for classic wikis and not yet been fully integrated with the Wikidata data model. This is the reason the feature has been disabled. There are multiple challenges to make this happen and thus I've created a task to followup on this. If you are referring to search-as-you-type the problem is slightly similar, classic wikis benefit from the mw:Extension:CirrusSearch/CompletionSuggester which is not enabled on wikidata, enabling the completion suggester on wikidata is slightly harder due to the size of this wiki and would require significantly more hardware. DCausse (WMF) (talk) 12:51, 15 October 2019 (UTC)

Tools & features downEdit

It seems for me, that a couple of tools & features are down. Like above already mentioned, the primary sources tool is not working. Additionally, I tested nameGuzzler on different PCs - it is not more working (Error message: notoken). Also, the copy-insert references functionality is gone. Probably more tools & features are affected. Florentyna (talk) 09:29, 14 October 2019 (UTC)

There seems to be some issues with https://tools.wmflabs.org/ at the moment, some tools may be impacted. I don't know much more but I will have a look. Lea Lacroix (WMDE) (talk) 11:03, 14 October 2019 (UTC)

be-x-old has been renamed as be-taraskEdit

Bonjour,

be-x-old Wikipedia has been renamed as be-tarask, but wikidata interwikis still contain be_x_old and I see no way to change it.

Bots are complaining: WARNING: pywikibot-core/pywikibot/site.py:1897: UserWarning: Site wikipedia:be-tarask instantiated using different code "be-x-old"

Is there something planned to have this changed globally in wikidata?

Thank you for your help.  – The preceding unsigned comment was added by Vargenau (talk • contribs).

Hello @Vargenau: Thanks a lot for reporting the issue! I just created a ticket, we will investigate about this issue. Lea Lacroix (WMDE) (talk) 13:55, 15 October 2019 (UTC)
Merci Léa ! Vargenau (talk) 14:15, 15 October 2019 (UTC)