Open main menu

Wikidata:Contact the development team

(Redirected from Wikidata:DEVS)

Contact the development team

Wikidata development is ongoing. You can leave notes for the development team here, on #wikidata connect and on the mailing list or report bugs on Phabricator. (See the list of open bugs on Phabricator.)

Regarding the accounts of the Wikidata development team, we have decided on the following rules:

  • Wikidata developers can have clearly marked staff accounts (in the form "Fullname (WMDE)"), and these can receive admin and bureaucrat rights.
  • These staff accounts should be used only for development, testing, spam-fighting, and emergencies.
  • The private accounts of staff members do not get admin and bureaucrat rights by default. If staff members desire admin and bureaucrat rights for their private accounts, those should be gained going through the processes developed by the community.
  • Every staff member is free to use their private account just as everyone else, obviously. Especially if they want to work on content in Wikidata, this is the account they should be using, not their staff account.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2019/09.

Development plan Usability and usefulness Status updates Development input Contact the development team


Open sourcing WikidataEdit

What's the status on phabricator:T226735? I think it really hinders editing that Wikibase doesn't provide the source.

Somehow I think we missed that Wikibase implemented a closed source approach for this. --- Jura 07:29, 29 August 2019 (UTC)

The task has not been picked up yet. I'll try to get someone from the team having a closer look at it. Lea Lacroix (WMDE) (talk) 09:30, 29 August 2019 (UTC)
Thanks. --- Jura 09:32, 29 August 2019 (UTC)

@Lea Lacroix (WMDE): what's the outcome? If I understand correctly, the issues are:

  1. editing isn't efficiently possible as the original string isn't made available
  2. searching isn't possible on query server or Special:Search as the string isn't indexed
  3. users who need query server to retrieve the original string currently can't do so

This might have lead to the datatype being the only one where the overall number of statements is regressing. The design considerations are:

  • currently the content isn't open source as users can't retrieve the original string
  • new prefixes in RDF would be a solution specific to this datatype (similar to other datatypes that have special prefixes). A separate triple is the standard way of storing data.
  • including it in the MATH markup would require a bespoke way of retrieving the string and build-in a dependency on possible changes to that markup. Retrieval may be inefficient or brake on each update to MATH markup

As long as 1,2,3 are solved, I'm obviously indifferent how we get there, but I don't want to see it break on every update. --- Jura 08:52, 5 September 2019 (UTC)

Your assumptions are all correct. We will be working on this task over the next weeks. Lea Lacroix (WMDE) (talk) 11:44, 6 September 2019 (UTC)
Sounds good. Beyond the usecase mentioned in the ticket, it could be interesting to look at two other:
  1. title in LaTeX (P6835) is available when the original title (P1476) is hard to read or unreadable without LaTeX markup. We have fairly larger number of such titles that need conversion. When retrieving titles on query server, I don't think the entire mathup is needed and the raw string should be available.
  2. in defining formula (P7235)-qualifiers and values will allow to identify various elements within a defining formula (P2534)-value. To build and check these, I think the raw string should be used. Sample uses at Q862169#P2534.
From a dev point of view, I suppose the question is if one wants to spent time in import/export function for math-markup or simply on the triple addition.
Hope this helps.--- Jura 17:22, 9 September 2019 (UTC)

Adding a label that is already in useEdit

I tried to add צבי to he label of Zvi (Q231342)

I got this error message: "Could not save due to an error. Item Tzvi (Q55362931) already has label "צבי" associated with language code he, using the same description text."

What should be done on such case? The word in Hebrew is the same word, but some people transliterate Zvi and some Tzvi. Thanks, Uziel302 (talk) 13:52, 30 August 2019 (UTC)

@Harmonia Amanda: do you have an idea? Lea Lacroix (WMDE) (talk) 08:55, 2 September 2019 (UTC)
I cleaned up the descriptions so it would be clear that Zvi (Q231342) is about the Latin-script name and Tzvi (Q55362931) is about the Hebrew name. I added the other transliteration as an alias for that one. --Harmonia Amanda (talk) 08:58, 2 September 2019 (UTC) edit:I also moved the sitelinks, which were all speaking of the Hebrew name, not the Latin-script one. --Harmonia Amanda (talk) 09:00, 2 September 2019 (UTC)
  • BTW It's a standard feature of Wikidata requiring a different description text. --- Jura 09:30, 2 September 2019 (UTC)

SPARQL query OFFSET by ?var (instead of number)Edit

SELECT * 
WITH
{ 
  SELECT (COUNT(*) as ?all) (xsd:integer(ROUND(RAND() * ?all)) as ?test) WHERE { ?item wdt:P31 wd:Q11424 }
} as %all
WHERE
{
    INCLUDE %all 
    ?item wdt:P31 wd:Q11424 
}
OFFSET ?test
#OFFSET 25341
LIMIT 10

Try it!

To get a random set of ten items, I used the above query. OFFSET is determined by RAND() to avoid getting the same ones every day. However, it gives me the following error:

MalformedQueryException: Encountered " <VAR1> "?test "" at line 11, column 8. Was expecting: <INTEGER> ...

Would be good if it was possible. --- Jura 12:16, 2 September 2019 (UTC)

  • It looks like the requirement comes from the SPARQL standard. Obviously, WMF Blazegraph could still support it.
Is there another way to create somewhat random selections directly on query server? Please don't mention SAMPLE().
It could be done by calculating the offset in a separate step or by retrieving all data and selecting afterwards, but doing it when query should be much more convenient. Any suggestions? --- Jura 10:50, 8 September 2019 (UTC)
@Jura1: You may be interested in this approach:
SELECT * WITH { 
  SELECT DISTINCT ?item WHERE { ?item wdt:P279 wd:Q18602249 }
} AS %all WITH {
  SELECT (COUNT(*) AS ?count) { INCLUDE %all }
} AS %count WITH {
  SELECT (?item AS ?item2) WHERE { INCLUDE %all }
} AS %all2 WITH {
  SELECT ?item (SUM(?x) AS ?i) WHERE {
    INCLUDE %all .
    INCLUDE %all2 .
    BIND( IF( STR( ?item ) > STR( ?item2 ), 1, 0 ) AS ?x ) .
  } GROUP BY ?item
} AS %main WHERE {
  INCLUDE %main .
  INCLUDE %count .
  BIND( FLOOR( RAND() * ?count ) AS ?rnd ) .
  FILTER( ?rnd = ?i ) .
}
Try it!
You need to associate each entry with a unique number. It can be the number of items which are "lower". But it only works with smaller datasets. Matěj Suchánek (talk) 18:58, 11 September 2019 (UTC)
I think it does exactly the same (with much more computing). I could run it on ≈3000 items: Wikidata:WikiProject Movies/reports/random/film is a good start. --- Jura 19:49, 11 September 2019 (UTC)

possible new tool: VIAF identifier importerEdit

Hi folks -- I'm not sure quite the right place to ask this, so if there's a better place to do so, please let me know.

I've been developing a tool to make it easier to import VIAF-linked identifiers into Wikidata. It's far enough along now that I'd like to get your input on the utility of the tool and how we might leverage it to be useful for the Wikidata community as a whole. Right now, it's hosted on my own web space. I'd prefer not to link to it publicly, so I've made a quick showing how it works.

It looks up a Q-item and a VIAF ID, then looks at all the other identifiers linked from VIAF. It formats them as necessary, validates them against the known format as a regular expression (P1793) associated with the identifier, and spits out the appropriate QuickStatements-formatted data for all the identifiers it validated.

Not explained in the video are what happens to data that don't work out: if the tool doesn't know what to do with an identifier or it fails the regex check, it's noted as an error at the bottom and not put into QS list.

Does this look like something that would contribute to the Wikidata world and not already accomplished elsewhere? If so, what would be an appropriate development path to bringing it into production?

Some areas for development I have in mind include:

  • adding rules for handling some identifiers it currently doesn't know what to do with
  • there could be a bot that does the same thing for records that already have VIAF identifier (Q19832964) statements, and by-passes QuickStatements and just adds the extra ids in automatically

What do you think?

Thanks - Kenirwin (talk) 19:55, 4 September 2019 (UTC)

Hello,
Thanks for your work! I think you should bring this discussion to the mailing-list or the project chat page, where more people can give feedback about it :) In general, people autonomously develop tools and add them on the Wikidata:Tools page. Feel free to do the same. Lea Lacroix (WMDE) (talk) 10:42, 9 September 2019 (UTC)

Designing to avoid query limitsEdit

Hello all,

A python project I am working on keeps getting the HTTP code 429, which has resulted in a ban from the service. I continued to produce requests after an initial result in an (unsuccessful) attempt to figure out the problem (after reading the link here I still don't understand how to use a header).

The goal of the project is to iterate through a list of wikidata items and retrieve the value for a preselected property of the items. (For example, it could grab the run times of a saved list of films)

Any advice on how to get the ban lifted, and especially on how I can design my project in order to avoid query limits (for instance, do I need to code in a pause after each iteration through the list?), would be greatly appreciated. Reason&Squalor (talk) 20:26, 4 September 2019 (UTC)

Hello, is your tool having a user agent header? Tools not having it may be blocked on the Query Service. Lea Lacroix (WMDE) (talk) 10:46, 9 September 2019 (UTC)
I assume I don't. I'm read the page you linked a few times (both when looking up a 429 and after your reply), and can't understand how to figure out if I have one, or add one if I don't. The tool is a Python file made following the basic Django tutorial. I'm not sure if this is relevant, but the tool was not initially blocked, it took a dozen or so uses to get the 429.

Thank for your reply. Reason&Squalor (talk) 16:10, 10 September 2019 (UTC)

You can find an example here: m:User-Agent_policy. Lea Lacroix (WMDE) (talk) 07:02, 11 September 2019 (UTC)

mw.loadData and wikibase limitEdit

I bumped into this more or less by accident, but what if wikibase made its own mw.loadData() with higher load budget and better caching? Now mw.loadData() caches for the duration of one page, but what if we could cache for longer time and invalidate the cache as necessary? It would make it possible to use higher load limits, because the computed data can be reused. The generated data could be stored as a pageprop blob, and the module generating the blab would have to track all items it depends upon, but that is pretty straight forward.

A sufficient criteria for the data set to be cached would be that it does not use implicit load of a connected item. A necessary criteria for reusing the cached data would be that the page requesting the data has an current revision older than the timestamp of the cached data. That is necessary, but not sufficient, as some other item might be updated without the data set being invalidated yet. (The timestamp might be different from the current revision if the pageprops table is manipulated, but it is probably better to do something like that in a separate table. Or memcached.)

It would solve problems like generating GDP indexes for countries and sorting the list to get a ranking, the flag lookup for sport articles, listing ESC winners (and Nobel winners, etc) in the individual articles. It is although most interesting in articles depending on some computation or special formatting of the data. For example when the GDP indexes must be expressed in a local currency. (Yes, this can be done in WQS.)

It would probably also make a very nice fit together with the Wikibase Query service, but it solves a slightly different problem. I'm not sure WQS can expose all traversed relations, and thus be able to invalidate the cached data. That could be a showstopper.

A function mw.wikibase.loadData() could actually make it (nearly) unnecessary to integrate the Wikibase Query service in Wikipedia. (Now I'm going to be unpopular, again…) Jeblad (talk) 21:57, 8 September 2019 (UTC)

@Jeblad: What exactly (statements of an entity?) would you load via that new method? Why are the current access methods not enough for these use cases (they are also, although only for a limited number, in-memory cached)? -- Hoo man (talk) 16:25, 16 September 2019 (UTC)
@Hoo man: This is typically pages that need a large fixed set of statements for comparison or presentation. Information about nationality for winners from a large sports contest, this is typically such things as flags. Comparing normalized demographic (also GDP, land use, etc) statistics to create a ranking. Those numbers are non-static and must be recalculated when the numbers from any country changes. Ranking of mountain tops and lake area can be put into the items, as this is information that will not change. In those cases too it could be better to calculate the ranking in a separate module. A third use case would be to collect pages about numerical units, to do such things as getting to the short forms, or simplifying units into normalized short forms, or inverting unit names. This particular use case can be solved by autogenerating redirects and a few additional properties. A fourth use case is to build inheritance trees, but that should probably be explicitly solved. Now we avoid at all cost to traverse instance of (P31) and subclass of (P279), but properly checking the type hierarchy is important for creating navigation templates.
[rant] Navigational templates could perhaps be solved with mw.wikibase.getReferencedEntityId( fromEntityId, propertyId, toIds ), but I have not tried to reimplement any of the existing modules. It is probably (?) implemented slightly wrong. It should filter down the set of toIds into the ones that satisfy the constraint, not only return a random match. It should be two methods, one mw.wikibase.isReferencedEntityId and another mw.wikibase.filterReferencedEntityId.
The current blocker for creating a module like that is the limit of 500 requests. Unwinding information of all countries eats a good chunk of the load budget, and we are hard pressed to create pages that stay within the limit. Jeblad (talk) 18:33, 16 September 2019 (UTC)
@Jeblad: So what you want is to have a pre-computed (and cached) collection of statements, collected according to certain rules (probably by a Lua module that can be invoked stand-alone to collect this data into a table)?
Regarding mw.wikibase.getReferencedEntityId: If that is needed often, we could potentially create a similar function that checks for the presence of all ids and return those that are referenced. --Hoo man (talk) 18:22, 17 September 2019 (UTC)
@Hoo man: On the first part; yes, and it would be pretty close to Wikibase Query service integrated into Wikipedia. Most use cases will be better served with WQS, but some would be hard to implement with WQS alone. You might view this as a map-reduce problem, where map is implemented as WQS and reduce (or conditioning) as Lua code.
On the second part; What I did was to create standardized navigational templates for administrative areas. (Same thing happen for a lot of such templates.) Typically I have a set of statements from contains administrative territorial entity (P150), but must filter that down to real municipalities. Statements like Q103732#P150 contains weirdness like Dalarna County Council (Q3231325). It is not an administrative area, it is the administration itself. The entities from P150 must be filtered on type municipality of Sweden (Q127448), or even better municipality (Q15284) or second-level administrative country subdivision (Q13220204). Trying to traverse the hierarchy iteratively is awfully heavy, but can be done much more efficient with mw.wikibase.getReferencedEntityId, especially if it is implemented somewhat better. Just put in a flag to signal if any hit is sufficient or whether filtering the whole set is necessary. [Seems like what I describe is in the opposite direction of what getReferencedEntityId does…] Jeblad (talk) 19:36, 17 September 2019 (UTC)

Mixup of ⓬ and ⑫. ⑿ is okEdit

Any idea why it's mixed up on query server: https://w.wiki/88J (the same appears in both columns) ?

It's correct on Q36977#P487.

Same for the others I checked. The ones in parentheses are fine. --- Jura 21:21, 9 September 2019 (UTC)

Is this a problem with server load or the export function? --- Jura 05:24, 12 September 2019 (UTC)
I had a look but couldn't find any explaination. Would you mind creating a Phab task so I can ping some people who could help? Lea Lacroix (WMDE) (talk) 10:55, 12 September 2019 (UTC)
I would have to find password for the account I use there and even if I did, I don't think there is much I could add and it would probably not go anywhere. Pinging should work here too. --- Jura 16:10, 12 September 2019 (UTC)
I can try to delete and re-add them, but that might hinder checks on your side. I think the same happened with some aliases. Hope there is no bug in the export functions. Just figured out why you wouldn't ping Stas. Too bad. --- Jura 11:05, 16 September 2019 (UTC)
At the moment, there is no dedicated person at WMF working on the Query Service. Things should evolve in the next months. Thanks for your patience. Lea Lacroix (WMDE) (talk) 07:57, 17 September 2019 (UTC)
Can you create the ticket and liaise with the team? (AFAIK that is your role as community liaison). I suppose WMF has several full stack developers. --- Jura 10:02, 17 September 2019 (UTC)

What happened with "undo"?Edit

It seems I can't find it any more when selected multiple edits. I could click rollback, but this is generally meant for something else. --- Jura 13:53, 10 September 2019 (UTC)

Hello, is the issue still happening for you? I tried again and I see "undo". Lea Lacroix (WMDE) (talk) 13:59, 11 September 2019 (UTC)
Looks like it's back. That was quick! Thanks to all involved. --- Jura 18:20, 11 September 2019 (UTC)

phab:T209208, can this be deployed more quickly?Edit

There's local agreements now. --Liuxinyu970226 (talk) 22:17, 12 September 2019 (UTC)

Thanks for the ping, we'll move forward with this as soon as our resources allow it. Lea Lacroix (WMDE) (talk) 11:55, 16 September 2019 (UTC)

Distinct page background colo(u)r based on P31 (for Q5)Edit

In a discussion on project chat, @Simon Villeneuve: brought this up. I think it would be an interesting addition. Could we try one for items with instance of (P31)=human (Q5) to start with? Pick whatever color GUI designers suggest. --- Jura 14:08, 13 September 2019 (UTC)

I think this is a good idea for a user script, but not for a default feature for all users. Lea Lacroix (WMDE) (talk) 14:55, 13 September 2019 (UTC)
Another similar request : when any dissolved, abolished or demolished (P576), show a box or an icon "closed" or something like that... (perhaps asking for the moon :) ) Bouzinac (talk) 15:07, 13 September 2019 (UTC)
  • Re: Background color: maybe a gadget? We can then discuss if it should be on by default. The problem is that once users figure out how to use user scripts, they wouldn't really need it anymore.
    Looking at the html source of a page, I don't think there is any feature that could use css for it. Could we include some support? --- Jura 15:09, 13 September 2019 (UTC)
Lydia and I have been looking again at the original discussion. We acknowledge the issue about duplicates, but we don't think that adding a background color is a good way to solve it. Also, background color is not the kind of information that everyone can interpret correctly, and is not accessible for everyone. So we are not going to add this as a default feature. Lea Lacroix (WMDE) (talk) 11:09, 16 September 2019 (UTC)

Some automatic edit summaries are uselessEdit

Hello,
Automatic edit summaries are necessary to fight vandalism efficiently. But sometimes I see the automatic edit summary "‎Updated item" (which is useless). Why?
For instance:

  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (‎a change of a label) here.
  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (additions of a label and a description) here.
  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (‎an addition of an alias) here.
  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (‎a change of an alias) here.

Regards --NicoScribe (talk) 22:10, 13 September 2019 (UTC) → 18:58, 14 September 2019 (UTC)

Hello, these edits come from the new mobile termbox. We're currently working on improving these edits summary, per phab:T220696: in the upcoming weeks, the new edit summaries will be slightly more explicit ("Changed label, description and/or alias in # languages") and more improvements will follow. Feel free to check edits again in a few weeks, and let me know if you still encounter this issue. Lea Lacroix (WMDE) (talk) 10:11, 16 September 2019 (UTC)

Line charts viz weirdness and feature requestEdit

Following a request in WD:Request a query to graph the number of members in an organisation

I find weird that a query like this one does not work : It seems really similar in spirit to this line chart documentation example, in spirit, except the datatype of « year » is number instead of string . It seems that a sum is performed over all the lines of the table result for a reason unclear to me. (This make sense to treat it as a number, this is easier to treat missing informations as gaps without having to deal with the problems of, say year 2000, 2010 and 2015 points shown with equal distance if we have only 3 datapoints. )

Which leads to the feature requests, motivated by a query request : it seems that all works fine keeping the date datatype instead of trying to extract the year, but … the intent was to display the year only on the chart, whereas the x-label of the datapoints are displayed as 1-1-year, ignoring the precision of the date. Should this display should not display the month and day if the precision is « year » ? This would avoid the need to extract the year of the date (as a number). author  TomT0m / talk page 21:06, 16 September 2019 (UTC) Now that I think about it, this is a stupid feature request in that case, as the precision is not available with simple RDF values of statements. As this query does use simple values I don’t know how WDQS could be aware of the precision of the value …

  • As a second thought, the feature request would be simply display the year in the X axis label if it seems there is no more than one value per year, or something like that. author  TomT0m / talk page 17:42, 17 September 2019 (UTC)