Wikidata:Contact the development team/Archive/2018/11

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Times-series properties - Wikidata UI performance

Dear development team, we are disussing on the WD project chat about the modeling of statistical properties with time-series like nominal GDP (P2131). One user mentioned here, that Wikidata may be not well-suited to time-series data. I have made an estimation for already available properties and new (not proposed) properties, for wich we have lists in Wikipedia. For example for United States of America (Q30):

If we import Data for all available properties for USA (population, inflation rate etc.) like we have already done this for nominal GDP (P2131), then we would have ca. 1650 new statements (old values for these properties remain deprecated)
If I count statements for new (yet not proposed) properties, that are used in Wikipedia, then I come (in the sum) up to ca 1800 new statements

This is just a rough estimation, without counting other new properties that could result in the future.

Can the Wikidata UI handle this? --Datawiki30 (talk) 20:27, 6 November 2018 (UTC)

Are we talking about 1800 statements on one item? Then Special:LongPages and https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel-statements?refresh=30m&orgId=1 say yes it is possible. Will it make you happy? Very unlikely - especially on high-profile pages that get visited a lot. As has been said on Project chat Wikidata isn't very good at this kind of high-granularity time-line data. It's not what it's made for. (And it can't be good for everything. We have to make trade-offs.) --Lydia Pintscher (WMDE) (talk) 13:04, 9 November 2018 (UTC)

Hi Lydia and thank you for your answer. Where is the bottleneck - is this the used database, datamodel or the UI? If the UI is the bottleneck - isn't possible to first query the sum of all the statements and then decide: a) if the numer of statements is below x, then the actual UI can be used and b) if the numer of statements exceeds x, then the only few statements per property are shown and an AJAX function (or something like that) is used to show more statements if needed. Could this help? --Datawiki30 (talk) 22:10, 9 November 2018 (UTC)

In the future possibly yes. But it's not something we can do anytime soon unfortunately. --Lydia Pintscher (WMDE) (talk) 14:27, 10 November 2018 (UTC)

Unhelpful error message returned by some Lua functions

I suppose it has already been discussed but I can't find it. Most annoying things it that is does

mw.wikibase.getEntityObject("QXX") -> "The ID "QXX" is unknown to the system. Please use a valid entity ID."
mw.wikibase.getAllStatements("Q1", " P31"): -> "failed to serialize data"

I think the first of those messages has been slightly improved over a previous reason, but still.. The really annoying thing is that there is no backtrace inside Lua code, as there is for other Lua errors. -Zolo (talk) 14:52, 10 November 2018 (UTC)

Wikidata:Development plan

@Lea I think Wikidata:Development plan needs some curation... --Succu (talk) 20:38, 20 October 2018 (UTC)

Thanks for the reminder. We're currently working on the plan for 2019 and next, as soon as I have information that I can share, I will update the page. Lea Lacroix (WMDE) (talk) 07:20, 21 October 2018 (UTC)

Is there a place where we can comment on 2019? --- Jura 08:12, 25 October 2018 (UTC)
@Lydia Pintscher (WMDE): ? --- Jura 12:02, 31 October 2018 (UTC)
- It's far from done and not yet a roadmap but here is what I am currently working with: https://eu.roadmunk.com/publish/024b19029fa532915c2558a67a311207f1d688cb This will change and be expanded still. --Lydia Pintscher (WMDE) (talk) 17:10, 31 October 2018 (UTC)
  - Thanks. I will have a look. --- Jura 08:31, 1 November 2018 (UTC)
    - I added short descriptions to all of them to make it a bit clearer what I have in mind. --Lydia Pintscher (WMDE) (talk) 14:04, 2 November 2018 (UTC)

@Lydia Pintscher (WMDE): thanks. I didn't comment earlier as it's actually a complicated stage for Wikidata.

Bearing in mind that most basic features are in place, the question is now how to ensure that Wikidata can grow into different magnitudes. We need to ensure to be able to do that in terms of capacity (server capacity, etc. and tools to import and edit) and in terms of being able to handle quality.
I can't really comment on what's technically needed to grow into another magnitude, but users (like me) should come to a good understanding of the other points. I will try to comment on this aspect another day. Sync with other databases can be a feature for that.
Of the basic features, maybe time precision, quantity range and astronomical coordinates could be addressed.
For interested Wikipedias, there should be a clear way to ensure infoboxes and annotated lists can be used.

Some the points in the plan might take a few months to realize, for others we might just see a step within the next year. Obviously, there are always some tweaks to gui and other that come in, but these might not necessarily be key. --- Jura 05:58, 14 November 2018 (UTC)

@Lydia Pintscher (WMDE): Thanks for posting the plan. As Jura, I also believe that Wikidata has most basic features in place, and I hope that can be used as a promotion tool outside of the Wikimedia world. For instance, it would be nice to establish contacts with the corporate world and start some kind of pilot project (like Wikidata:FactGrid was for humanities) to see what are the needs of the projects that they might envision, so that eventually there are more Wikibase users.

I find some of the tasks in the roadmap a must (citoid, easier queries, federation, client editing, and mobile web support), but there are other tasks that could be there as well, like for instance building the tools to enable Wiktionaries to reuse lexicographical data generated or imported here. It is also not very clear which data Wiktionaries might need (I suppose initialy translation lists), or even how to navigate from items to Lexemes.

I would also appreciate if there would be some long term investment in research. The Wikidata structure has limits representing knowledge, and if some day we want to achieve an "abstract Wikipedia", we'll need to think beyond yearly plans.--Micru (talk) 21:54, 14 November 2018 (UTC)

Current rate limits

Hey dev team, after all the trouble with dispatching throughout the past year, there were edit rate limits in place until recently (IIRC: 80 or 90 per min in times with low max lag times). Is this still the case? (Probably not, as User:QuickStatementsBot operates at 180/min on average over the past 24 hours [1]).

So I’d like to ask which limits apply at the moment, and where I can look up this value at any time. I’m considering speeding up my bot account as well, but I do not want to run into trouble at all… —MisterSynergy (talk) 16:30, 8 November 2018 (UTC)

@Ladsgroup: Can you help? Ideally with a link so people can look it up themselves in the future? <3 --Lydia Pintscher (WMDE) (talk) 13:07, 9 November 2018 (UTC)

Hey, the way to handle it is to just respect maxlag. Stop editing once the maxlag is more than 5 "seconds". Hope this helps Amir (talk) 14:05, 9 November 2018 (UTC)

Thanks! So "No limits" in case everything runs smoothly :-)
However, are you sure about "5 seconds"? It used to be 5 minutes before the value was recently reduced to the current range; WD:Bots still talks about "60" (seconds), which has always been an outdated value of course. —MisterSynergy (talk) 14:57, 9 November 2018 (UTC)

I think the documentation was misunderstood. you need to provide the maxlag=5 to the API when making an edit, the API would return error when the server lag is actually higher than what you sent (in this case 5). You need to wait for some time and try again. 5 can be seconds of replication lag (between master and replicas) or it can be minutes of dispatch lag between wikidata and median of its clients (so maxlag=5 would return error if any of these two happen) Amir (talk) 17:16, 9 November 2018 (UTC)

Okay thanks, I think I understand most of it. As I am using pywikibot in standard configuration, I should be on the safe side anyways. I have observed auto throttling due to high server load several times in the past. —MisterSynergy (talk) 17:58, 9 November 2018 (UTC)

@Amir: If maxlag=5 is a requirement, why exist this parameter at all? --Succu (talk) 21:04, 10 November 2018 (UTC)

User:Succu: Technically we want to give as much as freedom as possible. In other words, we let users/community decide how to handle the pressure. Amir (talk) 14:54, 14 November 2018 (UTC)

Root Category - PT wikipedia dump

Hi!

I am developing a Natural Language Process (NLP) project and using Hadoop/Map Reduce/Google Cloud to process wiki dumps. I´ve used the Wikipedia Miner project (https://github.com/dnmilne/wikipediaminer/wiki) to extract the english dump file (https://dumps.wikimedia.org/enwiki/20181020/enwiki-20181020-pages-articles-multistream.xml.bz2). The Wikipedia Miner needs a configuration file to inform the wikipedia root category to extract. I've used the category Contents and it works fine!!! My real problem happens trying to process the portuguese dump (https://dumps.wikimedia.org/ptwiki/20181020/ptwiki-20181020-pages-articles-multistream.xml.bz2). I've used the category Conteúdo as root category and DOESN'T work. I did several changes and combinations to generate the portuguese root category (Conteúdos, Categoria:Conteúdos, Artigos, Conceitos etc) and nothing. Someone could help me please? I need this dump extract to mey PHD research.

Valtemir - PHD research valtemir.alencar@gmail.com

@Hoo_man: Can you help here? :) Lea Lacroix (WMDE) (talk) 10:47, 24 November 2018 (UTC)

True duplicates clean up?

Maybe we could do another check for true duplicates?

It hadn't come across them in a while, but here is another recent pair:

Done --Liuxinyu970226 (talk) 01:32, 5 October 2018 (UTC)

In case it isn't clear: the idea is to re-generate the list with the special script to find them (if the script is still needed). @Hoo man: might recall. The above was a sample of items to be found, but I guess that's broken now.
If the rate of true duplicates has increased recently, maybe some additional investigation is needed. We can then merge them (ideally directly by bot). --- Jura 08:43, 5 October 2018 (UTC)
@Jura1: list them, please. --Liuxinyu970226 (talk) 04:28, 6 October 2018 (UTC)
I'm not a member of the development team I'm trying to contact and you probably aren't either. --- Jura 07:59, 6 October 2018 (UTC)
a new pair: Q57077881 and Q57077882 --Pasleim (talk) 08:54, 9 October 2018 (UTC)
Done very easy. --Liuxinyu970226 (talk) 14:37, 9 October 2018 (UTC)
@Liuxinyu970226: I reported it to the development team that they can investigate why true duplicates occur and eventually can create a list of all true duplicates. Of course I could also have merged the items but that does likely not help the development team. --Pasleim (talk) 14:50, 9 October 2018 (UTC)
Maybe it needs to be spelled out: "Liuxinyu970226, please refrain from editing any samples mentioned on this page." --- Jura 15:53, 9 October 2018 (UTC)

@Adam Shorland (WMDE): Could you have a look please? Also see phabricator:T44325. --Lydia Pintscher (WMDE) (talk) 15:08, 13 October 2018 (UTC)

If they are not exported to WQS, wouldn't they be on some rejection log of the export process? --- Jura 13:14, 14 October 2018 (UTC)
Is it still running ( https://phabricator.wikimedia.org/T44325#4666049 ) ? --- Jura 13:39, 20 October 2018 (UTC)

A new pair. Q57422402 Q57422403 (Please don't merge). --- Jura 03:18, 21 October 2018 (UTC)

Another: Q57549898 Q57549899 --- Jura 12:56, 25 November 2018 (UTC)

Misalignment of SPARQL

Tracked in Phabricator
Task T210451
Resolved

I found a lot of item deleted some days ago, but query returns these elements in the results. Example in this list generated today, you can find a lot of item deleted also 10 days ago. It's possible solve it? --ValterVB (talk) 18:43, 6 November 2018 (UTC)

@Smalyshev (WMF): Any idea what's going on? --Lydia Pintscher (WMDE) (talk) 12:59, 9 November 2018 (UTC)

Looks like some updates have been missed, I'll take a look. Smalyshev (WMF) (talk) 19:09, 11 November 2018 (UTC)

Any news about this? Also because I think that the problem continues, I find item deleted 2 days ago --ValterVB (talk) 19:07, 14 November 2018 (UTC)

@ValterVB: Working on it. See phab:T210044 for updates. Smalyshev (WMF) (talk) 20:46, 26 November 2018 (UTC)

@Smalyshev (WMF): Have you checked? The problem persists, and the number of deleted items in Query Service continues to increase. --ValterVB (talk) 09:31, 24 November 2018 (UTC)

I support this request; there are several deletion-related worklists I work with which are populated by deleted items that don’t go away for weeks meanwhile. —MisterSynergy (talk) 20:22, 25 November 2018 (UTC)

Look like we have a breakage in delete reporting: phab:T210451. As soon as this is fixed, I'll manually re-update all deletes and then deletes should work better again. Smalyshev (WMF) (talk) 22:33, 26 November 2018 (UTC)

Constraint violations in the query service

Being able to query constraint violations in the query service sounded really cool, but I've never been able to use it for anything because it only returns a tiny fraction of the actual violations (e.g. for one constraint I checked, it only finds 138 when there are really 36000+). When is it going to start returning all of them? - Nikki (talk) 10:02, 27 November 2018 (UTC)

Hey Nikki, we're working on it :) See phab:T204031 and its subtasks. Unfortunately I can't give you a precise date of deployment for now. Lea Lacroix (WMDE) (talk) 13:11, 27 November 2018 (UTC)

Some more background info after grilling the devs: What still needs to happen: write a script to create some items on the test systems, make this work on two beta and test wikidata, change a config variable. I hope we can still get this done before Christmas. --Lydia Pintscher (WMDE) (talk) 17:04, 27 November 2018 (UTC)

Not stable number of senses in lexemes

Tracked in Phabricator
Task T210495

I observe strange thing. In this report number of English senses floats up and down regardless real progres. @ArthurPSmith: is adding senses nearly every day and number should be always increasing. I observed it last month and marked in red in the table when number went down: https://ibb.co/mn2oLV . Today it happened again. I feel there is something to seriously worry about but I'm not experienced enough to name it. KaMan (talk) 10:50, 27 November 2018 (UTC)

Hello, thanks for reporting and tracking this! We're going to investigate. Maybe there's nothing to worry about, and some Lexemes are just deleted as duplicates or something similar. I'm looking for more detailed data. Lea Lacroix (WMDE) (talk) 13:15, 27 November 2018 (UTC)

@Lea Lacroix (WMDE): Thanks. I doubt its about deleting duplicates. It's decreasing of 2749-2659=90 in six hours. There is not so much duplicates and English lexemes are mostly one-man-work of ArthurPSmith and he says he is mainly adding content. KaMan (talk) 13:26, 27 November 2018 (UTC)

Here's a query reporting the deleted Lexemes. Indeed they are not many. I created a ticket. The issue could be related to a problem existing these days with the Query Service not returning the proper number of results, and as far as I know Listeria uses the Query Service. Lea Lacroix (WMDE) (talk) 13:53, 27 November 2018 (UTC)

User:Liamjamesperritt has been adding English senses recently, and I believe some other users have as well. Not sure if one of them might also be deleting, but it sounds like not on this scale... ArthurPSmith (talk) 17:33, 27 November 2018 (UTC)

Strange. I have not deleted any senses (maybe one at one time or another, but not that I can remember). From my end, the number of English senses should just be increasing. Liamjamesperritt (talk) 18:20, 27 November 2018 (UTC)