Wikidata:Contact the development team/Archive/2014/09

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Featured list badge

Here an unsupported badge item for featured lists is mentioned. What's the state with this one? Isn't it quite obvious that it's needed as quite a few wikis use en:Template:Link FL to mark interwiki links. As much as I understand it can't be replaced with FA badge item as it's need to distinguish featured lists that aren't articles (or at least not proper articles). I.e. no badges have been imported for Q462671 yet and there doesn't seem to be a proper way to do it. 11:34, 30 August 2014 (UTC)

Please start a topic on Wikidata:Project chat saying you want it. If there is no objection within a few days we will add it. --Lydia Pintscher (WMDE) (talk) 16:19, 1 September 2014 (UTC)

Redirects to redirects

If one merges more than two items, by doing the wrong order one may create a redirect to a redirect. However, it is not possible to edit the redirect in a simple way, to solve this. Are there any thoughts on that matter? I solved it very unelegantly here, faking a new merge through merge.js. Lymantria (talk) 07:45, 1 September 2014 (UTC)

We have bugzilla:69167 for that. --Lydia Pintscher (WMDE) (talk) 16:16, 1 September 2014 (UTC)

"format=txt has been deprecated"

Hello, today I find warning "format=txt has been deprecated. Please use format=json instead" in API response. My bot uses this format 4 years already. This format works fine (at least on my side). Is it really needed to deprecate it? How much time I have to move my code to JSON? — Ivan A. Krestinin (talk) 14:25, 1 September 2014 (UTC)

Hey :) That's something Wikimedia Core did I believe. Nothing specific to Wikidata. I don't know how long it'll stay around. Sorry. --Lydia Pintscher (WMDE) (talk) 16:15, 1 September 2014 (UTC)

New badges and missing categories in cawiki

Today, all categories of "featured articles in other languages" show empty in cawiki, and a wrongly named red category has appeared in ca:Categoria:Viquipèdia:Articles destacats en la Viquipèdia en German. I'm afraid it could be related to new badges. I don't see the same problem in other wikipedias except for astwiki, where such categories are empty, too.--Pere prlpz (talk) 08:58, 28 August 2014 (UTC)

Problem solved reverting an unaccured edit on local templates. Anyway, once the badges are available for Wikipedias, local templates like Link FA will be removed and there will be no categorization. --Vriullop (talk) 12:12, 28 August 2014 (UTC)
@Ladsgroup: Can you have a look please? --Lydia Pintscher (WMDE) (talk) 09:37, 29 August 2014 (UTC)
French Wikipedia has the same issue, They have categories that is being populated by these templates, So we can't remove them now but It's possible to write a Lua module to add categories but the question in what template you want to invoke the module? I think they'll do it in Template:Portal in French Wikipedia. Amir (talk) 13:44, 29 August 2014 (UTC)
I think we should create a special page to query all badges on Wikidata and then we can drop the categories on the Wikipedias. -- Bene* talk 11:10, 30 August 2014 (UTC)
I'm analyzing a bot request in Portuguese Wikipedia to remove the {Link FA} from articles and Ladsgroup said about this problem. I can try to create a tool in Tool Labs to generate the lists, but I'm not finding where the badges are in the wikidata database, it is not in the wb_items_per_site table, someone knows where the badges are in database? Danilo.mac (talk) 02:37, 31 August 2014 (UTC)
Italian Wikipedia has the same issue, too. And probably every other wiki that has Category:Featured articles needing translation from foreign-language Wikipedias (Q8445069) and similar. --Ricordisamoa 00:46, 4 September 2014 (UTC)

27 char date strings

As some points, 27-char timesvalues were added instead of the standard 28 char ones (like [1]). Is it ok now ? Can I make a bot request to remove them ? --Zolo (talk) 14:36, 3 September 2014 (UTC)

I am not aware of any remaining issues there. --Lydia Pintscher (WMDE) (talk) 15:53, 3 September 2014 (UTC)

Problem with file?

There is some problem with the images, If you see main page, for example, all the image are missing and namespace 6 is empty. --ValterVB (talk) 20:50, 3 September 2014 (UTC)

Small hiccup with a configuration change deployed across the cluster which affected all wikis. Resolved now. John F. Lewis (talk) 20:54, 3 September 2014 (UTC)

Remove badges

@Bene*: Remove badges, or select more than one not in sequence, via CTRL+Click is not very intuitive for new users. For example, it's possible to put a checkbox near to each badge, so users can select\deselect more easily? Or, alternatively, add a button or a link "Deselect all badges". Do you think this could be a useful feature? Thanks! --β16 - (talk) 08:40, 4 September 2014 (UTC)

I think checkboxes is the plan, per Aude (talk) 14:59, 4 September 2014 (UTC)

Coordinates precision

Auto precision of coordinates is broken. If I enter decimal values with six decimals, instead of 1e-6 precision the UI sets 1/1000 of an arcsecond precision. --JulesWinnfield-hu (talk) 09:49, 5 September 2014 (UTC)

Also, when I set 1e-6 precision the value displayed at “will be displayed as:” is wrong and doesn't match the actual value displayed after I save and reload the page. --JulesWinnfield-hu (talk) 09:56, 5 September 2014 (UTC)

Une erreur inattendue s’est produite : $1 / Invalid token

Yo !

In wp:fr (I don't know if is just wp:fr, but I doesn't see discussions about others wiki), we have an error when we add a item on the pop-up interface (when there is no interwiki). It's said : "Une erreur inattendue s’est produite : $1", "Invalid token". So we have to go to wikidata search the item and add the link manually (I don't know if lot of people know to do that). See that discussion : fr:Wikipédia:Le Bistro/6 septembre 2014#Problème d'interwiki avec wikidata (encore). The problem was occurred 1 or 2 days ago, from my personal experience. --Nouill (talk) 08:14, 6 September 2014 (UTC)

Is the same on Problem occurred this morning. --ValterVB (talk) 08:29, 6 September 2014 (UTC)
Possibly the same problem as what was fixed here, in case anyone knows where the code of "Add Links" pop-up lies. LaddΩ chat ;) 10:35, 6 September 2014 (UTC)
And as what was fixed here too. Dr Brains (talk) 13:53, 6 September 2014 (UTC)

Same in Vietnamese Wikipedia also. It DID affect editors of all languages. Alphama (talk)

I've added a link to this page on MediaWiki Bugzilla 70488--Robby (talk) 15:56, 6 September 2014 (UTC)
I don't know how you guys can deal with this error but for me actually it is very annoyed. Everytime I have to add interlinks on Wikidata instead on popups of local Wikis. Some large-scale Wikis may have technical team to fix this error but what are about small-scale Wikis? We did have 287 languages. Alphama (talk) 05:35, 7 September 2014 (UTC)
According to the information here: this issue should be fixed on Monday 8th September 2014. --Robby (talk) 10:38, 7 September 2014 (UTC)
Same in Bengali (bn) Wikipedia also. --Aftab1995 (talk) 14:05, 7 September 2014 (UTC)
Same in Slovene (sl) Wikipedia, see w:sl:Wikipedija:Pod lipo#Interwiki_ne_deluje for screenshot. --Romanm (talk) 11:34, 8 September 2014 (UTC)
By now everything is back to normal ( I tested it from lb-wikipedia). --Robby (talk) 06:51, 9 September 2014 (UTC)

When moving an article, it removes the badge

As seen here: [2]. In my opinion the badge should follow the page move. --Stryn (talk) 19:22, 8 September 2014 (UTC)

Actually, the badges should be removed when a sitelink gets edited on Wikidata but they should stay when the page is moved through the client as shown in your example. Therefore, this seems to be a bug. -- Bene* talk 21:16, 8 September 2014 (UTC)
Indeed. Can one of you please file a bug on please? Thanks! --Lydia Pintscher (WMDE) (talk) 20:47, 10 September 2014 (UTC)
Done bugzilla:70687 -- Bene* talk 20:52, 10 September 2014 (UTC)

Wikidata broken by design?

Excuse me for re-activating the following discussion since @Jeblad: made a quite impressive statement that illuminates the problems in a more technical fashion. I did not check back earlier since I am sort of disappointed by by the fact that this fundamental topic is not regarded as important as it should be. Random knowledge donator (talk) 09:56, 25 August 2014 (UTC)

Trying to get some answer here since the project chat discussion about how to properly capture uncertainty did not result in any valuable input.
I am still not sure how to properly capture those two cases of uncertainty I listed. An answer that Wikidata does and will not support that is fair enough - although, as far as I know, the intention of Wikidata was not to be a plain fact database but indeed allow modeling uncertainty. Listing the original description of the issues - any answer appreciated (please excuse the catchy headline, just trying to get some more attention than in the project chat):
The first one was on Abraham of Freising (Q330885): According to the reference, the person may have died either on 7 June 993 or 7 June 994. This could be reflected by using a time range or a data type specific qualifier like "alternative date". But, actually, these are two discrete values, basically a list of dates. Eventually, I added both which, at first, seems reasonable and was done before. However, when querying for people having died in 993, one would receive Abraham of Freising (Q330885) without any hint that this information is not certain. Consequently, when querying for people having died in 993, one would assume that this person, in fact, died in 993 and uncertainty becomes fact.
Another example is Wolfgang Carl Briegel (Q1523127). According to one reference, the person may have been a student of Johann Erasmus Kindermann (Q466635). Qualified by the same time range, I added "unknown value" and Johann Erasmus Kindermann (Q466635) for student of (P1066). However, split into two separate statements, that does not really reflect what the reference expresses and applying both statements, backed by the same reference, seems even odder than backing different values for date of death (P570) with the same reference. Expressing that Johann Erasmus Kindermann (Q466635) may have taught Wolfgang Carl Briegel (Q1523127) using student (P802) on Johann Erasmus Kindermann (Q466635) seems kind of impossible without some weird qualifier expressing "may be false". One could argue to just drop that uncertain information and use "unknown value" exclusively, but, well, that would be a loss of information and I am sure such problems occur in other situations as well (an example of a more prominent topic may be to model something like "Roger Godberd (Q7358238) might have been Robin Hood (Q122634)"). Random knowledge donator (talk) 06:56, 25 June 2014 (UTC)

I think those are different cases and in each case it should be treated differently. For instance, for the case of the date of birth, I would mark it as "unknown value" with qualifiers earliest date (P1319)/latest date (P1326). For the second case you could propose a qualifier "source certainty" that would indicate how sure are the sources about the provided information.
But you shouldn't expect to get "ultimate answers". Anyone can give suggestions, and if you don't get feedback, that means that you can come up with a proposal of your own.
OTOH, I agree that Wikidata is broken by design, however that applies not only Wikidata but to any piece of software or reality-representation :) The trick is to move closer little by little every day and not to expect perfect data or knowledge, because by definition it doesn't exist. --Micru (talk) 08:30, 25 June 2014 (UTC)
Thanks for your answer. Using earliest date (P1319) and latest date (P1326) would imply a range though. "Source certainty" is a nice idea. However, one would need to define a constraint of exclusive values (which probably would need to be items to be machine-readable) and what would these values be? Items for "high certainty", "normal certainty", "low certainty"?
Technically, I would like to simply flag values that can be regarded uncertain. When issuing a query, these values could be marked/filtered/whatever easily. As for the first example, it would even be better to allow some kind of alternative values on single statements since the value is basically a list of possible values - but that is probably hard to model from a technical perspective. Flagging statements uncertain could, for example, be simply(?) achieved by extending the "value type" options though: "custom value" (as opposed to "no value" and "unknown value") would be split into something like "certain value" (default) and "uncertain value". In my opinion, the amount of uncertainty ("source certainty") should be left to the reference/content of the reference since capturing that is out of scope for Wikidata as it involves subjective rating. Random knowledge donator (talk) 14:07, 28 June 2014 (UTC)
Seems like my inquiry was not successful once again. Still, I think this is a fundamental problem. I do not demand that the issue has to be solved right now but it needs to be addressed. However, the only outcome of my question is that no one really cares. I refrain from editing data as long as there is no strategy to resolve such a fundamental issue. Random knowledge donator (talk) 08:46, 2 July 2014 (UTC)
Random knowledge donator, how do you expect it to be successful if you don't file a property proposal with whatever property you think it could help you model uncertainty? I agree that it would be nice to have a confidence option for sources, but I am not the one setting the priorities, and I also think that for now we can do that with a property or a qualifier, so we can learn about the needs and possible uses.--Micru (talk) 09:31, 2 July 2014 (UTC)
Repeating myself: Personally, I do not think a property is appropriate. I would be fine if someone would explain how a property would solve the issue. Random knowledge donator (talk) 09:35, 2 July 2014 (UTC)
Random knowledge donator, if we create a new property it could be used as a qualifier: [qualifiers] expand on, annotate, or contextualize beyond just a property-value pair. It is not the same saying "date of birth:1850" than "date of birth:1850" with qualifier "source certainty:low". Both statement and qualifier form a whole and the statement is incorrect if you don't take both into account.--Micru (talk) 10:48, 2 July 2014 (UTC)
I really appreciate your answers and understand your argumentation. However, having a "source certainty" property involves subjectivity by rating the amount of certainty of a source or the fact stated by the source (which even are two different things but that is more of a different story). And which values would be allowed for "source certainty"? Low, normal, high, very low? Ultimately, I would not support having such subjectivity in Wikidata. How is one supposed to rate the certainty of a reference anyway? That is a very scientific matter. In my opinion, the amount of uncertainty should not be subject of Wikidata - however, having a qualifier like "is uncertain" pointing to a boolean "true" seems pointless as well. Random knowledge donator (talk) 11:42, 3 July 2014 (UTC)
Random knowledge donator, when there is source uncertainty it happens mainly because of two reasons: either the source is stating their self-assessed level of uncertainty, or the circumstances do not allow to consider properly the information contained in the source (physical support degradation, obsolete methodology, wrong assumptions, etc). You could generalize both cases with a general "sourcing circumstances" qualifier with objective values like: significant self-assessed uncertainty, incomplete source, source ambiguity, etc. To model information that is disputed by other sources we already have statement disputed by (P1310).--Micru (talk) 08:01, 4 July 2014 (UTC)
OK, I get the point. Still, I have concerns though. Sorry! First off, a generic property is not really usable since users need to figure out upfront that (a) the property exists, (b) it is the one they are actually looking for and (c) what values are supposed to be used for the property. The concept is just really hard to understand resulting in the property not being used at all. And what data type would the values of "sourcing circumstances" have? Are these supposed to be items, individual text or something else? Apart from that, one needs to be aware of the properties ("source uncertainty", "disputed by" and whatever is there and there to come) that mark uncertainty when querying to be able to filter those values eventually. And in the end, still, I think it involves too much subjectivity and detail. How can I judge that a source is incomplete, outdated or whatever? Yes, there are those really obvious matters like the flat earth theory - however, there are sources with much more subtle issues and the reason why a source may be regarded uncertain can be of diverse scientific matters and I would probably not put my head above the parapet and ascertain a reason why a reference may be regarded uncertain. Instead, I would recommend having a look at the original reference to the reader. Even more, in a secondary source, the reason why something is uncertain may not be supplied at all, like for the two dates of the example in the initial post. I am afraid, the concept of using one or more properties to mark uncertainty, still, seems too complex and - please, excuse me - naive. However, I think the two of us are not getting towards a solution here... what about the developers anyway? Random knowledge donator (talk) 07:22, 8 July 2014 (UTC)
I would like to see something done with qualifiers first to see how it is being used. We can then decide about what to do next and if it is worth investing more time into and if it is worth complicating the user interface and data model for it. --Lydia Pintscher (WMDE) (talk) 09:21, 8 July 2014 (UTC)
A rank "uncertain" would be nice, but I do not know which property could represent that... suggestions welcome, Random knowledge donator. --Micru (talk) 19:28, 11 July 2014 (UTC)
No offense, but waiting for "something being done with qualifiers" is not really helpful. Personally, I do not see a sane way to get that resolved with qualifiers (see all my statements). I would rejoice if there is... Using an additional rank is problematic since that would interfere with the original concept of ranks (see discussion on the corresponding help page).
Still, I stick to another snak type technically being the most sane solution. If there is another method to flag statements - fine. But regarding qualifiers: Qualifiers do not allow flagging since snaks always consist out of a property and a specific value (unless you choose another snak type - you get the point...). You would need to restrict the value to one particular item which is true (Q16751793). Regardless of true (Q16751793) being a strange item, the method would be too technical, too complex and not prove usable since, even more, you never would use False (logic) (Q5432619) for a property "is uncertain" - instead, you would just not assign the qualifier.
I really think that I made my point clear in all the lines above. If you want to see something made up with qualifiers - I cannot offer that since I am not convinced of that being a proper solution; And since nobody else seems to be interested in specifying uncertainty, we can also wipe that topic off the table since "doing something with qualifiers" is unlikely to happen unless you guys take action and figure out a proper solution.
If there is another proper solution (in terms of being logical and usable), yes, I would gladly accept it but, to me, it seems like uncertainty was not really considered in the original concept of the software. However, being labeled a "knowledge base" in contrast to a "fact database", I would suppose modeling uncertainty should be a core concept of Wikidata. Random knowledge donator (talk) 10:17, 16 July 2014 (UTC)
@Random knowledge donator: The other day I was doing some tests with a qualifier "type of statement" to specify uncertainty, universal quantification (∀) and existential quantification (∃). All these options are necessary (perhaps integrated into the software) if some day we want to move away from mere fact collection into the "knowledge base" realm. You can also do some experiments (create properties and items as needed) on the test instance of wikidata. See for instance:
--Micru (talk) 10:58, 16 July 2014 (UTC)
I get the point, but still not convinced, sorry. That seems to capture the logic but for the price of poor usability. "type of statement" is far too generic for users to be aware of using it for specifying uncertainty. Given the huge amount of data that is supposed to be managed in Wikidata, Wikidata cannot rely on experts that have dug into the concepts. I think, being able to represent uncertainty should be as obvious as possible. If not, users will simply not enter data or, probably even worse, enter data as if it was not uncertain... But maybe/probably I am the only one who regards that as important. Random knowledge donator (talk) 09:44, 17 July 2014 (UTC)
@Random knowledge donator: "Uncertainty" is not the only meta-information that statements require. There were users also requesting for a system to protect statements, maybe both features belong together, I don't know. But without testing first what is needed we will not be able to know what to ask for.--Micru (talk) 09:57, 17 July 2014 (UTC)
(Sorry for editing in an archive, but I think this issue is more important.)
I agree with you on whether Wikidata is broken in this respect, it is a lot of things that isn't correct when it comes to handling of simple values values. I'll add bits and pieces as I read your text, but the general idea goes like this
  • Our mental model of an value is quite complex, but we must represent it in some simple way
  • Our value can be a range or a bag of values, probably also an ordered set of values
  • Our value can have several uncertainties and error sources attached to them, and two values in a list might not use the same error model
  • Any value should refer to some kind of datum, but values that share the same dimension might be compared (not always, we know from sources how some relates to each other but we don't know their absolute value)
Let me give an example: A box can have 3 lengths, those are width, depth and height. We could call them extent. We could then say
width: a
depth: b
height: c
or slightly better
extent: a
extent: b
extent: c
or perhaps even for a list of values
extent: {a b c}
If a, b', and c is 1, 2 and 3 then it might be valid to say
extent: [a c]
Those two last forms are very important if you want to keep the values together in a statement, and they are ordered and unordered sets. The first form is typically called a seq in rdf and the last form a bag.

But the values themselves (the a, b', and c), what if we need to model them more accurately? If we have simple values as the main snak then we can describe that value by using qualifiers, that is it is a reified statement anyhow. But if we have separate values inside a bag or seq inside a main snak, then we need to create the values as blank nodes themselves and we put the additional stuff inside that blank node. The qualifiers for the statements as in the Wikidata UI will then refer to the whole bag or seq of stuff, while we keep the very specific additions inside the blank node. We sort of add another level of qualifiers.

(Note that we can add qualifiers to describe uncertainty for a value in a statement, but when that statement is multivalued this might not be apropriate.)
So in your case with Abraham von Freising (Q330885) you will have
died: { "7 June 993" "7 June 994" }
Simply two dates, no mystery at all except for two dates where most people would expect one. That can be solved with a reference to some publication that describes the situation. It will although be troublesome in some context where you want to use a single value.
Messing around with this opens up some quite funny simplifications. What about open ended intervals for a birth date? The nice thing with this is that we can say something about a value without spiraling down the we need yet another special property.
There is also a situation where a statement holds a reference to a vector. I played around with various representations of that, it seems solvable. Some very important cases are where there exist some kind of statistical analysis, like a w:en:Five-number summary. Those represents something that would probably have been squashed into a single value in the present model.
There are several core concepts that should be implemented, and it should be possible to either write some parseable strings or it should be possible to build them some other way. Personally I like parseable strings, take a look at w:en:Well-known text for example.
I think some of the problems regarding the modeling of this is lack of expertise in statistics, probability and real life wetting and analysis of data. It was simply expected that this was a simple problem with simple solutions, but it is not, it is quite complex. Jeblad (talk) 21:01, 17 August 2014 (UTC)
Thanks alot Jeblad for the extensive write-up. I agree with your concept of modeling. Featuring collections of values/multivalued statements would be the most sane way of representing uncertainty when it comes to representing possible alternatives, like for the dates in the example. It would be a step into the right direction although there would still be the need for a solution regarding uncertain values without alternatives (assumptions), like the other example of "someone might have been someone else". However, with multivalued statements, the impact on the data model would probably be quite huge and according to the attention the topic receives, I am convinced that this will not be implemented in the far future and the problem will not be highlighted before there is a lot of misleading data in the database already... Random knowledge donator (talk) 09:56, 25 August 2014 (UTC)
@Jeblad, Random knowledge donator: I've got these ideas, but I don't know if that would be practical. This is also related to some properties that take as value "items", but sometimes with just a string would be enough (P:P1420, I'm looking at you...).--Micru (talk) 13:51, 25 August 2014 (UTC)

Adding more expressive features to Wikidata

Hi all. @Jeblad, Random knowledge donator, Micru: Jeroen asked me to comment here, so here is my attempt at an answer. What you are discussing has many different aspects and the discussion has already turned into something rather hard to digest. I might be missing some of the proposals. But on a general note, in order to ever achieve a result in such discussions, it is important not to come up with new examples in each reply. You will always find something else that does not work. A better approach would be to make a list of use cases (in the form of statements that one could want to make). Then one can decide if we want to support these statements or not, whether we already have a way to model them or not, and whether we want to have new features in some future to capture them or not.

I hope that we all agree that there will always be knowledge that cannot be captured accurately in a computer, whatever tool or system we use. But I have a feeling that not all of us might fully be aware of all of the reasons for this. I guess most of us are aware of several kinds of "knowledge" that are clearly out of scope for Wikidata (and maybe any computer):

  1. Beliefs and feelings. One could call this "vague knowledge" but it's not the kind of vagueness that the discussion here was about. It's highly personal and not something that people can easily nail down even in words, not to mention statements in a fixed format.
  2. Understanding. Somebody who studied, e.g., Wittgenstein's philosophy for ten years may (hopefully) have acquired a kind of knowledge that is more than a mere collection of facts. Learning all the facts by heart won't be enough to obtain this. Deep understanding leads to new ways of thinking. This cannot be captured in a database of any kind, since databases don't think.
  3. Process knowledge. Knowledge like "how to knit a pair of socks" is process knowledge that involves a lot of implicit skills that one gets only by practising (if you ever read a written manual on knitting, you know what I mean). This type of knowledge is largely out of scope for Wikidata.

One could add more, but this is only to clarify that some things we might call "knowledge" are without dispute beyond our reach. But even if we restrict to a "formal" kind of knowledge that seems more liable to machine representation, we have to be aware of certain limitations:

  1. Computability and undecidability. There are many fully formal systems of knowledge with a clear and "mechanical" meaning that cannot be fully implemented in computers. These insights go back to Turing and Goedel: there are mathematical functions that no computer can compute and there is no computer that can draw all logical conclusions.
  2. Practicality and complexity. Even if a computer can compute a desired answer in principle, the task might be extremely hard in the sense that there cannot be any tractable algorithm for solving it. Such hardness results can be shown by mathematical proof.

How do these things affect us here? The answer is that answering queries (as used in the examples of Random knowledge donator) is a computing task. When we make the knowledge model more powerful, this task will become harder, maybe even undecidable. Then we have managed to express the world in greater detail and yet will not get the answers that this formalised knowledge should give us. Since Wikidata is a data management platform and not a theorem prover, this would not be a very good situation to be in.

We really want to be able to answer all queries in reasonable time (low complexity). I am completely missing this whole aspect in the above discussion (maybe I overlooked it?). It seems to me that you are discussing features solely from the viewpoint of more powerful modelling. "I need to express this, hence we should add a new feature." You seem to presume that, once you have a way of expressing something in the system, anything that a human can reasonably conclude from it would also be computable with some algorithm. Goedel taught us that this is not the case. Turing taught us that even when it is, it might require exceedingly high amounts of time and memory.

I am emphasizing this here because already the proposed features are known to make query answering much harder. The first of the use cases is a requirement for disjunctive information: you want to say that one of two possible things might be the case. This makes query answering intractable (NP-hard in the size of the data). If you combine it with additional features, it can even be much worse (e.g. the lightweight ontology language OWL EL, which is in polynomial time, jumps to ExpTime if you add disjunction).

If you go further into the modelling of "vagueness" then you will encounter probability as a useful modelling paradigm. Complexity of query answering in probabilistic databases (even with very simple notions of probability) is again well known to be computationally intractable (#P even for simple scenarios with far-reaching assumptions of probabilistic independence). If you study the literature on probabilistic databases, you will see that there is more than one way of adding probability to a database.

What I am saying here is that this is not a one dimensional problem, where we just have to add as much features as possible until we are happy. Every new feature has a cost, which has to be paid by all users and consumers of the data. The above discussion revolves around the simplest of the problems: how could we best represent (or even just: write down) this knowledge. This is the first and most simple step. A much harder question is how this information should be taken into account when answering queries. Some might say that this is left to the consumer, but then we might still see the same misleading query results that Random knowledge donator argued against (and I fully agree with her/him on this).

How to move forward. It is important to consider expressive power and query answering behaviour together. The world of databases offers many query languages, many ontology/constraint languages, and many data models from which we can get some understanding on how to structure this discussion. For example, adding disjunction to a database ("born in 943 or 944") is a completely different matter than adding probability. The algorithms you need are quite different. For other statements, like "he might have been Robin Hood", I don't even know what to make of it as a human. What kind of information does this give me? How should this influence query results? A much more detailed description of the desired behaviour would be needed to decide if this can be supported and what this would require. Anyway, my main point for now is that these things are completely and fundamentally different and should be discussed in separate threads. As it is now, this discussion will never lead to anything that can be implemented. (Of course, the heading chosen for this discussion suggests that there is not much to discuss or implement here anyway, don't you think? Maybe it would help to take a more constructive perspective if you really want to have some impact.)


Markus Krötzsch (talk) 14:18, 31 August 2014 (UTC)

P.S. I am notoriously bad at replying to talk page discussions. Apologies in advance for ignoring any replies given here. If there is a concrete proposal (which use cases should we support, what behaviour is desired, maybe what technology could be used) then feel free to give me a shout and I will have a look.

The problem is that expressive power and query answering is jumbled together and ends up as a faulty data model. Correct the value model and then add necessary constraints to make it the queries feasible. Now the query model creeps into the value model and makes it very weird. The values already have limits or error bounds, but they are messy and ildefined. I wonder if the problem arose from an attempt to make query answering easy, and because of this error bounds was added as a limited and easy to implement solution. The query problem is although not the important one, but the data representation is. Wikidata shall represent data used in other projects, that is the primary purpose, making queries about those data is secondary. It is although important to answer questions, but those questions could easily be limited to the plain and simple values and not use the underlaying value distributions. If we don't use those distributions we make a choice not to do so, but if we model them we can say which kind of distributions we do want to use and which we don't. In the case of disjunctions we simply state that we don't want to consider error distributions at all. That is not the case now, because now we have jumbled simple values and error bounds together in one messy representation. Jeblad (talk) 14:37, 6 September 2014 (UTC)
Note that finite enumerated sequences gives finite answer, while finite continuous interval can give infinite numbers of answer unless rewritten. Unbound distributions will more often than not give infinite numbers of answer and be NP hard. Real errors are often both continuous and within bounds, and answer is often finite after an rewrite of the query. Ie solve the bound case first (whats attempted in the current model) and then go for the unbound one. Jeblad (talk) 14:48, 6 September 2014 (UTC)

Money for numeric data type with units


It seems that several contributors are waiting on that data type to reach the complete state of basic functions for WD. It seems too that the resources of the development team are limited to treat that subject in the near future.

What do you think of doing an internal funds request to hire an external person (temporary) or to pay a current contributor having the skills to develop this data type in a short time ? Snipre (talk) 11:13, 5 September 2014 (UTC)

This will probably take us more time than we have if we want to do this properly. (I do this for another non-profit in my spare-time.) Instead I am looking for a small team of students to help out with this. If anyone is interested or knows someone please let me know. --Lydia Pintscher (WMDE) (talk) 20:46, 10 September 2014 (UTC)
@Lydia Pintscher (WMDE): Hello, yes you already explained me your plan but looking for someone working for nothing is more difficult than finding someone who will be payed. So the question is: if you have money do you will find more easily a group of students or a freelancer ? What is more difficult: to find someone which has the skills or someone who is ready to do for nothing ? If the problem is the second one having money will solve your problem. The problem is not to put the pressure on the development team but to help it and often money is the key factor to solve a problem in a faster way. Snipre (talk) 10:02, 12 September 2014 (UTC)
If budget is the problem, I'm also in favor of starting a fundraiser to get this done because it is crucial for many items and possible collaborations like this one. Plus I'm sure on Commons they would appreciate to be able to enter object dimensions.--Micru (talk) 12:53, 12 September 2014 (UTC)

AbuseFilter and redirects

AbuseFilter cannot log and tag creating redirects. But if you examine a revision (example) and load the filter, it says "The filter matched this change." Matěj Suchánek (talk) 08:27, 6 September 2014 (UTC)

Just to add: this apparently affects Q-items only. The filter works for other namespaces such as User and Wikidata. [3] whym (talk) 02:21, 7 September 2014 (UTC)
Seems to me this is due to the api-approach. Somehow these edits do not reach the abusefilter "live". Kind regards, Lymantria (talk) 11:00, 8 September 2014 (UTC)
Yeah we need to look into this. Can one of you please file a bug on Thanks! --Lydia Pintscher (WMDE) (talk) 20:47, 10 September 2014 (UTC)
@Lydia Pintscher (WMDE): Done. Matěj Suchánek (talk) 14:10, 11 September 2014 (UTC)

Property reference counting seems off

Have a look at date of death (P570) @ Edward Nelson (Q381346). For me, at least, it says "1 reference". But there are two, surely (imported from Wikimedia project (P143) and reference URL (P854))? It Is Me Here t / c 11:22, 11 September 2014 (UTC)

There is only one reference with two properties. VIAF ID (P214) has three references. --Succu (talk) 12:46, 11 September 2014 (UTC)
Sorry, I don't understand. As far as I can tell, Q381346 currently says (rightly or wrongly) that the date of death was (a) as stated on en:, and (b) as stated at [4]. How is that not two references? It Is Me Here t / c 20:56, 13 September 2014 (UTC)
Hi, you must know that one reference can consist of several claims. So this reference here consists of the two claims "stated on en: and stated at [5]". Thus the reference says that the statement is imported from enwiki but was originally stated in the other website. -- Bene* talk 21:47, 13 September 2014 (UTC)

Feature request: Commons filepages that are referenced by Wikidata items

There are various properties which Wikidata items can have, that store the name of a media file on Commons as their data type.

For a full list, see: d:Wikidata:Database_reports/Constraint_violations/All_properties and sort by 'type'.

Shouldn't we be noting that a File is being identified in this way on its Commons file pages? (Similarly to the way we note when a file is being used on a wiki).

I proposed a bot should be created for the task, adding a template to the Commons file page, but it has been pointed out to me that, with almost 600,000 file pages to add the template to, adding such templates would take about 2 months for a pywikibot with proper throttling (6 edits a min) [6], never mind keeping them updated.

@Zhuyifei1999: therefore suggested it would make more sense to modify the MediaWiki code instead.

Thoughts? Jheald (talk) 14:25, 12 September 2014 (UTC)

@Jheald, Zhuyifei1999: That is bugzilla:46358.--GZWDer (talk) 09:07, 13 September 2014 (UTC)


Hi, it seems the Xml-serialization is broken:

<?xml version="1.0" ?> 
 <api success="1">
   <Q17825490 pageid="19423476" ns="0" title="Q17825490" lastrevid="155531600" modified="2014-09-03T05:49:34Z" id="Q17825490" type="item">
   <aliases /> 
    <label language="sv" value="Thalictrum spurium" /> 
   <descriptions /> 
   <claims /> 
    <sitelink site="svwiki" title="Thalictrum spurium">
     <badges /> 

The Entity-node is replaced with a Q17825490-node. --Succu (talk) 06:27, 3 September 2014 (UTC)

And another old bug is back: Internal error in ApiFormatXml::recXmlPrint: (P225, ...) has integer keys without _element value. --Succu (talk) 12:07, 3 September 2014 (UTC)
Thanks! Katie is looking into it. We have bugzilla:70299. The other one is likely related. --Lydia Pintscher (WMDE) (talk) 15:56, 3 September 2014 (UTC)
News about serialization problem? --ValterVB (talk) 16:46, 5 September 2014 (UTC)
@Lydia Pintscher (WMDE): the problem with xml format, (but also with json) is a bug or is a new format? If is a bug do you know when it will fixed? --ValterVB (talk) 17:35, 10 September 2014 (UTC)
My bot depends on xml deserialization of an item. So I can do nothing at the moment. --Succu (talk) 17:52, 10 September 2014 (UTC)
It is a bug, yeah. I am on vacation at the moment. @Aude: Can you give an update on this? --Lydia Pintscher (WMDE) (talk) 20:44, 10 September 2014 (UTC)
@Lydia Pintscher (WMDE): probably also @Aude: is on vacation. :) --ValterVB (talk) 07:07, 14 September 2014 (UTC)
Update: Daniel is working on it :) --Lydia Pintscher (WMDE) (talk) 15:33, 15 September 2014 (UTC)
Update 2: Katie will try to get it live tonight. If that doesn't work then it'll be on Monday. --Lydia Pintscher (WMDE) (talk) 10:39, 18 September 2014 (UTC)
We deployed the fix! While investigating the issue, we also found some inconsistencies (bugzilla:70531) in the xml format (e.g. <claim id="P1464"> when it should be <property id="P1464">, and there was an extra <claim>property</claim>). We fixed those issues, and added tests that should help avoid future breakage in the xml format. Aude (talk) 15:07, 18 September 2014 (UTC)
Thank you, Aude. --Succu (talk) 15:55, 18 September 2014 (UTC)

<?xml version="1.0"?>
<api servedby="mw1141">
  <error code="internal_api_error_MWException" info="Exception Caught: Internal error in ApiFormatXml::recXmlPrint: (P1151, ...) has integer keys without _element value. Use ApiResult::setIndexedTagName()." xml:space="preserve" />

-- Vlsergey (talk) 03:10, 17 September 2014 (UTC)

Vlsergey, see #XML-Serialization. --Succu (talk) 06:20, 17 September 2014 (UTC)

Datatype of snak is missing

For any snak (mainsnaks, qualifiers, reference) datatype field is missing. -- Vlsergey (talk) 03:18, 17 September 2014 (UTC)

I've filed bugzilla:70995 for it. --Lydia Pintscher (WMDE) (talk) 10:43, 18 September 2014 (UTC)

Wrong AppleTouchIcon

On my ipad, the bookmark icon is the Wikipedia "W" while it should be the Wikidata icon (cf. Manual:$wgAppleTouchIcon). Thanks. — Ayack (talk) 18:58, 30 July 2014 (UTC)

The same on my Android tablet's Opera, which, as far as I know, doesn't use the Apple touch icons but some other files that look very much the same in the end. --YMS (talk) 19:39, 30 July 2014 (UTC)
Does anyone know which icons those are? The Favicon here seems fine so I assume it is not that. --Lydia Pintscher (WMDE) (talk) 10:37, 4 August 2014 (UTC)
I still don't know which file Opera uses (did not find anything in the page source code - is it actually the AppleTouchIcon?), but the Apple one is this file. --YMS (talk) 10:53, 4 August 2014 (UTC)
Ok thanks. Will investigate some more. --Lydia Pintscher (WMDE) (talk) 13:30, 11 August 2014 (UTC)

Since no change happens, I reactivate this thread. Should I file a bug? Thanks. — Ayack (talk) 09:45, 18 September 2014 (UTC)

I've filed bugzilla:70996. --Lydia Pintscher (WMDE) (talk) 10:46, 18 September 2014 (UTC)

Wrong rounding by displaying time values with precision decade

Hey. There is a bug by displaying time values with precision 8 (decade). The time value +00000001994-01-01T00:00:00Z with precision 8 is shown as 1990s, ok. But the value +00000001995-01-01T00:00:00Z with precision 8 is displayed as 2000s.--Pasleim (talk) 09:10, 12 September 2014 (UTC)

I found a similar problem: Year 100 and "century" is displayed as 1. century, year 101 and 120 too, 150 is 2. century. Year 101 is clearly 2. Century. (First century 0-99, second 100-199 etc.) However it is not possible to enter "2" and choose century to get 2. century in the display. Furthermore it is not displayed in German as I´ve chosen in the language selector.Acts of Peter (Q1231858)--Giftzwerg 88 (talk) 11:14, 13 September 2014 (UTC)
Yeah we need to fix that one -.- bugzilla:64742 --Lydia Pintscher (WMDE) (talk) 15:37, 15 September 2014 (UTC)
I found another similar problem: it is difficult to enter time values in BCE, the user interface does not provide sufficient means. Also 550 BCE and century adds up to 5. century BCE but it is 6. century BCE. --Giftzwerg 88 (talk) 14:10, 19 September 2014 (UTC)

Wikinews Phase 2

Is it ready to be enabled?--GZWDer (talk) 12:32, 19 September 2014 (UTC)

My feeling is that there is still too much trouble with phase 1 and how people want it to work. --Lydia Pintscher (WMDE) (talk) 10:08, 20 September 2014 (UTC)

Add a statement on initial page creation with Special:NewItem?

Currently Special:NewItem can be prefilled with

Could we add the possibility to add directly a statement as well? Sample instance of (P31) = human (Q5) ?

This would give:

Or does it already exist? --- Jura 15:17, 20 September 2014 (UTC)

Hey :) How/where would you want to use it? --Lydia Pintscher (WMDE) (talk) 16:09, 23 September 2014 (UTC)
Today I could have usedófilo&label=Teófilo&description=male+given+name&property=P31&value=Q12308941
when making Q18121215. --- Jura 19:05, 24 September 2014 (UTC)
Ok that makes sense. I don't think we'll get to that anytime soon but maybe someone else will pick it up so feel free to file it on --Lydia Pintscher (WMDE) (talk) 13:40, 25 September 2014 (UTC)

Error message when trying to create a new property with HHVM activated

Hi, when I try to create a new property with HHVM activated, I've got the following error message:

Wikimedia Foundation

Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes.
If you report this error to the Wikimedia System Administrators, please include the details below.
Request: POST http://www.wikidata.:org/wiki/Special:NewProperty, from XXXXXXX via cp1053 cp1053 ([XXXX]:3128), Varnish XID 2642604231
Forwarded for: XXXXXXXX
Error: 503, Service Unavailable at Tue, 23 Sep 2014 08:37:04 GMT

It works fine without HHVM. — Ayack (talk) 08:46, 23 September 2014 (UTC)

I don't know if there are a link, but property number has just jumped from P1484 (deleted) to P1529... — Ayack (talk) 08:55, 23 September 2014 (UTC)
@Ayack: I can not reproduce it in Wikidata or test Wikidata.--GZWDer (talk) 11:12, 23 September 2014 (UTC)
There are some known issues of hhvm crashing with some Wikibase code, due to a memory issue or something. We are working to add work around in Wikibase and investigate the problem in hhvm. Aude (talk) 12:05, 25 September 2014 (UTC)
@GZWDer, Aude: Ok, thanks. Since I have some other random problems with HHVM, I disabled it on Wikidata. — Ayack (talk) 12:37, 25 September 2014 (UTC)
I had created a series of items when HHVM got activated. Some of them couldn't be edited and had to be deleted: WD:Requests_for_deletions/Archive/2014/09/22#Deletion_request_.28Jura_1.29. --- Jura 23:25, 25 September 2014 (UTC)

The same issue (503 but different message) when I want to open e.g. Wikidata:Database reports/Constraint violations/P373‎‎ with HHVM. JAn Dudík (talk) 05:24, 26 September 2014 (UTC)

We are still looking into the issue and if necessary, might disable the hhvm beta feature until we can address the problems and be more confident that hhvm works okay for Wikidata. We've already made a lot of progress, but the fixes are not deployed yet. Aude (talk) 16:02, 26 September 2014 (UTC)
We disabled the hhvm beta feature for now. It might be as soon as Tuesday or Wednesday that we can try it again, or might take longer. hhvm is still enabled (for everyone) on test.wikidata. Aude (talk) 10:26, 27 September 2014 (UTC)
hmm.... I see that hhvm is still listed in beta feature preferences, but am sure the feature itself is actually disabled. Aude (talk) 10:33, 27 September 2014 (UTC)