Wikidata talk:Showcase Queries

(Redirected from User talk:Wittylama/Showcase Queries)
Latest comment: 1 year ago by PAC2 in topic RfC

Questions/comments about critieria edit

If referring to a specific criteria point, please specify

  • 1.1/1.2 (Complete/Knowable) If we take a query that asks questions like listing all cities with over a million inhabitans that have a female major, I'm unsure how we would go about whether we would go about evaluating whether those two criteria are being meet. ChristianKl08:44, 11 May 2019 (UTC)Reply
    • I would say: both of those two criteria are “knowable” - in the sense that there IS a “correct” answer to each. The nuance comes in how to define “city”, and that’s where it’s important evaluation also checks the quality of the sourcing for the statements, and the clarity of the scope of the query. The quality of the sourcing is referred to in criteria 2.2.1. - which I have just amended to be more overt in requiring relevant and reliable sources (diff). The clarity of the scope of the query is accounted for in the “nomination statement” (that it should include a clear explanation of the question that it is trying to answer [also criteria 3.2. - which calls for a title as a ‘comment’ within the query code]. The “completeness” of the answer is much the same answer and relies on choosing appropriate/relevant and reliable sources. We are not trying to pretend that wikidata itself knows the complete answer - we merely can represent/structure the knowledge that is provided to us through the sources. Does this answer your question adequately? User:ChristianKl? - Wittylama (talk) 09:47, 11 May 2019 (UTC)Reply
      • If someone would presently nominate a query like that how would you go about verifying that it's complete when someone nominates the query? ChristianKl10:07, 11 May 2019 (UTC)Reply
        • Yeah, it’s tricky. Mainly because “complete” is contextual to the topic and there isn’t a way to “automatically” check such a thing across all the potential topics that could be nominated. My first thought is to look at what the English Wikipedia “featured topic” criteria say. The relevant part is:
”(c) All articles or lists in the topic are linked together, preferably using a template, and share a common category or super-category.”
(d) There are no obvious gaps (missing or low-quality articles) in the topic. A topic must not cherry pick only the best articles to become featured together.”
Thus, they check completeness by looking for a consistent element (in their case it would often be a navigation template) and then it comes, in practice, to the nomination statement. I note that all the current nominees (listed here) start their nominations by a little explanation of how the scope of the set they’re nominating is “complete”. For example “The London station group is a group of 18 railway stations served by the National Rail network in central London.” or equally, “This topic comprises all of the torpedo cruisers built by the Italian Regia Marina from the 1870s to the 1890s.” Perhaps the explanation of the “nomination statement” here should be made more explicit to indicate that a nominee should not only describe their scope but also explain why they believe it to be “complete” - ideally by referring to an applicable third party reliable source. Wittylama (talk) 20:53, 11 May 2019 (UTC)Reply
  • That sounds like a topic such as the list of female majors of cities with more then 1 million inhabitians couldn't be a showcase query when there's no existing external source that already lists the female majors. This rule seems like it says that the most valuable queries of Wikidata, those that answer questions that couldn't be answered without Wikidata can't be showcase queries. Is that your intention? ChristianKl14:35, 12 May 2019 (UTC)Reply
  • While I said "ideally by referring to an applicable third party reliable source" I did not mean for you to understand that as a mandatory element for proving "completeness". If there is such a thing for a given topic, that's ideal. But the rest of my comment was all about how the nomination statement would need to explain the context and why the nominator felt justified in saying it is complete. Wittylama (talk) 17:52, 13 May 2019 (UTC)Reply
  • 1.3. (Comprehensive - Broad): Should there be a set minimum number of items, an/or a minimum number of statements for this criteria? My first instinct is no. I think there might also be a risk in the current text of ruling-out any “short but beautiful” queries - eg. things which have a scope that naturally limits to fewer than 50 results (e.g. prime ministers of Australia...) I wouldn’t want to exclude those automatically, so is there a better way to frame it? Wittylama (talk) 23:06, 5 May 2019 (UTC)Reply
I agree that a numerical limit would rule out some valuable queries. ChristianKl11:48, 10 May 2019 (UTC)Reply
  • 1.4 (Stable): While this makes sense for some type of queries, the power of sharing data that is updated (which is the sort that usually get obsolete in projects with too few users to keep everything up to date) get lost. Let's say for a list of members of a national parliament. This data is expected to be moving and when it is kept up to date, it is when we really get the benefit of only having to update it in one place. If we disqualify such queries, we only showcase a fraction of why Wikidata is useful. Ainali (talk) 23:08, 29 January 2021 (UTC)Reply
    • I'd say that, as with most things - the answer to this would be a combination of "the answer depends on context" and "the community would decide"! For the example you give: members of a national parliament: I would suggest two possibilities. 1) the scope of the query that is listed as 'showcase' could be restricted to a specific time period (e.g. 1950-2020). and B) there could be some kind of 're-certification' process of the award, each time the person maintaining that data wants to expand the recognition to a larger scope (e.g. after the next election). Wittylama (talk) 11:43, 31 January 2021 (UTC)Reply
      • That's one way to think about it, but I would rather see this as "maintainability" or "has active maintainers" as what is most important about any data is that it is up to date/current. That kind of indication would make data more reusable for Wikipedias. Sure, it is fine to have a list on how the previous mandat looked like, but where we would really shine is if you find the current set of parliamentarians whatever language version you happen to read. Ainali (talk) 15:03, 31 January 2021 (UTC)Reply
        • If the criteria required demonstration of “active maintainers” then I would worry that the recognition becomes partially about “who” is associated with the query, rather than merely the quality of the content. I don’t think we want to build a culture that requires a declaration of implied “ownership” of content. Wittylama (talk) 15:26, 31 January 2021 (UTC)Reply
          • No, and that is a really good point. That is why I first suggested "maintainability". And to show it, there should be a documented (and preferrably somewhat automated) process for keeping the list in top notch. For example, where can you see the data is updated? Is it possible to build a tool that notifies somewhere that data is outdated or that is actually making the edit (if the source is good enough to be trusted). For parliamentarians it could be an RSS feed with updates, or a regular check from a tool on toolforge that looks if the official API has been updated once per day. Ainali (talk) 17:12, 31 January 2021 (UTC)Reply
  • 2.1.1. (Quality - Items - Labels): How many labels/aliases/description languages would be needed? There could be a fixed minimum (like, the UN languages) or it could be left entirely to context. Also, should "English label" be made the mandatory minimum? Wittylama (talk) 23:06, 5 May 2019 (UTC)Reply
While I consider English to be important for matters of setting policy in Wikidata, I don't think it's important in this case. If someone writes a query about Chinese provinces that provides information that's valuable to Chinese speakers I see no reason why the query would need to have all labels in English filled. Maybe there could be a standard that there's at least one language where all the results have labels and descriptions. ChristianKl11:48, 10 May 2019 (UTC)Reply
I'm not sure why we need a requirement for aliases here. ChristianKl11:37, 11 May 2019 (UTC)Reply
the text of the criteria reads “Labels, descriptions (and aliases, where appropriate)”. I imagine that in some circumstances are quite important to the usefulness of a query. Perhaps it’s about plants (botanical and common names) or about people who’ve been known as different names over time (just two things off the top of my head). I think the “where appropriate” gives a lot of contextual leeway :-) Wittylama (talk) 20:34, 11 May 2019 (UTC)Reply
  • 2.(Quality - all sub points): Is there a way to easily automate a check in the reviewing process whether these criteria is actually fulfilled? I don't think it would be feasible to manually check each item and statement appearing in what could potentially be very long query results. Wittylama (talk) 23:06, 5 May 2019 (UTC)Reply
  • 4.1 (Useful - Demonstrable): I suspect this is the most strenuous criteria - that says the query must already have been used in a 'real-world' situation. The most likely would be "in a Wikipedia article". However, an alternative lower-difficulty to this could be that it viably COULD be used in an educational setting. This shifts the criteria from being proving a previous use (most likely by providing a URL) to describing an example use-case. Wittylama (talk) 23:06, 5 May 2019 (UTC)Reply
I'd skip this requirement all together. Maybe replace it with "interesting/wow". Because that's what you want the viewer to say. This leaves the community in charge of the evaluation. --99of9 (talk) 11:59, 8 May 2019 (UTC)Reply
I've swapped around the text to make it about "...be able to be used in a public-facing, independently reviewed setting, to educate..." and removed the requirement that it must have already done so. Consequently I've highlighted that or internal-facing usage (for wikimedia content analysis or wikimedia content improvement workflow) is insufficient. Wittylama (talk) 13:55, 8 May 2019 (UTC)Reply

stable link to query results at time of nomination edit

Is it possible to "timestamp" a query - not just a stableURL to the querycode to get a "stable result"? This would allow reference back to the contents at the moment that it was judged to be Showcase-worthy. (and, by extension, compare with any subsequent changes). The best way currently, I suspect, is to use Template:Wikidata list to create an on-wiki static output of the query and THAT will give a stable URL for each revision. However, the constraints of the template/ListeriaBot might limit the flexibility of the Wikidata query service. Wittylama (talk) 23:53, 5 May 2019 (UTC)Reply

As Wikidata improves, the results of the query will improve. I think some of the best queries are not static. --99of9 (talk) 11:57, 8 May 2019 (UTC)Reply
I'm not expecting that the query results *should* be static, but that it's useful to refer to the specific moment when the query was deemed to be "showcase worthy". Of course the "live" query should be highlighted too, but a) for data-comparison purposes and b) for third-parties who may wish to embed/download something that is vetted to be high-quality/complete/vandalism free, it would be important to say "THIS version is checked." Listeria seems to be the best (only?) way to do this...
And the 'stable' criteria is saying that the results shouldn't be in major flux - not that they shouldn't ever change. This is similar to Featured Articles on WP: they can and do still get edited, just they're not expected to be changing by large degrees. Wittylama (talk) 14:55, 8 May 2019 (UTC)Reply

Feedback by Jura1 edit

There was some previous discussion about this at Wikidata:Requests_for_comment/Data_quality_framework_for_Wikidata and its talk page.

Maybe important takeaways from that discussion for this new approach are:

  1. references in result should not be identical for all or most statements
  2. each statement should have multiple references (uncorrelated ones)
  3. the query shouldn't rely on external identifiers to select/de-select
  4. it should be possible to feature results from a potentially infinite group

Personally, (5.) I'm not really sure if it's a good idea to combine "featured lists" and "featured query syntax". --- Jura 08:48, 12 May 2019 (UTC)Reply

Thanks for this context Jura. I've changed your bullet points to numbers, so I can reply point-by-point more clearly:
  1. Good point. I've made this amendment to the proposed criteria on this basis (Diff).
  2. Personally, I disagree that for something to be "showcase quality" then ALL [non-trivial] statements should have multiple independent sources. I think's probably many cases it would be fine to have a single source. The text of the criteria currently says "References to relevant, reliable sources for all non-trivial statements." - which leaves it debatable whether this requires "sources" (plural) for individual statements. Personally I think this is the kind of thing that should be debated based on the context of the specific content of a nomination - depending on how controversial/contested a statement is. Some might definitely call for multiple, and that would definitely be raised by a reviewer if the fact was sufficiently contested. Perhaps in the future this criteria could be revisited to make it more stringent, when Wikidata becomes ever-more-referenced :-)
  3. I'm not sure if I understand the significance of this. But, is it concept not already covered within criteria 1.1. "A query should not arbitrarily exclude or include items."? In any community review of a nomination, discussion of [mis]use of an external ID system would fall within that criteria - no?
  4. Wouldn't a potentially-infinite query timeout the WDQS? That is, unless you put an arbitrary LIMIT clause in it? This might be a matter of semantics perhaps.... "All people, in order of personal wealth" is unlimited, but "top 100 richest people" (or "list of billionaires") is finite. In the SPARQL code this is simply a matter of using the LIMIT function. Easy! It's primarily a matter of the query's scoping statement. Or, is there an example of where "infinite" is indeed beneficial?
  5. Do you mean: you think that a "community recognition of high quality answer to a question" (the items and statements) should be differentiated from "community recognition of a well formed question" (the query)? If so - then I understand that these are two different things: one is 'content' and the other is 'code' (and in a particularly obscure software language too!). However, my argument for combining the two is that: If I've learned anything in this steep-learning-curve of Wikidata is that "how you ask the question" is integral to the quality of the answer you receive. Differentiating a community-recognition for quality-content from one for quality-query is, in my opinion, a "distinction without a difference" - since you can't obtain a "quality answer" without both. On the level of principle I think the combination is important for wikidata's re-use potential (in Wikipedia, and third party places like academia and data-journalism) and on a practical level I have faith that there's always going to be talented people available willing to help 'fix' a sub-optimal SPARQL :-)
Wittylama (talk) 19:40, 13 May 2019 (UTC)Reply
  • About #
    2: I don't think that it applies to statements that generally aren't considered needing sources.
    3: See Wikidata:Requests_for_comment/Data_quality_framework_for_Wikidata#Would_you_add_any_other_dimensions_to_the_ones_already_listed?. I don't think it's covered by the point you mentioned. Maybe not using external-ids in the query would be a good rule for featured queries.
    4: Infinite is the what could be in Wikidata, not what actually is. Christian brought up a similar point above.
    5: I think it's less complicated than that: if you look at featured lists at (en) Wikipedia, the equivalent Wikidata query is generally trivial. For a suitable display in Listeria, sometimes a layer SPARQL needs to be added that isn't particularly "educational".
--- Jura 04:49, 14 May 2019 (UTC)Reply
  • 2: Do you mean, that you think the criteria should specify that each non-trivial statement should have at least two entirely independent sources? If so, I personally think that's too high a bar to make as a standard for all; and the referencing requirements for specific nominations should be taken on a case-by-case basis.
  • 3: I've added "...or use an external ID property for its scope." (diff) as an exclusion to "completeness" criteria.
  • 4: Then, apparently I'm not understanding what you're suggesting. Can you rephrase?
  • 5: What do you prefer? Wittylama (talk) 09:41, 14 May 2019 (UTC)Reply

Excellent idea edit

In Sweden we speak about "base data" datasets that should be used by Governments when doing Open data. I can see Wikidata as an enabler for also doing that for culture data which today is very slow in adapting new technologies or create common datasets. As we in Sweden has created some good datasets in Wikidata and we are starting having a contact with the National Archives I can see

can be good candidates. I have also played around with SPARQL federation Nobelprize.org to check data quality see T200668 and my intention is to do SPARQL Federation for the datas sets above with a project called Tora from the national archives in Sweden see T199977 Combining Show Case Queries with

Salgo60 (talk) 21:36, 20 August 2019 (UTC)Reply

Support edit

I really like this proposition. I think it would be useful to go further male it real. Maybe we need to have something like a namespace or a convention for query pages. It could be something such as Wikidata:Query:. PAC2 (talk) 05:12, 10 June 2022 (UTC)Reply

RfC edit

For your information, there is new request for comments on a related topic : Wikidata:Requests for comment/Documented and featured SPARQL queries. PAC2 (talk) 20:35, 22 July 2022 (UTC)Reply

Return to the project page "Showcase Queries".