Wikidata:Showcase Queries

Lightning talk talk at Wikimania 2019 on the proposal.

Wikidata currently has "showcase items" and "example queries". The former is about individual Wikidata items in isolation, and the latter is about demonstrating features of SPARQL code. However, until now, Wikidata does not have a process to identify, praise, and share to the world its true superpower: the combination of statements from many different items – the ability uncover knowledge spread across the database.

This is a proposal to create Showcase Queries: A community recognition of a high quality query which brings together high quality content across Wikidata to answer a question.

This would be a native Wikidata "featured" award that highlights the special value of the Wikidata's content, and provides recognition to those who work to curate complete sets of high quality information. Furthermore, the recognition would serve both an internal, and external purpose – to provide a form of motivation to ourselves, and also to highlight our best work to the Wikiverse and to the world. A list of showcase queries would also serve as a set of "ready to use" content for those Wikipedia editions willing to incorporate live Wikidata information. When multiple language editions use the same information it would serve as a positive feedback-loop for maintaining the quality of that content.

Context edit

On many Wikipedias there is the concept of the Featured Topic – the purpose is to demonstrate high quality complete "sets". To quote the English Wikipedia description, it is "a collection of articles or lists that represents Wikipedia's best work in covering a subject comprehensively and with items of consistently good quality." (selection criteria). Of course, there are also Wikipedia Featured Articles (selection criteria). Both of these processes are relevant to this proposal and parts have been adopted in the proposed criteria below.
In some cases, the existing Featured Lists on Wikipedias (selection criteria) would make good candidates to start building Wikidata Showcase Queries, as their references have already been checked - they just need importing and properly curating into structured data.

Proposed criteria edit

 

A nomination should include a nomination statement which explains:

  • The context and content of the query;
  • A link to the code ready to be run in query.wikidata.org/
  • A description of scope, explaining the question it is trying to answer;
  • An explanation for how it could be used for educational purposes;
  • A demonstration that it has complete references, labels etc. for all statements included.

In order for a query to be considered a Showcase Query it must pass 4 key tests. It must be be: comprehensive, of high quality, clean, and useful:

  1. Comprehensive. The results of the query should provide a full and true answer to the question being asked. The scope of the query should be:
    1. Complete. There are no obvious gaps (missing items nor statements) in the query result. A query should not arbitrarily exclude or include items, or use an external ID property for its scope.
    2. Knowable. The question being asked must have a scope which makes a complete answer feasible and attainable. A query should not have a scope whose answer is potentially unlimited or undefinable.
    3. Broad. The results of the query should be longer, more detailed, and/or more complex to research than could be feasibly be done manually.
    4. Stable. The results of the query should neither be subject to frequent significant changes nor ongoing edit-wars; although the content may steadily improve over time.
  2. Quality. The content of the query results should be of verifiably high quality:
    1. Items appearing in the query result should have:
      1. Labels, descriptions (and aliases, where appropriate) in each language that is especially relevant to the query.
    2. Statements appearing in the query result should have:
      1. References to relevant, reliable sources for all non-trivial statements. They should be fully formed with applicable qualifiers (for example publication date (P577), archive URL (P1065), or page(s) (P304)). The references for the various statements should not be identical, and none should be imported from Wikimedia project (P143).
      2. Ranked and Qualified as applicable.
      3. Without "ordinary" or "mandatory" constraint violations.
  3. Clean. The query is as simple as is necessary to achieve its goal, and as easy as possible for others to understand and adapt. The SPARQL code of the query should be:
    1. Efficient. It should take the minimum time and computing power to obtain the necessary results.
    2. Explained. The query should include a title as a comment which explains its purpose and scope. The code should include clear comments for each major element, in order for future users to be able to understand why and how it works.
    3. Standardised. It should best-practice coding principles and structures for achieving its purpose.
  4. Useful. The results of the query must serve an actual, not merely potential, purpose to educate or inform. The results of the query should be:
    1. Demonstrable. The query results must be able to be used in a public-facing, independently reviewed setting, to educate or provide information. Use as part of a Wikimedia project content analysis or improvement workflow is insufficient. This could include through use in a Wikimedia project, embedded in a third-party website, used in an academic conference presentation, referenced in a print publication, etc.
    2. Coherent. The query should not arbitrarily combine concepts or demonstrate coding techniques merely to prove that they can be done.
    3. Convenient. The information should be expressed in a manner that is easy for the recipient to understand, such as an appropriate data-visualisation. This should be understood as contextual to circumstances of the demonstrated-use.

What’s missing edit

In order for this idea to move from “proposal” to reality, a few more things would be needed:

  • A nomination page, process, and form. I assume this would look very similar to the “Generic property proposal” process - with a one week minimum discussion time and a transcludable proposal form (equivalent to the property proposal template).
  • A reviewing template
    • In particular, an automated process to easily check that all "quality" criteria – in particular that all the relevant statements have well-formed references.
  • A method to share the specific query result as it was, on the day the recognition was confirmed. (The stable link to the result).
  • A graphic that can be displayed by the successful nominee(s) on their userpage/project page. Each Wikipedia has a “featured article” icon, so we’d need something equivalent to that.
  • Once there are a few approved, a discussion about whether there should be a space on the main page (and/or elsewhere) to showcase the latest showcase queries (equivalent to “picture of the day” on Commons, article of the day in Wikipedia etc...)
  • An RFC to adopt the new process into policy.

Suggested Examples edit

Joalpe & Ederporto - would you two be interested in writing a 'nomination' statement for this query for this [proposed] award? I don't really know the best way to go forward in making this proposed award an actual thing, so my best guess is that it is useful for the community to see what a "real world" proposal would look like. If you like the idea of this award, I would be really interested in your interpretation of the criteria I wrote, into an actual nomination statement. We could show the nomination, and the proposal in general, to the wider wikidata community, and if all would go well it could become the defacto first recipient of the award. Wittylama (talk) 16:02, 4 January 2021 (UTC)[reply]
@Wittylama: Hey. Happy to do it. Where should this be done? --Joalpe (talk) 00:45, 18 January 2021 (UTC)[reply]
@Joalpe: how about at a ‘nomination subpage’ /List of people killed by and disappeared during the Brazilian military dictatorship? Wittylama (talk) 13:29, 18 January 2021 (UTC)[reply]
@Wittylama: OK. Have started it here. Will continue later. Not sure about how this should look like, so please feel free to change anything you think should be adjusted. Thanks! --Joalpe (talk) 15:42, 18 January 2021 (UTC)[reply]
@Wittylama: Our proposal is ready, we think. What happens now? :) --Joalpe (talk) 00:55, 22 January 2021 (UTC) and Ederporto[reply]
@Joalpe: I think criteria 3.2 needs a response still. That could mean changing the query itself, or, explaining why it shouldn't be changed. More generally: when you're ready, I think the 'next step' is for me to go to the Project Chat page and create a new discussion section asking to renew debate about formalising the process of 'showcase queries' - using your proposal as the test-case. I've asked about the general idea before, and people seemed to like it, but it was an abstract question. NOW we have a specific case it makes it easier to decide about the process by using your query as the first example. Wittylama (talk) 14:39, 22 January 2021 (UTC)[reply]
@Wittylama: Hey. Now I think we are good to go :) --Joalpe (talk) 17:05, 29 January 2021 (UTC)[reply]
Thank you Joalpe! I will now start a discussion on the Project Chat about the suggestion to create the concept of a 'showcase query' award - using this proposal as the test-case. Wittylama (talk) 18:11, 29 January 2021 (UTC)[reply]

Questions and comments edit

Please write on the Discussion page