Wikidata:Requests for comment/Documented and featured SPARQL queries

An editor has requested the community to provide input on "Documented and featured SPARQL queries" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

SPARQL queries are essential to explore the Wikidata database, get insights from Wikidata and check the quality of the data.

Right now, we can find queries in the following places :

That's great but those queries are often poorly documented. They have a title but few explanations, context, insights, and there are few warnings or quality checks to know if the results are reliable.

I've recently made a proposal to create dedicated pages to document a query with a basic skeleton to explain the context, the quality checks, the construction of the query, etc. See User:PAC2/Documented queries and the first example : Wikidata:WikiProject France/Queries/List of current French departments.

A few years ago, there have been a proposal to create featured queries (see Wikidata:Showcase Queries). I think that we need both. Documented queries is just a template to document a query. If the documentation is of high quality, the query is optimised and the results are reliable, the query can be nominated as a featured query.

Also, if we want to have a community discussing about documented and featured queries, we need to create a dedicated WikiProject.

My proposition :

  1. Documented queries can be created
    • In a user space
    • In a Wikiproject
    • Anywhere in the Wikidata namespace
  2. All documented queries should be in the category : Category:Documented query
  3. Documented queries can be proposed to be featured queries. One need to create a new subpage with the discussion and the vote. This could be something such as Wikidata:WikiProject France/Queries/List of current French departments/Featured query.
  4. If the vote is conclusive after a few weeks, the documented query can be considered as a featured query. (we may create a category and a small template with a barnstar.)
  5. Create a WikiProject dedicated to Documented and featured queries

I think it's worth to have a discussion about this proposal during a few weeks before submitting a new proposal for voting.

 – The preceding unsigned comment was added by PAC2 (talk • contribs).

Discussions edit

I find the SPARQL learning curve near vertical, as after the basic syntax is explained, all the example queries, which I generally rely on for learning, seem to be scattered across the whole knowledge base, and often seem to be showing off rather than informative. If setting up a new set of model queries, I'd like to see them focus on a few linked entities, and show all the different tricks for extracting reports from them. So 3 items that are members of 1 or 2 classes, and for an item, how to show information from a class that has entries with qualifiers.

Having those well documented would be invaluable. Vicarage (talk) 06:17, 23 July 2022 (UTC)[reply]

Thanks for your feedback. I agree that there is a need of well documented basic examples. PAC2 (talk) 04:52, 25 July 2022 (UTC)[reply]
  • Hi PAC2, it looks like you've created a template ('Querydoc') for a documented query, can you explain a bit about how that works? Is it just adding the category? It would be nice if there was a standard format for the documented query page as a whole but I guess a MediaWiki template might not be the right tool for that. ArthurPSmith (talk) 16:51, 25 July 2022 (UTC)[reply]
@ArthurPSmith:. Yes {{Querydoc}} adds a message at the top of the page and the category. I plan to develop another template which would create the basic structure for a documented query page. PAC2 (talk) 19:26, 25 July 2022 (UTC)[reply]

I'm experienced in SPARQL and Blazegraph thanks to Wikidata. I find Wikidata really interesting that I installed my private instance of Blazegraph and developed a frontend for it with Vue.js (Q24589705). As it is intended for private use and allows anyone with acccess password to directly run SPARQL 1.1 Update (Q86817302), I cannot share it with you. But I would like to share some ideas with you based on my experience.

  • There should be plenty of featured and well documented queries. These queries need to be featured on the home page, just like how Wikimedia Commons features its galleries and categories.
  • There should be a dedicated "Query:" namespace hosting these documented queries, so that they can use a different content model that allows them to be executed, displayed, edited and transcluded. It is an essential portal to individual items.
  • The SPARQL queries are documented.
  • The SPARQL queries are accompanied by additional machine-readable properties that do various things. For example, a user friendly name can be attached to each column and so on.
  • Most importantly, the SPARQL queries are enhanced with methods to directly edit on them. That is, a "column of statement GUID" and "property ID or datatype" pair is assigned to each column intended to be edited by the user. The column that serve statement GUID for other editable columns can be hidden from GUI, but its data can be used as the master key to associate new data back to the item. This makes SPARQL queries feel like spreadsheet, and ensure schema consistency. Of course, Vue components need to be developed for each possible data type.
  • The SPARQL queries are also accompanied with methods to create an item that would be found in this query. This can be used to add missing item when you cannot find what you want, and also allows you to transform a CSV table into new items programmatically.
  • The items are accessed through the SPARQL queries. If you cannot find an item through one of the featured query on the home page or anywhere else, then the data quality of this item is bad. No need to use entity schema.
  • Complex constraints can also be created without being part of any property, to encourage their use. These complex constraints are accompanied by human readable descriptions or even fix-it buttons.
  • The items themselves need a better page layout. Let's hope Abstract Wikipedia (Q96807071) will be successful.
  • The query is linked to Wikidata Query Builder (Q105550739) or other means of easy modification. That's something I don't have, I just use bare SPARQL.

Midleading (talk) 02:52, 6 November 2022 (UTC)[reply]

@Midleading: I agree that query pages should have their own content model, because then you'd no longer have to annoyingly escape | as {{!}}, but that does not necessitate a dedicated MediaWiki namespace and in fact I think it would be best to keep queries in the Wikidata:* namespace because then it's easier to link them via relative links.
I just developed a small MediaWiki extension to facilitate this (just 230 lines of PHP), you can try it out at https://demo-wiki.push-f.com/wiki/index.php?title=Code. Feel free to experiment there :)
Pages with names ending in .rq[1] automatically are rendered as SPARQL. Notice how my extension also makes it easy to automatically linkify Wikidata identifiers within SPARQL code and also makes it easy to link the WDQS for a given code page via a dedicated special page.
I am not sure what the process here is on getting an extension installed in this wiki, I guess I'd open a task on Phabricator? I think it would make sense for this to firstly be installed on https://test.wikidata.org/.
--Push-f (talk) 18:50, 2 December 2022 (UTC)[reply]
The query tag you've created is cool, but it looks like the improvements and additional configuration can be made upstream to Extension:SyntaxHighlight (Q21678766) instead (sites like https://en.cppreference.com/ can use this to link to online compiler). Transclusion, rendering of query results are not supported. Anyway, Query namespace already exists, supposed to be used just in the way I described without features related to editing. Hope the development team will work on this one day (The phase 3 of Wikidata). --Midleading (talk) 03:26, 3 December 2022 (UTC)[reply]
My extension in fact does not contain anything specific to Wikidata, all Wikidata specifics are provided via configuration. I just implemented integration with Scribunto to allow the highlighted code to be postprocessed via Lua (which we could use to display labels for identifiers), as well as allowing the displaying of additional content after the code blocks via Lua ... so my extension could very well be used with Lua API proposed in phab:T67626 once it's introduced. I implemented a proof of concept via mw:Extension:UnlinkedWikibase, see [2] ... this is obviously not yet ready for production but it shows that my extension is flexible enough to support result embedding.
I am certain that a dedicated namespace for queries is a bad idea because then query pages couldn't be subpages of regular pages which is IMO the best way to structure things, see for example Wikidata:Topics/Wikidata, which I created placing the queries on subpages.
I just filed phab:T324395 to get some more opinions on my extension.
--Push-f (talk) 16:04, 3 December 2022 (UTC)[reply]
What about both of them? A subpage on Wikidata namespace in wikitext that hosts the SPARQL and documentation (so that the documentation can use wikitext and templates, other wiki pages can transclude them to embed & display its SPARQL sources). The page in Query namespace also transcludes the SPARQL source, but there, the users can see the cached query results , possibly edit them in the way I proposed (This requires new user interface elements like the EntitySchema, so only possible with the Query namespace). The page in the Query namespace can also be transcluded, and the cached query result will be displayed on whatever page transcludes it, instead of the SPARQL source. Template parameters can be used to parameterize the SPARQL query, allowing the use case like "List of people born in {year}". In fact the documentation page in Wikidata namespace is likely to transclude both pages (item editing is likely only supported in Query namespace though, it requires Javascript libraries to be loaded). --Midleading (talk) 17:21, 3 December 2022 (UTC)[reply]
Everything you are talking about can be implemented via page content models, nothing of this requires a new MediaWiki namespace (MediaWiki is very powerful and flexible, e.g. MediaWiki extensions can also dynamically embed JavaScript on any page). I think visiting a page that doesn't exist and pasting the SPARQL code is all it should take to put a query on the wiki; you shouldn't have to create two pages.
I think queries should be documented within SPARQL comments because code and documentation should live together on one page or otherwise they are bound to get out of sync. I don't think templates are necessary for documentation. Note that the regular expressions used to linkify Wikidata identifiers also work within comments. If a couple of more features are desired for documentation they could be implemented but I don't think we need all of the power of wikitext for documentation ... but if we really do we could also render SPARQL comments as wikitext.
To be honest I don't understand how "SPARQL queries [that] feel like spreadsheets" should work ... how would you represent an aggregate function or a subquery in spreadsheets? Or are you just talking about basic SPARQL queries that don't use any of that?
Parameterization of queries is a very interesting topic, thanks for bringing it up. I think it could work via simple string substitution. E.g. a query could contain ?item wdt:P569 ?dateOfBirth and when embedding the query you could specify ?dateOfBirth="2001-5-11"^^xsd:dateTime, which would perform a string substitution, changing the query to ?item wdt:P569 "2001-5-11"^^xsd:dateTime.
--Push-f (talk) 01:07, 4 December 2022 (UTC)[reply]
If your demo query tag works like map extension which creates an interactive user element within any wiki page, that is very cool.
I don't think ad-hoc regular expressions can do what should be done by wikitext and templates. It's like back to the age of magic links, prevents i18n and many more essential or innovative uses. Alternatively featured query can be translated like other pages using translate extension. Wikitext either must be rendered inside a <div> tag, or cannot use templates like in edit summary.
If templates are allowed in SPARQL page, parameterization of queries is as simple as placing {{{1}}} in the SPARQL source. But soon you realize it doesn't work when the parameters contain SPARQL keywords and punctuation (similar to SQL injection (Q506059)). What works actually is passing the parameter to another template to tidy it up (like many systems prohibit anything with SQL keywords or quotes from being entered). In your example, a hypothetical template that transforms "2001" into FILTER(YEAR(?dateOfBirth) = "2001"^^xsd:integer) can be created, and then you place {{exampletemplate|?dateOfBirth|{{{1}}}}} at the end of the SPARQL source.
Ideally the query results page works like phpMyAdmin (Q188104). But as we don't parse the SPARQL query, information needs to be provided manually for each column. Minimum required information is datatype and column name for claim GUID for wbsetclaimvalue (the query might have to be modified to include such a column). Unsupported columns, such as that produced by aggregate function, cannot be edited. An important aspect of featured query is completeness of the query results. To be honest, users must edit Wikidata (sometimes also Wikipedia) when doing a query, many important items you want may have 0 statements right now!
I think we should call for broader input from community. We should change the workflow on Wikidata fundamentally from working on item pages to working on query pages, to improve efficiency and ensure consistency across items. Midleading (talk) 13:45, 4 December 2022 (UTC)[reply]
My query tag currently doesn't require interactivity. My point was just that there are no technical benefits to a Query namespace.
Your claim that regular expressions prevent i18n is very much unsubstantiated. Quite the contrary is the case: If we automatically display a localized label e.g. date of birth (P569) instead of P569 within SPARQL code via regex then this is a big improvement to i18n. Documentation certainly does not require a Turing complete templating system (as evidenced by the many programming languages that very successfully use a much simpler Markup language, e.g. Rust just uses Markdown and it works fine). Actually if we care about accessibility it is better to restrict the markup language to simple features.
Yes support for mw:Extension:Translate would indeed be awesome. I just looked into it and Extension:Translate is in fact also very flexible via message groups. So I should be able to implement my own message group to automatically make any comment on a query page translatable ... without needing any <translate> tags.
I firmly believe that the text of query pages should be syntactically valid SPARQL. Which would let us automatically find pages that contain syntax errors and would let consumers easily analyze SPARQL queries by just parsing them. SPARQL queries are also just data and data does not belong within wikitext for the simple reason that wikitext is way too complex (the best third-party parser for wikitext is called mwparserfromhell for a reason).
I agree that parameters would need to be escaped but that is very much trivial. Note that the solution to code injection is escaping and certainly not blacklisting specific keywords.
Ah thanks for elaborating on your spreadsheet idea ... I didn't get that you meant that the results should be editable ... that's indeed a very good idea! However because of the involved complexity I think that this is better implemented by an application like the Wikidata Query GUI (Q114902143) (which could be linked from the wiki or perhaps even embedded via iframes), but I don't see any reason to implement this directly in a MediaWiki extension.
I agree that more opinions would be nice. In particular I am looking for feedback to my extension. I already suggested in the Phabricator ticket that it should be installed on test.wikidata.org, so that the community can properly try it out (with label display for identifiers, which I cannot easily do on my demo wiki).
--Push-f (talk) 15:18, 4 December 2022 (UTC)[reply]
The same feature has been available as SPARQL Jupyter notebook. Midleading (talk) 15:35, 5 February 2024 (UTC)[reply]

See also edit