User:Rdrg109/1/4

Introduction edit

This page contains a proposal for storing SPARQL queries using structured data.

Use case edit

Suppose you want to show what it is possible to do with SPARQL to a Ukrainian native speaker that is very familiar with Wikidata and OpenStreetMap. They would be able to understand that by being shown SPARQL examples that (1) contain comments and variable names in Ukrainian, (2) uses the SPARQL endpoint of Wikidata and OpenStreetMap, and that (3) uses the item Ukraine.

Here are some more ideas with their motivation

  • What SPARQL queries use #defaultView:Timeline and use the item mathematics (Q395)?
  • Queries that use the predicate wikibase:sitelinks and schema:isPartOf
    • Motivation: It might help someone to understand the usefulness of those predicates
  • Queries that use the most number of prefixes
  • Queries that use the most number of SPARQL endpoints
    • Motivation: It might help someone to understand SPARQL federated queries
  • Queries that use the SPARQL endpoint of Wikidata (Q2013) and Lingua Libre (Q60024037)
  • Queries whose variable names and comments are written in Spanish (Q1321)
    • Motivation: It might help a spanish native speaker to find examples that are easier to understand.
  • Queries that don't contain comments
    • Motivation: It might help someone to find queries that are more difficult to understand as a challenge for getting more familiar with SPARQL.
  • Queries that are minimal working examples of SPARQL functions on string (i.e. STRLEN, SUBSTR, UCASE, etc.)
    • Motivation: It might help someone to understand how SPARQL treats strings
  • Queries that use the SPARQL keyword IN
    • Motivation: Someone might want to know whether it is possible to pass a subquery as a parameter for the IN keyword, so they would search examples that use the IN keyword and find out whether this is possible or see how people have tackled a problem that involve using IN.
  • Queries that show a graph and whose result include some national hero of Peru
    • Motivation: Suppose that you are a student that is studying the history of Peru and want to know how people have used the data of Wikidata to generate a graph that uses item that is a national hero of Peru.

Next steps edit

Classes for classifying SPARQL queries edit

A SPARQL query can be instance of a given class. There might be different ideas for the classes for classifying SPARQL queries. I propose the following classes

SPARQL template query edit

Queries whose code allow someone to change a parameter to obtain results for those parameters.

For example, the template "List of people with a given occupation that have born in a given country" (A) is used by the query "List of tennis players that have born in a given country" (B). (B) is used by the queries "List of tennis players that have born in India" and "List of tennis players that have born in Australia".

Some examples are:

  • Number of libraries in a given country
  • Radio stations that were founded before a given year
  • Number of people with a given occupation born in a given year

SPARQL explorative query edit

Queries that obtain information about the data in the whole Wikidata environment.

Some examples are:

  • Number of uses per each property in instances of a given class
  • Properties used in instances of a given class
  • Number of uses per each property in references
  • Wikidata item with the most number of sitelinks

SPARQL informative query edit

These are the queries that show specific results and are not templates. These queries usually use templates.

Some examples are:

  • Map of libraries in Argentina

This query uses the template "Map of instances of a class located in Argentina" and "Map of libraries located in a given country. These two templates are derived from "Map of instances of a class located in a given country".

#defaultView:Map
SELECT
  ?coordinates
  ?item
  ?itemLabel
WITH {
  SELECT DISTINCT ?item {
    ?item (wdt:P276|wdt:P131)* wd:Q414.
  }
} AS %0
WITH {
  SELECT DISTINCT ?item {
    INCLUDE %0.
    ?item wdt:P31/wdt:P279* wd:Q7075
  }
} AS %1
{
  INCLUDE %1.
  ?item wdt:P625 ?coordinates.
  SERVICE wikibase:label {bd:serviceParam wikibase:language "es"}.
}
Try it!
  • Radio stations that were founded before the year 2000

SPARQL demonstrative query edit

A simple query that shows how to perform specific and basic tasks

Some examples are

  • Get the label of a Wikidata item in a given language using SERVICE wikibase:label.
SELECT ?item ?itemLabel {
  VALUES ?item {wd:Q935}.
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en"}.
}
Try it!
  • Get the label of a Wikidata item in a given language using rdfs:label
SELECT ?item ?label {
  VALUES ?item {wd:Q935}.
  ?item rdfs:label ?label FILTER(LANG(?label) = "zh").
}
Try it!
  • Get the creation date of a Wikidata item using SERVICE wikibase:mwapi
SELECT ?item ?created {
  BIND(wd:Q18517638 AS ?item).
  BIND(SUBSTR(STR(?item), 32) AS ?title).
  SERVICE wikibase:mwapi {
    bd:serviceParam
      wikibase:endpoint "www.wikidata.org";
      wikibase:api "Generator";
      wikibase:limit "once";
      mwapi:generator "allpages";
      mwapi:gaplimit 1;
      mwapi:prop "revisions";
      mwapi:gapfrom ?title;
      mwapi:gapto ?title;
      mwapi:rvprop "timestamp";
      mwapi:rvdir "newer".
    ?created wikibase:apiOutput 'revisions/rev/@timestamp'.
  }
}
Try it!
  • Get the description of a Wikidata item
  • Get the alternative label of a Wikidata item
  • Get the last modification date of a Wikidata item
  • Get the Wikipedia articles of a Wikidata item

Properties for the queries edit

Some properties that would be useful for the queries are

  • uses SPARQL endpoint
    • Possible values would be
      • Wikidata SPARQL endpoint
      • Wikimedia Commons SPARQL endpoint
      • LinguaLibre SPARQL endpoint
      • OpenStreetMap SPARQL endpoint
      • Dbpedia SPARQL endpoint
      • etc.
  • uses predicate
    • Possible values would be
      • schema:about
      • schema:isPartOf
      • schema:dateModified
      • etc.
  • uses prefix
    • Possible values would be
      • wdt
      • wd
      • etc.
  • uses SPARQL keyword
    • Possible values would be
      • OPTIONAL
      • FILTER NOT EXISTS
      • FILTER EXISTS
      • BIND
      • BOUND
      • etc.
  • uses SPARQL function
    • Possibles values would be
      • SUBSTR
      • REGEX
      • STRENDS
      • STRSTARTS
      • REPLACE
      • etc.
  • results are shown using
    • #defaultView:Timeline
    • #defaultView:BarChart
    • #defaultView:BubbleChart
    • etc.
  • defined prefixes are
  • results include entities
    • Some queries might result in more than 10K items, storing all of those items in this property might not bring any benefit.
  • comments written in language
  • variable names written in language

Parser for automating the addition of structured data edit

Implement a parser that would read a query and add structured data in regards to its syntax.

Until the parser is not implemented, users can add structured data by themselves.


Pending questions edit

  • How to store the statement "query use GROUP BY"?
  • How to store the statement "query show results in ascending order"?
  • How to store the statement "query show results in descending order"?
  • How to store the statement "query has a property path of 3, 4 or the most number of properties"?
  • How to list queries that use the most number of anonymous subqueries?
  • How to list queries that turn off the Wikidata optimizer?
  • How to list queries that use the most number of named subqueries? (e.g. WITH)
  • How to list queries that contain wdt:P31/wdt:P279*?
  • How to specify the section in which a statement is true? Let's suppose there is a query of 1000 LOC, how can the part of the query that runs in the SPARQL endpoint of LinguaLibre be pointed out?
  • How to store execution times?
    • How to answer the following question: Query whose execution times have the highest standard deviation
  • How to list queries that use VALUES with the highest number of n-tuples?
  • How to list queries that use VALUES with the highest number of variables?