User:Rdrg109/1/4
Introduction edit
This page contains a proposal for storing SPARQL queries using structured data.
Use case edit
Suppose you want to show what it is possible to do with SPARQL to a Ukrainian native speaker that is very familiar with Wikidata and OpenStreetMap. They would be able to understand that by being shown SPARQL examples that (1) contain comments and variable names in Ukrainian, (2) uses the SPARQL endpoint of Wikidata and OpenStreetMap, and that (3) uses the item Ukraine.
Here are some more ideas with their motivation
- What SPARQL queries use
and use the item mathematics (Q395)?#defaultView:Timeline
- Motivation: The timeline might be of interest to someone that is studying history of mathematics.
- Queries that use the predicate
andwikibase:sitelinks
schema:isPartOf
- Motivation: It might help someone to understand the usefulness of those predicates
- Queries that use the most number of prefixes
- Motivation: It might be very interesting to a curious Wikidata editor (Q28859215)
- Queries that use the most number of SPARQL endpoints
- Motivation: It might help someone to understand SPARQL federated queries
- Queries that use the SPARQL endpoint of Wikidata (Q2013) and Lingua Libre (Q60024037)
- Motivation: It might motivate a Wikidata user to contribute to Lingua Libre (Q60024037)
- Queries whose variable names and comments are written in Spanish (Q1321)
- Motivation: It might help a spanish native speaker to find examples that are easier to understand.
- Queries that don't contain comments
- Motivation: It might help someone to find queries that are more difficult to understand as a challenge for getting more familiar with SPARQL.
- Queries that are minimal working examples of SPARQL functions on string (i.e.
,STRLEN
,SUBSTR
, etc.)UCASE
- Motivation: It might help someone to understand how SPARQL treats strings
- Queries that use the SPARQL keyword
IN
- Motivation: Someone might want to know whether it is possible to pass a subquery as a parameter for the
keyword, so they would search examples that use theIN
keyword and find out whether this is possible or see how people have tackled a problem that involve usingIN
.IN
- Motivation: Someone might want to know whether it is possible to pass a subquery as a parameter for the
- Queries that show a graph and whose result include some national hero of Peru
- Motivation: Suppose that you are a student that is studying the history of Peru and want to know how people have used the data of Wikidata to generate a graph that uses item that is a national hero of Peru.
Next steps edit
Classes for classifying SPARQL queries edit
A SPARQL query can be instance of a given class. There might be different ideas for the classes for classifying SPARQL queries. I propose the following classes
SPARQL template query edit
Queries whose code allow someone to change a parameter to obtain results for those parameters.
For example, the template "List of people with a given occupation that have born in a given country" (A) is used by the query "List of tennis players that have born in a given country" (B). (B) is used by the queries "List of tennis players that have born in India" and "List of tennis players that have born in Australia".
Some examples are:
- Number of libraries in a given country
- Radio stations that were founded before a given year
- Number of people with a given occupation born in a given year
SPARQL explorative query edit
Queries that obtain information about the data in the whole Wikidata environment.
Some examples are:
- Number of uses per each property in instances of a given class
- Properties used in instances of a given class
- Number of uses per each property in references
- Wikidata item with the most number of sitelinks
SPARQL informative query edit
These are the queries that show specific results and are not templates. These queries usually use templates.
Some examples are:
- Map of libraries in Argentina
This query uses the template "Map of instances of a class located in Argentina" and "Map of libraries located in a given country. These two templates are derived from "Map of instances of a class located in a given country".
#defaultView:Map
SELECT
?coordinates
?item
?itemLabel
WITH {
SELECT DISTINCT ?item {
?item (wdt:P276|wdt:P131)* wd:Q414.
}
} AS %0
WITH {
SELECT DISTINCT ?item {
INCLUDE %0.
?item wdt:P31/wdt:P279* wd:Q7075
}
} AS %1
{
INCLUDE %1.
?item wdt:P625 ?coordinates.
SERVICE wikibase:label {bd:serviceParam wikibase:language "es"}.
}
- Radio stations that were founded before the year 2000
SPARQL demonstrative query edit
A simple query that shows how to perform specific and basic tasks
Some examples are
- Get the label of a Wikidata item in a given language using
.SERVICE wikibase:label
SELECT ?item ?itemLabel {
VALUES ?item {wd:Q935}.
SERVICE wikibase:label {bd:serviceParam wikibase:language "en"}.
}
- Get the label of a Wikidata item in a given language using
rdfs:label
SELECT ?item ?label {
VALUES ?item {wd:Q935}.
?item rdfs:label ?label FILTER(LANG(?label) = "zh").
}
- Get the creation date of a Wikidata item using
SERVICE wikibase:mwapi
SELECT ?item ?created {
BIND(wd:Q18517638 AS ?item).
BIND(SUBSTR(STR(?item), 32) AS ?title).
SERVICE wikibase:mwapi {
bd:serviceParam
wikibase:endpoint "www.wikidata.org";
wikibase:api "Generator";
wikibase:limit "once";
mwapi:generator "allpages";
mwapi:gaplimit 1;
mwapi:prop "revisions";
mwapi:gapfrom ?title;
mwapi:gapto ?title;
mwapi:rvprop "timestamp";
mwapi:rvdir "newer".
?created wikibase:apiOutput 'revisions/rev/@timestamp'.
}
}
- Get the description of a Wikidata item
- Get the alternative label of a Wikidata item
- Get the last modification date of a Wikidata item
- Get the Wikipedia articles of a Wikidata item
Properties for the queries edit
Some properties that would be useful for the queries are
- uses SPARQL endpoint
- Possible values would be
- Wikidata SPARQL endpoint
- Wikimedia Commons SPARQL endpoint
- LinguaLibre SPARQL endpoint
- OpenStreetMap SPARQL endpoint
- Dbpedia SPARQL endpoint
- etc.
- Possible values would be
- uses predicate
- Possible values would be
schema:about
schema:isPartOf
schema:dateModified
- etc.
- Possible values would be
- uses prefix
- Possible values would be
wdt
wd
- etc.
- Possible values would be
- uses SPARQL keyword
- Possible values would be
OPTIONAL
FILTER NOT EXISTS
FILTER EXISTS
BIND
BOUND
- etc.
- Possible values would be
- uses SPARQL function
- Possibles values would be
SUBSTR
REGEX
STRENDS
STRSTARTS
REPLACE
- etc.
- Possibles values would be
- results are shown using
#defaultView:Timeline
#defaultView:BarChart
#defaultView:BubbleChart
- etc.
- defined prefixes are
- results include entities
- Some queries might result in more than 10K items, storing all of those items in this property might not bring any benefit.
- comments written in language
- variable names written in language
Parser for automating the addition of structured data edit
Implement a parser that would read a query and add structured data in regards to its syntax.
Until the parser is not implemented, users can add structured data by themselves.
Pending questions edit
- How to store the statement "query use
"?GROUP BY
- How to store the statement "query show results in ascending order"?
- How to store the statement "query show results in descending order"?
- How to store the statement "query has a property path of 3, 4 or the most number of properties"?
- How to list queries that use the most number of anonymous subqueries?
- How to list queries that turn off the Wikidata optimizer?
- How to list queries that use the most number of named subqueries? (e.g.
)WITH
- How to list queries that contain
?wdt:P31/wdt:P279*
- How to specify the section in which a statement is true? Let's suppose there is a query of 1000 LOC, how can the part of the query that runs in the SPARQL endpoint of LinguaLibre be pointed out?
- How to store execution times?
- How to answer the following question: Query whose execution times have the highest standard deviation
- How to list queries that use
with the highest number of n-tuples?VALUES
- How to list queries that use
with the highest number of variables?VALUES