Wikidata:Request a query/Archive/2022/04



How to query qualifiers? edit

Hello everyone! I am looking for a way to query all gold medallists from Poland from the 1988 Summer Olympics. The property "award received" P166 is always created as a qualifier of the statement "participant in" P1344. Does anyone knows a solution? Thank you very much!

Number of artists / cultural organisations from Québec edit

Hi! I am looking to find out how many artists (Q483501) or cultural institutions (Q3152824) / art organizations (Q7062022) from the province of Quebec (Q176) are listed actually on Wikidata (all disciplines combined). I have no clue as to where to start in creating a SPARQL query, but here are some properties I found that could be useful (I think), if someone (anyone) is willing to give me a hand. Thanks in advance for your help.

  • Wikidata property related to art (Q27918607)
  • Québec (Q176)
  • Artists (Q483501)
  • Cultural institution (Q3152824)
  • not-for-profit arts organization (Q7062022)
  • nonprofit organization (Q163740)
  • field of this occupation : Art (Q735)
  • sub-class of : Art (Q2018526)
  • part of / field of work : Culture (Q11042)
  • has quality of : art genre (Q1792379) , art style (Q1792644)
  • Catégorie:Arts (Q4104783)
  • Catégorie:Association ou organisme culturel au Québec (Q49656487)
  • Catégorie:Association ou organisme culturel (Q8809115)
Bonjour. I'm no expert in SPARQL queries so I won't attempt to write the query. But, I know you will need different queries for persons and for organizations. Individual artists can be queried on the basis of their work location (P937) or residence (P551) or place of birth (P19). However, these properties are often incomplete, so the query may miss several artists. Organizations can usually be queried directly with located in the administrative territorial entity (P131) (that is the recommended property in the WikiProject Performing arts), but many items use headquarters location (P159) and music bands use location of formation (P740). As to classes for P31 values, I recommend performing arts group (Q105815710) for the performing arts domain: it's a superclass for all incorporated and unincorporated performing arts and live music organizations. I will let other Wikimedians take it from there. Fjjulien (talk) 18:33, 1 April 2022 (UTC)[reply]

Server error: Unexpected end of JSON input edit

The WikiProject Performing arts has a number of recurrent queries that we run to track progress over time. However, one of them no longer works. It now yields: "Server error: Unexpected end of JSON input". This query is meant to retrieve production companies (organizations listed at least once as production company of a performing arts production).

SELECT DISTINCT ?item ?itemLabel 
WHERE 
{
  ?production wdt:P31/wdt:P279* wd:Q43099500.
  ?production wdt:P272 ?item.
  ?item wdt:P31/wdt:P279* wd:Q43229.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

Thanks in advance for your help. --Fjjulien (talk) 18:10, 1 April 2022 (UTC)[reply]

@Fjjulien: It seemed to want some go-faster stripes, which I've applied as hints. Some more documentation here fwiw.
SELECT DISTINCT ?item ?itemLabel 
WHERE 
{
  ?production wdt:P272 ?item . hint:Prior hint:runFirst true.
  ?production wdt:P31/wdt:P279* wd:Q43099500 . hint:Prior hint:gearing "forward".
  ?item wdt:P31/wdt:P279* wd:Q43229 . hint:Prior hint:gearing "forward".
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Tagishsimon (talk) 18:16, 1 April 2022 (UTC)[reply]
It works fine the count is consistent with the value we had when the query was last run. Thanks! Fjjulien (talk) 21:24, 1 April 2022 (UTC)[reply]

Tennis events without a winner edit

Hi, I am looking for an overview of all tennis events (P31 = Q46190676) where the men's singles (P2094 = Q16893072) event does not yet have a winner (P1346) indicated. To make it a bit trickier this list should only contain tournaments on the highest ATP Tour level, so no Challenger or Futures level events. This can be derived from the tennis event which has a part of field (P361) indicating the tournament. The tournament item also has apart of field and the value should be a year edition of the ATP Tour or Grand Prix circuit. Example: men's singles tennis event (Q19839695) is part of tournament (Q3786643) which is in turn part of the 1998 ATP Tour (Q299855). Thanks! --Wolbo (talk) 11:44, 2 April 2022 (UTC)[reply]

@Wolbo: I assume you mean "grand slam" when you say grand prix? Otherwise please elaborate. It seems the ATP tour tournaments and grand slam tournaments are organized differently in Wikidata, so the query has a UNION branch for each:
SELECT ?item ?itemLabel ?type ?tournament ?tournamentLabel
WHERE
{
  ?item wdt:P31 wd:Q46190676 . # Instance of tennis event
  ?item wdt:P2094 wd:Q16893072 . # men's singles
  MINUS
  {
    ?item wdt:P1346 ?winner .
  }
  ?item wdt:P361 ?tournament .
  {
    ?tournament wdt:P361 ?tour .
    ?tour wdt:P3450 ?season_of .
    VALUES ?season_of { wd:Q300008 } # ATP World Tour
    BIND ("ATP world tour" AS ?type)
  }
  UNION
  {
    ?tournament wdt:P31 ?a_grand_slam .
    ?a_grand_slam wdt:P361 wd:Q102113 . # part of grand slam
    BIND ("Grand Slam" AS ?type)
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
--Dipsacus fullonum (talk) 14:47, 2 April 2022 (UTC)[reply]
@Dipsacus fullonum: Thanks for the response. No, I mean the Grand Prix tennis circuit (1970 to 1989) which was the precursor to the ATP Tour (which started in 1990). For example, the 1975 Pacific Southwest Open tournament (Q16971727), was part of the 1975 Grand Prix circuit (Q3465233). Hope that helps.--Wolbo (talk) 15:06, 2 April 2022 (UTC)[reply]
@Wolbo: Thanks for the explananation. I don't know much about tennis history. I hope this query is better:
SELECT ?item ?itemLabel ?tournament ?tournamentLabel
WHERE
{
  ?item wdt:P31 wd:Q46190676 . # Instance of tennis event
  ?item wdt:P2094 wd:Q16893072 . # men's singles
  MINUS
  {
    ?item wdt:P1346 ?winner .
  }
  ?item wdt:P361 ?tournament .
  ?tournament wdt:P361 ?tour .
  ?tour wdt:P3450 ?season_of .
  VALUES ?season_of { wd:Q300008 wd:Q3775082} # ATP World Tour or Grand Prix tennis circuit
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
--Dipsacus fullonum (talk) 15:53, 2 April 2022 (UTC)[reply]
Seems to work well. Thanks! --Wolbo (talk) 20:43, 2 April 2022 (UTC)[reply]

Logical text search in item description edit

Dear all,

I'd like to match several strings (ignoring upper/ lower case) in the item description as follows:

mylist is a list of lists of strings: { {a_1, ..., a_m}, ..., {z_1, ... , z_q} }

The query shall return items where the item description contains at least ONE string of EACH sub-list of mylist

Example: mylist = {{"mathematician"}, {"algebra", "geometry"}}

=> The query should return mathematicians who worked/ contributed in algebra OR geometry (OR both).

Thank you Oliver

Hi Oliver. It isn't possible to do this for the general case with a SPARQL query without timeout. I don't suppose it can be done without downloading and analyzing a database dump.
You can however do smaller and more specific things like your example by calling cirrus search with the WMAPI service like this:
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?en_description
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .
    bd:serviceParam wikibase:api "Search" .
    bd:serviceParam mwapi:srsearch "algebra OR geometry mathematician " .
    bd:serviceParam mwapi:srlimit "max" .
    bd:serviceParam mwapi:srnamespace "0" .
    ?item wikibase:apiOutputItem mwapi:title .
  }
  ?item schema:description ?en_description .
  FILTER (LANG(?en_description) = "en")
  BIND (LCASE(?en_description) AS ?lc_desc)
  FILTER CONTAINS(?lc_desc, "mathematician")
  FILTER (CONTAINS(?lc_desc, "algebra") || CONTAINS(?lc_desc, "geometry"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
Note if you edit the query that use of "OR" and "AND" in a cirrus search string doesn't work like ordinary logical operators and may give unexpected results as described in the documentation at mw:Help:CirrusSearch/Logical operators. --Dipsacus fullonum (talk) 19:39, 3 April 2022 (UTC)[reply]
Thank you very much! Very kind of you.
This works pretty well and amazingly quick.
The CirrusSearch is new to me -- I'll try to learn more about it ... Oli the sloth (talk) 08:51, 4 April 2022 (UTC)[reply]
@Oli the sloth: Cirrus search is just the technical name for the normal wiki search function. It the same search that you can do from the search field in the top right corner of every wiki page. There are a few functions that are specific for Wikidata (or the Wikibase extension to be precise) described at mw:Help:Extension:WikibaseCirrusSearch which you can use to search for labels in specific languages, statement values, and other things. --Dipsacus fullonum (talk) 12:08, 4 April 2022 (UTC)[reply]

Query members orgs by time and location edit

Could someone do a Query of all members organizations of a Clubture Network ideally by:

year of joining (and display a timeline?) and location (and display as map?)

--Zblace (talk) 21:32, 5 April 2022 (UTC)[reply]

@Zblace: Timeline:
#defaultView:Timeline
SELECT ?item ?itemLabel ?start WHERE {
  ?item p:P463 ?stat . 
  ?stat ps:P463 wd:Q96376996.
  OPTIONAL {?stat pq:P580 ?start . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Try it!
Map. The data's not very good for this; few items have coordinates. The best common property seems to be located in the administrative territorial entity (P131), and so I've fetched the coordinates for the P131s of the member items.
#defaultView:Map
SELECT ?item ?itemLabel ?admin_area ?coord WHERE {
  ?item p:P463 ?stat . 
  ?stat ps:P463 wd:Q96376996.
  OPTIONAL {?item wdt:P131 ?admin_area .
           OPTIONAL { ?admin_area wdt:P625 ?coord. } }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Try it!
--Tagishsimon (talk) 22:04, 5 April 2022 (UTC)[reply]
Thank you@Tagishsimon that was super fast :-)
I would try to advance it further once I figure out how to modify the display and insert more useful data. Any good pointers were to start?
For the map: I get your point about precision of geo data, but since it is small organizations that have registered at random street adresses without owning buildings, it is not so essential to be precise, rather to be in the administrative territorial entity. How can I check quickly which member organizations do not have certain type of data?
For the timeline: I would like to display just the most simple nodes with maybe overlay names (no dates in boxes) and at the bottom total number of members in that year. Is this easy to customize, or should I look for some services that do this in matching style?
What are the chances there is one that combines timeline scroll with map display?
Thank you again! --Zblace (talk) 06:27, 6 April 2022 (UTC)[reply]

List all the events where a person was a guest edit

Stephen Baxter (Q314416) was a guest of honor (P967) at Novacon (Q7064385) in 1993.

SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P967 ?statement0.
      ?statement0 ps:P967 wd:Q314416.
    }
    LIMIT 100
  }
}

Gives me the overall event, but I can't get the year qualifier I made for the entry. I want "Novacon" "1993" returned.  – The preceding unsigned comment was added by Vicarage (talk • contribs) at 6 april 2022 kl. 18.15‎ (UTC).

Try this query:
SELECT ?itemLabel (YEAR(?p585) AS ?year) WHERE {
    ?item p:P967 [ps:P967 wd:Q314416; pq:P585 ?p585] .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Larske (talk) 18:51, 6 April 2022 (UTC)[reply]
Thanks very much. I am pondering how I might encode "Novacon contains Novacon 23 at point in time 1993 with GoH Stephen Baxter", and do the query against that, but that can wait until I get confirmation as to how best to structure the information. Vicarage (talk) 19:23, 6 April 2022 (UTC)[reply]

Query a random set of popular entries edit

I am trying to get a random set of entries from Wikidata that are "popular".

I define popular as having more than 20 site links. And I am interested in all kinds of entries (humans, cats, ... all, there are "filters" to avoid timeouts, but I want to remove them).

So far, this is what I have:

 SELECT ?item ?itemLabel ?sitelinks ?random WHERE {
 
   #Filters tried to avoid Timeout
   #?item wdt:P31 wd:Q146. #Cat, it works
   #?item wdt:P31 wd:Q11762356 . #Valley Glacier, it works
   #?item wdt:P31 wd:Q5 . #Humans, it gives a timeout!
 
   #Minimum sitelinks
   ?item wikibase:sitelinks ?sitelinks .
   FILTER (?sitelinks > 20).
 
   #Random stuff
   BIND(SHA512(CONCAT(STR(RAND()), STR(?item))) AS ?random) .
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en" . }
 
 } ORDER BY ?random
 LIMIT 100
 # jsadhflkjsd, change this before each run to bypass WDQS cache

I've got the code for the randomization from here. And I do believe it works.

However, I have issues with timeouts. It looks like it is trying to fetch all the entries that follow the "where" and then limit them to a 100, which is not feasible, obviously.

How should I do it? I'm quite lost at this point.

Yes, the query tries to fetch all matching items before limiting to 100 results. That is necessary to do the sorting. You asked for the 100 items with the lowest sorted values of ?random. To find these, it has to look at all values. Besides that, you can speed up the query by using a rangesafe hint:
SELECT ?item ?itemLabel ?sitelinks ?random
WHERE
{
  #Minimum sitelinks
  ?item wikibase:sitelinks ?sitelinks . hint:Prior hint:rangeSafe true .
  FILTER (?sitelinks > 20 )

  #Random stuff
  BIND(SHA512(CONCAT(STR(RAND()), STR(?item))) AS ?random) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en" . }
}
LIMIT 100
Try it!
--Dipsacus fullonum (talk) 11:33, 7 April 2022 (UTC)[reply]
That works like a charm! Thanks for that superfast response! Guplem (talk) 11:44, 7 April 2022 (UTC)[reply]
One question: if you remove the "order by", the "randomness" of the Wikidata service will have to be trusted. But I don't even know if it is random! Is it?
Where can I find information about it? It looks random, but I don't know if it will start to fetch always the same group of entries. Guplem (talk) 11:48, 7 April 2022 (UTC)[reply]
The unsorted order may not be very random, I don't know for sure but that's a reasonable assumption. I tweaked the query some more with that in mind, mainly by calling the label service only for the final 100 items. Also RAND() seems to be sufficiently random on its own so that also helps speed things up. The only time you'll see the same items pop up twice in a row is where you re-run an unmodified query, then it will just present you with the results from the previous run. Infrastruktur (talk) 12:31, 7 April 2022 (UTC)[reply]
SELECT ?item ?itemLabel ?sitelinks
WITH { SELECT ?item ?sitelinks WHERE {
  #Minimum sitelinks
  ?item wikibase:sitelinks ?sitelinks . hint:Prior hint:rangeSafe true .
  FILTER (?sitelinks > 20 )

  #Random stuff
  #BIND(RAND() AS ?random) . # Doesn't work
  BIND(SHA512(CONCAT(STR(RAND()), STR(?item))) AS ?random) .
} ORDER BY ?random LIMIT 100 } AS %i
WHERE {
  INCLUDE %i
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en" . }
}
Try it!
@Guplem, Infrastruktur: I think the original use of a hash function is necessary. The RAND() function is only called once, so all items get the same value in Infrastruktur's query as you can see if you add ?random to the two SELECT clauses. You will also see that all 100 results have very low Q-numbers, indicating the selected items are not random at all. There are about 439000 items with over 20 sitelinks, and it seems that the modified query with ordering and limiting in a subquery before finding labels, can run in 30-40 seconds with the original calculation of ?random for a more random result. --Dipsacus fullonum (talk) 12:57, 7 April 2022 (UTC)[reply]
Nice catch. As for finding out for sure what the randomness of unordered results might be, that's implementation specific. So if you can't find it in the Blazegraph wiki, try to search Google Scholar for open-access papers done with Blazegraph. One would expect such a question to pop up in the method section of papers. Infrastruktur (talk) 13:12, 7 April 2022 (UTC)[reply]
I am struggling to follow it all. Using the old random method should work? I am still working on the query and right now, I have this.
It looks random to me, but on my end it looks like the filters are adding quite some work and I can't make it not time out if I go under 30 site links (the original target was 20, but it was arbitrary). I am wondering if there is a better way of filtering out undesired entries. Guplem (talk) 13:35, 7 April 2022 (UTC)[reply]
@Guplem: Filters often make a query slow, and with so many filters I am not surprised for timeouts for lower values of ?sitelinks than 30. I suggest first finding, say, 1000 random items with over 20 sitelinks, and then select 100 of these that passes your filters. That way you don't need to apply the filter to 100 of thousands of items. The prize is there will be less than 100 results in some runs. You can adjust the initial number 1000 to weigh that risc against additional used time for higher numbers. The query will look like this:
SELECT ?item ?itemLabel ?itemDescription ?sitelinks
WITH
{
  SELECT ?item ?sitelinks
  WHERE
  {
    #Minimum sitelinks
    ?item wikibase:sitelinks ?sitelinks.
    hint:Prior hint:rangeSafe true.
    FILTER (?sitelinks > 20 )
  
    #Random stuff
    # BIND(RAND() AS ?random) . # Using this makes it not random
    BIND(SHA512(CONCAT(STR(RAND()), STR(?item))) AS ?random) 
  }
  ORDER BY ?random
  LIMIT 1000
} AS %subquery1
WITH
{
  SELECT ?item ?sitelinks
  WHERE
  {
    INCLUDE %subquery1

    #Filters to remove undesired entries (templates, categories, ...)
    FILTER NOT EXISTS {?item wdt:P31 wd:Q11266439}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q97950663}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q4167836}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q59541917}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q14204246}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q19842659}
    FILTER NOT EXISTS {?item wdt:P373 ?commonsCategory}
    FILTER NOT EXISTS {?item wdt:P301 ?categoryMainTopic}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q15184295}
    FILTER NOT EXISTS {?item wdt:P1423 ?templateHasTopic}
    FILTER NOT EXISTS {?item wdt:P910 ?topicMainCategory}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q20010800}
    FILTER NOT EXISTS {?item wdt:P360 ?isAListOf}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q108783631}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q11753321}
    FILTER NOT EXISTS {?item wdt:P4224 ?categoryContains}
    FILTER NOT EXISTS {?item wdt:P971 ?categoryCombinesTopics}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q97303168}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q59259626}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q110010043}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q1474116}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q15647814}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q19887878}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q107344376}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q36330215}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q14296}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q42032}
    FILTER NOT EXISTS {?item wdt:P2370 ?conversionToSIUnit}
    FILTER NOT EXISTS {?item wdt:P31 wd:Q4167410}
    #FILTER NOT EXISTS {?item wdt:P31 wd:aaa}
  }
  LIMIT 100
} AS %subquery2
WHERE 
{
  INCLUDE %subquery2
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
--Dipsacus fullonum (talk) 14:34, 7 April 2022 (UTC)[reply]
That's smart! That works like a charm and there is no issue on receiving less than 100 results. Thank you very much!!!! Guplem (talk) 15:43, 7 April 2022 (UTC)[reply]
Yes, there is an issue on receiving less than 100 results. If the selection truly is random, then there is even a risc (although with very low probability) that 1000 unwanted items are selected in the first subquery. --Dipsacus fullonum (talk) 15:57, 7 April 2022 (UTC)[reply]

List the names of the guests of honour at a convention edit

For Illumination (Q111529504) return guest of honor (P967) as labels, ie "Paul J. McAuley" and "Geoff Ryman" as separate array items in a json file. This in addition to single value items such as dates, urls, and locationLabel which I can get

I'm surprised extracting multiple values of a property isn't covered in the Examples page, but I couldn't see one. Vicarage (talk) 11:28, 11 April 2022 (UTC)[reply]

SELECT ?gohLabel ?start ?locLabel ?countryLabel ?website
WHERE {
  VALUES ?aconv { wd:Q111529504 }
  ?aconv wdt:P967 ?goh.
  OPTIONAL { ?aconv wdt:P580 ?start. }
  OPTIONAL { ?aconv wdt:P17 ?country. }
  OPTIONAL { ?aconv wdt:P276 ?loc. }
  ?conv ^wdt:P31 ?aconv;
    wdt:P856 ?website.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
There's a "download" button in the middle right of the screen which lets you grab the table in json format.
The data on the conventions is a bit lacking, but you can help by adding more precise information. There's a page at https://eastercon.fandom.com/wiki/History which lists more precise locations and you can use either preferably Fandom article ID (P6262) (consisting of wikiname + ":" + pagename) or official website (P856) to link to the convention pages themselves.
Extracting multiple values is no different from single values. If the triple you match against have no restrictions that would prevent multiple matches, it will do so. Using the "optional" keyword usually takes care of this. Infrastruktur (talk) 13:17, 11 April 2022 (UTC)[reply]
I know about that first page, I wrote it!
The json I get is with your query is a table with information repeated (sorry about the formatting, pre, nowiki etc don't seem to work here
{
"gohLabel": "Paul J. McAuley",
"start": "1992-01-01T00:00:00Z",
"countryLabel": "United Kingdom",
"website": "http://www.eastercon.org/"
},
{
"gohLabel": "Geoff Ryman",
"start": "1992-01-01T00:00:00Z",
"countryLabel": "United Kingdom",
"website": "http://www.eastercon.org/"
}
]
but what I wanted was the GoHLabels as a json array, separate from the date and country, more like
{
"start": "1992-01-01T00:00:00Z",
"countryLabel": "United Kingdom",
"website": "http://www.eastercon.org/",
"GoHs": [
{
"Label": "Paul J. McAuley"
},
{
"Label": "Geoff Ryman"
}
]
}
Vicarage (talk) 13:49, 11 April 2022 (UTC)[reply]
Doubt that can be done without programming. However you can use aggregation to combine the GOHs into a list separated by for instance semicolon.
SELECT
  (GROUP_CONCAT(?gohLabel; SEPARATOR="; ") AS ?guestofhonor)
  (SAMPLE(?start) AS ?start)
  (SAMPLE(?locLabel) AS ?locLabel)
  (SAMPLE(?countryLabel) AS ?countryLabel)
  (URI(CONCAT("https://community.fandom.com/index.php?title=w:c:", SAMPLE(?faid))) AS ?article)
WHERE {
  VALUES ?aconv { wd:Q111529504 }
  OPTIONAL { ?aconv wdt:P967 ?goh. }
  OPTIONAL { ?aconv wdt:P580 ?start. }
  OPTIONAL { ?aconv wdt:P17 ?country. }
  OPTIONAL { ?aconv wdt:P276 ?loc. }
  OPTIONAL { ?aconv wdt:P6262 ?faid. }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    ?goh rdfs:label ?gohLabel.
    ?loc rdfs:label ?locLabel.
    ?country rdfs:label ?countryLabel.
  }
}
GROUP BY ?aconv
Try it!
Infrastruktur (talk) 14:22, 11 April 2022 (UTC)[reply]
I've managed to work with that one. Thanks for your help. Vicarage (talk) 12:17, 12 April 2022 (UTC)[reply]
Oh, and let's know if you need a similar list covering all the Eastercons. Infrastruktur (talk) 13:40, 11 April 2022 (UTC)[reply]
I also wrote https://fancyclopedia.org/Eastercon, I'm trying to formalise the information in wikidata, and expand to other cons worldwide. Vicarage (talk) 13:53, 11 April 2022 (UTC)[reply]

Trying to remove duplicate authors in scientific articles edit

There's a bot that has caused some issues so long ago on so many items, I presume the only fix is to repair not restore or undo. I have found numerous items where the bot has added author (P50), but not deleted the author name string (P2093) and later ORCIDator efforts have duplicated the author values. See Search for Dark Matter and Large Extra Dimensions in pp Collisions Yielding a Photon and Missing Transverse Energy (Q56742969) as an example. There are many author (P50) values that are entered twice with one listing both object named as (P1932) and series ordinal (P1545) and again with only series ordinal (P1545). I recall there was a way to reorganize values according to the series ordinal (P1545). If I could do that, I could see what was listed twice rather because they'd follow one another rather than having to search each author. If that's not so easy (I'm not that clever with programming), maybe I can run a query that lists both author (P50) and author name string (P2093) and I can manipulate the csv file which I am clever at doing.

So, for the query, can I get all the author (P50) and author name string (P2093) values for Q56742969? Thanks in advance. Trilotat (talk) 20:51, 11 April 2022 (UTC)[reply]

I tried doing the obvious thing making a query that looked for exact matches, however even with the limited amount of author names in that item it made it cry uncle. This is still the method I would recommend though, however you need to do things in several steps. You'll need to grab the list of authors first, then comment out that paragraph and uncomment the next paragraph in the query below to grab a list of author strings. Save both lists to CSV files. Then the third and final step would be to use a tool to compare the sets of strings to find exact matches. There are several tools you could use, but I suspect a spreadhsheet application or OpenRefine would be the easiest ones. Other tools include awk, arq (jena) or any scripting language. Infrastruktur (talk) 12:40, 12 April 2022 (UTC)[reply]
Thank you!! Trilotat (talk) 14:46, 12 April 2022 (UTC)[reply]
SELECT ?authLabel ?authstr ?ord
WHERE {
  VALUES ?item { wd:Q56742969 }

  # Run first, then comment out paragraph
  ?item p:P50 ?auth_st.
  ?auth_st ps:P50 ?auth.
  OPTIONAL { ?auth_st pq:P1545 ?ord. }

  # Run second, uncomment paragraph after first run is complete
  #?item p:P2093 ?authstr_st.
  #?authstr_st ps:P2093 ?authstr.
  #OPTIONAL { ?authstr_st pq:P1545 ?ord. }

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY xsd:integer(?ord)
Try it!
@Trilotat, Infrastruktur: This query will find repeated use of the same series ordinal (P1545) used as qualifiers in either author (P50) or author name string (P2093).
SELECT (GROUP_CONCAT(?authLabel; separator="; ") AS ?author_labels)
       (GROUP_CONCAT(?authstr; separator="; ") AS ?author_strings)
       ?ord (COUNT(?ord) AS ?count)
WHERE
{
  {
    VALUES ?item { wd:Q56742969 }
    ?item p:P50 ?auth_st .
    ?auth_st ps:P50 ?auth .
    ?auth_st pq:P1545 ?ord .
  }
  UNION
  {
    VALUES ?item { wd:Q56742969 }
    ?item p:P2093 ?authstr_st .
    ?authstr_st ps:P2093 ?authstr .
    ?authstr_st pq:P1545 ?ord .
  }
  SERVICE wikibase:label
  {
    bd:serviceParam wikibase:language "en" .
    ?auth rdfs:label ?authLabel . 
  }
}
GROUP BY ?ord
HAVING (?count > 1)
ORDER BY xsd:integer(?ord)
Try it!
--Dipsacus fullonum (talk) 13:58, 12 April 2022 (UTC)[reply]
This was perfect! Thank you. I've bookmarked the query and will reuse as needed! Awesome. Now, on to the next one... Thanks again, Trilotat (talk) 14:48, 12 April 2022 (UTC)[reply]

Help get the dates if available edit

In my query

# Problem getting start and end year if they exist
SELECT DISTINCT ?mandal ?mandalLabel  ?startyear ?endyear WHERE {
  ?mandal (wdt:P31/(wdt:P279*)) wd:Q817477;
    p:P131 ?districtnode.
  ?districtnode ps:P131 wd:Q15341.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "te,en". }
  OPTIONAL {
     ?districtnode pq:P580 ?startdate. 
     ?districtnode pq:P582 ?enddate. 
  }
  BIND(IF(BOUND(?startdate),year(?startdate),0) as ?startyear)
  BIND(IF(BOUND(?enddate),year(?enddate),0) as ?endyear)     
}
ORDER BY (?mandalLabel)
Try it!
,

I am unable to print the start year and end year for a statement of P131 if available. Can you help fix the query? Thanks.

Arjunaraoc (talk) 02:08, 12 April 2022 (UTC)[reply]

Your optional clause will match only districts that have both a start and an end date. To fix it put "optional" around both the line with the start date and around the line with the end date.
SELECT DISTINCT
  ?mandal ?mandalLabel (YEAR(?startdate) AS ?startyear) (YEAR(?enddate) AS ?endyear)
WHERE {
  ?mandal wdt:P31/wdt:P279* wd:Q817477;
    p:P131 ?districtnode.
  ?districtnode ps:P131 wd:Q15341.
  OPTIONAL { ?districtnode pq:P580 ?startdate. } 
  OPTIONAL { ?districtnode pq:P582 ?enddate. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "te,en". }
}
ORDER BY (?mandalLabel)
Try it!

--Infrastruktur (talk) 11:22, 12 April 2022 (UTC)[reply]

@Infrastruktur, Thanks a lot.--Arjunaraoc (talk) 15:55, 12 April 2022 (UTC)[reply]

help edit

Creating query for lebanese politicians who have missing articles in english. Im struggling to list out the images of them Prodrummer619 (talk) 17:44, 14 April 2022 (UTC)[reply]

SELECT ?item (COUNT(distinct ?sitelink) as ?count)  WHERE {    ?item wdt:P106 wd:Q82955 .   ?item wdt:P27 wd:Q822 .   ?sitelink schema:about ?item .   FILTER NOT EXISTS { ?wen schema:about ?item . ?wen schema:inLanguage "en" } }GROUP BY ?item ORDER BY DESC(?count)
Try it!
@Prodrummer619: You can use an optional section for getting images to your query, e.g. OPTIONAL { ?item wdt:P18 ?image . } You will need to also add the image variable to the GROUP BY clause, or use an aggregation function (typical SAMPLE) in the SELECT clause.
BTW. The query does find items without an article in English, but it doesn't test what type of article. So any English article e.g. in Wikipedia, Wikiquote or Wikisource will remove an item from results. Is that really your intention? --Dipsacus fullonum (talk) 18:19, 14 April 2022 (UTC)[reply]
Works now! And yes, that's not my intention, however I'm not prioritizing that right now. Thank you for the response. Prodrummer619 (talk) 18:45, 14 April 2022 (UTC)[reply]

Get all Kind of food of a given language edit

Hi there, since I am not a programmer, I have laboriously worked out the query. For other queries with similar problems it works, but here I run into a timeout. I try to get all kind of food for german.

 construct{
 ?subclass  ?rpredicate  ?entity.
 ?entity ?predicate ?subclass}
 WHERE {
 ?subclass (wdt:P279*) wd:Q2095.
 { ?entity ?predicate ?subclass. 
 }
 UNION
 { ?subclass ?rpredicate ?entity. 
 }}
The query above is trying to get all triplets where either subject or object is a subclass of food. What is probably many millions triplets, so the query will timeout. Please tell more precisely what you try to query about. We cannot help without knowing the intention. --Dipsacus fullonum (talk) 18:11, 10 April 2022 (UTC)[reply]
I wanted to create a list that includes all types of food (ID are sufficient). Obviously there are too many. But I have no idea how to reduce the amount or break them down into smaller parts. Maybe in the first step it is possible to determine only the all (sub)classes and in the second step for each of the classes the individual members. But I do not know how. Depending on how many classes there are you can't do that by hand anymore. 87.181.42.168 19:51, 11 April 2022 (UTC)[reply]
The query below will list all items that a subclass of food (Q2095) or an instance of a such subclass. First column is the item name, and second column is the item's German label if it exists. Is that fine, or should it be changed?
SELECT DISTINCT ?item ?itemLabel
WHERE
{
  ?item wdt:P31? / wdt:P279* wd:Q2095 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de" . }
}
Try it!
--Dipsacus fullonum (talk) 20:44, 11 April 2022 (UTC)[reply]
Thank you very much Dipsacus fullonum. That is exactly what I am looking for. And it is still extremely fast 87.181.41.160 06:55, 15 April 2022 (UTC)[reply]

Requesting the value of a qualifier that may or may not exist edit

SELECT ?entity ?desc (year(?date) as ?year) ?work {
  wd:Q238702 p:P166 [ps:P166 ?entity ; pq:P585 ?date ; pq:P1686 ?work1]
  OPTIONAL { ?entity rdfs:label ?desc FILTER((LANG(?desc)) = "en") }
  OPTIONAL { ?work1 rdfs:label ?work FILTER((LANG(?work)) = "en") }
} ORDER BY ?year
Try it!

Returns the one award Brian Aldiss won where the work was named

SELECT ?entity ?desc (year(?date) as ?year){
  wd:Q238702 p:P166 [ps:P166 ?entity ; pq:P585 ?date]
  OPTIONAL { ?entity rdfs:label ?desc FILTER((LANG(?desc)) = "en") }
} ORDER BY ?year
Try it!

Returns a dozen works

How can I get both, so I get a table for the dozen awards, with one of them with the name of the book? My banal questions are because all the examples seem to go straight from trivial to showing-off, a vertical learning curve Vicarage (talk) 17:02, 14 April 2022 (UTC)[reply]

@Vicarage: You can use OPTIONAL sections so possible results are not discarded if some data, like here a qualifier for for work (P1686), isn't present. Your queries above use the so called abbreviated blank node syntax (in lines like wd:Q238702 p:P166 [ps:P166 ?entity ; pq:P585 ?date]). In order to use OPTIONAL sections, you need to first split these to use simple triplets, and explicit name a variable for the statement nodes which was only implicit in the abbreviated syntax. So my query below introduces the variable ?award_statement which is used in the OPTIONAL section to get values for the work. I also simplify how to get English labels by using the label service.
SELECT ?entity ?entityLabel (YEAR(?date) as ?year) ?workLabel
{
  wd:Q238702 p:P166 ?award_statement .
  ?award_statement ps:P166 ?entity .
  OPTIONAL { ?award_statement pq:P585 ?date . }
  OPTIONAL { ?award_statement pq:P1686 ?work . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ?year
Try it!
--Dipsacus fullonum (talk) 18:02, 14 April 2022 (UTC)[reply]
Thanks. I managed to use IF(BOUND) to force nulls so the Mediawiki ExternalData keeps the work array in step with the other awards Vicarage (talk) 08:13, 15 April 2022 (UTC)[reply]

Query to produce the following output edit

Starting from usernames and using the username, usercontributions and the global account I would like to be able to produce the following table at any given point in time via a query:

User UserContributions DateFirstEdit TotalEdits n_enwikiEdits n_CommonsEdits n_WikidataEdits GlobalAccount DateLastEdit DaysEditing
MargaretRDonald https://en.wikipedia.org/wiki/Special:Contributions/MargaretRDonald 2017-12-07T00:00:00Z 535225 33802 18443 472570 https://en.wikipedia.org/wiki/Special:CentralAuth/MargaretRDonald 2022-04-13T00:00:00Z 1588
Oronsay https://en.wikipedia.org/wiki/Special:Contributions/Oronsay 2014-10-27T00:00:00Z 238065 75561 6254 154229 https://en.wikipedia.org/wiki/Special:CentralAuth/Oronsay 2022-04-13T00:00:00Z 2725
Cuyahoga44 https://en.wikipedia.org/wiki/Special:Contributions/Cuyahoga44 2019-07-22T00:00:00Z 11 11 0 0 https://en.wikipedia.org/wiki/Special:CentralAuth/Cuyahoga44 2022-04-07T00:00:00Z 990
DrPlantGenomics https://en.wikipedia.org/wiki/Special:Contributions/DrPlantGenomics 2019-07-25T00:00:00Z 1272 1247 0 0 https://en.wikipedia.org/wiki/Special:CentralAuth/DrPlantGenomics 2022-04-13T00:00:00Z 993

It would be good to be able to do it via a query and not by error-prone hand. MargaretRDonald (talk) 04:08, 16 April 2022 (UTC)[reply]

@MargaretRDonald: You can get editcounts for all WikiMedia projects and a total count from the meta=globaluserinfo API query (e.g. like this) which can be integrated into a SPARQL query. But is DateFirstEdit, DateLastEdit and DaysEditing for a certain wiki (like e.g. the English Wikipedia) or the first and last edit on any MediaWiki project? If it is the former, then that wiki can be queried, but if it the latter, I don't see an easy solution. --Dipsacus fullonum (talk) 07:02, 16 April 2022 (UTC)[reply]
@Dipsacus fullonum: I'd be happy if it was the first and last edit on enwiki and to forget other projects.. Although later I would love to do the same for edits on wikicommons. MargaretRDonald (talk) 07:15, 16 April 2022 (UTC)[reply]
@MargaretRDonald: Here you are:
SELECT
  ?User
  (URI(CONCAT("https://en.wikipedia.org/wiki/Special:Contributions/", ?User)) AS ?UserContributions)
  ?DateFirstEdit ?TotalEdits ?n_enwikiEdits ?n_CommonsEdits ?n_WikidataEdits
  (URI(CONCAT("https://en.wikipedia.org/wiki/Special:CentralAuth/", ?User)) AS ?GlobalAccount)
  ?DateLastEdit
  (xsd:dateTime(?DateLastEdit) - xsd:dateTime(?DateFirstEdit) AS ?DaysEditing)
WHERE
{
  VALUES ?User { "MargaretRDonald" "Oronsay" "Cuyahoga44" "DrPlantGenomics" }
  SERVICE wikibase:mwapi
  {
    # Get edit counts from globaluserinfo API call.    

    # Best supported service is Generator. We don't need or use the generator output,
    # but can construct the API call to also include the meta=globaluserinfo info
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "random" . # This generator doesn't make much output, and doesn't require additional parameters

    # What we really want here:
    bd:serviceParam mwapi:meta "globaluserinfo" . 
    bd:serviceParam mwapi:guiuser ?User .
    bd:serviceParam mwapi:guiprop "merged|editcount" .

    # XPath expressions to extract the results:
    ?TotalEdits wikibase:apiOutput "/api/query/globaluserinfo/@editcount" .
    ?n_enwikiEdits wikibase:apiOutput "/api/query/globaluserinfo/merged/account[@wiki='enwiki']/@editcount" .
    ?n_CommonsEdits wikibase:apiOutput "/api/query/globaluserinfo/merged/account[@wiki='commonswiki']/@editcount" .
    ?n_WikidataEdits wikibase:apiOutput "/api/query/globaluserinfo/merged/account[@wiki='wikidatawiki']/@editcount" .
    bd:serviceParam wikibase:limit "once" .
  }
  SERVICE wikibase:mwapi
  {
    # Get dates for first edit at enwiki using the usercontribs API call.

    # Best supported service is Generator. We don't need or use the generator output,
    # but can construct the API call to also include the list=usercontribs info
    bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "random" . # This generator doesn't make much output, and doesn't require additional parameters
    
    # What we really want here:
    bd:serviceParam mwapi:list "usercontribs" . 
    bd:serviceParam mwapi:ucuser ?User .    
    bd:serviceParam mwapi:ucdir "newer" . # Start with oldest edit
    bd:serviceParam mwapi:ucprop "timestamp" . # Get timestamp
    bd:serviceParam mwapi:uclimit "1" .
    
    # XPath expressions to extract the result:
    ?DateFirstEdit wikibase:apiOutput "/api/query/usercontribs/item/@timestamp" .
    bd:serviceParam wikibase:limit "once" .
  }
  SERVICE wikibase:mwapi
  {
    # Get dates for last at enwiki using the usercontribs API call.

    # Best supported service is Generator. We don't need or use the generator output,
    # but can construct the API call to also include the list=usercontribs info
    bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "random" . # This generator doesn't make much output, and doesn't require additional parameters
    
    # What we really want here:
    bd:serviceParam mwapi:list "usercontribs" . 
    bd:serviceParam mwapi:ucuser ?User .    
    bd:serviceParam mwapi:ucdir "older" . # Start with newest edit
    bd:serviceParam mwapi:ucprop "timestamp" . # Get timestamp
    bd:serviceParam mwapi:uclimit "1" .
    
    # XPath expressions to extract the result:
    ?DateLastEdit wikibase:apiOutput "/api/query/usercontribs/item/@timestamp" .
    bd:serviceParam wikibase:limit "once" .
  }
}
Try it!
--Dipsacus fullonum (talk) 13:15, 16 April 2022 (UTC)[reply]
That is beautiful and amazing @Dipsacus fullonum: I have now tried it with 33 names in the value list and it works perfectly. MargaretRDonald (talk) 21:21, 16 April 2022 (UTC)[reply]

Unable to filter the results edit

Hi guys, I am trying to query all diseases with a symptom of "headache", the following is my query:

SELECT ?item ?itemLabel ?symptomsLabel WHERE {

 ?item wdt:P31 wd:Q12136 .
 ?item wdt:P780 ?symptoms .
 FILTER (REGEX(?symptoms, "headache"))
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

}

But the query returns me 0 record, can I know which part I did wrong?

Hi, your filter gives 0 results because ?symptoms will be bound to IRI values for items, and they doesn't contain any strings. For the appraoch with a REGEX filter to work, you would need an extra triple going from the symptoms item to either its label, description or aliases in some specific language. But that would be very inefficient. I propose to directly match the item for headache, headache (Q86):
SELECT ?item ?itemLabel
WHERE
{
 ?item wdt:P31 wd:Q12136 .
 ?item wdt:P780 wd:Q86 .
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Dipsacus fullonum (talk) 05:53, 18 April 2022 (UTC)[reply]
Thanks for the help!
By the way, if you know any query, where I can get the ID, for example: Q86, from using the "Heart disease" string?
I am learning to write query without knowing the item's ID, which I will query from HTTP request rather than from the wikidata query service where we can press CTRL + SPACE to get the ID.
Thanks in advanced. 161.139.102.162 08:10, 18 April 2022 (UTC)[reply]
I am not sure if I understand the request. But you can find any items which have the string "Heart disease" as its English label with this query:
SELECT ?item ?itemLabel ?instance_ofLabel
WHERE
{
  ?item rdfs:label "Heart disease"@en .
  ?item wdt:P31 ?instance_of .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
Note the @en suffix to the string giving the language code for English. Nothing is found if you omit a language code as all labels are language coded. I added the corresponding values for instance of (P31) in the query so you can see that the found items are all scholarly articles. The disease isn't found because the English label for the item heart disease (Q190805)) is in lowercase as recommended in Help:Label#Capitalization. --Dipsacus fullonum (talk) 08:58, 18 April 2022 (UTC)[reply]
Yes, you did understand the request and manage to solve it.
Really appreciate the help! 161.139.102.162 09:13, 18 April 2022 (UTC)[reply]

Last revision edit

How to get output item and ID of the item's last revision? Thanks. Игорь Темиров (talk) 11:49, 18 April 2022 (UTC)[reply]

@Игорь Темиров:
#title: Get revision ID for last revision of some items
SELECT ?item ?revid
WHERE
{
  VALUES ?item_strings { "Q2" "Q5" }
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "revisions" .
    bd:serviceParam mwapi:prop "revisions" . 
    bd:serviceParam mwapi:titles ?item_strings .
    ?item wikibase:apiOutputItem mwapi:title .
    ?revid wikibase:apiOutput "revisions/rev/@revid" .
  }
}
Get revision ID for last revision of some items
--Dipsacus fullonum (talk) 12:54, 18 April 2022 (UTC)[reply]
Bravo! Thank you very much! Игорь Темиров (talk) 17:03, 18 April 2022 (UTC)[reply]

two or more pictures in one item via one property edit

We want to have a list, containing all Upper Lusatian house (Q1362233) like f.e. Horní Podluží 56 (Q111523584) containing two images, because we want to restructure our data :) . Regards, thank you very much! Conny (talk) 06:48, 20 April 2022 (UTC)[reply]

@Conny: Counts:
SELECT ?item ?itemLabel (count(?stat) as ?count ) WHERE {
  ?item wdt:P31 wd:Q1362233.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de,cs,en" . }
  
  ?item p:P18 ?stat.
  ?stat ps:P18 ?img.
} group by ?item ?itemLabel having (?count>1) order by desc(?count)
Try it!
image pairs - note some items will have multiple rows
SELECT ?item ?itemLabel ?img ?img2  WHERE {
  ?item wdt:P31 wd:Q1362233.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de,cs,en" . }
  
  ?item p:P18 ?stat.
  ?stat ps:P18 ?img.
  ?item p:P18 ?stat2.
  ?stat2 ps:P18 ?img2.
  filter(str(?stat) < str(?stat2))
}
Try it!
--Tagishsimon (talk) 08:25, 20 April 2022 (UTC)[reply]
Thank you so much Tagishsimon :) . Regards, Conny (talk) 14:25, 20 April 2022 (UTC).[reply]

Query timeout limit reached edit

I’m trying to find all subjects and objects with the property “shares border with (P47)” on Wikipedia in Arabic (I need to retrieve the Arabic wikipedia Title), but it times out when I added the : ^schema:about schema:isPartOf Lines..

Is there any possible way to get the results with timing out?


The Query: https://w.wiki/55Un

SELECT ?sbj_ar ?obj_ar WHERE {

 ?sbj wdt:P47 ?obj .
 
 ?sbj ^schema:about ?sbj_url_ar .
 ?sbj_url_ar schema:isPartOf <https://ar.wikipedia.org/>; schema:name ?sbj_ar .
 
 ?obj ^schema:about ?obj_url_ar .
 ?obj_url_ar schema:isPartOf <https://ar.wikipedia.org/>; schema:name ?obj_ar .

}



Thank you!

Is there any possible way to get the results without timing out? I doubt it because P47 is used over 600,000 times and more than 2 million items are linked to arwiki. I suggest trying to limit the results some way, e.g. by country, type of items or something else. --Dipsacus fullonum (talk) 08:27, 21 April 2022 (UTC)[reply]

Server Timeout Issue edit

Hello all! I'm running into an issue with the following query where the server times out:

SELECT
  ?person ?personLabel (GROUP_CONCAT(?citizenshipLabel;SEPARATOR=', ') AS ?citizenship)
  WHERE
  {
    ?person wdt:P3171 ?value .
    OPTIONAL { ?person wdt:P27 ?value2 . 
             ?value2 rdfs:label ?citizenshipLabel . FILTER (LANG(?citizenshipLabel)='en') }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
  }
  GROUP BY ?person ?personLabel
Try it!

I'm attempting to write a query that returns the citizenship of Olympians, with the eventual hope of adding other parameters (like sex or gender), but even just the combination of these two properties returns an error (even after attempting to limit the number of rows). I know this is a known limitation of current server resources, but if anybody knows any workarounds that might help, I'd be immensely grateful!

This seems to work, but don't ask me why it's faster. :P Got it down to about 30s, but adding gender made it dangerously close to timing out again. You can see where I've commented it out. Probably the only way you'll be able to have more columns is if you limit the working set by some criteria, gender for instance. Infrastruktur (talk) 16:39, 21 April 2022 (UTC)[reply]
Thank you very much for your helpful response! Tjg6ph (talk) 23:02, 21 April 2022 (UTC)[reply]
SELECT
  ?person (SAMPLE(?personLabel) AS ?personLabel) (SAMPLE(?genderLabel) AS ?genderLabel) (GROUP_CONCAT(?citizenshipLabel;SEPARATOR=', ') AS ?citizenship)
WHERE {
  ?person wdt:P3171 [] .
  OPTIONAL { ?person wdt:P27 ?value2 . }
  #OPTIONAL { ?person wdt:P21 ?gender . }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
    ?person rdfs:label ?personLabel.
    #?gender rdfs:label ?genderLabel.
    ?value2 rdfs:label ?citizenshipLabel.
  }
}
GROUP BY ?person
Try it!
SELECT
  ?person (SAMPLE(?personLabel) AS ?personLabel) (GROUP_CONCAT(?citizenshipLabel;SEPARATOR=', ') AS ?citizenship)
WHERE {
  ?person wdt:P3171 [];
    wdt:P21 wd:Q6581072. # Only females
  OPTIONAL { ?person wdt:P27 ?value2 . }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
    ?person rdfs:label ?personLabel.
    ?value2 rdfs:label ?citizenshipLabel.
  }
}
GROUP BY ?person
Try it!

Frwiki subways articles on a map edit

Hello, I'd like a map of (current) subways stations with frwiki article being dotted as green + their frwiki link and without frwiki articles being dotted as red with their (?en? wiki link). Is it feasible ? Bouzinac💬✒️💛 13:17, 20 April 2022 (UTC)[reply]

@Bouzinac:
#defaultView:Map{"hide": ["?rgb", "?coordinates"]}
SELECT DISTINCT ?item ?itemLabel ?coordinates ?wiki ?rgb
WHERE
{
  ?item wdt:P31 / wdt:P279* wd:Q928830 . # subway station
  ?item wdt:P17 wd:Q142 . # in France
  ?item wdt:P625 ?coordinates .
  OPTIONAL
  {
    ?frwiki schema:about ?item .
    ?frwiki schema:isPartOf <https://fr.wikipedia.org/> .
  }
  OPTIONAL
  {
    ?enwiki schema:about ?item .
    ?enwiki schema:isPartOf <https://en.wikipedia.org/> .
  }
  BIND (IF(BOUND(?frwiki), "00FF00", "FF0000") AS ?rgb)
  BIND (COALESCE(?frwiki, ?enwiki) AS ?wiki)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en" . }
}
Try it!
--Dipsacus fullonum (talk) 14:53, 20 April 2022 (UTC)[reply]
Very nice! I've amended here https://w.wiki/55LL with a yellow color (enwiki) and a red color (neither fr nor enwiki) ? Bouzinac💬✒️💛 15:17, 20 April 2022 (UTC)[reply]
Discovered today that you can plot interconnecting lines on the map as well, which is very cool. Here's the query adapted to show Paris metro just as an example. Edit: I'm aware the lines are doubled. Cookie to whoever fixes that. Infrastruktur (talk) 16:42, 20 April 2022 (UTC)[reply]
#defaultView:Map{"hide": ["?rgb", "?coord1","?coord2","?line"]}
SELECT DISTINCT ?station1 ?station1Label ?coord1 ?wiki ?rgb ?line
WHERE {
  ?station1 wdt:P31/wdt:P279* wd:Q928830;
    wdt:P361 wd:Q50716;
    wdt:P625 ?coord1.
  OPTIONAL {
    ?station1 wdt:P197 ?station2.
    ?station2 wdt:P625 ?coord2.
    ?station1 p:P625 [ ps:P625 []; psv:P625 [ wikibase:geoLongitude ?coord1lon; wikibase:geoLatitude ?coord1lat; ] ] .
    ?station2 p:P625 [ ps:P625 []; psv:P625 [ wikibase:geoLongitude ?coord2lon; wikibase:geoLatitude ?coord2lat; ] ] .
    BIND(CONCAT('LINESTRING (', STR(?coord1lon), ' ', STR(?coord1lat), ',', STR(?coord2lon), ' ', STR(?coord2lat), ')') AS ?str) .
    BIND(STRDT(?str, geo:wktLiteral) AS ?line) .
  }
  OPTIONAL {
    ?frwiki schema:about ?station1 .
    ?frwiki schema:isPartOf <https://fr.wikipedia.org/> .
  }
  OPTIONAL {
    ?enwiki schema:about ?station1 .
    ?enwiki schema:isPartOf <https://en.wikipedia.org/> .
  }
  BIND (IF(BOUND(?frwiki), "006400", IF(BOUND(?enwiki), "F7E35F", "FF0000")) AS ?rgb)
  BIND (COALESCE(?frwiki, ?enwiki) AS ?wiki)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en" . }
}
Try it!
@Infrastruktur:: Nice! We have seen maps using well-known text representation of geometry (Q4018860) before on this page, but a reminder of the possibilities is fine. You could add a simple filter: FILTER (STR(?station1) < STR(?station1)) but some lines might disappear where adjacent station (P197) are not used for both stations. The property has a suggestion constraint (Q62026391) of symmetric constraint (Q21510862), but that isn't a guarantee for completeness, so it is probably easiest to just leave the double lines. --Dipsacus fullonum (talk) 18:13, 20 April 2022 (UTC)[reply]
Yes, I would see Paris with this template {{Metro}} to see that , eg Map of Paris Métro (query). I would welcome any suggestion to improving the inside query :) Bouzinac💬✒️💛 05:08, 21 April 2022 (UTC)[reply]
That looks even better, and I guess the lines thing was old news. So yeah there are a couple of things I would do. First thing is put optional around the part that makes the line and requests P197, since if a terminal isn't connected it would be useful if it shows up in the map so that it can be fixed. Also there are places where you request both wdt and p for the same predicate, I would replace the wdt with p here to allow for reuse of data, but remember to add a wikibase:BestRank. Infrastruktur (talk) 13:02, 21 April 2022 (UTC)[reply]
Hi, could you elaborate the p / wdt stuff? (I know SPARQL but I am absolutely no expert in SPARQL). Feel free to edit the template directly ! Bouzinac💬✒️💛 11:45, 22 April 2022 (UTC)[reply]

Federation question edit

Here's a query (which works) to find the most common classes of the values of depicts (P180) statements on Commons:

# Most common classes for "depicts" (P180) on Commons
SELECT (SUM(?count) AS ?sum) ?class ?class_label (SAMPLE(?value) AS ?instance) (SAMPLE(?value_label) AS ?instance_label) (SAMPLE(?example) AS ?sample)

WITH { 
  SELECT (COUNT(DISTINCT(?file)) AS ?count) ?value (SAMPLE(?file) AS ?example) WHERE {
     service bd:sample {
       ?file wdt:P180 ?value . 
        bd:serviceParam bd:sample.limit 1000 .
        bd:serviceParam bd:sample.sampleType "RANDOM" .
     }  
  } GROUP BY ?value
  ORDER BY DESC(?count)
#  LIMIT 2000           
} AS %values
 
WHERE {
    INCLUDE %values .
    service <https://query.wikidata.org/sparql> {
      ?value wdt:P31 ?class
      OPTIONAL {?value rdfs:label ?value_label FILTER (lang(?value_label) = 'en') } .
      OPTIONAL {?class rdfs:label ?class_label FILTER (lang(?class_label) = 'en') } .
    }
} GROUP BY ?class ?class_label 
ORDER BY DESC(?sum)
Try it!

However, the query seems to break (it times out) if I try to do the grouping before the label lookup:

# Most common values for "depicts" (P180)
SELECT ?sum ?class ?class_label ?instance ?instance_label ?sample

WITH { 
  SELECT (COUNT(DISTINCT(?file)) AS ?count) ?value (SAMPLE(?file) AS ?example) WHERE {
     service bd:sample {
       ?file wdt:P180 ?value . 
        bd:serviceParam bd:sample.limit 10 .
        bd:serviceParam bd:sample.sampleType "RANDOM" .
     }  
  } GROUP BY ?value
  ORDER BY DESC(?count)
  LIMIT 2000           
} AS %values
 
WHERE {
    INCLUDE %values .
    service <https://query.wikidata.org/sparql> {
      {
         SELECT (SUM(?count) AS ?sum) ?class (SAMPLE(?value) AS ?instance) (SAMPLE(?example) AS ?sample) WHERE {
            ?value wdt:P31 ?class
         } GROUP BY ?class 
      }
      OPTIONAL {?instance rdfs:label ?instance_label FILTER (lang(?instance_label) = 'en') } .
      OPTIONAL {?class rdfs:label ?class_label FILTER (lang(?class_label) = 'en') } .
    }
} 
ORDER BY DESC(?sum)
Try it!

Can anybody spot what I've got wrong here, or what I need to do to fix it? Possibly it may be as simple as having mis-spelt a variable name somewhere, but I can't see it. Alternatively, is there anything I need to do, to make sure that the federated query can see the values of ?count, ?value and ?example that it needs? It's been a while since I've done anything with federation, have I missed something out?

Thanks for any help you can give, Jheald (talk) 20:38, 21 April 2022 (UTC)[reply]

One way round it is this:
# Most common values for "depicts" (P180)
SELECT ?sum ?class ?class_label ?instance ?instance_label ?sample

WITH { 
  SELECT (COUNT(DISTINCT(?file)) AS ?count) ?value (SAMPLE(?file) AS ?example) WHERE {
     service bd:sample {
       ?file wdt:P180 ?value . 
        bd:serviceParam bd:sample.limit 1000 .
        bd:serviceParam bd:sample.sampleType "RANDOM" .
     }  
  } GROUP BY ?value
  ORDER BY DESC(?count)
#  LIMIT 2000           
} AS %values
 
WITH {
  SELECT (SUM(?count) AS ?sum) ?class (SAMPLE(?value) AS ?instance) (SAMPLE(?example) AS ?sample) WHERE {
    INCLUDE %values .
    service <https://query.wikidata.org/sparql> {
         ?value wdt:P31 ?class
    }
  } GROUP BY ?class 
} AS %classes

WHERE {
  INCLUDE %classes . 
  service <https://query.wikidata.org/sparql> {
      OPTIONAL {?instance rdfs:label ?instance_label FILTER (lang(?instance_label) = 'en') } .
      OPTIONAL {?class rdfs:label ?class_label FILTER (lang(?class_label) = 'en') } .
  }
} 
ORDER BY DESC(?sum)
Try it!
but I'm curious as to why the second query above didn't work, and what I would have needed to have done to make it work, if anyone has any thoughts. Jheald (talk) 20:47, 21 April 2022 (UTC)[reply]
That's a wicked hard problem. I wasn't able to spot any issues with query two but I made two observations. In query one and three you perform the aggregation step on the commons endpoint, where on query two you perform it on the wikidata endpoint which gets a lot more traffic. I also tried converting your third query from using named subqueries to using nested regular subqueries and that worked just fine too, it didn't even look scary. Ok, maybe a little... Infrastruktur (talk) 22:40, 21 April 2022 (UTC)[reply]
@Jheald: I think that the problem with query 2 is that you have ?value wdt:P31 ?class in a nested subquery where the variable ?value from the INCLUDE isn't in scope, so it tries to match all triples with wdt:P31. You cannot move the INCLUDE inside the federation, so I don't have better solutions than your query 1 or 3. --Dipsacus fullonum (talk) 03:11, 22 April 2022 (UTC)[reply]
@Dipsacus fullonum, Infrastruktur: Aaah. Yes, that's it. I've got the INCLUDE %values in the wrong place in #2. And, as you say, if I try to move it inside the federation, the results aren't on wdqs to use, while on the Commons side they never get re-used, so the query then throws the error "Named subquery results are not used by this query: %values". So: #3 it is, then. Thanks everybody. Jheald (talk) 09:40, 22 April 2022 (UTC)[reply]
I apologize if I ended up just adding confusion, at the point I was saying that I suspected it could be a performance issue, but it turned out not to be. Dipsacus is spot on. A very nice catch. But like the manual says, the inner subquery is executed before any contents in the outer query so that means at that point the inner query doesn't know about the contents of %values yet, so ?value is unbound in the inner query making it match all triples. The INCLUDE is available to the federation as far as I can tell, otherwise your other queries would not have worked., but it can't be moved inside the federation because an external endpoint might not know about blazegraph extensions, it might have worked if the source "with" statement was also inside the same federation, but I didn't test that. Infrastruktur (talk) 13:07, 22 April 2022 (UTC)[reply]
Disregard the very last part. Just tested a query from Apache Jena to the Wikidata endpoint, and from the commons sparql GUI to the Wikidata endpoint and both times it throws a syntax error if I try to use a complete named subquery inside a federation. It looks like this happens before the query even executes so I'm guessing that type of query language extensions aren't allowed in federations, regardless of if the target endpoint supports them. I could find no mention of this being disallowed in the W3C docs, so it comes as a surprise. Infrastruktur (talk) 18:00, 22 April 2022 (UTC)[reply]
@Infrastruktur: I was thinking about what you wrote earlier, and it seems to me the problem would be that the INCLUDE in the federated query would be trying to include a solution set that wasn't available on WDQS -- it would have been computed on Commons. To make it work, you'd have to replace the INCLUDE with a federation request back from WDQS to perform a subquery on Commons. But I think at the moment Commons doesn't accept incoming federation, not even from WDQS, so it may not be possible even to try. Jheald (talk) 20:21, 22 April 2022 (UTC)[reply]

list objects by ID edit

hello,

can I get a list of items starting with Q1? Might be useful for other entities or properties too. But it would be useful I think to see a list starting with Q1, that displays 100 or 500 items, then it could paginate, and you see the next block. can someone help create this? Does it already exist? thank you, Skakkle (talk) 19:32, 23 April 2022 (UTC)[reply]

Sounds like a fun problem seing as there aren't really any obvious ways of doing this. I got two suggestions. Suggestion one require either knowledge of scripting or using an online tool to generate the range of items you're interested in. If you need to retrieve extra data for each item this is the one you want to use. Suggestion two is something I made as a fun excercise. It generates a list of the entities itself. Unfortunately I am not aware of any way to use the URLs for triple matches, which means I can't remove redirects or deleted items. Also I think it's probably not very portable. Infrastruktur (talk) 21:40, 23 April 2022 (UTC)[reply]
SELECT DISTINCT ?s ?sLabel
WHERE {
  # Generate the desired range of items with a script or for instance with this tool:
  # https://textmechanic.com/text-tools/numeration-tools/generate-list-numbers/
  VALUES ?s { wd:Q1 wd:Q2 wd:Q3 wd:Q4 wd:Q5 wd:Q6 wd:Q7 wd:Q8 wd:Q9 } # Replace the contents of the curly brackets
  MINUS { ?s owl:sameAs []. } # No redirects
  FILTER EXISTS { ?s ?p []. } # Must exist or be non-empty
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
ORDER BY xsd:integer(STRAFTER(STR(?s), "http://www.wikidata.org/entity/Q"))
Try it!
SELECT ?s ?sLabel
WHERE {
  {
    SELECT ?a WHERE {
      VALUES ?i { 0 1 2 3 4 5 6 7 8 9 }
      {
        SELECT ?j ?k WHERE {
          VALUES ?j { 0 1 2 3 4 5 6 7 8 9 }
          {
            SELECT ?k WHERE {
              VALUES ?k { 0 1 2 3 4 5 6 7 8 9 }
            }
          }
        }
      }
      BIND((?k * 100 + ?j * 10 + ?i) AS ?a)
    } ORDER BY DESC(?a)
  }
  VALUES ?start { 10 } # Start at this QID
  BIND(URI(CONCAT("http://www.wikidata.org/entity/Q", STR(?start + ?a))) AS ?s)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
ORDER BY ?a
LIMIT 100 # List at most this many items
Try it!
Nice! Tweaked version of query 2, returning only items that actually exist: https://w.wiki/56Ce -- Jheald (talk) 10:02, 24 April 2022 (UTC)[reply]
Very interesting. I tried wrapping the triple match inside a federation. I also tried wrapping it inside a regular subquery and both worked. In the process I learned that subqueries accepts data from the outside. I am still puzzled why this workaround is needed in the first place, that would be nice to know. Infrastruktur (talk) 17:15, 24 April 2022 (UTC)[reply]
@Infrastruktur: My working hypothesis would be that Blazegraph may by default try to reorder the query to do the BIND last -- ie to reorder it after the ?s wikibase:statements [] ; but by doing so ?s would then not yet have been defined, so that line would then try to match everything.
I used the named subquery to force the order of execution, to make sure ?s was built first; but the same can also be achieved using a query hint to turn off the optimiser: https://w.wiki/56Gf -- Jheald (talk) 17:59, 24 April 2022 (UTC)[reply]
Note also that subqueries effectively only accept data from the outside if the variable is one of the one's selected for. Technically, the variable is out-of-scope in the subquery. But if the engine knows the variable is going to be selected, and then joined with an existing set, it makes the optimisation that it only needs to consider those values of the variable inside the subquery.
In general, expect surprises -- the question of whether data from the outside is in scope or not within any particular set of braces { ... } is one that I still get wrong on a regular basis, and sometimes can seem the most anti-intuitive bit of the entire language. As a rule, it isn't, so subclauses that rely on tests like FILTER(bound(?x)) can be endlessly surprising. Jheald (talk) 18:09, 24 April 2022 (UTC)[reply]

Two stats about humans edit

  1. what is the number of items that are individual persons (instance of=human?), that have no date of birth and no date of death, and have a sitelink to commons or a P373?
  2. a percentage, that is the number of individual person items that have a date of birth between 1902 and 1937 (85-120 years ago) and no date of death, to the number of individual person items that have a date of birth after 1902 and no date of death. (i.e. the percentage of older-than-85 persons among all the persons that might be alive). RZuo (talk) 17:24, 23 April 2022 (UTC)[reply]
@RZuo: We currently have just short of 10 million items for individual instances of human,
SELECT (COUNT(*) AS ?count) WHERE {
  ?item wdt:P31 wd:Q5
}
Try it!
-- currently giving me 9,716,118.
This is a bit too many to analyse exhaustively in one go. But luckily for the stats you are looking for there is a Blazegraph addition to SPARQL that lets one ask for a random sample of cases of a particular triple. So using that we can estimate some answers to your questions:
SELECT ?item ?itemLabel WHERE {
   SERVICE bd:sample {
     ?item wdt:P31 wd:Q5 .
     bd:serviceParam bd:sample.limit 100000 .
     bd:serviceParam bd:sample.sampleType "RANDOM" .
   }
   MINUS {?item wdt:P569 []} .
   MINUS {?item wdt:P570 []} .
   OPTIONAL {?item wdt:P373 ?commonscat} .
   OPTIONAL {
      ?commons schema:about ?item .
      ?commons schema:isPartOf <https://commons.wikimedia.org/> .
   }
   FILTER(bound(?commonscat)|| bound(?commons)) . 
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
I got 1684 and 1757 the two times I ran it; so it looks like a typical run may give around 1720 with a random sampling error of about 40 either way. So multiplying that up by a factor of 100, there may be around 170,000 items for which your first question is true. (That was Commons cat or English wikipedia)
Trying again, specifically for Commons I got 298 and 313; so about 305 with a sampling error either way of about 20. And maybe 30,000 such items across wikidata as a whole.
You could also replace the first line with SELECT (COUNT(*) AS ?count), if all you are interested in is just the count.
Turning to your second question, with
SELECT ((xsd:integer(1000* SUM(?sub85)/COUNT(?item))/10.0) AS ?pct) WHERE {
   SERVICE bd:sample {
     ?item wdt:P31 wd:Q5 .
     bd:serviceParam bd:sample.limit 100000 .
     bd:serviceParam bd:sample.sampleType "RANDOM" .
   }
   MINUS {?item wdt:P570 []} .
   ?item wdt:P569 ?dob .
   FILTER(year(?dob) > 1901) .
   BIND(IF((year(?dob) > 1936), 1.0, 0.0) AS ?sub85) .
}
Try it!
I get about 91.2% born after 1936.
This is likely to be an underestimate of the true proportion, as the older that people are, probably the more likely they are to have died, without the death having been recorded on wikidata. Jheald (talk) 20:03, 23 April 2022 (UTC)[reply]
Also of interest may be this plot, https://w.wiki/566z showing the distribution for year-of-birth after 1901, out of all the people we do not have a date of death for. (The spike on 2000 is possibly an artefact, as this may also be aggregating 2000s births with only decade/century/millenium precision. Jheald (talk) 20:14, 23 April 2022 (UTC)[reply]
Also of potential interest, this chart https://w.wiki/566w showing the proportion of people by year of birth for whom we do not have a date of death. Jheald (talk) 20:27, 23 April 2022 (UTC)[reply]

Filter instances of species/subspecies by user input string edit

I want to reliably get the correct species of animal based on user input.

The closest I was able to come was this query:

 SELECT DISTINCT ?item
   (GROUP_CONCAT(DISTINCT lcase(?common_name); separator=", ") as ?common_names)
   ?scientific_name
   (GROUP_CONCAT(DISTINCT lcase(?taxon_common); separator=", ") as ?taxon_common)
 WHERE {
   ?item wdt:P31 wd:Q16521 .
   ?item wdt:P171+ wd:Q7377 .
   ?item wdt:P225 ?scientific_name.
   ?item wdt:P1843 ?taxon_common.
   ?item wdt:P366 ?common.
   ?common rdfs:label ?common_name.
   # Only return common names in English
   FILTER(LANGMATCHES(LANG(?common_name), "en"))
   FILTER(LANGMATCHES(LANG(?taxon_common), "en"))
   # Filter by string value.
   FILTER(
     CONTAINS(lcase(?common_name), "cat") || CONTAINS(lcase(?taxon_common), "cat")
   )
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
   }
 GROUP BY ?item ?scientific_name ?taxon_common
 LIMIT 100

[1]

But as this example query shows, "cattle" came before "cat" and I'm not sure how I can optimize this search. Something like a filter that could compare the % of the returned string that matches the filter string might work but I'm at a loss for how to make this performant.

I guess the simplest and most efficient scoring system would be to compare the difference in string length after the initial substring match. The proper way to compare the similarity between two strings would be to calculate the Levenshtein distance between the two strings, but that would be overkill here. HTH. Infrastruktur (talk) 06:08, 25 April 2022 (UTC)[reply]
SELECT DISTINCT ?item
  (GROUP_CONCAT(DISTINCT lcase(?common_name); separator=", ") as ?common_names)
  (SAMPLE(?scientific_name) AS ?scientific_name)
  (GROUP_CONCAT(DISTINCT lcase(?taxon_common); separator=", ") as ?taxon_common)
  (MIN(?score) AS ?matchscore)
WHERE {
  BIND ("cat" AS ?searchstr) # Fantastic cats and where to find them
  ?item wdt:P31 wd:Q16521;
    wdt:P171+ wd:Q7377;
    wdt:P225 ?scientific_name.
  OPTIONAL {
    ?item wdt:P1843 ?taxon_common.
    FILTER(LANGMATCHES(LANG(?taxon_common), "en"))
  }
  OPTIONAL {
    ?item wdt:P366 ?common.
    ?common rdfs:label ?common_name.
    FILTER(LANGMATCHES(LANG(?common_name), "en"))
  }
  FILTER(BOUND(?common_name)|| BOUND(?taxon_common))
  FILTER( CONTAINS(lcase(?common_name), ?searchstr) || CONTAINS(lcase(?taxon_common), ?searchstr) )
  BIND((STRLEN(?common_name) - STRLEN(?searchstr)) AS ?ldiff_a)
  BIND((STRLEN(?taxon_common) - STRLEN(?searchstr)) AS ?ldiff_b)
  BIND(
    IF(BOUND(?ldiff_a) && !BOUND(?ldiff_b), ?ldiff_a,
      IF(!BOUND(?ldiff_a) && BOUND(?ldiff_b), ?ldiff_b,
        IF(!BOUND(?ldiff_a) && !BOUND(?ldiff_b), 9999,
          IF(?ldiff_a < ?ldiff_b, ?ldiff_a, ?ldiff_b))))
  AS ?score)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?item
ORDER BY ?matchscore
Try it!
@Infrastruktur: You can also use regular expressions with word boundaries, but that will also match species like Homotherium (Q132606) and fishing cat (Q190674) etc. But why not simply try for an exact match with the search string? That would also be much faster. --Dipsacus fullonum (talk) 08:50, 25 April 2022 (UTC)[reply]
True. On a related note, do you know if the CirrusSearch extension for Wikibase supports fuzzy searches? Infrastruktur (talk) 09:38, 25 April 2022 (UTC)[reply]
Interesting approach. More reliable than what I originally wrote so thank you!

Both queries take quite a long time to compute so I would love to find a more efficient strategy. Does the query builder support anything like Levenshtein distance or a "LIKE" operator?

Maybe then I could start with a small query reserved for finding the string match & pipe that entity ID to a second query that returns the other properties.

The reason I can't rely on an exact match is because there is no reliable property for all species that could be matched against the search string. Sometimes the "common name" properties are tied to the subspecies or genus rather than the species entity for example.

That was why I was initially restricting the query to results that had a P366 property. If I tried to query based on "?item wdt:P31 wd:Q7432" (species), the matches become quite inaccurate. 83archie (talk) 14:52, 25 April 2022 (UTC)[reply]
@83archie: All taxa should have instance of (P31) taxon (Q16521) independent of taxon rank. Use taxon rank (P105) species (Q7432) to select species only. You could search for exact matches to either taxon common name (P1843) or the label of has use (P366) or the taxon item's label or the taxon item's aliases first. That would be a fast search. Then is nothing is found, you can make a slower search using either CONTAINS or REGEX with word boundaries added. --Dipsacus fullonum (talk) 16:06, 25 April 2022 (UTC)[reply]
For the record, apparently the search engine for the wiki does support fuzzy searches, but quick tests on Wikidata suggests that not only is this not supported for entities, but it is not supported for other namespaces either. There is a metric shit-ton of queries that fail because of time-out, but if people stopped relying on Wikidata to do the heavy lifting and instead do the heavy lifting on their own computers, 90% of these problems will simply go away. Queries would then look like 1) grab data from wikidata and 2) do the processing on your own computer, such as computing Levenshtein distances. I'm afforded the luxury of ignoring the 60 second timeout of Wikidata because I have Apache Jena installed on my box. Jena also allows users to supply custom javascript functions to do things like computing Levenshtein distances. I wonder how many people realize this is an option, and a damn good one to boot. #shamelessjenapromo Infrastruktur (talk) 17:00, 25 April 2022 (UTC)[reply]

YouTube videos with music video in the name edit

Would this be possibly?--Trade (talk) 23:28, 24 April 2022 (UTC)[reply]

@Trade: Do you mean name (P2561) as a qualifier to YouTube video ID (P1651) or as an independent statement? --Dipsacus fullonum (talk) 08:07, 25 April 2022 (UTC)[reply]
@Dipsacus fullonum: The first--Trade (talk) 14:19, 25 April 2022 (UTC)[reply]
@Trade: Good, that makes the query easy and fast because name (P2561) doesn't have many uses as qualifier. There is only 1 use as qulifier for YouTube video ID (P1651), and it doesn't meet the criteria, so the query has no results:
SELECT ?item ?itemLabel ?YouTubeVideoID ?name
WHERE
{
  ?statement pq:P2561 ?name . # Start here because P2561 is only used as qualifier ca. 2500 times.
  ?item p:P1651 ?statement .
  FILTER (CONTAINS (?name, "music video") || CONTAINS (?name, "musicvideo") || CONTAINS (?name, "official"))
  MINUS { ?item wdt:P31 wd:Q193977 }
  ?statement ps:P1651 ?YouTubeVideoID .
  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
Try it!
--Dipsacus fullonum (talk) 20:52, 25 April 2022 (UTC)[reply]

Is it possible to create a hierarchical tree diagram for two known entities? edit

Looking at the template query for "Parent taxons of Blue Whale" I was curious about how it could be used to determine the common path between two species.

 #Parent taxons of Blue Whale
 #defaultView:Graph
 SELECT ?item ?itemLabel ?linkTo ?pic
 WHERE
 {
 wd:Q42196 wdt:P171* ?item . # <---- species ID goes here
 OPTIONAL { ?item wdt:P171 ?linkTo } # child of
 OPTIONAL { ?item wdt:P18 ?pic }
 SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
 }

Here the species "Balaenoptera musculus" is given, and it recursively climbs the list of P171 attributes. https://w.wiki/veR

But let's say I have a query for a cat, "Felis catus" wd:Q20980826

https://w.wiki/56SR

Could I join these two queries in order to return a single path that terminates at their first common parent?

I figure this requires:

- two queries made in series, then joined by UNION

- some calculation that can pare down each query to a single shortest path

If this is possible I would love to see an example of how to do so given the commands that are available on wikidata's endpoint.

edit:

I found a property called "preferred rank" Q71533031 which is used to highlight the preferred parent taxon green on Wikidata web pages. I'm not sure how to incorporate it into the query but this might be enough to "prune" the tree down to a single path.

Yes, that is probably possible. I would find the taxon hierarchies with the GAS service, join them, then limit the result to lowest sum of the depths.
PS. preferred rank (Q71533031) isn't a property. Read about ranking in Help:Ranking and how to use it in queries in mw:Wikibase/Indexing/RDF Dump Format. --Dipsacus fullonum (talk) 20:33, 25 April 2022 (UTC)[reply]
I've managed to get a list of all common ancestors, but I can't choose the latest... Probably this could be fixed for showing the graph:
#Common parent taxons of 2 species
# defaultView:Graph
SELECT distinct ?ECA ?ECALabel #?item1 ?item1Label ?item2 ?item2Label ?pic1 ?linkTo1 ?pic2 ?linkTo2
WITH 
  {
    SELECT distinct ?item1 WHERE {wd:Q42196 wdt:P171* ?item1.}
  } AS %sub1
WITH 
  {
     SELECT distinct ?item2 WHERE { wd:Q20980826 wdt:P171* ?item2.}
  } AS %sub2
WHERE
{
  INCLUDE %sub1.
  INCLUDE %sub2.
  ?item1 wdt:P171* ?ECA.
#  OPTIONAL { ?item1 wdt:P171 ?linkTo1 }
#  OPTIONAL { ?item1 wdt:P18 ?pic1 }
 
  ?item2 wdt:P171* ?ECA.
#  OPTIONAL { ?item2 wdt:P171 ?linkTo2 }
#  OPTIONAL { ?item2 wdt:P18 ?pic2 }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
}
Try it!

--Infovarius (talk) 20:48, 26 April 2022 (UTC)[reply]

@Infovarius: My proposed solution as outlined above solves the task in 1 second:
#Common parent taxons of 2 species
SELECT ?item ?itemLabel ?depth1 ?depth2 ?taxon_rankLabel
{
  SERVICE gas:service
  {
    gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" .
    gas:program gas:in wd:Q42196 .
    gas:program gas:linkType wdt:P171 .
    gas:program gas:out ?item .
    gas:program gas:out1 ?depth1 .
  }
  SERVICE gas:service
  {
    gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" .
    gas:program gas:in wd:Q20980826 .
    gas:program gas:linkType wdt:P171 .
    gas:program gas:out ?item .
    gas:program gas:out1 ?depth2 .
  }
  OPTIONAL { ?item wdt:P105 ?taxon_rank . }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
}
ORDER BY (?depth1 + ?depth2)
LIMIT 1 # Remove limit to see all higher levels
Try it!
--Dipsacus fullonum (talk) 08:22, 27 April 2022 (UTC)[reply]

Languages by number of native speakers at point in time edit

First time here, so please feel free to clue me in if I'm not doing it right.

I'm looking for a query that will give me the following output for P31 = language or P31 = dialect :

QNumber | P220 | LanguageLabel | P1098 | Qualifier: P585

Q1860 | eng | English | 379,007,140 | 2019

Q188 | deu | German | 76,540,740 | 2019


etc. where number of speakers

1) Has preferred rank, and...

2) has the qualifier applies to part = first language


Does that make sense, is it doable? Thanks in advance, Moebeus (talk) 17:18, 27 April 2022 (UTC)[reply]

@Moebeus: That is very doable if understood your request correctly:
SELECT DISTINCT (?item AS ?QNumber) ?P220 ?LanguageLabel ?P1098 ?Qualifier_P585
WHERE
{
  VALUES ?instance_of { wd:Q34770 wd:Q33384 } # Language or dialect
  ?item wdt:P31 ?instance_of .
  OPTIONAL { ?item wdt:P220 ?P220 . } # ISO 639 3 code
  ?item p:P1098 ?no_of_speakers_statement .
  ?no_of_speakers_statement wikibase:rank wikibase:PreferredRank . # Has preferred rank
  ?no_of_speakers_statement pq:P518 wd:Q36870 . # Has the qualifier applies to part = first language
  ?no_of_speakers_statement ps:P1098 ?P1098 . # No of speakers
  OPTIONAL { ?no_of_speakers_statement pq:P585 ?Qualifier_P585 . } # Optional timestamp
  SERVICE wikibase:label
  {
    bd:serviceParam wikibase:language "en" .
    ?item rdfs:label ?LanguageLabel .
  }
}
Try it!
--Dipsacus fullonum (talk) 18:33, 27 April 2022 (UTC)[reply]
Looking great, thank you so much for the quick response! Moebeus (talk) 19:52, 27 April 2022 (UTC)[reply]
I know you said preferred rank, but did you mean best rank? Preferred excludes results with normal rank. Infrastruktur (talk) 18:42, 27 April 2022 (UTC)[reply]
Yes, in this case I only wanted the statements that have actively been given preferred rank (many of them have numbers for several different points in time). Moebeus (talk) 19:55, 27 April 2022 (UTC)[reply]

State-affiliated Twitter accounts edit

Request for: All items with a X username (P2002) statement with a qualifier of has characteristic (P1552) state-affiliated Twitter account (Q111678803).

Pretty simple, I'm just not sure how to do qualifiers. AntisocialRyan (Talk) 19:07, 27 April 2022 (UTC)[reply]

@AntisocialRyan: Here's one way.

SELECT ?item ?itemLabel ?username
WHERE 
{
  ?item p:P2002 [ps:P2002 ?username; pq:P1552 wd:Q111678803]. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
}
Try it!
and looking at only BestRanked values (i.e. wdt:P2002 equivalent) :
SELECT ?item ?itemLabel ?username
WHERE 
{
  ?item p:P2002 [ps:P2002 ?username; pq:P1552 wd:Q111678803; a wikibase:BestRank]. 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
}
Try it!
--Tagishsimon (talk) 19:26, 27 April 2022 (UTC)[reply]
This is perfect, thank you! AntisocialRyan (Talk) 19:50, 27 April 2022 (UTC)[reply]

WikiProject Climate Change edit

Hi, I am new to sparql and I need your help! I would like to get a list of all articles which are part of Wikipedia:WikiProject Climate Change - Q15305047 and find out if those articles exists in four other languages ('no','sv','da', 'nn'). Anda Bordieanu (talk) 06:05, 28 April 2022 (UTC)[reply]

@Anda Bordieanu: For those marked in WD as being on the focus list of the project:
SELECT ?item ?itemLabel ?no ?sv ?da ?nn WHERE 
{
  ?item wdt:P5008 wd:Q15305047.
  
  OPTIONAL { ?article_no schema:about ?item ;
  schema:isPartOf <https://no.wikipedia.org/> .}
  OPTIONAL { ?article_sv schema:about ?item ;
  schema:isPartOf <https://sv.wikipedia.org/> . }
  OPTIONAL { ?article_da schema:about ?item ;
  schema:isPartOf <https://da.wikipedia.org/> . }
  OPTIONAL { ?article_nn schema:about ?item ;
  schema:isPartOf <https://nn.wikipedia.org/> . }  
  BIND(IF(BOUND(?article_no),"yes","no") as ?no)
  BIND(IF(BOUND(?article_da),"yes","no") as ?da)
  BIND(IF(BOUND(?article_sv),"yes","no") as ?sv) 
  BIND(IF(BOUND(?article_nn),"yes","no") as ?nn) 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Try it!
or perhaps
SELECT ?item ?itemLabel ?sitelink_no ?sitelink_sv ?sitelink_da ?sitelink_nn WHERE 
{
  ?item wdt:P5008 wd:Q15305047.
  OPTIONAL { ?article_no schema:about ?item ;
  schema:isPartOf <https://no.wikipedia.org/> ; 
  schema:name ?sitelink_no .}
  OPTIONAL { ?article_sv schema:about ?item ;
  schema:isPartOf <https://sv.wikipedia.org/> ;
  schema:name ?sitelink_sv . }
  OPTIONAL { ?article_da schema:about ?item ;
  schema:isPartOf <https://da.wikipedia.org/> ;
  schema:name ?sitelink_da . }
  OPTIONAL { ?article_nn schema:about ?item ;
  schema:isPartOf <https://nn.wikipedia.org/> ;
  schema:name ?sitelink_nn . }  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Try it!
--Tagishsimon (talk) 08:05, 28 April 2022 (UTC)[reply]
Hi again, thank you so so much! This is amazing! I have one question, I can see that the climate change project has 3741 articles, but the query shows 1600. Do you know what could be the cause? Is it because the rest of the articles are missing an identifier? Anda Bordieanu (talk) 12:29, 28 April 2022 (UTC)[reply]
@Anda Bordieanu: Yes. The project is (I guess) counting articles on WP having a certain template on their talk page. The report is looking at items on WD with a on focus list of Wikimedia project (P5008) value of WikiProject Climate change (Q15305047) and, as is normally the case, there's a large gap between the two; many items for CC articles do not have a P5008 value. Should be possible to identify which items would benefit from the addition of that value, perhaps with https://petscan.wmflabs.org/ but right now I've run out of time to dig further. --Tagishsimon (talk) 14:10, 28 April 2022 (UTC)[reply]
(After edit conflict) Hi Anda Bordieanu. The queries made by Tagishsimon finds Wikidata items with the Wikidata property on focus list of Wikimedia project (P5008) and the corresponding value WikiProject Climate change (Q15305047). This doesn't necessarily correspond to the articles on the English Wikipedia those talk pages have a {{WikiProject Climate change}} template. There will only be an up-to-date correspondance between Wikidata and the English Wikipedia if someone is actively maintaining it.
I can make a SPARQL query which will make an up-to-date list of the project articles in the English Wikipedia by calling the Wikipedia's MediaWiki API, but the list would only have the English article titles and not the corresponding Wikidata item titles, so you would not be able to use it get further information about e.g. linked pages in other Wikipedias. --Dipsacus fullonum (talk) 14:28, 28 April 2022 (UTC)[reply]
Hi, thank you for the explanation! I just needed to understand the difference in the number of articles. I am currently doing research about this and I just wanted to fully comprehend how those properties and items are used so I can further explain it. I need to see the articles in other languages so there is no need to link it if there would not be a way to further see if they are in other languages. Thank you both !! You are awesome!Anda Bordieanu (talk) 14:40, 28 April 2022 (UTC)[reply]
I'd appreciate seeing that query, Dipsacus fullonum, if you'd be so kind. --Tagishsimon (talk) 18:26, 28 April 2022 (UTC)[reply]
@Tagishsimon: No problem. Here it is:
SELECT ?title_enwiki (URI(CONCAT("https://en.wikipedia.org/wiki/", ?title_enwiki)) AS ?link) ?namespace
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
    
    # Generator to get all transclusions of the Template "WikiProject Climate change"
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "alltransclusions" .
    bd:serviceParam mwapi:gatfrom "WikiProject Climate change" .
    bd:serviceParam mwapi:gatto "WikiProject Climate change" .
    bd:serviceParam mwapi:gatlimit "max" .

    # Use the info module to get the name of associated page
    # (the template is on talk pages, so this gives the subject pages)
    bd:serviceParam mwapi:prop "info" .
    bd:serviceParam mwapi:inprop "associatedpage" .

    # Output
    # (There is no point in trying to get associated items, because that would for the found talk pages) 
    ?title_enwiki wikibase:apiOutput "@associatedpage" .
    ?namespace wikibase:apiOutput "@ns" .
  }
}
Try it!
--Dipsacus fullonum (talk) 08:23, 29 April 2022 (UTC)[reply]
@Anda Bordieanu, Tagishsimon: I missed that you don't have to get the item titles from the English Wikipedia, but can find them with WDQS from the Wikipedia titles, and then combine Tagishsimon's and my query to give what Anda Bordieanu wants:
SELECT ?title_en ?article_en ?no ?da ?sv ?nn
WITH
{
  SELECT ?title_en ?article_en
  WHERE
  {
    SERVICE wikibase:mwapi
    {
      bd:serviceParam wikibase:endpoint "en.wikipedia.org" .

      # Generator to get all transclusions of the Template "WikiProject Climate change"
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam mwapi:generator "alltransclusions" .
      bd:serviceParam mwapi:gatfrom "WikiProject Climate change" .
      bd:serviceParam mwapi:gatto "WikiProject Climate change" .
      bd:serviceParam mwapi:gatlimit "max" .

      # Use the info module to get the name of associated page
      # (the template is on talk pages, so this gives the subject pages)
      bd:serviceParam mwapi:prop "info" .
      bd:serviceParam mwapi:inprop "associatedpage" .

      # Output
      # (There is no point in trying to get associated items, because that would be for the found talk pages) 
      ?title wikibase:apiOutput "@associatedpage" .
    }
    BIND (STRLANG(?title, "en") AS ?title_en)
    OPTIONAL { ?article_en schema:name ?title_en . }
    FILTER BOUND(?article_en)
  }
}
AS %get_titles
WHERE
{
  INCLUDE %get_titles
  ?article_en schema:isPartOf <https://en.wikipedia.org/> .
  ?article_en schema:about ?item .

  OPTIONAL { ?article_no schema:about ?item ;
  schema:isPartOf <https://no.wikipedia.org/> .}
  OPTIONAL { ?article_sv schema:about ?item ;
  schema:isPartOf <https://sv.wikipedia.org/> . }
  OPTIONAL { ?article_da schema:about ?item ;
  schema:isPartOf <https://da.wikipedia.org/> . }
  OPTIONAL { ?article_nn schema:about ?item ;
  schema:isPartOf <https://nn.wikipedia.org/> . }  
  BIND(IF(BOUND(?article_no),"yes","no") as ?no)
  BIND(IF(BOUND(?article_da),"yes","no") as ?da)
  BIND(IF(BOUND(?article_sv),"yes","no") as ?sv) 
  BIND(IF(BOUND(?article_nn),"yes","no") as ?nn) 
}
Try it!
NB. I had to use the named subquery to get ?article_en get bound before trying to bind ?item. Otherwise the query would timeout. --Dipsacus fullonum (talk) 09:21, 29 April 2022 (UTC)[reply]
Thank you so much, @Dipsacus fullonum, Tagishsimon: ! This is exactly what I needed! You have no idea how much this will help me! Thank youuuu! 188.177.32.70 10:15, 29 April 2022 (UTC)[reply]
It was a useful question, Anda Bordieanu, eliciting a much better understanding for the rest of us a particular part of the report toolset, so thank you. --Tagishsimon (talk) 10:28, 29 April 2022 (UTC)[reply]
Thank you, Dipsacus fullonum, much appreciated. Your MWAPI code is indistinguishable from magic; which is to say, I'm still quite far away from understanding the wide range of options MWAPI provides, far from being able to summon and marshall the syntax, and far from understanding the MWAPI documentation. I'm a bit preoccupaied with data curation right now, but nagging in the back of my mind is that WDQS users could do with a) a page on common SPARQL design patterns and b) a WDQS MWAPI cookbook page of example queries & explanations. I might come back and talk to you some more about these, later, if & when I draw together some more structured thoughts. --Tagishsimon (talk)
@Tagishsimon: To me, it feels more like a craft than magic. I agree that a more user friendly step by step guide would make it easier to learn the craft. Feel free to talk more about your thoughts later. --Dipsacus fullonum (talk) 16:30, 29 April 2022 (UTC)[reply]
The existing documentation could always be better. For one there should be a link to the API sandbox for the english wikipedia. There is a lot that can be learned simply by playing around with it. Infrastruktur (talk) 18:53, 29 April 2022 (UTC)[reply]