Open main menu

Wikidata:Request a query/Archive/2016/10

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Sitelink count

Recently phab:T129046 was resolved, so I thought one could write queries like

select * where{
  ?item wdt:P31 wd:Q79007 . 
  ?item wikibase:sitelinks ?cnt .
}

Try it!

but the result is definitly wrong. How to do it right? --Pasleim (talk) 21:24, 4 October 2016 (UTC)

For two items I checked sitelink count was OK. --Edgars2007 (talk) 21:27, 4 October 2016 (UTC)
The query results look correct to me (at a glance). Slight inconsistencies might occur, see T145712 for that. Cheers, Hoo man (talk) 21:44, 4 October 2016 (UTC)
The problem are not wrong numbers but many missing results.
select * where{
  ?item wdt:P31 wd:Q79007 . 
}
Try it! returns 261,837 items wheareas the first query only returns 2992 items. --Pasleim (talk) 21:59, 4 October 2016 (UTC)
It's only added when an item is edited.
--- Jura 05:28, 5 October 2016 (UTC)

List all Pokémon moves

I think this is a very simple query: How I can list all Pokémon moves?

By now moves are defined with instance of (P31) and a subclass of Pokémon move (Q15141195). For example:

and fire-type Pokémon move (Q26001209) is a subclass of Pokémon move (Q15141195)). However, I'd like to have a list of all moves, not just fire type moves (i.e. even normal, ice, steel, ... ones).

Then I'd like that the output is ordered alphabetically by English label. Could someone help me? Thank you in advance, --★ → Airon 90 15:41, 6 October 2016 (UTC)

@Airon90:
SELECT DISTINCT ?move ?moveLabel WHERE {
  ?move wdt:P31/wdt:P279* wd:Q15141195 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
} ORDER BY ?moveLabel
Try it!
Matěj Suchánek (talk) 16:09, 6 October 2016 (UTC)
@Airon90: I modified the class of move so that it's not a subclass of fire pokemon as of course a move is not a Pokémen. The point is that the main agent of the move is the pokémon itself, and those specific moves are performed by fire pokemons. author  TomT0m / talk page 18:52, 6 October 2016 (UTC)
@Matěj Suchánek: Thank you very much!
@TomT0m: I saw your edit on fire-type Pokémon move (Q26001209) and I did the same with every other similar item. Please check my other edit on fire-type Pokémon move (Q26001209). --★ → Airon 90 20:02, 6 October 2016 (UTC)

TV series in a language and in a interval of time

Hi, I'm bored and I'd like to watch a TV series (television series (Q5398426)) in a certain language (e.g. Italian (Q652)) and produced in a certain interval of time (i.e. 2001 - 2009). Could someone help me listing all those kind of TV series? Thank you very much! --★ → Airon 90 06:50, 7 October 2016 (UTC)

You can start with this:
SELECT ?item ?itemLabel ?incept {
  ?item wdt:P31/wdt:P279* wd:Q5398426;
        (wdt:P364|wdt:P407|wdt:P2439) wd:Q652;
        (wdt:P571|wdt:P580|wdt:P585) ?incept .
  FILTER ( 2000 < YEAR( ?incept ) && YEAR( ?incept ) < 2010 ) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
} ORDER BY ?incept
Try it!
Matěj Suchánek (talk) 12:18, 7 October 2016 (UTC)
Arf, {{SPARQL}} do not support pipes ('|') in its arguments :) My solution to this problem was to make a table.concat(frame.args,"|") on Module:Show Path Items. author  TomT0m / talk page 12:28, 7 October 2016 (UTC)
It works now. author  TomT0m / talk page 13:22, 7 October 2016 (UTC)

Archaic Greek lyric fragments

I tried to create this query:

SELECT ?fragment ?authorLabel ?fragmentLabel ?genreLabel ?dialectLabel ?title
WHERE {
  ?fragment wdt:P136/wdt:P279* wd:Q1885775 .
  ?fragment wdt:P136 wd:Q3906966 .
  ?fragment wdt:P136 wd:Q3164714 .
  ?fragment wdt:P50 ?author .
  ?fragment wdt:P136 ?genre .
  ?fragment wdt:P364 ?dialect .
  ?fragment wdt:P1476 ?title .
  	SERVICE wikibase:label { bd:serviceParam wikibase:language "en,grc" }    
} ORDER BY ?authorLabel

Try it!

My problem is: the three lines which describe what a ?fragment is should be alternative (either ... or ... or ...), not and ... and ... and ... as in this formulation. Could you help me? --Epìdosis 17:23, 11 October 2016 (UTC)

{ } UNION { } UNION { }
seems generally faster than
VALUES ?fragmenttype { wd:Q3906966 wd:Q3164714 }
?fragment wdt:P136 ?fragmenttype

--- Jura 17:32, 11 October 2016 (UTC)
@Jura1: Could you please rewrite the entire query with your first option? I tried but I evidently made a mistake. --Epìdosis 17:41, 11 October 2016 (UTC)
SELECT DISTINCT ?fragment ?authorLabel ?fragmentLabel ?genreLabel ?dialectLabel ?title
WHERE 
{
	{ ?fragment wdt:P136/wdt:P279* wd:Q1885775 . }
	UNION { ?fragment wdt:P136 wd:Q3906966 . }
	UNION { ?fragment wdt:P136 wd:Q3164714 . }

  	OPTIONAL { ?fragment wdt:P50 ?author .}
	OPTIONAL { ?fragment wdt:P136 ?genre .}
	OPTIONAL { ?fragment wdt:P364 ?dialect .}
	OPTIONAL { ?fragment wdt:P1476 ?title .}

  	SERVICE wikibase:label { bd:serviceParam wikibase:language "en,grc" }    
} 
ORDER BY ?authorLabel

Try it! np, I also added "optional" to the additional fields.
--- Jura 17:48, 11 October 2016 (UTC)

Kosovo

I'm speaking about Wikidata and Wikipedia in Pristina, Kosovo, next week, at "Software Freedom Kosova 2016: SFK16". I would like to demonstrate one or two local queries like:

  • Historical buildings (monuments) in Pristina
  • Public art in Pristina
  • Mosques in Pristina
  • Museums in Pristina
  • Railway stations in Kosovo
  • Parks in Kosovo
  • People born in Kosovo, but educated at a university in (say) Germany

or similar, mapped where possible, but I am not sure which will have many results - those with many will make Wikidata look good, while those with only a few will be a good motivation for Kosovan contributors!. Can someone write me a couple, please? If anyone is inclined to do several, free to drop them on User:Pigsonthewing/Kosovo, rather than flooding this page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:52, 11 October 2016 (UTC)

I will start with the ones you asked for. Feel free to ask me elsewhere (with a {{Ping}}) if you have more ideas. Matěj Suchánek (talk) 14:16, 12 October 2016 (UTC)
@Matěj Suchánek: That's really helpful, thank you. I added extra data to two items for museums in Pristina, including P31 & coordinates, but the latter do not show on the"museums" query. Also, the railway station query keeps timing out for me. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:27, 13 October 2016 (UTC)
Fixed accordingly. Matěj Suchánek (talk) 15:36, 13 October 2016 (UTC)

Works by Canova

Why this query doesn't work? Thank you, --Epìdosis 09:12, 13 October 2016 (UTC)

SELECT
  ?work ?workLabel ?instanceLabel (CONCAT(str(YEAR(?date))) as ?year) ?materialLabel ?image
  (GROUP_CONCAT(DISTINCT ?instanceLabel; separator=", ") as ?i)
WHERE {
  ?work wdt:P170 wd:Q5547 .
  ?work wdt:P31 ?instance .
  ?work wdt:P571 ?date .
  ?work wdt:P186 ?material .
  ?work wdt:P18 ?image .
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" }
}

Try it!

Because you haven't grouped results by some variable (but you are using GROUP_CONCAT). Easy fix is adding group by ?work ?workLabel ?instanceLabel ?date ?materialLabel ?image at the end of query - after closing bracket. --Edgars2007 (talk) 09:18, 13 October 2016 (UTC)
Would it be possible to group in the same way all the multiple values in all the columns I visualize? --Epìdosis 10:04, 13 October 2016 (UTC)
Yes:
SELECT
  ?work ?workLabel (GROUP_CONCAT(DISTINCT ?instanceLabel; separator=", ") AS ?inst)
  (GROUP_CONCAT(DISTINCT STR(YEAR(?date)); separator=", ") AS ?date)
  (GROUP_CONCAT(DISTINCT ?materialLabel; separator=", ") AS ?mater) (SAMPLE(?image) AS ?image)
WHERE {
  ?work wdt:P170 wd:Q5547;
        wdt:P31 ?instance;
        wdt:P571 ?date;
        wdt:P186 ?material;
        wdt:P18 ?image .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en,fr" .
    ?work rdfs:label ?workLabel .
    ?instance rdfs:label ?instanceLabel .
    ?material rdfs:label ?materialLabel .
  } .
} GROUP BY ?work ?workLabel
Try it!
Matěj Suchánek (talk) 11:18, 13 October 2016 (UTC)

In some cases (e.g. The Three Graces (Q19347884)) the date is indicated as a period: qualifiers state the start year and the end year. Is it an incorrect way of indicating start and end time? Otherwise, how would you build a query in order to visualize at the same time both the artworks with a precise date and those with a date with qualifiers stating start and end time? --Epìdosis 15:50, 13 October 2016 (UTC)

I suppose earliest date (P1319) and latest date (P1326) are properties dedicated for this (also having even any main value is redundant but it makes sense in this case). The problem is that inception (P571) represents a point in time whereas this pair represents a range/interval, so I am afraid these two things cannot be combined into one field easily. Matěj Suchánek (talk) 18:32, 13 October 2016 (UTC)
I think that earliest date (P1319) and latest date (P1326) are meant for indicating a period of time in which an event can be collocated instead of a process (for a process I think start time (P580) and end time (P582) should be used. --Epìdosis 19:22, 13 October 2016 (UTC)
Oh, so the qualifiers don't represent the uncertainity but the fact that Canova was making the sculptures during some time? Although I don't want to make a decision on how this should be done correctly, it seems to me a bit weird that a point in time (as the description of inception (P571) states) is qualified as a process. Matěj Suchánek (talk) 19:49, 13 October 2016 (UTC)

Histogram

Is it possible to make a statistics of films by several parameters? For example, to get a histogram like this, or to get a distribution of films by countries? --Infovarius (talk) 14:18, 17 October 2016 (UTC)

The data for the number of films by year you can get with
SELECT ?year (count(?item) as ?cnt)  WHERE{
  ?item wdt:P31/wdt:P279* wd:Q11424 .
  ?item wdt:P577 ?publicationdate .
  BIND(str(year(?publicationdate)) as ?year)
  FILTER(bound(?year))
} GROUP BY ?year ORDER BY ?year

Try it!

To visualize this data you need to rely on external tools, e.g. http://demo.seco.tkk.fi/visu/. This could then look like this. --Pasleim (talk) 15:56, 17 October 2016 (UTC)
Looks nice :) --Edgars2007 (talk) 16:01, 17 October 2016 (UTC)
Thanks! Very useful. --Infovarius (talk) 12:56, 18 October 2016 (UTC)
Actually one can also produce the figure with the graph extension: (User:Pasleim/movies by year)
 
--Pasleim (talk) 15:55, 18 October 2016 (UTC)
Nice! Did you type this code manually, or do we have a generator for these graphs as well? —MisterSynergy (talk) 16:25, 18 October 2016 (UTC)
There is no generator but on mw:Extension:Graph/Demo you find many examples which you can use as templates. The graph extension is based on vega so you can also copy code from their examples if you want to create your own figures. --Pasleim (talk) 21:18, 18 October 2016 (UTC)
I´m trying to learn to code for graph, but it is really difficult to understand. I have seen that it is easier to write Vega lite code and let the Vega Editor transform that into Vega-code (and insert that Code into Wikidata). Introduction for Vega lite is here. --Molarus 16:28, 19 October 2016 (UTC)
Nice! Sandbox to play with graphs is at mw:Special:GraphSandbox.--Wesalius (talk) 17:25, 19 October 2016 (UTC)
With Vega-lite code it is only those few lines of code at User:Molarus/graph#Example_3. I have translated that code into Vega code and have changed the "values"-part for the "wikidatasparql" part from Pasleim and got a similar graph, but not as nice. One could add another dimension, for example, "country of origin" or "film genres" and got a more complex graph. --Molarus 01:43, 20 October 2016 (UTC)
What a peak on 2016! Why so much more than 2014-2015?.. --Infovarius (talk) 12:43, 19 October 2016 (UTC)
SELECT DISTINCT ?item ?itemLabel WHERE{
  ?item wdt:P31/wdt:P279* wd:Q11424 .
    ?item wdt:P577 ?_publication_date.
  FILTER (?_publication_date >= "2016-00-00T00:00:00Z"^^xsd:dateTime)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

I´m no SPARQL-Expert, this is a query to print the 7051 films for 2016. Maybe the names of the films help with answering your question. --Molarus 01:43, 20 October 2016 (UTC)

The peak in 2016 is caused by User:Goo1Bot which created many items for films without WP articles [1]. --Pasleim (talk) 07:56, 20 October 2016 (UTC)
As query shows, the distribution over countries in 2016 films is usual:
SELECT ?countryLabel (count(?item) as ?cnt)  WHERE{
  ?item wdt:P31/wdt:P279* wd:Q11424 .
  ?item wdt:P577 ?publicationdate .
  ?item wdt:P495 ?country .
  FILTER (?publicationdate >= "2016-00-00T00:00:00Z"^^xsd:dateTime)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "ru,en". }
} GROUP BY ?countryLabel

Try it!

--Infovarius (talk) 15:07, 20 October 2016 (UTC)

All items that have P1440 with qualifier p585 set to "+2014-04-01T00:00:00Z/10" and S1440 set to "P1140"

I did not manage to construct such a query, I tried something like this:

select ?item WHERE {?item wdt:P1440 ?value .
                    ?item pq:P585 "+2014-04-01T00:00:00Z/10" .}

Try it!

I do not know how to get to the source though. Thank you for your help.--Wesalius (talk) 10:53, 18 October 2016 (UTC)

Well, the constraint "+2014-04-01T00:00:00Z/10" is not needed (it also occured in April 2013). Just look for all the Items with P1440 which have in the qualifier the statement "P1440". For reference, Q4391159 has to be found by the query. Steak (talk) 13:20, 18 October 2016 (UTC)
I assume your request is about Elo rating (P1087) and you mean "P1440" instead of "P1140". You get all items with a Elo rating (P1087) claim with source FIDE ID (P1440)="P1440" by:
SELECT DISTINCT ?item WHERE {
  ?item p:P1087 ?statement .
  ?statement prov:wasDerivedFrom ?source .
  ?source pr:P1440 'P1440' .
}

Try it! --Pasleim (talk) 14:37, 18 October 2016 (UTC)

No, I think this is something he wants. But this one doesn't work.
SELECT DISTINCT ?item WHERE {
  ?item p:P1087 ?statement .
  ?statement prov:wasDerivedFrom ?source .
  ?source pr:P1440 ?fide .
  ?source pr:P585 ?time .
  FILTER ( ?time = "2015-07-04T00:00:00Z"^^xsd:dateTime ) .
}
Try it! --Edgars2007 (talk) 16:03, 18 October 2016 (UTC)

I think the query produced by Pasleim is what I wanted. Thank you all for the input. --Wesalius (talk) 04:46, 19 October 2016 (UTC)

Count of occupation for women with no en.wikipedia article

Would anyone be willing to put together a report providing a count of occupations, for women who have no article on en.wikipedia, please. (i.e. a list of occupations and for each a count of the number of instances amongst the class of women with no en.wiki links). thanks --Tagishsimon (talk) 16:49, 20 October 2016 (UTC)

Should be something like this:
SELECT ?occupation ?occupationLabel ?cnt WHERE {
  {
    SELECT ?occupation (COUNT(?occupation) AS ?cnt) WHERE {
      FILTER NOT EXISTS { ?wikilink schema:about ?item; schema:inLanguage "en" . }
      ?item wdt:P21 wd:Q6581072 .
      OPTIONAL { ?item wdt:P106 ?occ . }
      BIND(IF(!BOUND(?occ), wd:Q19798648, ?occ) AS ?occupation) .
    } GROUP BY (?occupation)
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} ORDER BY DESC(?cnt)
Try it! . Hope it does not time out… —MisterSynergy (talk) 17:30, 20 October 2016 (UTC)
Many thanks. 25775ms, and exactly what I was after. --Tagishsimon (talk) 17:37, 20 October 2016 (UTC)
my version :
select  ?oc  (count(?item) as ?number) where
  {
    
      ?item wdt:P21 wd:Q6581072 .
      ?item wdt:P31 wd:Q5 
      filter not exists {
        ?article schema:about ?item .
        ?article schema:inLanguage "en" .
      }
      ?item wdt:P106 ?oc .filter not exists { ?oc wdt:P279     
  } group by ?oc order by desc(?number)
Try it!
I'll try another one to avoid some potential problems with subclasses of occupations ... author  TomT0m / talk page 18:03, 20 October 2016 (UTC)
While (trying to) do this, I created an interesting queries related to the top occupations classes :
select ?oc ?ocLabel (sample(?subocLabel) as ?example) (count(?suboc) as ?number_of_subclass) where {
  ?oc wdt:P31 wd:Q28640 
      filter not exists { 
      ?superocc wdt:P31 wd:Q28640 . 
      ?oc wdt:P279 ?superocc .
    }
  ?suboc wdt:P31 wd:Q28640 .
  ?suboc wdt:P279+ ?oc .
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en" .
  }
} group by ?oc ?ocLabel  order by desc(?number_of_subclass)
Try it!
The idea is to list all occupation with no over occupation as a (direct) superclass and to count all their specialization. We sort of get an idea of how well organized our occupation class tree. It seems I actually can't find a class for sportsmen and woman for example. author  TomT0m / talk page 18:36, 20 October 2016 (UTC)
Thank you, TomT0m. I'll look in more detail when my brain recovers from the issue below. thanks --Tagishsimon (talk) 01:20, 21 October 2016 (UTC)
@Tagishsimon: A quick update to mix the two parts of the query :
select ?oc ?ocLabel  (count(?item) as ?number_of_women) where {

  #############################################################
  # select (an approximation of) the top level occupation classes
  
  #select occupations (instance of occupation) which have no occupation as a direct superclass
  ?oc wdt:P31 wd:Q28640 
    filter not exists { 
      ?superocc wdt:P31 wd:Q28640 . 
      ?oc wdt:P279 ?superocc .
    }
  
  # find all of their (non direct) specialisation
  ?suboc wdt:P31 wd:Q28640 .
  ?suboc wdt:P279+ ?oc .
  #############################################################

  # dealing with the "woman" part

  # ?item is a woman, which has any of the specialization of the occupation ?oc  
  ?item wdt:P106 ?suboc.

  
  {
    # subquery to select only the woman with no article on enwiki - made to be able to limit the size of the sample to avoid timout
    select ?item where {
      ?item wdt:P21 wd:Q6581072 .
      ?item wdt:P31 wd:Q5 
            filter not exists {
              ?article schema:about ?item .
              ?article schema:inLanguage "en" .
            }
    } limit 10000 # change the limit to change the size of the woman sample
  }
  
  #############################################################
  
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en" .
  }

} group by ?oc ?ocLabel order by desc(?number_of_subclass)
Try it! as said in the comments, I inserted limits on the number to take into account because without this the query timeout. You can play with the limit if you want to try to take a bigger sample :) With this query all mathematicians should be counted as scientists for example. Maybe some woman are counted twice if they are both mathematician and physicist however, there is probably other refinements to do. author  TomT0m / talk page 12:42, 21 October 2016 (UTC)
Thanks TomT0m. I will pore over your various versions ... as I note in the thread below, I have rudimentary SQL skills but am finding SPARQL a slightly uphill task, possibly because I've not found amenable documentation, but also a limit on the time I can give the matter. I do find it useful to work backwards from examples of working code to an understanding of the code, so I'm v.grateful. I suspect this desk has not heard the last question from me. --Tagishsimon (talk) 13:03, 21 October 2016 (UTC)

Listeria versus Query service problem

I have a report which works fine in the Wikidata Query Service - this one but which fails when composed as a Listeria report - here. Listeria specifies there's no data.

I suspect it's something to do with the limit statement and/or that there are more than 5k records that match the query ... as you'll see from the body of the report, it works fine if volleyball players are sought. All help welcomed; wits-end reached. --Tagishsimon (talk) 01:19, 21 October 2016 (UTC)

Listeria is always a little difficult. I would suggest to try something like this:
SELECT ?item (COUNT(?sitelink) AS ?slcnt) WHERE {
  ?item wdt:P106 wd:Q82955 .
  FILTER NOT EXISTS { ?wen schema:about ?item; schema:isPartOf <https://en.wikipedia.org/> . }
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P31 wd:Q5 .
  OPTIONAL { ?sitelink schema:about ?item . }
} GROUP BY(?item) ORDER BY DESC(?slcnt) LIMIT 1000
Try it! . I’m not sure whether listeria can deal with the sitelinkcount in the results set, so in case it can’t just omit the (COUNT(?sitelink) AS ?slcnt) in the very first line. —MisterSynergy (talk) 05:53, 21 October 2016 (UTC)
Thank you MisterSynergy; that worked a treat. It's very odd; a listeria report dealing with writers worked perfectly, where politicians failed. --Tagishsimon (talk) 09:53, 21 October 2016 (UTC)
Well, Listeria…   I just took a look at the results. I would suggest to …
  1. … not use Listeria’s sort option; take the results as given by the query, which already sorts descending by sitelink count; this makes sure that the most important results are listed at the top of the table.
  2. … reduce the number of elements to something which is useful; at the moment 5000 entries are listed with more than 800 images – if I accidentally visited this page with my limited mobile data plan, I’d be seriously peeved.
Regards, MisterSynergy (talk) 10:02, 21 October 2016 (UTC)
Thanks, MS. I find the size troubling too; I'm on a satellite broadband system - v.expensive, metered & capped. At the risk of imposing on you, would you please provide an amendment such that we can produce multiple lists with ranges based on the initial letter of the item label - so Writers A-D, Writers E-F, etc. (I discount enough of the items having a surname field to made seeking to filter on surname a possibility ... by nationality is another option, but chances are we have 5k US writers.) Right now the listeria sort is on DoB, which makes some sort of presentational sense, albeit the tables are column-sortable in any event. Again, if you would be kind enough to integrate that into the query? I have rudimentary SQL skills, but I'm still very unfamiliar with SPARQL; much that I only dimly understand. I'm hoping to learn by studying your examples, besides implementing them for the WiR project. --Tagishsimon (talk) 12:58, 21 October 2016 (UTC)
Okay, I’ll put this to my worklist. Coincidentally there is an almost identical task I am trying to solve for another user right now, but the Listeria bot does not behave as I expect it to do. Also coincidentally the problem seems to be the same as yours this night: there is a valid query given to Listeria, but is says “no items” and does not update the list. I already left a note at Magnus Manske’s bitbucket repository where he hosts and manages the Listeria code, so let’s see whether he can come up with a solution and then I’ll try to provide solutions for your problem. Regards, MisterSynergy (talk) 13:32, 21 October 2016 (UTC)
Thanks; much obliged. Time is not of the essence, and I may well tinker on. One more for you. In, for instance, the Writers list, I'm currently excluding lots of occupations, e.g. excluding a Poet who is also a Writer, and to do this I use a series of statements in the form:
minus {?item wdt:P106 wd:Q3499072 . }
minus {?item wdt:P106 wd:Q2576499 . }
which works, but may lack efficiency and elegence - I might want to include 20 or 30 such exclusions in the report. I guess, though I have not tried it, that I might use:
minus {?item wdt:P106 ?prof .}
VALUES ?prof {
wd:Q3621491#archeologist
wd:Q15983985#classical_archeologist
wd:Q26424344#near_eastern_archeologist
}
Any thoughts on this welcome; thanks --Tagishsimon (talk) 14:26, 21 October 2016 (UTC)
you may prefer to use "not in" https://www.w3.org/TR/sparql11-query/#func-not-in with filter ... author  TomT0m / talk page 14:50, 21 October 2016 (UTC) (exampel
select ?item ?itemLabel where {
  ?item wdt:P31 ?class filter (?class not in (wd:Q5, wd:Q4, wd:Q68)) .
    
  ?item rdfs:label ?itemLabel filter (lang(?itemLabel) = "en") .
  
} limit 100
Try it! ) author  TomT0m / talk page
Thanks. But in that example, if we want to exclude P31:Q4 as well as P31:Q5 (and then many other Q values) ... --Tagishsimon (talk) 15:39, 21 October 2016 (UTC)
You can put as many values as you want. author  TomT0m / talk page 16:50, 21 October 2016 (UTC)
Doh. I tried other comma combinations but... thank you, TomT0m. You'll need an abundance of patience. --Tagishsimon (talk) 17:09, 21 October 2016 (UTC)

Properties with multiple formatter URLs

In order that I can properly apply third-party formatter URL (P3303), please can someone build a query listing every property with more than one value for formatter URL (P1630)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:06, 25 October 2016 (UTC)

Wikidata:Database reports/Constraint violations/P1630#Single value or [2] copied from Property talk:P1630 --Pasleim (talk) 10:15, 25 October 2016 (UTC)
alternatively there exists a {{Multiple values}} template that helps to build such queries :
select ?prop where { ?prop a wikibase:Property . {{tl|muliple Values|P1630|?prop}} }
will give
select ?prop where { ?prop a wikibase:Property . 
     ?prop wdt:P1630  ?propP1630val1, ?propP1630val2
                        filter(?propP1630val1 != ?propP1630val2) . }
Try it! which seem to do the trick. author  TomT0m / talk page 14:14, 25 October 2016 (UTC)

Median-length featured article in specific languages

I'm developing new character encoding converters for Firefox. For benchmarking purposes, I'd like to find typical pages in various languages. Of actual top sites whose performance is relevant to users, Wikipedia seems like an appropriate site to choose pages from, since it's suitable licensed and has the language coverage I'm looking for.

It turns out that in languages I can't read, it's hard to find a representative article manually. Initially I wanted to find equivalent content in different languages, but since different-language Wikipedias aren't translations of each other and the level of effort editors feel like putting in a given topic varies by language, that seems futile and not even strictly necessary for what I want to accomplish, since benchmarking a given language across converter changes is more important than comparing the performance of the converters with each other. (I intend to synthetize legacy-encoded benchmarks from Wikipedia's UTF-8.) Therefore, I'm now just looking for an example of "typical" article a user would browse to in a set of languages. To bias the selection towards non-stubs and articles that might be visited by more users and to try to select from some sort of core set whose article length hopefully more even across languages even when different languages have overall very different length patterns, I'm thinking that selecting from featured articles might make sense. If that's hard, it's not essential. I'm hoping to find for each of the listed languages, a page with a typical amount of text, links and references within some set of frequently-accessed pages.

Therefore, I'd like to request this query:

For each language-specific Wikipedia in the set of languages: ar, cs, de, el, en, es, fr, he, ja, ko, ru, th, tr, vi, zh: Find the set of featured articles. Return a median-length article from the set of feature articles.

If that particular query is infeasible, some other query that approximates "find a representative article in each of the languages ar, cs, de, el, en, es, fr, he, ja, ko, ru, th, tr, vi, zh" would suffice.

Hsivonen (talk) 06:26, 28 October 2016 (UTC)

Hi Hsivonen, we unfortunately cannot fully answer your request, since some data is not available on Wikidata and its tools. What I can offer is a list of entities which have most “featured article” badges on their sitelinks (this resembles the “featured article” status from Wikipedias, but it is maintained more or less manually and might contain very small deviations from the actual status of articles): The query would be:
SELECT DISTINCT ?item ?itemLabel (COUNT(?sitelink) AS ?cnt) WHERE {
  ?sitelink schema:about ?item .
  ?sitelink wikibase:badge wd:Q17437796 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} GROUP BY ?item ?itemLabel ORDER BY DESC(?cnt)
Try it! . As you can see, there are 21 featured articles available for Mars (Q111), 20 for the Solar System (Q544), and 18 for World War II (Q362). In total, ~26 000 “featured article” badges are distributed to ~18 000 Wikidata items.
Now the part we cannot answer. Wikidata does not contain any information about article length, number of links, number of references, or number of page views. Some of this information is available through other tools (definitely article length and page views), or only very difficult to estimate (# of links and # of references I guess). I could try to give you an impression about article size distribution across different Wikipedias (by source length), it that helps… —MisterSynergy (talk) 07:19, 28 October 2016 (UTC)
Thank you! The article for Mars seems substantial enough in all of the languages I listed. I will use various language versions of that topic as benchmark data. Hsivonen (talk) 10:27, 28 October 2016 (UTC)
You can list featured articles in specific language using PetScan ([0=1&sortby=size&interface_language=en&active_tab=tab_output&doit= example for enwiki]), export it to CSV and make calculation in your spreadsheet sw (btw it is 47,959 bytes). --Jklamo (talk) 13:46, 28 October 2016 (UTC)

Percentage of genders in ... (an art collection, on a certain language wiki, etc)

@Jane023: and I would like to have a generic way to query the percentages of people with certain genders for any given set of people. For instance, what's the percentage of women, men and other genders in the artists in the collection of the Rijksmuseum? Percentages women, men and other genders in RKDartists? What percentage of women, men and other gender biographies do we have on Dutch Wikipedia? We are stumped how to formulate the percentage part in a query, any help is welcome! Spinster 💬 13:14, 29 October 2016 (UTC)

@Spinster, Jane023: One solution :
select (count(?male)/count(?item) * 100 as ?percentman) (count(?female)/count(?item) * 100 as ?percentwoman) (count(?other)/count(?item) * 100 as ?percentother) where { 
    ################################################################################
    # this part defines the dataset you want to query, put whatever you want in it
    # example:
    #?item wdt:P31 wd:Q5 .
    #but this could be
     {
     select ?item where {
       ?item wdt:P31 wd:Q5 .
     } limit 500000  }
    ################################################################################

    optional { bind("1" as ?male) . ?item wdt:P21 wd:Q6581097 .}
    optional { bind("1" as ?female) . ?item wdt:P21 wd:Q6581072 . }
    optional { bind("1" as ?other) . ?item wdt:P21 ?gender filter (?gender not in (wd:Q6581097, wd:Q6581072) ). }
    
}
    
}
Try it!

One other to include the number of unknown gender :

select (count(?male)/count(?item) * 100 as ?percentman) (count(?female)/count(?item) * 100 as ?percentwoman) (count(?other)/count(?item) * 100 as ?percentother) (count(?other)/count(?item) * 100 as ?percentunknown) where { 
    ################################################################################
    # this part defines the dataset you want to query, put whatever you want in it
    # example:
    #?item wdt:P31 wd:Q5 .
    #but this could be
     {
     select ?item where {
       ?item wdt:P31 wd:Q5 .
     } limit 250000  }
    ################################################################################

    optional { bind("1" as ?male) . ?item wdt:P21 wd:Q6581097 .}
    optional { bind("1" as ?female) . ?item wdt:P21 wd:Q6581072 . }
    optional { bind("1" as ?other) . ?item wdt:P21 ?gender filter (?gender not in (wd:Q6581097, wd:Q6581072) ). }
    optional { bind("1" as ?unknown) . ?item wdt:P21 ?gender filter (!bound(?gender)). }
}

Try it!

Yet another solution : generates the query through the newly created template {{GenderPercent}}.

{{GenderPercent|query=
       # architects on Wikidata
       ?item wdt:P31 wd:Q5 .
       ?item wdt:P106 wd:Q42973 . }} 

will for example give

select (?countitemg/?countitemt*100 as ?item_percent) ?item_gender {
  { 
    select (count(?item) as ?countitemg ) ?item_gender where { 
       # architects on Wikidata
       ?item wdt:P31 wd:Q5 .
       ?item wdt:P106 wd:Q42973 .
       ?item wdt:P21 ?item_gender filter (!isBlank(?item_gender)) .
    } group by ?item_gender
  } 
  {
    select (count(?item) as ?countitemt) where {      
       # architects on Wikidata
       ?item wdt:P31 wd:Q5 .
       ?item wdt:P106 wd:Q42973 .
    }
  }
}

Try it!

All Grandmasters with qualifier

I would like to have a query which gives me all the chess grandmaster that have a qualifier, but the qualifier "imported from" shall be ignored (so it does not matter for the query if they have this qualifier). To reformulate, if a grandmaster statement has only the "imported from" qualifier, it shall not be found. I know that I can get all grandmaster with

SELECT ?item WHERE { ?item wdt:P2962 wd:Q105269 .}

Try it! , but I dont know who to state the criterium for the qualifiers. Steak (talk) 16:35, 29 October 2016 (UTC)

@Steak: That would be this:
SELECT distinct ?item WHERE {
  
  ?item p:P2962 [ ps:P2962 wd:Q105269 ; ?prop [] ] . # equivalent for ?item p:P2962 ?statement . ?statement ps:P2962 wd:Q105269 . ?statement ?prop ?anything .
  
  ?mprop a wikibase:Property .                       # finding the properties entity
  ?mprop wikibase:qualifierValue ?prop .             # ensure that ?prop is the qualifier equivalent of some property. 
  filter (! (?prop = pqv:P143))
}
Try it! We need to "expand" the whole statement by using "p:" instead of the "wdt:" who only gets the statement main value, then select the statement that have any qualifiers, last the filter excludes the non relevant property. (Side note, I don't like this property specific to chess players. It should be handled by a more generic one. author  TomT0m / talk page 17:01, 29 October 2016 (UTC)
Thank you, exactly what I wanted :) Steak (talk) 17:08, 29 October 2016 (UTC)
Well something’s wrong here I guess. imported from Wikimedia project (P143) is usually used in references, not in qualifiers. Steak might have confused that in the initial request, right…? —MisterSynergy (talk) 17:31, 29 October 2016 (UTC)
I was coming back here just to say that :) I'll correct. author  TomT0m / talk page 17:33, 29 October 2016 (UTC)
Here is the result :
SELECT distinct ?item WHERE {
  
  ?item p:P2962 [ ps:P2962 wd:Q105269 ;  prov:wasDerivedFrom [ ?prop [] ] ] .
  
  ?mprop a wikibase:Property .
  ?mprop wikibase:referenceValue ?prop .
  filter (! (?prop = pqv:P143))
}
Try it! just three items left. More realistic :) author  TomT0m / talk page 17:38, 29 October 2016 (UTC)
Not sure what you mean. Maybe my request was confusing, but the first query answer is still exactly what I wanted :). The second query is useless for me. Steak (talk) 20:40, 29 October 2016 (UTC)
@Steak: Maybe an explanation would be useful then. "Imported from" is not supposed to be used as a qualifier at all. It's supposed to be used as a reference. If you encounter it as a statement qualifier, you probably should delete it or ask for the user who added the statement why he did this. A qualifier modifies the meaning of the claim. A source says where the statement come from. "Imported from" definitely not change the meaning of the claim, it's just a metadata about it. So as is the query is a little weird and it seem instead of creating it to avoid improper qualified statements, the statements should be corrected. This is totally doable efficiently with wd:QuickStatement and wd:PetScan for example. author  TomT0m / talk page 08:32, 30 October 2016 (UTC)
Could you give me an example item where imported from is wrongly used as a qualifier? Steak (talk) 09:24, 30 October 2016 (UTC)
@Steak: Easy :
SELECT distinct ?item WHERE {
  
  ?item p:P2962 [ ps:P2962 wd:Q105269 ; ?prop [] ] . # equivalent for ?item p:P2962 ?statement . ?statement ps:P2962 wd:Q105269 . ?statement ?prop ?anything .
  
  ?mprop a wikibase:Property .                       # finding the properties entity
  ?mprop wikibase:qualifierValue ?prop .             # ensure that ?prop is the qualifier equivalent of some property. 
  filter (?prop = pqv:P143)
}
Try it!  there is none. Why did you want to filter out this in the first place ? author  TomT0m / talk page 11:03, 30 October 2016 (UTC)
Well, if not, I would have seemingly got your second query, which is not what I want. Steak (talk) 12:39, 30 October 2016 (UTC)