Wikidata:Contact the development team/Query Service and search/Archive/2021/03

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Mwapi service duplicates querying categories : bug ?

A question on WD:RAQ that implied querying categories (quality classification categories) could be solved with WDQS and its mwapi service, but a weirdness that might be a bug is bugging me : some duplicates. This is a query that checks if the talkpage of some articles are classified in one category on enwiki "Category:Start-Class biography articles". it works correctly. But the same query decomenting one or two other categories to include their members in the results returns repectively twice or trice each article. This is weird ! each article is classified in only one of these categories, it should appear once ! There is a join on the « ?category » variable.

check this one

select ?item ?itemLabel ?genreLabel ?article ?name (lang(?name) as ?lang) ?category {
  ?item wdt:P31 wd:Q5 ;
        wdt:P106/wdt:P279* wd:Q266569 .
  optional {
    ?item wdt:P21 ?genre
  }
  filter (?genre != wd:Q6581097 ).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?name


  ########### find articles by their ratings on enwiki

  # compute the name of the talk page on enwiki
  bind (concat("Talk:", ?name) as ?title)

  # find the categories of the talkpage using mwapi
  SERVICE wikibase:mwapi {
      # Categories that contain these pages
     bd:serviceParam wikibase:api "Categories";
                      wikibase:endpoint "en.wikipedia.org";
                      mwapi:titles  ?title.
       # Output the page title and category
      #?otitle wikibase:apiOutput mwapi:title.
      ?category wikibase:apiOutput mwapi:category .  
  }
  values ?category { #### add relevant (sub?)categories if needed 
    "Category:Start-Class biography articles" 
    #"Category:Stub-Class biography articles"
    #"Category:C-Class biography articles"
  }
}
Try it!

versus the same with 2 categories decommented and the results appear twice

select ?item ?itemLabel ?genreLabel ?article ?name (lang(?name) as ?lang) ?category {
  ?item wdt:P31 wd:Q5 ;
        wdt:P106/wdt:P279* wd:Q266569 .
  optional {
    ?item wdt:P21 ?genre
  }
  filter (?genre != wd:Q6581097 ).
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?name


  ########### find articles by their ratings on enwiki

  # compute the name of the talk page on enwiki
  bind (concat("Talk:", ?name) as ?title)

  # find the categories of the talkpage using mwapi
  SERVICE wikibase:mwapi {
      # Categories that contain these pages
     bd:serviceParam wikibase:api "Categories";
                      wikibase:endpoint "en.wikipedia.org";
                      mwapi:titles  ?title.
       # Output the page title and category
      #?otitle wikibase:apiOutput mwapi:title.
      ?category wikibase:apiOutput mwapi:category .  
  }
  values ?category { #### add relevant (sub?)categories if needed 
    "Category:Start-Class biography articles" 
    "Category:Stub-Class biography articles"
    #"Category:C-Class biography articles"
  }
}
Try it!

and trice with the last decommented.

Is this my mistake and I missed something or a bug that should be filed ? author  TomT0m / talk page 18:37, 20 February 2021 (UTC)

Anyway, there is a workaround : having two variables, one for the service output, one other for the « values » ?category, and setting them equal in a filter, see https://w.wiki/$$G that does not show any duplicate. author  TomT0m / talk page 11:30, 22 February 2021 (UTC)
I have no clue why it behaves this way but I think this might be due to the fact that the same ?category variable is populated from both the wikibase:apiOutput in the service call and the VALUES and while it seems to be able to do the join properly it still duplicate entries. Have you considered asking the MW api to do the join using the clcategories parameter from category properties: example? DCausse (WMF) (talk) 14:53, 4 March 2021 (UTC)
@DCausse (WMF), TomT0m: It is a good idea to use the clcategories API parameter. However it is a waste to make separate API calls for each wanted value when you can put a list of wanted values into the parameter. So this version is faster:
select ?otitle ?item ?itemLabel ?genreLabel ?article ?name (lang(?name) as ?lang) ?category {
  ?item wdt:P31 wd:Q5 ;
        wdt:P106/wdt:P279* wd:Q266569 .
  optional {
    ?item wdt:P21 ?genre
  }
  filter (?genre != wd:Q6581097 ).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?name
  ########### find articles by their ratings on enwiki
  # compute the name of the talk page on enwiki
  bind (concat("Talk:", ?name) as ?title)
  # find the categories of the talkpage using mwapi
  SERVICE wikibase:mwapi {
      # Categories that contain these pages
     bd:serviceParam wikibase:api "Categories";
      wikibase:endpoint "en.wikipedia.org";
      mwapi:cllimit "max" ;
      mwapi:titles  ?title ;
      mwapi:clcategories "Category:Start-Class biography articles|Category:Stub-Class biography articles|Category:C-Class biography articles" .
       # Output the page title
     ?otitle wikibase:apiOutput mwapi:title.
     ?category wikibase:apiOutput mwapi:category.
  }
}
Try it!
--Dipsacus fullonum (talk) 17:03, 4 March 2021 (UTC)