Wikidata:SPARQL query service/WDQS backend update/Feb 2022 scaling community meetings

One of the Search team's priorities this year is scaling Wikidata Query Service (WDQS). Specifically, this conversation has centered around the need to move off of the Blazegraph backend that WDQS currently uses.

As part of this process, we want to get input/feedback from our community of users, and better understand some of the use cases and needs you have. There will be two feedback sessions during February 2022, that you are welcome and encouraged to join:

  1. WDQS scaling community meeting 1/2: SPARQL query features — Thursday, February 17, 2022 at 18:00 UTC
    Etherpad notes
    Video call link: https://meet.google.com/beu-fxov-etm
    Or dial: (US) +1 413–341–4301 — PIN: 108 765 815#
  2. WDQS scaling community meeting 2/2: RDF store backend needs — Monday, February 21, 2022 at 18:00 UTC
    Etherpad notes
    Video call link: https://meet.google.com/skc-enqb-bpr
    Or dial: (US) +1 601–803–2313 — PIN: 499 480 133#

The purpose of these meetings is primarily to facilitate meeting each other, and to gather requirements and use cases around WDQS — while this information will be used to plan future scaling, no decisions will be made during the meetings themselves.

Below, we present a rough outline of the topics we intend to cover in each meeting. We also welcome relevant feedback that may not be covered below, though we encourage and prioritize ideas that are also valuable to others. We ask that you please be mindful of allowing others to express their thoughts and perspectives, and helping facilitate a constructive conversation.

A rough outline of the meetings edit

#1: SPARQL query features edit

SPARQL is a power querying language, and is the endpoint to access information on Wikidata. The flexibility and power of SPARQL also makes it possible for WDQS to be strained from complex/computationally expensive queries, affecting all users. In considering how to balance the usability of SPARQL and limitations on it that can help service reliability, we want to have a better understanding of what SPARQL features you most frequently use and/or are most important to you, and what the frequency of use is.

The following list of features indicates most of the SPARQL features of interest, but is not exhaustive, and anything else that comes to mind is also valuable:

  • Query forms (SELECT, ASK, DESCRIBE and/or CONSTRUCT)
  • Queried entities
  • Query patterns (example queries would be appreciated)
    • Do you have constant subjects, predicates or objects? (Meaning that you know their values when you define the query)
    • Do you use property paths (e.g., a series of properties connected in sequence or as alternatives, inverted predicates, etc.)?
    • Do you use FILTERs, OPTIONALs, UNIONs... ?
      • For FILTERS, do you use regex or mathematical functions? Do you use EXISTS, NOT EXISTS or MINUS? Do you use SPARQL functions (such as logical functions like if/and/or/..., string functions like CONCAT, date/time functions like year...)?
    • Do you use aggregations (such as GROUP BY)?
    • Do you ORDER results?
  • SERVICEs (such as labels, GAS or date processing)
  • Federated endpoints (such as DBPedia, the Getty vocabularies, Lingua Libre, ...)
  • ...anything else?

#2: RDF store backend needs edit

In addition to SPARQL query features, we are interested in knowing more about what functionality is important to you from an RDF store and SPARQL endpoint. For example, many you reported in the August 2021 WDQS user survey that the 60 second timeout limit was a top priority. This meeting will be about discussing how scaling the backend engineering of WDQS can be most valuable to your interests and needs.

Other possible topics (non-exhaustive) may include:

  • update speeds
  • instrumentation and monitoring capabilities
  • query tuning
  • custom SPARQL extensions
  • geospatial support
  • support for other query languages
  • support for inference/reasoning
  • ...anything else?

Participants (Add yourself!) edit

Let us know if you plan to attend!

#1: SPARQL query features edit

#2: RDF store backend needs edit