Wikidata:WikidataCon 2017/Notes/The New SQID: Improving Wikidata Made Easy

Title: The New SQID: Improving Wikidata Made Easy

Speaker(s) edit

Name or username: Markus Krötzsch (=username), Maximilian Marx

Contact (email, Twitter, etc.): markus.kroetzsch tu-dresden.de, maximilian.marx tu-dresden.de (@korenchkin)

Useful links:

https://tools.wmflabs.org/sqid/

Abstract edit

SQID is a data browser for Wikidata that integrates data from various sources, including the Wikidata Query service, the Wikidata API, Wikidata dumps (analysed offline), and further Wikimedia sources. It features class and property browsers, and provides a concise display of both outgoing and incoming statements for items.

A recent addition to SQID is the integration of proposals from Primary Sources, where proposed references may be approved or rejected with just a single click.

We extend SQID with a reasoning component that is able to infer new statements from existing data. These rules can be used to:

  • materialise statements that should be present, but are not (such as inverses of “spouse” statements);
  • find inconsistencies in the data (e.g., statements missing mandatory qualifiers); and to
  • display statements that logically follow from the data, but do not need to be materialised (e.g., “relative” statements obtained from following “father”/“mother”-paths).

Rules of the latter type contribute to a better browsing experience by making explicit data that could only be obtained by combining different statements (not necessarily belonging to the same item), whereas rules of the former types help to improve the quality of the data itself.

After introducing the basic usage of SQID, we will demonstrate applications for rules in the use cases above, and have discussions with interested participants regarding the next steps for reasoning on Wikidata.

Example rules edit

spouse is symmetric:

(?x.P26 = ?y)@?S -> (?y.P26 = ?x)@?S

A female parent is a mother:

(?mother.P40 = ?child)@?X,
(?mother.P21 = Q6581072)@?
-> (?child.P25 = ?mother)@[]

anyone holding a country's head of state position is its head of state, not taking qualifiers into account

(?headOfState.P39 = ?headOffice)@?X,
(?country.P1906 = ?headOffice)@?Y
-> (?country.P35 = ?headOfState)@[]

anyone holding a country's head of state position is its head of state, with the same start and end time

?X:(P580=?startdate, P582=?enddate),
(?headOfState.P39 = ?headOffice)@?X,
(?country.P1906 = ?headOffice)@?Y
-> (?country.P35 = ?headOfState)@[P580=?X.P580, P582=?X.P582]

Example rules: Hands-on edit

owner of is inverse of owned by

currencies that have a country are the currency of that country

a female child of a child is a granddaughter

A body of water whose mouth of watercourse is another river, is a tributary

... define more complicated familiy relationships

... come up with further examples from other domains

Collaborative notes of the session edit

Incompleteness - many SPARQL queries seem to be expecting completeness in the wikidata dataset (counting statements, logic depending on absence of statements).

"Wikidata often doesn't know what Wikidata knows" - many things that would be logically implied by wikidata statements aren't explicitly included in wikidata.

Trying to make logical implications writable by ordinary users... Keeping humans in the loop on these rules - record exceptions, or suggest how to fix data.

Already in SQID - suggestions from their rules: "MARS" - if you are logged in.

Looking for community input on rules. Right now a lot based just on constraints - particularly inverse constratins. Maybe WikiProject Reasoning...

Questions / Answers edit

Discussion about handling constraints - the city "Florence" has a spouse. New SQID surfaces this proiblem to people who might not notice the error, so helpful.

Can also code constraints as rules.

Can do complex constraints beyond what wikidata's constraint system can handle.

Can SQID be made speedier? Caching etc?

Could there be an API/batch mode returning the rule application results in machine-processable format?