Wikidata:WikidataCon 2019/Program/Sessions/Shape Expressions workshop

WikidataCon logo ID : SUB-113 Shape Expressions workshop
Speaker(s): EricP, Lucas Werkmeister (WMDE), Andra Waagmeester (Micelio), Jose Labra (Oviedo), Tom Baker (DCMI) Timeblock: tb-saturday Start: 16:00 Slides:
No slides have been added for now. If you're the speaker of this session, feel free to upload your support on Commons and add the file name here.
Room: Darwin Duration: 55min

Experience from the Linked Open Data Cloud shows that data quality is the biggest predictor of whether a collaborative database will be widely adopted. Well-structured RDF data like Uniprot tended to be backed by relational stores which ensured structural integrity. ShEx brings such integrity validation to RDF, allowing graph databases to meet the same standards in data quality, encouraging more widespread use of the data by industry, academia and governments.

ShEx is a concise, formal modeling and validation language for knowledge graphs. It can be used to define shapes within the graph. In the case of Wikidata, this would be sets of properties, qualifiers and references that describe the domain being modeled. Subsets of the Wikidata graph can be tested to see whether or not they conform to a specific shape through the use of validation tools. In this workshop, we will demonstrate the utility of ShEx. During the workshop, participants will learn how to write Shape Expressions to model and disseminate structural expectations, and use existing tools to test conformance with those expectations. We will discuss the infrastructure requirements for a healthy, interlinked data ecosystem and how to maintain a level of data quality that will attract institutional investement and dependence. These requirements have to meet the needs of the contributor community, who frankly don't always agree on a single structure. In conclusion, ShEx is to signal structural issues; insert sensible information; document schemas and check for conformance on for Wikidata items.

General Introduction into Shape Expressions (Eric Prud'hummeaux)
Shape Expressions (ShEx) is a structural schema language for RDF graphs. It allows creating machine readable data profiles, which can be used to express data schemas and profiles. These can also be used as application profiles which allows the checking for conformance to a given application.
Introducing the ShEx extension in Wikidata (Lucas Werkmeister)
In May 2019, the EntitySchema extension was added to Wikidata, making it possible to store ShEx schemas on-wiki. These can then be used to check data consistency.
ShEx Toolchain (Jose Labra)
Next to the available tooling in Wikidata there exist different external tools, parsers and libraries that support the usage of ShEx in data management of Wikidata content
Large scale testing (Andra Waagmeester)
In the Gene Wiki project shex is used to check for conformance. Here ShEx is used for large-scale conformance tests. An example will be demonstrated.
Generating ShEx from a tabular form (Tom Baker)
An application profile is a set of data shapes that apply to a given application. We will demonstrate how a simple tabular input form can be used to generate a simple application profile expressed in ShEx
Type: Workshop
Keywords: data quality, tool
Notes: #WikidataCon2019_SUB-113
People planning to attend:
  1. Ecritures (talk)
  2. Loz.ross (talk)
  3. Mlemusrojas (talk) 14:09, 26 October 2019 (UTC)
  4. ...