Wikidata : WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-07-27
Gene Wiki Project
Wikipedia--create articles for human genes
Wanted to add structured data to Wikipedia articles, but Wikipedia about text--text mining, Natural Language Processing
2012--moved attention to Wikidata once started
Free and CC0
Queryable
Stable with active editors
Gene Wikiproject monitors data sources and synchronizes them with Wikidata
Tried to get as much as possible on coronaviruses into Wikidata in March 2020
What is known in Wikidata on coronaviruses
Wikipedia--Wikidata links
SPARQL queries
Shape Expressions
Shape Expressions--language to describe and validate RDF data
Human readable
Aligns with Turtle and SPARQL
RDF and knowledge graphs
RDF graphs can be merged
Reusable
SPARQL endpoints--not very well documented and can be difficult to identify full extent of subset of data interested in
Shape Expressions--language to describe shapes
Understand contents of RDF graph
Can be used to generate user interfaces
May 2019--entity schemas introduced to Wikidata
Allows storing Shape Expressions in Wikidata
Describe expected shape of entities
Check if entities conform to shape
Created Shape Expressions for project
Virus strain
Enriched Wikidata with data on 7 human coronaviruses
ShEx Community Group: http://shex.io
Shape Expressions are not constraint violations
Can be specific to your own use cases
Expectation of a user or description of data donator--tool to align the datasets
Items on authors
Extra with wdt:P31 means every item on author should have instance on human, but okay if there are other instances
With Given name the * indicates we expect 0 or more given names--means optional
Sex or gender with ? means accept multiple statements
Occupation must be author--only 1 value in this use case--would render an error if multiple
Can include values that are deemed acceptable
Can validate and detect errors
Either decide that needs to be fixed in Wikidata or adapt schema to accommodate
Can decide to ignore errors and push data to Wikidata
Working on supporting Shape Expressions in bot work pre-ingestion--used by bot operator or person
Simple ShEx: https://shex-simple.toolforge.org/wikidata/packages/shex-webapp/doc/shex-simple.html
Wikishape another interface https://wikishape.weso.es/
User-friendly--uses visualizations
Can do validations
Other uses for ShEX in Wikidata other than validating data via queries
Documentation
Validating from local data dumps
Communication with other communities about how datasets defined
Can validate data coming into local Wikibase using ShEx
Wikidata’s Starting Point for Entity Schemas: https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas
Categorical listing of existing Entity Schemas to get ideas: https://www.wikidata.org/wiki/Wikidata:Database_reports/EntitySchema_directory
Thad’s example schema of “what is a Nobel Prize Winner?” https://www.wikidata.org/wiki/EntitySchema:E126
Uses comments to explain decisions
Can get input/help
Other good examples: https://www.wikidata.org/wiki/EntitySchema:E37
https://www.wikidata.org/wiki/EntitySchema:E89
Can create set of linked shape expressions
Did for coronaviruses
Can build on other Shape Expressions
Can use Shape Expressions to look for problems like statements lacking references
Script recommended via chat when looking for errors in ShEX conformance test:
importScript("User:Teester?EntityShape.js”)
Find entity schema in Wikidata--type e: in Wikidata search box
Are there related tools that can generate lists of items which violate an entity schema, and what the violation is by item?
Main entry point for entity schemas-- still Wikidata query service
Possible to query for constraint violations, but pattern complex
Example of query for distinct value constraint violations involving the VIAF ID: https://w.wiki/3gwT
Wikishape (https://wikishape.weso.es/) can be used to extract shape from a Wikidata item (shexer)
Build ShEx from simple CSV files: https://github.com/johnsamuelwrites/ShExStatements ← Make sure to look at the conference slides/pdf from John Samuel
In a presentation by WESO, where it was suggested that the Schemas could also provide a better user account control, e.g., who could edit what. Have you experimented with this?
Has not
Forthcoming--lock on an entity schema
Andra has stored entity schemas in github as well
Main advantage in using entity schemas
If received complaint about design decisions had to go back through documents--can be hard to find answer
Once consensus shape expression linked to documentation--can be used to verify design decision
Book: Validating RDF Data: http://book.validatingrdf.com/
Virtual hackathons--could get together to work on writing schemas for specific entities
The Q&A notes were not clearly separated from the presentation notes