User:ProteinBoxBot/2016 ShEx sprint
Overall summary edit
Exploring Shape Expression to validate data added by bots.
Participants edit
Gameplan edit
- Extract a subset of data added by PBB to use as test case
- Write a shape expression that identifies if a wikidata item on a disease contains a disease ontology id.
- Identify a set of valuable data expressions.
Example Validation edit
Validate this gene
Requirements
- has exactly one: Entrez Gene ID (P351), Ensembl gene ID (P594)
- has zero or one: HGNC gene symbol (P353), HGNC ID (P354)
- has zero or more of: RefSeq RNA ID (P639), Ensembl transcript ID (P704)
- subclass of (P279) a gene (Q7187) or an item that is subclass of (P279) a gene (Q7187) (such as protein-coding gene (Q20747295))
- encodes (P688) an item that validates as a protein
- found in taxon (P703) an taxonomy item that is taxon rank (P105) a species (Q7432) or lower (is this possible?)
- successfully validate this item as a taxon?
- chromosome (P1057): a chromosome item
- successfully validate this item as a chromosome?
- if found in taxon (P703) is Homo sapiens (Q15978631):
- statement has as qualifier genomic assembly (P659) an assembly item
- strand orientation (P2548): forward strand (Q22809680) or reverse strand (Q22809711)
- if found in taxon (P703) is Homo sapiens (Q15978631):
- statement has as qualifier genomic assembly (P659) an assembly item
- has a genomic start (P644) and genomic end (P645)
- if found in taxon (P703) is Homo sapiens (Q15978631):
- statement has as qualifier genomic assembly (P659) an assembly item
Validate references also?
gene.ttl edit
PREFIX direct: <http://www.wikidata.org/prop/direct/> PREFIX ent: <http://www.wikidata.org/entity/> PREFIX p: <http://www.wikidata.org/prop/> PREFIX pref: <http://www.wikidata.org/prop/reference/> PREFIX prov: <http://www.w3.org/ns/prov#> PREFIX pstate: <http://www.wikidata.org/prop/statement/> PREFIX qual: <http://www.wikidata.org/prop/qualifier/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX ref: <http://www.wikidata.org/reference/> PREFIX schema: <http://schema.org/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX state: <http://www.wikidata.org/entity/statement/> PREFIX val: <http://www.wikidata.org/prop/reference/value/> PREFIX wikba: <http://wikiba.se/ontology#> PREFIX wikd: <http://wikiba.se/ontology#> PREFIX xml: <http://www.w3.org/XML/1998/namespace> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> ent:Q17861702 p:P1057 state:Q17861702-CA3E84B9-2404-428B-9AD2-3DE29ABB3FA2 ; p:P2548 state:Q17861702-0ED35367-33E1-4946-B508-F1437EE9A4B8 ; p:P279 state:Q17861702-7AA31ED4-C3FB-42B9-8CC6-6F4BAA1D5EA7 ; p:P2888 state:Q17861702-92391B71-07BB-4E1C-9381-408818596076 ; p:P351 state:Q17861702-B42D312B-8E55-454C-9D9B-30CE8470897E ; p:P353 state:Q17861702-FCC4CAB2-5B33-4271-B880-158CDC703397 ; p:P354 state:Q17861702-540AA318-7549-4C73-9532-827BC7D75238 ; p:P593 state:Q17861702-A34B8893-7ECC-4E49-9A39-4C548742DE8C ; p:P594 state:Q17861702-370E4E59-9E21-43F5-A6DF-10083805A620 ; p:P639 state:Q17861702-EBBC167A-027D-4008-96FA-95C66053A6CB ; p:P644 state:Q17861702-016DE9B5-A71B-4652-B4F3-E608ABB6E730, state:Q17861702-47FCA8A0-60CD-4D5B-8B8D-5E696FF32463 ; p:P645 state:Q17861702-8FDE462B-8548-4F9E-B11F-2E76CAD61229, state:Q17861702-95EA5477-1655-4A01-9E3C-29359694C3F1 ; p:P684 state:Q17861702-28BC9302-6B24-4BE1-B1DF-E221C3B9A222 ; p:P688 state:Q17861702-431573F8-5238-4758-8254-79BAFB5DE2DD ; p:P692 state:Q17861702-BD3AC56D-BEF7-40A5-89A1-5C7876428961, state:Q17861702-D03376D8-1EC0-4711-9062-2904D448D03D ; p:P703 state:Q17861702-995FB37D-C9FB-4C15-A532-30101C313EA0 ; p:P704 state:Q17861702-0DBED063-9218-468C-958D-77A716E869D4, state:Q17861702-2D01758A-4B9A-4313-A43C-7F6A412B93AF, state:Q17861702-EABB21D9-582F-42CC-BDDB-CF01CDC985A4 ; direct:P1057 ent:Q220677 ; direct:P2548 ent:Q22809680 ; direct:P279 ent:Q20747295 ; direct:P684 ent:Q18296779 ; direct:P688 ent:Q21100363 ; direct:P703 ent:Q15978631 . state:Q17861702-CA3E84B9-2404-428B-9AD2-3DE29ABB3FA2 a wikba:BestRank ; wikba:rank wikba:NormalRank ; prov:wasDerivedFrom <http://www.wikidata.org/reference/548aa04930b65df7ca12bc71748040453baf6186> ; qual:P659 ent:Q20966585, ent:Q21067546 ; pstate:P1057 ent:Q220677 .
gene.shex edit
PREFIX direct: <http://www.wikidata.org/prop/direct/> PREFIX ent: <http://www.wikidata.org/entity/> PREFIX p: <http://www.wikidata.org/prop/> PREFIX pref: <http://www.wikidata.org/prop/reference/> PREFIX prov: <http://www.w3.org/ns/prov#> PREFIX pstate: <http://www.wikidata.org/prop/statement/> PREFIX qual: <http://www.wikidata.org/prop/qualifier/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX ref: <http://www.wikidata.org/reference/> PREFIX schema: <http://schema.org/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX state: <http://www.wikidata.org/entity/statement/> PREFIX val: <http://www.wikidata.org/prop/reference/value/> PREFIX wikba: <http://wikiba.se/ontology#> PREFIX wikd: <http://wikiba.se/ontology#> PREFIX xml: <http://www.w3.org/XML/1998/namespace> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> start=@<Q17861702> <Q17861702> { # p:P1057 @wikba:BestRank ; # p:P2548 @wikba:BestRank ; # p:P279 @wikba:BestRank ; # p:P2888 @wikba:BestRank ; # p:P351 @wikba:BestRank ; # p:P353 @wikba:BestRank ; # p:P354 @wikba:BestRank ; # p:P593 @wikba:BestRank ; # p:P594 @wikba:BestRank ; # p:P639 @wikba:BestRank ; # p:P644 @wikba:BestRank+ ; # p:P645 @wikba:BestRank+ ; # p:P684 @wikba:BestRank ; # p:P688 @wikba:BestRank ; # p:P692 @wikba:BestRank+ ; # p:P703 @wikba:BestRank ; # p:P704 @wikba:BestRank+ ; # direct:P1057 @<Chromosome> ; # direct:P2548 @<Strand> ; # direct:P279 @<ProtCoding> ; # direct:P684 @<Gene> ; # direct:P688 @<Q21100363> ; # direct:P703 @<Species> } wikba:BestRank { a [wikba:BestRank]; wikba:rank [wikba:NormalRank]; prov:wasDerivedFrom @<Ref> } <Ref> { pref:P248 IRI; ( pref:P594 LITERAL | pref:P351 LITERAL ; pref:P813 xsd:dateTime ; val:P813 IRI ) } <Chromosome> { } <Strand> { } <ProtCoding> { } <Gene> { } <Q21100363> { } <Species> { }