Wikidata talk:WikiProject Schemas

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. For the archive overview, see Archive/. The latest archive is located at Archive/2024.

Meeting Notes edit

14 Feb 2018 edit

ericP: what is our mission Andra: to raise awareness and grow a community ... we can grow the WikiProject page over time ... use cases for ShEx ... namespace for ShEx- Lucas' idea from WikidataCon, should host shapes on their own URI, what would that URI be?

ericP: conflicting intrests- give ownership to WD community, conflicting one is to have them visible so people can copy and steal shapes even if they are outside wd community ... maybe move shemas over to WD and then mirror to shex schemas space when you want

Lucas: To create a new namespace- prob not trivial. Even if no one argues, the technical side might be complicated.

Andra: maybe postpone this until we have more use cases. ... Kat's http://wikidp.org/ demo- can we use this for additonal domains by driving it with shex- property checklist driven by shex ... We could create a generic version of the portal, containerize it and then poeple could slot in their own shape expressions to create their own property checklists ... need shapes avail through URL to reuse Harold Solbrig's pyshex, so that is why i need a namespace for shape URIs

ericP: demo manifests to run in Eric's or Jose's implementation- like the primer try it links- 1. create manifests so ... good queries and validation tests either that are picked up remotely, or static data, create the schemas that will be shared, demo data, and manifests in a picklist ... demos show why validation is useful, hints on how is used in different domains, give people ideas ... wiki page with try it links, if we have a data structure, we can express it like this, that catches errors like this, help people

Andra: create a page similar to the example queries


TODOs for next meeting:

   Lucas- ask around WMDE about how to request a new namespace
   Kat- create an example on the WikiProject page
   Andra- create an example on the WikiProject page
   Kat- paste notes in the talk page of the WikiProject
  ? Create phabricator ticket for a new namespace?

Examples and tools edit

  WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Could you please first provide examples of ShEx shapes that check particular data models in Wikidata and guidelines how to check Wikidata against this shapes? I'd prefer

  • a web form tailored to Wikidata to edit and check shape expressions with syntax highlighting and typeahead, such as Wikidata query service
  • a bot that regularly runs ShEx given at Wiki pages and posts the results, such as User:ListeriaBot

-- JakobVoss (talk) 07:15, 22 February 2018 (UTC)Reply

I've just added a "Tutorials and examples" section on the project homepage, with a very basic example on how to get started with ShEx2. Please help improving! (Thanks to Eric for fixing two minor issues in ShEx2!) Jneubert (talk) 12:06, 25 June 2018 (UTC)Reply
Updated version of How to get started with ShEx on Wikidata? - please help improving. --Jneubert (talk) 14:35, 25 July 2019 (UTC)Reply

Wikidata ShEx Inference tool edit

  WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Hi folks! I’ve been working on a tool to automatically infer ShEx schemas from Wikidata items, and a first version of the tool is now available at toolforge:wd-shex-infer (documentation). I would be very thankful if you could try it out and let me know how it works for you, preferably within the next two weeks (the tool will stay available after that, but eventually I’ll have to write and hand in my thesis). Let me know if you have any questions! --Lucas Werkmeister (talk) 12:33, 16 August 2018 (UTC)Reply

Some initial observations: This tool is a great idea and could potentially become very useful — thanks! It's understandable that only a small number of jobs can be run at any time, but it would be nice to be able to submit jobs into a queue if they cannot be run immediately. The tool tips when exploring the ShEx results are helpful. I haven't seen references covered in the ShEx output, but it would be handy to be able to run some jobs specifically to explore the data model used for references on items of particular types. --Daniel Mietchen (talk) 02:29, 18 August 2018 (UTC)Reply
@Daniel Mietchen: thanks! I’ll think about adding a job queue, depending on how many people use the tool. And currently, qualifiers and references are ignored, yes – I’m afraid that the way RDF2Graph works doesn’t really work well with them (it heavily relies on “instance of” and “subclass of” relations, so it would see all statement and reference nodes as equivalent, since they all have the type wikibase:Statement/wikibase:Reference). It might be possible to fix that, but I don’t think I’ll have time for that before my thesis is done. --Lucas Werkmeister (talk) 12:14, 22 August 2018 (UTC)Reply
Friendly reminder that the next few days would be an especially helpful time for feedback :) it should also be possible to run two jobs at once now. Please let me know if there are any problems! --Lucas Werkmeister (talk) 17:56, 28 August 2018 (UTC)Reply
I’ve also updated the tool to fix several problems with the simplification step, so now the schemas should look much nicer. For example, compare the shape for human (Q5) between job #11 and job #29 (both for “films that won ten or more Oscars”): five target classes for nominated for (P1411) were merged into one (award (Q618779)), as were nine target classes for award received (P166); eight target classes for country of citizenship (P27) were merged into two (political territorial entity (Q1048835) and political system (Q28108) – that second one is probably a bug in the data); and so on. You might even see completely new predicates be mentioned, because the tool drops any predicate with more than ten possible target classes (rationale: that’s pointless noise), so predicates which would previously have been dropped might now be included due to the target classes being merged. If you were dissatisfied with the schemas before, perhaps take another look? :) --Lucas Werkmeister (talk) 15:49, 6 September 2018 (UTC)Reply

You can now try Shape Expressions on a test system edit

  WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Hello all,

The Wikidata team started working on support for Schemas, specifically Shape Expressions, to integrate a new extension into Wikidata, in order to store and reuse Schemas.

It’s still in development, but we wanted to share the first results with you, so you can give us early feedback.

On the test system, one can create and edit Schemas. You can see an example Schema here.

Please note that the multilingual labels, descriptions and aliases are not enabled for now, this is the next step we will work on. After that we will work on linking to a tool that allows you to check the Schema against a list of Items.

If you have any questions or remarks at that stage, please let me know by replying to this section :) If you want to create Phabricator tickets, you can use the tag Shape Expressions.

Cheers, Lea Lacroix (WMDE) (talk) 14:13, 26 February 2019 (UTC)Reply

Improvements on ShEx test system edit

  WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Hello all,

Our developers keep working hard on Shape Expressions, and we would love to have your feedback on the current version :)

Here's what has been improved recently:

  • the "termbox" area of the page now displays several languages
  • if you switch your interface from English to a language that has label filled, the title of the page will change accordingly
  • if you want to add a label/description in a new languages, two options are possible: you can switch your interface in this new language, and an editable line will appear in the table, or you can edit directly the URL to access the special page, eg. https://wikidata-shex.wmflabs.org/wiki/Special:SetSchemaLabelDescriptionAliases/O2/fr
  • there is no more edit button on the top of the page, but the different sections are independantly editable
  • A new special page, Special:SchemaText, provides the raw text of the Schema in an external file. Example: https://wikidata-shex.wmflabs.org/wiki/Special:SchemaText/O2

And here's what is coming next:

  • the "edit" buttons will be translated in the language of your interface
  • we will add a button to check the schema in the validator tool

Feel free to try the interface on the test system, create new schemas, play around. If you find any issue, or if there is a feature/improvement that you would like to add, please let me know :)

Cheers, Lea Lacroix (WMDE) (talk) 09:14, 14 March 2019 (UTC)Reply

Please explain edit

What is the purpose and how will it affect the existing structure that is opaque. That can not be explained to me (I have asked repeatedly). What are the material benefits of this approach? Thanks, GerardM (talk) 16:45, 16 March 2019 (UTC)Reply

Community requirements for data integrity edit

@GerardM: Hi, for completeness and to make sure we're addressing your issues, could you link to your previous requests for explanation?

While not all Wikidata communities require or even desire validation, it is essential for some of the more complex ones, e.g. GeneWiki (c.f. GeneWiki grant proposal). Such validation can be hand-rolled, but having a standard schema language offers obvious advantages in terms of tooling, completeness and ease of maintenance. Compiling even a simple ShEx schema to SPARQL produces a 10-100x explosion in line noise and scripting something with conjunction of JSON path expressions would require tooling investment and would require maintenance of a corpus of rules to enforce cardinality, data type consistency and structural coherence. It would be possible to invent a Wikidata-specific schema language but it would lack the tooling support that ShEx offers (validators in five languages, form-generation, import from UML/XMI, etc).

I've witnessed many publicly-curated databases lose relevance as their data rotted over time or changed structure so that potential users gave up trying to track it. Open PHACTS was founded specifically to provide integrity and consistency to Linked Data. Domain-specific databases typically have greater institutional investment because they offer integrity and consistency backed by schemas (e.g. UniProt, whose RDF structure reflects a conventional SQL (DDL) schema for genes and proteins). General knowledge stores have to add schema validation because their native schema is not domain-specific but instead one of generalized assertions, which can express incoherent data structures as easily as coherent ones.

Of course not all communities demand validation, but I believe that the offer of testable contracts to ensure the longevity and institutional investment in Wikidata more than justifies this effort.

--EricP (talk) 07:00, 18 March 2019 (UTC)Reply

When technology is introduced that enforces particular behaviour, it is all too easy to use the same technology elsewhere when at first glance a similar situation exists. So you have been abstract in your answer and it does not satisfy. I am familiair with SwissProt/UniProt from my Wikiprotein days. I know that Wikidata is not as good as Wikiprotein used to be. The quality of the data is not the issue, the issue is that a schema enforces. It follows that a certain "completeness" will be enforced and that is not necessarily a good thing. What I learned at Wikiprotein is how vital it is that people include information that is valid but not necessary complete.
In conclusion, what is it EXACTLY what you aim to achieve/enforce? Thanks, GerardM (talk) 11:11, 18 March 2019 (UTC)Reply
ShEx or any schema language is not about enforcing, it is more instrumental to checking for conformance. As a data-consumer I want to be able to check data consistency according to relevant data-models. Relevant to me, not necessarily to you. There are many case where even within a single application multiple schema's could apply, depending on the use case. As you say it is crucial that people include data that is valid, not necessarily complete. There is no intention to enforce, only to be able to check the validity. --Andrawaag (talk) 11:40, 18 March 2019 (UTC)Reply
You asked EXACTLY what we aim to enforce. It would be tedious to enumerate everything but as an example, in Gene Wiki we want to know when an item on a protein doesn't have properties related to genes (e.g. chromosomal location) AND that a genomic build is missing as a qualifier to the statement on the gene location, making the statement non-sensical. When these inconsistencies occur having flags indicating these inconstancies being part of the workflow, tremendously helps in curating protein and gene information. Early prototypes of this system have already help me fixing errors. --Andrawaag (talk) 12:10, 18 March 2019 (UTC)Reply
That makes perfect sense. So in conclusion the intention is to signal structural issues in order to help people insert sensible information and to use it as a template to query those records that fail a "sanity"check. Thanks, GerardM (talk) 15:16, 18 March 2019 (UTC)Reply

Update documentation edit

Hello dear ShEx enthusiasts!

Because we will release Schemas on Wikidata very soon, I'm currently reviewing the existing documentation. When I announce it, I expect a lot of people in the Wikidata community to wonder "what is it exactly? how can write my own?"

The main links I'll redirect people to is your Wikiproject page and Wikidata:WikiProject ShEx/How to get started?. Is this second page still up to date from your point of view?

I think that now would be a good time to give a bit of polish to the presentation of shape expressions. From the development team side, will add technical documentation about the new extension and data type.

If you have any question or wish, feel free to ping me. Cheers, Lea Lacroix (WMDE) (talk) 15:11, 23 April 2019 (UTC)Reply

Shape Expressions arrive on Wikidata on May 28th edit

See full announcement on the Project Chat :)

Thanks a lot to all of you who have been involved in discussing, suggesting improvements, testing the feature! Lea Lacroix (WMDE) (talk) 13:30, 19 May 2019 (UTC)   WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.Reply

Hello all,
As announced here, we just released shape expressions on Wikidata. You can for example have a look at E10, the shape for human, or create a new EntitySchema.
A few useful links:
If you have any question or encounter issues, feel free to ping me. Cheers, Lea Lacroix (WMDE) (talk) 16:07, 28 May 2019 (UTC)Reply
Indeed it's CC0. Thanks for the reminder! I created a ticket. Lea Lacroix (WMDE) (talk) 07:18, 29 May 2019 (UTC)Reply

Are the following validations possible? edit

1. Ensure that at least one statement for a given property (where multiple statements exist) has a value in a specified value set. If other statements exist for the property, ignore them. For example, validate that an item has at least the statement instance of (P31) sovereign state (Q3624078), but may also have other instance of (P31) statements that should be ignored. --Dhx1 (talk) 18:35, 28 May 2019 (UTC)Reply

  • 1. Yes, the keyword EXTRA says that other values of the property may appear. This is common for P31. This example shows a schema with a simple value set [<Qx> <Qy>]. (In many schemas, that's a value set of 1 element.) <Q2> fails <WithoutExtra> because it has an extra P31 (outside the value set).but it passes <WithExtra>. I added a <Q3> which has two P31's within the value set. There you don't need an EXTRA, you need instead to increase the number of expected P31s matching the value set. I added + which is a shorthand for {1,}, i.e min number of 1, max number unlimited. --EricP (talk)

2. Extract data on linked Wikidata items using EXTERNAL (?) or some other technique, allowing a country (P17) statement to be validated to ensure the linked item has a statement instance of (P31) sovereign state (Q3624078).

  • 2. Yes, but you don't need EXTERNAL. If I understand the question, you just want your constraints to link to another resource in the wikidata world. I created a shape for national flags as an example. It has the constraint below (which 90% of flags fail, but...) to say that the NationalFlag must have a country with a given type. --EricP (talk)
   wdt:P17 { wdt:P31 [wd:Q3624078] }
      • at present, no, though there is a proposal to directly compare the value of some TripleConstraint against a property path, which is relatively simple to implement, and another to add more generic functions (example), which is more powerful but more complex. Aside from picking between the two, we also have to decide if we want to break the locality features of ShEx to add either one of them. --EricP (talk)

Validate in Blazegraph/query server ? edit

It would be interesting if these schemes could be used directly on query server, i.e. filter for items that match, check if items match, list errors. --- Jura 10:38, 29 May 2019 (UTC)Reply

Running validation with API access (i.e. getStatements()) would greatly accelerate validation and reduce parsing and serialization effort on the query server. ---EricP (talk)

Structure e-entities ? edit

There are a few essential, but secondary elements sometimes included on entities:

  • queries of items that could be validated
  • lists of prefixes

I think the first could easily go into the long announced "query"-namespace. The second could probably be assumed in the configuration of whatever tool one uses, at least if they are WD prefixes. --- Jura 10:38, 29 May 2019 (UTC)Reply

links between entities schema edit

I couldn't figure out the way to refer from an entity schema to another: for instance, I would like to be able to write from E36 entry point something like wdt:P629 @<someprefix:E35> Is that possible? Is is the right pattern to have several EntitySchema to describe different shapes of a schema? pinging the ShExperts ;) @Andrawaag, YULdigitalpreservation, Jelabra, Tombakerii: -- Maxlath (talk) 15:35, 29 May 2019 (UTC)Reply

I am definitely a ShEx beginner as well, but I have found the import command as described in [1] and [2] which looks promising. You can access the raw ShEx schema code via Special:EntitySchemaText (e.g. Special:EntitySchemaText/E10).
Unfortunately, I don't get it to work in the shex-simple tool, and I am not sure whether this is due to my poor ShEx skills, or some bug in the tool (error message is: "failed to create validator: loadImports@https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/browser/shex-webapp-webpack.js:53845:9 …"). —MisterSynergy (talk) 09:31, 30 May 2019 (UTC)Reply
I'll dive into this. @MisterSynergy, can you pass me an experiment that failed and I'll see if I can tweak it to make it succeed? (One requirement is of IMPORT <XXX> is that XXX returns the schema without any HTML around it; also that we don't get defeated by CORS issues which require administration beyond my fingertips. — EricP (talk)
For instance this one (sorry for the non-clickable link, there are several unmasked characters which I don't want to change in order not to break the link):
  • https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?schema=PREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20%3A%20%3Chttps%3A%2F%2Fwww.example.org%2F%23%3E%0A%0Aimport%20%3Chttps%3A%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE48%3E%0Astart%20%3D%20%40%3Asportsperson%0A%0A%3Asportsperson%20EXTRA%20wdt%3AP106%20{%0A%20%20wdt%3AP106%20[%20wd%3AQ2066131%20]%3B%0A%23%20wdt%3AP22%20%40%3Chuman%3E%0A}&data=Endpoint%3A%20https%3A%2F%2Fquery.wikidata.org%2Fsparql&shape-map=SPARQL%20%27%27%27SELECT%20DISTINCT%20%3Fid%20WHERE%20{%20%3Fid%20wdt%3AP106%20wd%3AQ2066131%3B%20wdt%3AP22%20[]%20}%20LIMIT%2010%27%27%27@START&interface=human&regexpEngine=threaded-val-nerr
It uses EntitySchema:E48 via Special:EntitySchemaText/E48 (raw shex without any HTML around—just click on it). I already tried several things, including this older version of E48 with prefixes. Note that E48 does not have a "start" command, as required for imported shape expressions. In the simple-shex tool, you'll see that the line that would actually make use of the imported shex is commented because it does not work anyways.
The error message displayed in Google Chrome is failed to create validator TypeError: Cannot read property 'keepImports' of undefined at loadImports (https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/browser/shex-webapp-webpack.js:53845:23). Sounds like a Javascript issue, but I am not very experienced with that… Thanks for investigating, —MisterSynergy (talk) 15:28, 30 May 2019 (UTC)Reply
One engineering decision is whether that import would be just textual, like C's *#include*, or whether the prefixes (and inclusion there-of) should appear in the JSON (ShExJ) and RDF (ShExR) versions of the schema. You may want to raise a language issue with the tag "enhancement". — EricP (talk)

What to do with duplicate schemas? edit

  WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Hi all, since people are already working to do their own schemas, and since we still didn't set up a list of all existing ones, there are already a couple of them who are basically the same thing, like E10, E14 and E48. What do we do in this case? Do we cancel them or "reuse" them? --Sannita - not just another it.wiki sysop 14:32, 11 June 2019 (UTC)Reply

Hello,
Not directly answering your question, I just wanted to point to a few tickets - we will continue improving the software in the future.
Cheers, Lea Lacroix (WMDE) (talk) 08:58, 13 June 2019 (UTC)Reply
Some more input: I do not think that we should be concerned about duplicates at this point. ShEx is a relatively new functionality and there is quite a lot of dev work going on, as well as the community needs to become familiar with it. According to [3], there are not that many EntitySchemas created until now. Later, we probably want to either merge duplicates (i.e. redirect the E-numbers), or simply allow "duplicated" EntitySchemas. Reuse does not seem to be a good idea, though. --MisterSynergy (talk) 09:24, 13 June 2019 (UTC)Reply

CheckShex UserScript edit

I thought this project might be interested in a new userscript named CheckShex. It adds a field to items, properties, lexemes where you can enter an entitySchema and it will return whether it passes or fails. It also adds a field to entitySchemas, where you can do the reverse. The userscript can be installed to your common.js from User:Teester/CheckShex.js.

The userscript is backed by an api based on PyShEx (Q51672520). The api is located at https://tools.wmflabs.org/pyshexy/api and details about its use are at https://tools-static.wmflabs.org/pyshexy/. Teester (talk) 11:56, 22 June 2019 (UTC)Reply

Thanks for this great tool! Sometimes however, I get strange results: Checking Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202) against "E94", I get "Pass Fail" as message. When I hit "Check" again, I get "Fail". This behaviour seems not to be reproducible, but I encountered it once for 20th Century Press Archives (Q36948990), too. A hint may be that hitting "Check" again on an item page after "Pass" always results in "Fail". From E94, both items are validated consistently as passing. Jneubert (talk) 09:55, 20 July 2019 (UTC)Reply
Thanks. There was a bug in the userscript where when you hit check more than once the schema would be checked against itself rather than the item being checked against the schema. I wonder if the "Pass, Fail"" behaviour is from clicking "Check" a second time before the check is complete and running into the bug?
Looking at the items, Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202) currently fails against E94 because of a missing parent organization (P749), while 20th Century Press Archives (Q36948990) currently passes. I get this result when using both the user script and the ShEx2 Validator. For the ShEx2 Validator, a query like this gets you just that item to validate:
SELECT ?item WHERE {BIND(wd:Q36948990 as ?item)} LIMIT 1
Try it!
A simpler way:
SELECT (wd:Q36948990 as ?item) WHERE {}
Try it!
--Vladimir Alexiev (talk) 09:00, 9 January 2020 (UTC)Reply
Let me know if there are any other bugs or problems. Teester (talk) 14:21, 20 July 2019 (UTC)Reply
A big sorry - I'm currently figuring out possible workflows, and indeed have made E94 more strict, which causes it to fail with Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202), while it passes the new relaxed E95. This messed up the test case - sorry again!
Now in multiple tests with some arbitrary clicking, I was not able to reproduce a case with "Pass Fail", so I suppose this is gone together with your bug fix, which also worked consistently well. Thank you for the quick fix! --Jneubert (talk) 08:01, 21 July 2019 (UTC)Reply
May I suggest a possible extension of the script? The API already returns the reason for failing (e.g., [4]). So it should be possible to show it to the user on request (with a popup/mouse-over perhaps, because the messages do not look nice, but are helpful nonetheless). --Jneubert (talk) 08:10, 21 July 2019 (UTC)Reply
Great idea. I've updated the user script so that now it shows some error information on failure. Now, if there's a missing or incorrect property in the response, the property number is shown beside the Fail message. Additionally, the raw error response is available on mouse over of the fail message. Teester (talk) 11:03, 23 July 2019 (UTC)Reply
This is fantastic - thank you so much. --Jneubert (talk) 14:37, 23 July 2019 (UTC)Reply
While adding the tool to the How to get started ... page, another possible improvement came to mind: On the item page, a tiny "schema" link, right of the validating result, would make it super-easy to navigate to the selected schema. --Jneubert (talk) 17:57, 23 July 2019 (UTC)Reply
I added more suggestions at User_talk:Teester/CheckShex.js#Usability --~~

Add saved queries to EntitySchema entries? edit

The "check entities against this Schema" link on the schema pages is a great thing. However, it requires newbies and experts alike to write a query from scratch, which is tedious. Some Schema authors are working around this by embedding example query code in the schema text as comment - which helps, but looks a bit messy, and still needs manual copy+paste for transfer to the query field.

So it would be great if we could save a query - or even better, muliple named queries - with the schema. The code to load queries and allow for user selection is already in place (see ShEx2 on Toolforge) with the "dataLabel" and "queryMap" parameters in the manifest file (though perhaps not yet as http request query parameter).

On the Wikidata/Wikibase side, I wonder if setting the Wikidata SPARQL query equivalent (P3921) property could be enabled for EntitySchema entries. Together with subject named as (P1810) qualifiers, that would allow for multiple queries to be saved with each schema. --Jneubert (talk) 09:31, 21 July 2019 (UTC)Reply

My idea was to re-use the property definition at EntitySchema:E123 in order to use it there directly (formatted as a link to ShEx2), not to add the property to an item about the schema. --Jneubert (talk) 09:46, 21 July 2019 (UTC)Reply
This is currently not supported. Once an item is associated with a schema, you should be able to load its content on the schema page with LUA. --- Jura 10:09, 21 July 2019 (UTC)Reply

Have there been improvements in this regard? Having a formal association of shape with query is essential for example for a reporting bot. --SCIdude (talk) 14:28, 10 November 2020 (UTC)Reply

I think there is a solution if you have the shape in RDF, see https://stackoverflow.com/questions/65618009/rdf-namespace-that-can-describe-sparql-queries. It uses these namespaces:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix shex: <http://www.w3.org/ns/shex#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix spin: <http://spinrdf.org/sp> .

<http://XYZ> a shex:Shape ;
  ...
  rdf:type skos:Concept.

<http://someuri> a spin:Select ;
  spin:text "SELECT ?item WHERE { ?item wdt:P31 xyz }";
  rdf:type skos:Concept.

<http://someuri> skos:related <http://XYZ>.

--SCIdude (talk) 08:56, 8 January 2021 (UTC)Reply

Comparison between ShEx and constraints edit

What’s possible with ShEx that is not with constraints and vice versa ? What I got so far is:

  • Constraints are tight to a property, shapes are « free » to be checked against any item and reused
  • Constraints are somewhat easier to edit textually, more efficient
  • Constraints are automatically checked by Mediawiki.
  • Shapes are more powerful, for example it’s possible to express something like any property that is not authorized is impossible

Anything else/wrong ? It’s unclear to me how type shape constraints can be dealt with on Wikidata, as « rdf:type » is irrelevant in Wikidata items. Wikibase has domain and range constraints, I’m not sure this can be dealt with with shape expressions as it seems there is no notion analog to Sparql PropertyPath’s in shex. author  TomT0m / talk page 19:42, 21 July 2019 (UTC)Reply

Correction, it’s definitely possible to express paths, my bad (this is used on the example shape for file formats, and for example showed in the 13th slide of this comparison between shex and shacl). author  TomT0m / talk page

Could we generate constraint from shapes and vice-versa ?

author  TomT0m / talk page 19:42, 21 July 2019 (UTC)Reply

Lack of help edit

There is also a lack of help. No mention of schemas in the Help namespace so far. There should be Help:Schemas just like Help:Constraints. -- JakobVoss (talk) 08:48, 25 October 2019 (UTC)Reply

In my opinion the technical references, links to standards and implementations should be removed. For an overview about SheX in general there is en:ShEx. This page in contrast should focus of use of SheX in/for Wikidata. -- JakobVoss (talk) 12:46, 26 October 2019 (UTC)Reply

Request a Schema page edit

Schemas are still hard for people for various reasons. We had the same problem with queries and one thing that beautifully helped was the Request a Query page. There anyone who doesn't know how to write sparql can ask for help from people who can. I think a similar Request a Schema page could be super helpful to get more wiki projects to adopt Schemas. Thoughts? --Lydia Pintscher (WMDE) (talk) 13:27, 31 October 2019 (UTC)Reply

  Support Currently, there are only around 140 entity schemas. This number may be possibly improved with the creation of a dedicated page for schema related questions. John Samuel (talk) 13:32, 31 October 2019 (UTC)Reply
  Support Fabulous idea. Do we have people willing to build schemas on request? - PKM (talk) 03:13, 15 November 2019 (UTC)Reply
  Support Request a query is super-helpful, and an equivalent for schemas would be great. --Oravrattas (talk) 06:43, 8 July 2020 (UTC)Reply

Human readable schemas edit

One of the biggest problems with schemas right now is that they are difficult to understand without sufficient technical knowledge. But it seems to me that it should be possible to translate a schema into human readable language without too much difficulty, for the most part.

For example, if my understanding of shex is correct, currently E10 could be translated as follows:

Schema Translation
 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

start = @<human>

<human> EXTRA wdt:P31 {
  wdt:P31 [wd:Q5];
  wdt:P21 [wd:Q6581097 wd:Q6581072 wd:Q1097630 wd:Q1052281 wd:Q2449503 wd:Q48270]?;   # gender
  wdt:P19 .;                     # place of birth
  wdt:P569 . + ;                 # date of birth
  wdt:P735 . * ;                 # given name
  wdt:P734 . * ;                 # family name
  wdt:P106 . * ;                 # occupation
  wdt:P27 @<country> *;  # country of citizenship
  rdfs:label rdf:langString+;
}

<country> EXTRA wdt:P31 {
  wdt:P31 [wd:Q6256 wd:Q3024240 wd:Q3624078] +;
}
  • start with <human>

I could see this sort of thing being useful as part of a schema's talk page, similar to how property's talk pages contain a template containing useful information about a property and its constraints. Does anyone know of a service which will translate a schema into human readable language or vice versa? Teester (talk) 13:46, 16 November 2019 (UTC)Reply

Since there seems to be nothing that can translate schemas into human readable language, I've put something together at https://tools-static.wmflabs.org/shextranslator/ Any feedback would be appreciated. Teester (talk) 12:23, 23 November 2019 (UTC)Reply
  • Schemas have great potential to be come a good tool, but, in its present implementation, I don't think we can or should expect from users to rely on them as a primary mean of understanding which properties to add or what statements to fix.
  • A human readable version should always be outlined on a WikiProject page or with property constraints. --- Jura 12:42, 23 November 2019 (UTC)Reply

PyShexy and sparql query edit

https://tools-static.wmflabs.org/pyshexy/ Have anyone figured out a way to get it to work with a sparql query? I tried hard but failed, I get HTTP 500 error. Example: query, pyshexy url--So9q (talk) 23:33, 25 November 2019 (UTC)Reply

Troubles edit

  WikiProject Schemas has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

I'm a big fan of shapes, extensively reviewed the "Validating RDF" book, try to use them in my work, and we at Onto are helping with the rdf4j effort (though that's SHACL not SHEX). I'm quite enthusiastic about the Wikidata ShEx project and see a lot of good things.

But I tried to validate a realistic list, eg BG painters (this selects 100 of 310 on WD) against E10:

select ?item {?item wdt:P106 wd:Q1028181; wdt:P27 wd:Q219} limit 100
Try it!

and I think the results are not quite usable yet.

PyShexy edit

PyShexy just gave up on me, even with limit 1 it returns "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."

shex.js edit

ShEx.js behaves better (paste the query in the box).

But there are still usability problems:

  • Some of the errors are reported many times, eg (I cut to only the first few) cc @EricP::
wd:Q284264@!START
validating http://www.wikidata.org/entity/Q284264 as //www.wikidata.org/wiki/Special:EntitySchemaText/human:
    validating http://www.wikidata.org/entity/Q12287013:
        Missing property: http://www.wikidata.org/prop/direct/P19
        Missing property: http://www.wikidata.org/prop/direct/P19
        Missing property: http://www.wikidata.org/prop/direct/P19

wd:Q6957611@!START
validating http://www.wikidata.org/entity/Q6957611 as //www.wikidata.org/wiki/Special:EntitySchemaText/human:
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P569
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P569
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19

wd:Q11317581@!START
validating http://www.wikidata.org/entity/Q11317581 as //www.wikidata.org/wiki/Special:EntitySchemaText/human:
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  Missing property: http://www.wikidata.org/prop/direct/P19

wd:Q12283051@!START
validating http://www.wikidata.org/entity/Q12283051 as //www.wikidata.org/wiki/Special:EntitySchemaText/human:
    validating http://www.wikidata.org/entity/Q12299788:
      validating http://www.wikidata.org/entity/Q12283051:
        validating http://www.wikidata.org/entity/Q28194288:
            Missing property: http://www.wikidata.org/prop/direct/P19
            Missing property: http://www.wikidata.org/prop/direct/P19
            Missing property: http://www.wikidata.org/prop/direct/P19
            Missing property: http://www.wikidata.org/prop/direct/P19
            Missing property: http://www.wikidata.org/prop/direct/P19
  OR
  validating http://www.wikidata.org/entity/Q28194288:
      Missing property: http://www.wikidata.org/prop/direct/P19
      Missing property: http://www.wikidata.org/prop/direct/P19
      Missing property: http://www.wikidata.org/prop/direct/P19
      Missing property: http://www.wikidata.org/prop/direct/P19
      Missing property: http://www.wikidata.org/prop/direct/P19
      Missing property: http://www.wikidata.org/prop/direct/P19
  • It takes over 10s for some of the more difficult items. This isn't scalable.

WikiShape edit

http://wikishape.weso.es/ by @Jelabra: runs validations in parallel so even though the hard items (eg 1,2,4 the count is zero-based) are spinning 10 min already, I can inspect the easier items.

  • I think the hard items have relatives, so they cause recursive validation (see next section) and I'm doubtful their validation will ever finish.
  • Parallel threads reuse validations of subsidiary entries, which is great: after 100, it added 27 "country", "language" and "human", and each is checked only once.
  • The error messages are quite hard to grok, see below for 6 wd:Q3650675. It'd take me probably 20 min to understand what's wrong.
Error: None of the candidates matched. Attempt: Attempt: node: wd:Q3650675, shape: <internal://base/human>
Bag: C0,C1?,C2,C3+,C4*,C5*,C6*,C7*,C8*,C9*,C10*,C11*,C12*,C13*,C14*,C15*,C16*,C17+,C18*
Candidate lines:
CandidateLine: ((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C0)
((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1)
((<http://www.wikidata.org/prop/direct/P569>,"1827-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3)
((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q15501913>),C4)
((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6)
((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@de),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Никола Образописов"@bg),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@sq),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@nl),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@es),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@en),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@ga),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@fr),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@pt),C17)
CandidateLine: ((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1)
((<http://www.wikidata.org/prop/direct/P569>,"1827-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3)
((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q15501913>),C4)
((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6)
((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@de),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Никола Образописов"@bg),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@sq),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@nl),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@es),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@en),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@ga),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@fr),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@pt),C17)
((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C18)

Or look at 10 wd:Q3804651:

Error: None of the candidates matched. Attempt: Attempt: node: wd:Q3804651, shape: <internal://base/human>
Bag: C0,C1?,C2,C3+,C4*,C5*,C6*,C7*,C8*,C9*,C10*,C11*,C12*,C13*,C14*,C15*,C16*,C17+,C18*
Candidate lines:
CandidateLine: ((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C0)
((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1)
((<http://www.wikidata.org/prop/direct/P569>,"1864-05-18T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3)
((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q21104340>),C4)
((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6)
((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ga),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@en),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ast),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@nl),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@de),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@it),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@sq),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@fr),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@es),C17)
CandidateLine: ((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1)
((<http://www.wikidata.org/prop/direct/P569>,"1864-05-18T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3)
((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q21104340>),C4)
((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6)
((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ga),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@en),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ast),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@nl),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@de),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@it),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@sq),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@fr),C17)
((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@es),C17)
((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C18)

Recursive Shapes edit

E10 includes recursive shape refs.

<human> EXTRA wdt:P31 {
  wdt:P22 @<human> *;           # father
  wdt:P25 @<human> *;           # mother
  wdt:P3373 @<human> *;         # sibling
  wdt:P26 @<human> *;           # husband/wife
  wdt:P40 @<human> *;           # children
  wdt:P1083 @<human> *;         # relatives

But some British politician ancestries have been tracked back to Adam (through some uncertain/fictional rulers). So if you follow all these links recursively, back and forth, you may pick up a majority of Humans on WD (5-6M). So following such recursion faithfully is suicide, and shex.js does seem to recurse faithfully:

wd:Q2989196@!START
validating http://www.wikidata.org/entity/Q2989196 as //www.wikidata.org/wiki/Special:EntitySchemaText/human:
    validating http://www.wikidata.org/entity/Q3657670:
      validating http://www.wikidata.org/entity/Q2989196:
        validating http://www.wikidata.org/entity/Q4162892:
          validating http://www.wikidata.org/entity/Q35228:
            Missing property: http://www.wikidata.org/prop/direct/P31
  OR
  validating http://www.wikidata.org/entity/Q4162892:
    validating http://www.wikidata.org/entity/Q35228:
      Missing property: http://www.wikidata.org/prop/direct/P3

What we need instead is something like:

<human> EXTRA wdt:P31 {
  wdt:P31 [wd:Q5];
  wdt:P22 @<mini_human> *;           # father
  wdt:P25 @<mini_human> *;           # mother
  wdt:P3373 @<mini_human> *;         # sibling
  wdt:P26 @<mini_human> *;           # husband/wife
  wdt:P40 @<mini_human> *;           # children
  wdt:P1083 @<mini_human> *;         # relatives
  ...
}
<mini_human> EXTRA wdt:P31 {
  wdt:P31 [wd:Q5];
}

So it's really easy for a schema writer to shoot himself in the foot.

Discussion edit

I've thought a lot about shape validation performance and scalability, and I think that fetching entities ad nauseum (esp. through numerous SPARQL queries) can never scale. What we need is for SHEX engines to strictly enforce limits on what's checked about referenced WD entities: basically we need an "existence check" but not full recursive checking.

@EricP, Jelabra: what do you think? --Vladimir Alexiev (talk) 09:41, 9 January 2020 (UTC)Reply

On pyshex: this one works. You have given incorrect input in the sparql= parameter. —MisterSynergy (talk) 11:27, 9 January 2020 (UTC)Reply
On Blaze: @Vladimir Alexiev: we've raised [an issue](https://phabricator.wikimedia.org/T243595) to move validation to a Blaze instance so we're not spending 99% of our time waiting for SPARQL scheduling.  – The preceding unsigned comment was added by EricP (talk • contribs) at 07:50, January 24, 2020‎ (UTC).


shex-simple edit

For the tool at

https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html

Sample link:

https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?data=Endpoint:%20https://query.wikidata.org/sparql&hideData&manifest=[]&textMapIsSparqlQuery&schemaURL=%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE10

is there a way to link the sparql query in the url? (To avoid having to paste it into the query field). --- Jura 01:11, 10 February 2020 (UTC)Reply

I think a URL parameter "shape-map" does the work here: https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?data=Endpoint:%20https://query.wikidata.org/sparql&hideData&manifest=[]&textMapIsSparqlQuery&schemaURL=%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE10&shape-map=SELECT%20?item%20WHERE%20{%20?item%20wdt:P31%20wd:Q5%20}%20LIMIT%205 —MisterSynergy (talk) 07:11, 10 February 2020 (UTC)Reply
Thanks @MisterSynergy:! Is there a way to autorun it? --- Jura 08:17, 10 February 2020 (UTC)Reply
No idea. I meanwhile do not use this shex-simple tool any longer, as it seems to be very basic in functionality and there are other ones available. I like the pyshexy API that is linked about two sections above this one most at this time. --MisterSynergy (talk) 08:39, 10 February 2020 (UTC)Reply
Interesting. I will try to add both to Talk:Q4925477. --- Jura 08:58, 10 February 2020 (UTC)Reply
No idea what your exact plan is, but for your convenience, User:Teester/CheckShex.js might also be worth a try... It adds an input field on item pages where you just need to provide an E-number for an evaluation of that item. --MisterSynergy (talk) 09:08, 10 February 2020 (UTC)Reply
The idea is to provide a link to the check the item against a schema. There are a few other approaches at "This item:" in the list. --- Jura 09:27, 10 February 2020 (UTC)Reply

ShExStatements edit

During Wiki Techstorm 2019 [1], we started exploring simplification for creating shape expressions. One possibility that was explored was to make something like QuickStatements that will take CSV/tabular format as input to generate shape expressions.

ShExStatements is now released: https://github.com/johnsamuelwrites/ShExStatements

The main goal is to help newcomers write shape expressions. The users write a CSV file and ShExStatements will translate it to a shex file.

Take for example, a CSV file concerning a language (with prefixes): https://github.com/johnsamuelwrites/ShExStatements/blob/master/examples/language.csv is translated to a shape expression [2].

There are five columns. Column 1 is used for specifying the node name, 2 for specifying the property value, 3 for one or possible values, 4 is for cardinality (+,*) and column 5 for comments.

Columns 3,4 and 5 are empty for prefixes. Columns 1, 2, 3 are mandatory. Column 3 can be . (to say any value).

Examples related to Wikidata that were used to create some entity Schemas E177, E178, E179 can be found here [3], with some additional examples in [4].

For a detailed documentation, please check [5].

Please let me know if you have any questions/remarks.


  1. https://medium.com/@jsamwrites/wiki-techstorm-2019-a996d69c60a5
  2. https://github.com/johnsamuelwrites/ShExStatements#quick-start
  3. https://github.com/johnsamuelwrites/ShExStatements/tree/master/examples/wikidata
  4. https://github.com/johnsamuelwrites/ShExStatements/tree/master/examples
  5. https://github.com/johnsamuelwrites/ShExStatements/blob/master/docs.md

Experimenting with Bioschemas at Scholia edit

For Scholia, we have begun to explore how to annotate entities using Bioschemas (Q93995803). You can see this in action at taxon profiles like toolforge:scholia/taxon/Q12024, whose HTML now includes the following:

/* BioSchemas annotation */
if (item.claims.P225) {
        try { /* Taxon */
      var taxonName = item.claims.P225[0].mainsnak.datavalue.value;
      bioschemasAnnotation = {
         "@context" : "https://schema.org",
         "@type" : "Taxon" ,
         "name" : taxonName ,
         "url" : "http://www.wikidata.org/entity/Q12024"
      }
      if (item.claims.P105) {
         var taxonRank = item.claims.P105[0].mainsnak.datavalue.value.id;
         bioschemasAnnotation.taxonRank = "http://www.wikidata.org/entity/" + taxonRank ;
      }
      if (item.claims.P171) {
         var parent = item.claims.P171[0].mainsnak.datavalue.value.id;
         bioschemasAnnotation.parentTaxon = "http://www.wikidata.org/entity/" + parent ;
      }
      $( '#bioschemas' ).append( JSON.stringify(bioschemasAnnotation) );
      // console.log(JSON.stringify(bioschemasAnnotation, "", 2))
   } catch(e) {}
}

In the process, we were wondering to what extent such Wikidata-generic annotations could be represented on Wikidata rather than hardcoded on the Scholia end, and are inviting your comments, here or via a currently open pull request for similar annotation of molecular entities. --Daniel Mietchen (talk) 20:24, 11 May 2020 (UTC)Reply

New subpage to document and explore subsets of Wikidata edit

as per Wikidata:WikiProject Schemas/Subsetting. --Daniel Mietchen (talk) 09:51, 4 June 2020 (UTC)Reply

Date-conditional checks edit

  WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Is it possible to have a check for election items that says something like "There should be a successful candidate (P991) unless there's a point in time (P585) that's in the future"? Or would we need to have two separate schemas for "future elections" and "past elections"? --Oravrattas (talk) 13:23, 8 July 2020 (UTC)Reply

@Oravrattas: So far as I understand, the checking process will always report failure for missing fields. I think the solution would be to only have one schema, no checks for special conditions, but there could be a second process which runs either before or after the schema check to exclude items which have certain characteristics. Blue Rasberry (talk) 22:29, 1 August 2020 (UTC)Reply
@Bluerasberry: I might be missing something about your suggestion, but I'm not sure how that would help. I don't want to skip all checks on future elections: I still want to ensure that those have a date, a jurisdiction, an office contested, candidates, the previous election, etc, but there are properties that will only have values once the election has passed, such as the number of votes cast, and the winning candidate. How can I validate that those do not have any value yet if the date of the election is in the future, but should if it's in the past? --Oravrattas (talk) 08:08, 2 August 2020 (UTC)Reply
@Oravrattas: As far as I know, there is no easy option to compare dates. You could maybe use regular expressions for it. If you do so, you do not need multiple shape expressions, however you will need multiple shapes. You can combine shapes via or and and, so <#election> { ... } and (@<#election-past> or @<#election-future>) with the one for future election with a regular expression matching future dates and the past election having successful candidate (P991) and point in time (P585) and that might also have a an expression for past dates so that no future election can have an outcome already. --CamelCaseNick (talk) 20:45, 2 August 2020 (UTC)Reply

EntitySchema labels edit

EntitySchema labels don't seem to be working for me - do they need to be updated to a new format? Sj (talk) 18:56, 30 July 2020 (UTC)Reply

Early test schemas - still in use? What happens when the incrementer gets to 733? edit

There are a few schemas with odd numbers: E734 (E734) (family name), E735 (E735) (given name), E999 (E999) (Borked), E11424 (E11424) (film). Are these actually in use? What was the mechanism of generating those EIDs, do we want to keep it, will the ID incrementing pass over them smoothly? Sj (talk) 18:59, 30 July 2020 (UTC)Reply

@Sj: I see - it looks like at some point, there were 2+ processes for assigning item numbers, and now hopefully there should only be the automated one. You are asking what happens when the canonical current system counts to the item numbers which previous earlier systems assigned.
So far as I know, the current practice is keeping all schemas, even test schemas, regardless of whether anyone uses them. I expect that the current desired practice is for the number assigning process to skip any existing numbers, not write over them. Eventually I suppose we should have inclusion or notability criteria for schemas, or otherwise, anyone could automatically generate countless schemas never to be used. Blue Rasberry (talk) 22:26, 1 August 2020 (UTC)Reply
Thank you kindly. Sj (talk) 15:24, 17 August 2020 (UTC)Reply

Creating schemas for basic concepts edit

(Reposting here:)

The entity schemas E3 (E3) ((Wikidata Item), E5 (E5) (Statement), E6 (E6) (Language mappings), E7 (E7) (Citation), E8 (E8) (External RDF), and E9 (E9) (Wikidata-Wikibase) are all blank.


  • There should be something there, even if it is all comments and optional elements.
  • Is there a quick way to find other blank entity schemas?
  • Was there discussion about this when schemas were being created for the first time?

Sj (talk) 15:28, 17 August 2020 (UTC)Reply

Help in creating Schema for Pokémon species edit

Hi, I don't know how to create a correct Schema for a Pokémon species. Could someone help me? Item QYYY which defines a Pokémon species must have:

Thank you very much for the ones who will help me! --★ → Airon 90 13:01, 5 October 2020 (UTC)Reply

@Airon90: I see that you are active at Wikidata:WikiProject Pokémon, and presumably if you learned this then you would bring the practice back to that WikiProject. I am still learning this myself and I do not know how to help, but I wanted to thank you both for asking the question and doing documentation at that other WikiProject. Pokemon are very important for the history of Wikipedia as the origin of English Wikipedia's notability policy. There is a lot of interest in good Pokemon content everywhere, so I think we should get this right. Blue Rasberry (talk) 14:28, 18 November 2020 (UTC)Reply
So has anyone done this, because it's been like 3 years and I think I can just go ahead and do it if nobody else is. OmegaFallon (talk) 15:54, 10 March 2023 (UTC)Reply
Schema is located at E394, any additions or tweaks are appreciated. I also recommend checking out the constraints on Pokémon index (P1685) as a guideline for the schema. @Airon90 @Bluerasberry OmegaFallon (talk) 16:39, 10 March 2023 (UTC)Reply

Best way to browse schemas? edit

Excuse me if I am missing this. How can I browse schemas?

I want to see schemas for instance of (P31) = human (Q5). Among other things, I am hoping to identify the most common properties among schemas, but I also would like to be able to browse individual schemas. I do not see how to search for schemas around a given theme. Thanks. Blue Rasberry (talk) 14:25, 18 November 2020 (UTC)Reply

The answer is User:HakanIST/EntitySchemaList
Right now there are only 264 schemas. The reason I could not find many is because hardly any exist. All of this is still new. Blue Rasberry (talk) 21:14, 18 November 2020 (UTC)Reply
@Bluerasberry:
Yes, it still quite new and a bit too technical to be widely adopted yet.
That said, you can just use Special:Search and ask for results in the EntitySchema namespace only.
Cdlt, VIGNERON (talk) 14:33, 22 November 2020 (UTC)Reply

lightweight Shapes edit

I think there is need for a lightweight approach. In other words the full Shex specification may be too powerful to be implemented for e.g. a daily check of millions of entities. However, a lightweight approach would reduce the worth of the shape set we already have, unless there is an automatic conversion to the lightweight form. Practically another standard is needed. Comments? --SCIdude (talk) 09:55, 19 December 2020 (UTC)Reply

Seems to require a little more concrete datas than « may be too powerful ». Which concrete shape is a problem ? There is already implementations in the wild, is it really needed to reimplement everything ?
My guess is that complex shape construction are already a bit too hard to write for most people. author  TomT0m / talk page 11:13, 19 December 2020 (UTC)Reply
You mean there are implementations that can handle this? Let's talk a real example: I have a list of 1.6M entities and a list of shapes; the task is to provide, for each shape, a list of items out of the 1.6M that are valid for each shape, and the result should not be older than 24 hours since the full WD dump download. Bonus if it can be done on a quad-core desktop with 32G of RAM. --SCIdude (talk) 16:32, 19 December 2020 (UTC)Reply

Complex constraints edit

Are constraint such as the constraint complex written on neutron number (P1148)   intended to be used on isotope items, that checks that the number used as values are the sum of two property values expressable in ShEx ? I see this is in principle possible with Shacl.

So now we have schemas, are these {{Complex constraint}}s better written as schemas ? Is there reports in the same fashion that could report errors ? author  TomT0m / talk page 14:42, 25 November 2021 (UTC)Reply

Redirect project schema request page edit

I think we should redirect Wikidata:WikiProject_Schemas/Request an EntitySchema to Wikidata:Schema proposals to be the future canonical place for requests. If there are no objections in the next few days I'll go ahead and do it (supporting votes also welcome). --SilentSpike (talk) 22:07, 30 December 2021 (UTC)Reply

I think we should not do this because the Request a Schema page was specifically set up for people to ask for help in creating a schema if they are not proficient in ShEx. This is similar to the Request a Query page we have and has been requested in several discussions about how schemas are too hard for people. LydiaPintscher (talk) 11:58, 31 December 2021 (UTC)Reply
That being said the page for sure could use some love and attention. LydiaPintscher (talk) 11:59, 31 December 2021 (UTC)Reply
Ah I see now the distinction between proposal and request. In that case I retract my suggestion above. SilentSpike (talk) 14:01, 31 December 2021 (UTC)Reply

Thoughts about ShEx integration in Wikidata and OpenRefine edit

Not being aware of this WikiProject I posted some thoughts and questions over at Wikidata talk:Schemas, maybe it is of interest to people here. − Pintoch (talk) 15:15, 23 July 2022 (UTC)Reply

Items for EntitySchemas? edit

hello!

 I am missing some way of SPARQL query entity schemas (e.g. search all schemas that use a property etc)

Should we have Wikidata items for each schema?

That way we can also track schemas on focus lists of Wikiprojects, and better categorize them, alongside all the other cool linked data stuff we love. TiagoLubiana (talk) 14:43, 19 October 2022 (UTC)Reply

I also am trying to put together a schema that will check a collection of items for validity. None of the examples have a query that produces the target set of items. Are there schemas that have a good target set and how does one set up a schema with a query to get a set of violations of the schema in a way that can be used to help editors that do not have knowledge of the schema language?
My goal is to have a schema that reports violations of the intended meaning of is metaclass for (P8225). Peter F. Patel-Schneider (talk) 07:52, 15 August 2023 (UTC)Reply

Data Modelling Days, online gathering, November 30 - December 2, 2023 edit

Hello all,

Following the past events dedicated to data quality and data reuse, the Wikidata team wanted to host a new gathering dedicated to data modelling.

The Data Modelling Days will take place online over three days and will host a variety of discussions, workshops and practical sessions on the topics of Wikidata ontologies, EntitySchemas, modelling issues and various other challenges.

The event is open to everyone, regardless of your experience with modelling data on Wikidata. We particularly encourage people who are working on specific topics to join the event and present their modelling challenges.

If you know people or groups who are already discussing modelling issues on Wikidata, or would have something interesting to contribute, please share this message with them!

You can find more information on the dedicated page, sign up and let us know what you are interested in, you can already propose discussions and workshops on the talk page until November 19th.

If you cannot attend, don’t worry, most sessions will be recorded, notes will be taken and slides will be shared.

We are looking forward to seeing you and learning more about your modelling challenges during the Data Modelling Days! If you have any questions, feel free to reach out to me. Best, Lea Lacroix (WMDE) (talk) 14:25, 9 October 2023 (UTC)Reply

model schema edit

Is there a model schema that can be used to determine how schemas are supposed to work? A model schema has to be non-trivial, match what is currently in Wikidata, have some utility, and have a good explanation of this utility. Peter F. Patel-Schneider (talk) 21:29, 31 October 2023 (UTC)Reply

Coming up soon: Wikidata Data Modelling Days, online, November 30-December 2 edit

 
Wikidata Data Modelling Days 2023

Hello all,

If you are regularly involved in adding, organizing or reusing data from Wikidata, you certainly encountered some questions or issues related to data modelling: how to describe and structure information in a consistent way on Wikidata. This is a big topic for the community at large, and that's why we will address it together during a 3-days online event, the Data Modelling Days, that will take place next week, on November 30th, December 1st and 2nd.

During this online gathering, we will have lots of discussions on various topics that you can discover in the program: we will talk about Entity Schemas and how they can be useful to improve data quality and consistency on Wikidata, how to model heritage, gender, references or web fiction, the challenges encountered by people reusing Wikidata's data inside and outside the Wikimedia projects, how to model data on a fresh new Wikibase instance, and many other exciting topics.

Aside from attending sessions and joining the discussions, you can also join our Data Modelling Clinic sessions, where you can bring any topic you are working on, ask questions or ask the community for feedback or help. You will find these sessions on each day in the program.

The event is taking place online on the video conference platform Jitsi, it is free, no registration needed (although you are invited to add your name to the participants list). Most sessions will be recorded in video and have collaborative notes, and we will publish a list of outcomes and next steps for each session.

We are hoping to see a lot of you at the event!

If you have any questions, feel free to ask on the talk page or directly by writing to me. Best, Lea Lacroix (WMDE) (talk) 16:02, 24 November 2023 (UTC)Reply

Return to the project page "WikiProject Schemas".