Wikidata:WikiProject Reasoning/Rule format

This page is for discussing how rule-like information could best be kept on Wikidata. We could use templates (like for the current constraints, but more general), we could use a simple plain-text syntax, or we could use statements of some special form on Wikidata entity pages. This is not decided yet.

This page is part of the Wikidata:WikiProject Reasoning.

Option 1: Write rules on wiki pages edit

Rules can be written using templates on a page. Explanations and examples for the rule can be recorded on the same page. Discussions could be there as well or on the talk page. The details of the rule format would need to be defined.

General observations
  • Advantage: maximal freedom to design rule presentation in a user-friendly way
  • Advantage: internationalised format; easy to localise via templates
  • Advantage: formal rule and informal documentation on same page
  • Advantage: low cost for getting started with using rules (anybody could just start making some such pages)
  • Disadvantage: hard to extract automatically/parsing templates is work for bots
  • Disadvantage: rules are always slightly free-form, even when using templates
  • Disadvantage: rule editing is not so simple if rules use (nested) templates
  • Issue: statements on property pages cannot link to relevant rule pages easily (maybe encode as string with a gadget that makes links)
discussion
Markus can you do an example? Joe Filceolaire (talk) 02:34, 4 September 2015 (UTC)[reply]
Yes, I can do this. Need to work out the details first though. --Markus Krötzsch (talk) 12:49, 6 September 2015 (UTC)[reply]

Option 2: Encode rules using statements edit

Rules can be described on own item pages, using statements. Discussions and explanations would be on the talk page. Special properties would be needed to define the rules.

General observations
  • Advantage: rules are part of the data, and therefore machine readable
  • Advantage: internationalised format; easy to localise
  • Advantage: familiar format for Wikidata users
  • Disadvantage: not the most natural format for rules (being decomposed into statements); maybe harder to understand
  • Disadvantage: main article pages are primarily machine-readable; documentation for humans only on talk page
  • Disadvantage: requires creating new properties and items already before benefit of rules has been shown
  • Issue: rules might need an extra namespace, to avoid mixing them with regular items (could be technically hard to add another item namespace) (though we don't have an extra namespace for items related to wikipedia related things like Templates so this might not be required.)

Proposal 1 for how to do this edit

The following was originally proposed by Joe Filceolaire on Wikidata:WikiProject_Reasoning/Use_cases and moved here.

Simple rules (constraints for example) can be statements on the property page. More complicated rules will each have a separate item as described below.

The pages for the properties will need a statement to identify which rules apply to that property.

Example of such a statement linking a property to a set of rules:
  • <'rules that apply to this property':'?Rule 1'> ('rules that apply to this property':'?Rule 2')
  • <'rules that apply to this property':'?Rule 3'>
Where 'rules that apply to this property' is a new property with datatype item.
?Rule 1, ?Rule 2, ?Rule 3 are placeholder items for rules.
<property:value> is a claim
(property:value) is a qualifier to the claim
qualifier means AND
additional value means OR
so the rules that apply in this case are ('?Rule 1' AND '?Rule 2' ) OR '?Rule 3'.

Example 1: Rule for a Transitive property edit

Rules expressed via properties using new properties 'IF', 'THEN', and new items '?Foo', '?Bar', '?Baz'

<...> means claim. (...) signifies qualifier. Multiple values means AND.
Note this only works if we have one rule so each rule needs to be on a separate item. Joe Filceolaire (talk) 22:14, 27 August 2015 (UTC)[reply]
Rewritten to be more general:
discussion
OK? Joe Filceolaire (talk) 09:26, 30 August 2015 (UTC)[reply]

Example 2:Make sure the various 'head of government' properties match edit

<IF:?Foo> (head of government (P6):?Bar), (office held by head of state (P1906):?Baz)
<THEN:?Bar> (position held (P39):?Baz)
This translates as "If Foo has head of government Bar and office held by head of state Baz then Bar should have the claim <'position held' : Baz>

Example 3: Like example 2 but with qualifier edit

If the 'office held by head of state' has a time qualifier (it's Baz since some date) then we can write this as
<IF:?Foo> (head of government (P6):?Bar),
<IF:?Foo> (office held by head of state (P1906):?Baz) ('has qualifier':start time (P580))
<THEN:?Bar> (position held (P39):?Baz) ('qualifer greater than or equal to':start time (P580))
meaning If Foo is head of government of Bar and the office of head of state of Bar is Baz (with a 'start date qualifier) THEN Bar has position held of Baz with a 'start date' qualifier more recent than (i.e. greater than) the start date of having Baz as the office of the head of state.
New properties required - 'has qualifier', 'qualifier greater than or equal to', 'qualifier less than or equal to', 'qualifier equal to' all with datatype 'property'.
OK?. Joe Filceolaire (talk) 09:21, 30 August 2015 (UTC)[reply]

Discussion edit

  • I am not sure I fully understand the proposal yet, so I will ask some questions.
    • If you use qualifiers to encode statements in a rule premise, how would you write a rule premise that refers to a qualifier?
      • Just add it as another qualifier using property "has qualifier". See example 3.
        • This would just say that there is some value for this qualifier. Usually, you would want to select a particular value ("people with website accounts on Wikidata") or use a variable that can be copied (e.g., when copying start and end date from premise to conclusion). --Markus Krötzsch (talk) 12:48, 6 September 2015 (UTC)[reply]
          • Well I did propose "qualifier same as" for the case where the start/end dates should be the same. Where the condition depends on the qualifier having a particular value my system does break down. Do we need that? Joe Filceolaire (talk) 23:24, 12 September 2015 (UTC)[reply]
    • How would you write a rule premise that refers to the absence of a qualifier? ("spouse statements without an end date" etc.)
    • I understand you are using special items like ?foo and ?baz as variable names. But how would you encode variable values for properties of types like Time or Quantity?
      • Specify that if a statement has a property with those properties datatypes then another statement should have a qualifier with a value equal to/less than/greater than the first value. See example 3.
        • I don't see how variables like ?bar and ?foo for, say, date type properties would be expressed in this way. For example, lets use the rule: "If A is a twin of B, and A has birth date ?X, then B also has birth date ?X". How do you write this? I also don't think that your proposal works for qualifiers, since it only refers to the property (like "start time" in your example) but not to the exact occurrence of this property. The same property could be a qualifier in several statements. The name of the property is not enough to identify the qualifier. --Markus Krötzsch (talk) 12:48, 6 September 2015 (UTC)[reply]
          • True about qualifiers. For complicated cases you would need multiple items each dealing with a different case/condition. The syntax I proposed above ties qualifiers to particular claims. If a claim has the same qualifier twice then that would be confusing but I don't think I have ever seen that. (Where it could happen we put the value in twice and give each a qualifier which should be ok - the IF statement/test will be applied to both these statements and the THEN statement applied in every case that the IF statement/test is true. Joe Filceolaire (talk) 23:24, 12 September 2015 (UTC)[reply]
    • What is the purpose of the "rules that apply to this property" statements on the property pages? Isn't this like making an "items that use this property" statement? I didn't understand why the AND/OR nesting is relevant here. Shouldn't the rule already be clear about what it applies to? How can you overwrite this?
      • Rules-items are generic rules that can apply to multiple different cases. The "rules that apply to this property" statement tells which of these generic rules apply to this property and lets you combine them in various ways. Because IF claims in a rule are ANDed together therefore you may need to break one rule down into a number of separate rule items then use the "rules that apply to this property" statement to OR these together. Maybe it should be "rules that apply to this entity" instead so we can used it on items as well.
--Markus Krötzsch (talk) 16:24, 3 September 2015 (UTC)[reply]
  • Interesting. Have you considered embedding the rules (or perhaps individual antecedents and consequents) as text strings? It would be nice if WDQ were suitable for that. To work really well, we'd want to extend the WikiBase type set to support that syntax, which would allow for auto-completion and formatting. Bovlb (talk) 00:10, 27 October 2015 (UTC)[reply]
    • This would not be string datatype, but WDQ or SPARQL datatype, with some facilities to write them as strings. WDQ has "claim" or "tree" construction that are structured actually, we could have a JSON representation of this quite easily. Just like RDF has a number of formats. author  TomT0m / talk page 09:04, 27 October 2015 (UTC)[reply]
      • Yes, that's exactly what I meant. Decomposing into atomic statements and using qualifiers seems to work at first, but ends up rather messy and inextensible. The key would be whether we could add such a complex datatype and get autocompletion, syntax checking, and natural language display. I haven't yet found the documentation for adding datatypes to WikiBase. Bovlb (talk) 18:40, 27 October 2015 (UTC)[reply]