Wikidata talk:WikiProject property constraints

Active discussions
On this page, old discussions are archived after 60 days. An overview of all archives can be found at this page's archive index. The current archive is located at Wikidata talk:WikiProject property constraints/Archive 2.

Can we better document separator (P4155) and group by (P2304) for property constraintsEdit

When trying to understand our current system it's not clear to me what separator (P4155) and group by (P2304) mean exactly. Does separator (P4155) also work for distinct values constraint (Q21502410). When I first read group by (P2304) I thought it intended to do the function of separator (P4155). Given that's not the case, what is it for? Maybe we can add Wikidata usage instructions (P2559) to the items? Lucas Werkmeister (WMDE)
Jarekt - mostly interested in properties related to Commons
John Samuel
Yair rand
Jon Harald Søby
Was a bee
Peter F. Patel-Schneider
ZI Jony
  Notified participants of WikiProject property constraints ChristianKl❫ 10:53, 22 June 2020 (UTC)

  • group by (P2304) is just for the layout of the constraint reports. --- Jura 10:55, 22 June 2020 (UTC)
Specifically, for the layout of the bot-generated reports – WikibaseQualityConstraints doesn’t use it. --Lucas Werkmeister (WMDE) (talk) 11:49, 22 June 2020 (UTC)
WikibaseQualityConstraints supports separator (P4155) for single value constraint (Q19474404), single best value constraint (Q52060874) and multi-value constraint (Q21510857). --Lucas Werkmeister (WMDE) (talk) 11:49, 22 June 2020 (UTC)

Proposing new constraint: contemporary with given itemEdit

Right now we have contemporary constraint (Q25796498), which allows us to say that an item must have existed at the same time as an item it is linked to through a property. This is great; however, it can only be used with properties which have a data type of Item. There are some other properties, especially External identifier properties, which should only be used on items which are contemporary with some set item. For example, any item which has a USL Championship player ID (P4019) property set should be contemporary with USL Championship (Q1362411). Any person who died before the league was established in 2010 could not have played in the league, and therefore should not have a player ID. Another example is Twitter username (P2002), which should only be applied to items who existed at the same time as Twitter (Q918). There are many other external IDs which link to a source that has a limited time range, and therefore could benefit from a "contemporary with given item" constraint. There are also likely a few properties other than external IDs which could benefit from this (e.g. items with e-mail address (P968) should be contemporary with email (Q9158)).

I don't know what the process is for implementing a new constraint, but what do others think of this proposal? –IagoQnsi (talk) 02:09, 29 July 2020 (UTC)

Distinct best value constraintEdit

The idea is that a given value (e.g. "AI") should be present just once within "best value". "AI" was reassigned.

I added that as complex constraint at Property_talk:P297.

See also Wikidata_talk:WikiProject_property_constraints/Archive_2#Distinct_best_value_constraint?. --- Jura 12:22, 23 September 2020 (UTC)

  • Not sure whethern another similar property constraint would be the best solution here. I tend to like a separator (P4155)-like approach more, similarly as we use it for "single value constraint" and "single best value constraint". In other words: have some qualifier which needs to have distinct values when the identifier (or property in general) cannot have distinct values for whatever reason. This would also require changes to be made to the "distinct values constraint". —MisterSynergy (talk) 12:45, 23 September 2020 (UTC)
  • The statements at Q1450765#P297 and Q25228#P297 have an end or start date-qualifier. I don't think a query with best rank should show both items.
    The nice thing with that identifier is that the dates are known. That might not be the case for air transport related codes that get re-assigned. --- Jura 12:58, 23 September 2020 (UTC)
    • "start date" and "end date" are not necessarily the only possible separators here. One needs to find a solution which fits the situation regarding the problematic property.
      In general, we meanwhile know that a strict single value constraint is too restrictive for most (identifier) properties, thus the "single best value constraint" makes sense—although we unfortunately do not use it very much. The situation with "distinct values" is quite different in my opinion, as I expect only very few properties to potentially use a "distinct best value constraint". —MisterSynergy (talk) 16:59, 23 September 2020 (UTC)
      • I think it's better to rely on ranks than some separator qualifier. Already in the AI sample above, a start and an end date qualifier would have to be compared somehow.
        The question is somewhat different from the "single value constraint".
        To see how many more there are, I created a P31 value: Wikidata property for an identifier value that can be re-assigned (Q99543626). --- Jura 17:47, 23 September 2020 (UTC)

Proposal: move all identifier properties from "single value constraint" to "single best value constraint"Edit

Currently most of our identifier properties use the single value constraint (Q19474404) (4721 identifiers), and only very few the single best value constraint (Q52060874) (37 identifiers); see also this query. This is probably the outcome of two factors:

  1. routine; "single value" is older and much better known by editors than "single best value"
  2. software support; particularly the KrBot covi reports did not support "single best value" for quite some time and it broke the entire report (meanwhile solved)

However, given our current knowledge about identifiers, external databases, duplicated entries and redirecting identifiers in them, I think that single best value constraint (Q52060874) should be the standard constraint for all identifier properties and single value constraint (Q19474404) should only be used if there is a specific reason (and consensus) to do so. I thus think of ways to seek community consensus for a systematic change, probably through an RfC. Before I draft something, I would like to hear opinions on this matter from this smaller, more expert group of editors. Thanks for any input in advance! —MisterSynergy (talk) 17:29, 23 September 2020 (UTC)

  • Maybe we could remove all that have exceptions?
    VIAF has a single value constraint, but now that users try to solve them bottom up it's not entirely clear if it helps them. Certainly, I don't think one should try to determine a "best" value. --- Jura 17:35, 23 September 2020 (UTC)
    • We cannot make all situations immediately solvable just by selecting one of these constraints, particularly if there are more than one profiles in the external database. However, "single best value" gives a better hint how to properly solve multiple value situations using ranks in many scenarios so that we end up with actual duplicated profiles as constraint violations, but no more redirects and so on that we currently often see as violations.
      Some current "single value constraints" definitely need a closer evaluation before they could potentially be upgraded. Defined exceptions are generally not a good idea due to poor scaling (and in my opinion just technical debt from the early times of the constraint violations system). —MisterSynergy (talk) 18:13, 23 September 2020 (UTC)
      • I think the main advantage of the single value constraint is that it prevents bots from adding additional statements without especially overriding them. I don't think this requires defining exceptions for cases where multiple values are present. Frequently there are two valid values and both should be kept. Setting one to preferred eliminates the other from checks and this may not be desirable. It's clear that this can be something new to learn for those tempted to solve "all" constraint violations. --- Jura 18:20, 23 September 2020 (UTC)
        • After some thinking I am convinced that bots are no factor here. The predominant bot framework pywikibot does not have any constraints support included anyways, so it is completely up to the bot operators to handle all of this properly. Tool-wise, we have particularly QuickStatements (which simply does not add an existing value a second time regardless or ranks, but otherwise does not care about the constraint), HarvestTemplates (not sure exactly who it behaves), and OpenRefine (which claims to support both constraints). —MisterSynergy (talk) 20:43, 23 September 2020 (UTC)
          • What other indicator should operators use to check if one or several values are expected?
            Anyways, I don't see how defaulting to "best rank" would solve the problem with cases where once in a while more than one contemporary identifier is possible (this type). --- Jura 09:06, 24 September 2020 (UTC)
Return to the project page "WikiProject property constraints".