Wikidata:Property proposal/verifiability of property
verifiability of property edit
Originally proposed at Wikidata:Property proposal/Generic
Description | Verifiability of this property, one of "Verified", "Human verifiable", "no value" |
---|---|
Data type | Item |
Domain | property |
Allowed values | "Verified", "Human verifiable", "no value" |
Example 1 | stated in (P248) → "Verified" |
Example 2 | imported from Wikimedia project (P143) → "Human verifiable" |
Example 3 | retrieved (P813) → no value |
Example 4 | reference URL (P854) → "Verified" |
Planned use | Sourcing SPARQL query |
Robot and gadget jobs | no |
Motivation edit
We need a way to enforce that results from Wikidata query service are reliably sourced. Currently there is no simple way to require higher reliability standard in Wikidata query service than "wdt". And this means unsourced statements or statements imported by bots from Wikipedia will appear in query result. Simply requiring a source be provided doesn't work, because it can be imported from Wikimedia project (P143) Wikipedia (Q52). Requiring any sources other than imported from Wikimedia project (P143) doesn't work either, because imported from Wikimedia project (P143) can be accompanied by retrieved (P813), and retrieved (P813) triple will be matched by SPARQL software as a valid source. Not to mention we are starting to have new identicals of imported from Wikimedia project (P143), such as Wikimedia import URL (P4656) and inferred from (P3452).
Using this new property, we can write SPARQL query that require the data be properly sourced. Example:
SELECT ?item ?value ?reliablesource ?reference WHERE {
?item p:P31 [ps:P31 ?value;
a wikibase:BestRank;
prov:wasDerivedFrom [?reliablereference ?reference]].
?reliablesource <http://example.com/verifiability> <http://example.com/Verified>;
wikibase:reference ?reliablereference.
}
Alternatively, we can link the new items "Verified" or "Human verifiable" to properties using existing properties, for example instance of (P31) or has characteristic (P1552). But using a new property we can even define a hierarchy of reliability. For example if statement "Verified" next lower rank (P3729) "Human verifiable" exists, we can write the following SPARQL query to require the result having at least one source even if it is linked to Wikipedia (Q52), but do not include unsourced statements or bogus sources, for example a reference with only retrieved (P813):
SELECT ?item ?value ?reliablesource ?reference WHERE {
?item p:P31 [ps:P31 ?value;
a wikibase:BestRank;
prov:wasDerivedFrom [?reliablereference ?reference]].
?reliablesource <http://example.com/verifiability>/wdt:P3729* <http://example.com/HumanVerifiable>;
wikibase:reference ?reliablereference.
}
Midleading (talk) 12:28, 10 December 2019 (UTC)
Data Model: There are multiple ways to model source reliability. We should decide which one we want to use.
- "Generic reliable source" > "Unreliable source", as proposed above.
- "Reliable source from trusted organizations and governments" > "Generic reliable source" > "Unreliable source". Defines a multi-level reliability hierarchy. More levels can be added if necessary.
- "Reliable source of science", "Reliable source of law", "Reliable source of medicine", "Reliable source of biblography", "Generic reliable source", "Unreliable source". Defines reliability by area.
- "Primary source", "Secondary source", "Tertiary source", "Generic reliable source", "Unreliable source".
Also we need to discuss the criterion used to define this property, is it about verifiability or is it about reliability? Midleading (talk) 08:06, 11 December 2019 (UTC)
Discussion edit
- Support, seems interesting. Nomen ad hoc (talk) 18:29, 10 December 2019 (UTC).
- Support I think it would be very useful. --Tinker Bell ★ ♥ 20:28, 10 December 2019 (UTC)
- Support David (talk) 07:01, 11 December 2019 (UTC)
- Comment doesn't reliability/verifiability attach to the source, not the way in which it is referenced (i.e. why is this a property to be attached to reference properties?) I can link just about anything I want with reference URL (P854), that doesn't mean it's either reliable or verifiable! (for example if the website domain has disappeared it can probably no longer be verified...) This doesn't feel like the right approach to modeling this. To identify types of reference properties it would probably be better to just subclass Wikidata property to indicate a source (Q18608359), no? ArthurPSmith (talk) 20:03, 11 December 2019 (UTC)
- Using a subclass technically works, but the new items used to specify property verifiability are subclass of Wikimedia internal item (Q17442446), as they are specific to Wikidata. The properties themselves are subclass Wikidata property to indicate a source (Q18608359) but not indirect subclass of Wikimedia internal item (Q17442446). Also, this new property uses parent relationship next higher rank (P3730)next lower rank (P3729) which is different from subclass of (P279). This property doesn't indicate the results are indeed reliable, the statement can still be incorrect or vandalized, but the reference is listed in query result which you can verify by consulting sources. Without this property, you will be overwhelmed by huge list of values without means to verify.--Midleading (talk) 02:25, 12 December 2019 (UTC)
- Properties are specific to Wikidata too, I'm not sure what your point is there. Here's specifically what I'm suggesting: create new items like "Wikidata property to indicate a verified source", "Wikidata property to indicate a human-verifiable source" as subclasses of Wikidata property to indicate a source (Q18608359), and make stated in (P248) an instance of the first, imported from Wikimedia project (P143) an instance of the second, and leave retrieved (P813) as it is. You can relate the new items with next lower rank (P3729) etc. relationships if you like, and a modified version of your SPARQL will work the same way, but using the subclass relation instead of a new property. ArthurPSmith (talk) 18:25, 12 December 2019 (UTC)
- I don't think properties are specific to Wikidata because many of Wikidata property to indicate a source (Q18608359) have an external equivalent property, for example publication date (P577) = schema:datePublished . Making them an instance of Wikimedia internal item (Q17442446) is just like saying Universe (Q1) is instance of "Wikipedia article". I'm fine with using subclass instead of a new property, provided we maintain the ontology stable so that we aren't forced to use ugly/slow "?reliablesource wdt:P31/(wdt:P279|wdt:P3729)* wd:Q123456789" and/or VALUES statement in next years, and the new subclass can't be used to infer the properties are Wikimedia-only.--Midleading (talk) 05:29, 13 December 2019 (UTC)
- They're already listed as instances of Wikidata property to indicate a source (Q18608359) so whether or not you consider them Wikimedia-only, making them instances of subclasses of that won't change anything in that respect. Also have many existing items that were created for purposes like this and should remain stable - for example all the property constraint-related items. ArthurPSmith (talk) 18:47, 13 December 2019 (UTC)
- I don't think properties are specific to Wikidata because many of Wikidata property to indicate a source (Q18608359) have an external equivalent property, for example publication date (P577) = schema:datePublished . Making them an instance of Wikimedia internal item (Q17442446) is just like saying Universe (Q1) is instance of "Wikipedia article". I'm fine with using subclass instead of a new property, provided we maintain the ontology stable so that we aren't forced to use ugly/slow "?reliablesource wdt:P31/(wdt:P279|wdt:P3729)* wd:Q123456789" and/or VALUES statement in next years, and the new subclass can't be used to infer the properties are Wikimedia-only.--Midleading (talk) 05:29, 13 December 2019 (UTC)
- Properties are specific to Wikidata too, I'm not sure what your point is there. Here's specifically what I'm suggesting: create new items like "Wikidata property to indicate a verified source", "Wikidata property to indicate a human-verifiable source" as subclasses of Wikidata property to indicate a source (Q18608359), and make stated in (P248) an instance of the first, imported from Wikimedia project (P143) an instance of the second, and leave retrieved (P813) as it is. You can relate the new items with next lower rank (P3729) etc. relationships if you like, and a modified version of your SPARQL will work the same way, but using the subclass relation instead of a new property. ArthurPSmith (talk) 18:25, 12 December 2019 (UTC)
- Support --Trade (talk) 22:33, 14 December 2019 (UTC)
- Depending on the source "stated in" can have different meanings. Sources for which we use "stated in" can be of very different quality. I do understand the desire to be able to query better but the approach that ArthurPSmith proposed seems to be more effective for that purpose and will take less toll on our query service, so I Oppose the proposal as it stands. ChristianKl ❪✉❫ 07:54, 18 December 2019 (UTC)
- If there is concensus to create the two new items and use instance of (P31) instead, can somebody create the two new items, link them to the examples given and close this property proposal?--Midleading (talk) 15:24, 21 December 2019 (UTC)
- Comment So is this accepted or rejected please? I don't know whether to use the above queries or those I have been given at https://www.wikidata.org/wiki/Wikidata:Request_a_query#Is_it_possible_to_return_only_referenced_results? Chidgk1 (talk) 11:21, 23 July 2020 (UTC)
- With living people protection class (P8274) created why can't we have this property or at least the new items as well? --Midleading (talk) 15:16, 30 July 2020 (UTC)
- @Midleading: That property exist to express a relationship that's defined by policy. This policy wants to say things about what sources are reliable in the absence of policy. A datamodel of how we think about source reliability should be codified in policy and RfC approved. ChristianKl ❪✉❫ 11:23, 4 September 2020 (UTC)
- Wikidata:Living_people exists because we protect the rights of living people. If an user violates that policy, he or she might be warned or blocked. But this one? I may violate this "reliable sources" policy, if defined, any time. Nobody would be blocked. I don't like such a useless policy being codified. If such a policy would be defined, it is the statement in the reference that matters, not the property itself. --Midleading (talk) 05:32, 5 September 2020 (UTC)
- @Midleading: That property exist to express a relationship that's defined by policy. This policy wants to say things about what sources are reliable in the absence of policy. A datamodel of how we think about source reliability should be codified in policy and RfC approved. ChristianKl ❪✉❫ 11:23, 4 September 2020 (UTC)
- Oppose Bogus references with retrieved (P813) only should be removed. One could still exclude references with imported from Wikimedia project (P143). --Haansn08 (talk) 15:35, 30 August 2020 (UTC)
- As explained from the start, this doesn't work because 1. there are references which combine imported from Wikimedia project (P143) and retrieved (P813) 2. there is other imported from Wikimedia project (P143) equivalent, for example, Wikimedia import URL (P4656). --Midleading (talk) 05:32, 5 September 2020 (UTC)
- Oppose I don't get the benefits from this proposal tbh. There should be no credibility differences between bot edits and human edits. If there are low quality bot edits, request the bot to be blocked. If you do not want to use information imported from Wikimedia projects, filter for imported from Wikimedia project (P143). -- Dr.üsenfieber (talk) 09:10, 12 September 2020 (UTC)
- Oppose Filter out statements which are backed only by reference properties which you don't find reliable. If needed, it would also be possible to make reference properties instance of (P31) {some type of classification system for reference properties} if there is a reasonable classification system in existence. --Dhx1 (talk) 14:03, 21 December 2020 (UTC)
- Oppose as per Dr.üsenfieber. --FocalPoint (talk) 09:14, 14 February 2021 (UTC)
Not done Clear lack of consensus. JesseW (talk) 02:34, 12 March 2021 (UTC)