Wikidata:Property proposal/exact match

exact match edit

   Done: exact match (P2888) (Talk and documentation)
DescriptionThe property exact match is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. (Tailored around skos:exactMatch (https://www.w3.org/2009/08/skos-reference/skos.html#exactMatch)
Data typeURL
Allowed valuesURI
Example
Motivation

Through the WDQS content from WIkidata can be extracted and integrated with federated queries on external SPARQL endpoints. To allow these queries a property is needed that captures the link between a concept in Wikidata and its URI. The property equivalent class Property:P1709 does exist, however using this property to express similarity is a bit problematic, as by the W3C definition : "NOTE: The use of owl:equivalentClass does not imply class equality." I would like to propose the propety exact match, tailored around skos:exactMatch, to be able to store mappings to external URIs, allowing querying Wikidata content in external SPARQL endpoints. --Andrawaag (talk) 19:46, 25 April 2016 (UTC)[reply]

Discussion
  •   Support since, as stated above, the equivalent class property is not suitable for this kind of use. The skos:exactMatch is a perfect fit for this. Emitraka (talk) 20:32, 25 April 2016 (UTC)[reply]
  •   Support another alternative would be owl:sameAs .. I think generally the skos approach will work in more cases than the OWL approach so I am supportive. But the reason that is the case also means that less powerful automated reasoning is possible so there is a downside.. e.g. numerous OWL reasoners could process sameAs and correctly merge nodes. Is there equivalent support for automating node merges via skos:exactMatch ? --I9606 (talk) 21:22, 25 April 2016 (UTC)[reply]
Skos has the following mapping properties: skos:closeMatch, skos:exactMatch, skos:broadMatch, skos:narrowMatch and skos:relatedMatch. skos:exactMatch, being a sub property of skos:closeMatch and also the transitivity property allowing some reasoning being applied to merge on nodes. --Andrawaag (talk) 12:20, 26 April 2016 (UTC)[reply]
  •   Support Egon Willighagen (talk) 06:26, 27 April 2016 (UTC)[reply]
  •   Support --jjkoehorst (talk) 10:23, 28 April 2016 (UTC)[reply]
  •   WikiProject Ontology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
  •   Comment There was already a proposal to create owl:sameAs (/Archive/15) which concluded in the creation of described at URL (P973). --Pasleim (talk) 21:46, 25 April 2016 (UTC)[reply]
    • Which ended up being schema:sameAs, it seems. I would be good if someone could compare owl:sameAs, schema:sameAs, and skos:exactMatch, especially the difference between the latter two, as the OWL version differs from them. --Srittau (talk) 00:18, 26 April 2016 (UTC)[reply]
      • owl:sameAs implies that both can be interchanged. Take for example a protein in Wikidata. This can be match to for example a record in Uniprot. However, the Wikidata entry contains multiple translations of the preferred label, whereas uniprot is english only. Skos:exactMatch is used to link two concepts indicating a high level of similarity, between the concepts both URI describe, but the properties are not necessarily interchangeable.schema.org:sameAs on the other hand is new to me. But according to its description (https://schema.org/sameAs). "URL of a reference Web page that unambiguously indicates the item's identity.". This seems to be more catered towards human readable website (e.g. schema.org/sameAs seems to be more appropriate to point to human readable websites (http://www.uniprot.org/uniprot/Pxxxxxx) in contrast to the URI needed in a SPARQL query (http://purl.uniprot.org/uniprot/Pxxxxxx). --Andrawaag (talk) 06:44, 27 April 2016 (UTC)[reply]
        • This "implication" means that a reasoner can apply the properties (and values) of one of the items to the properties of the over one isn't it ? Which means imho that the items has not to be exactly similar value by value. He will just make the union of both the facts he knows on both items. author  TomT0m / talk page 18:30, 30 April 2016 (UTC)[reply]
  •   Weak oppose I have a concern this will or could lead to a large number of redundant statements on items - for example almost all external identifiers could be considered (with their FormatterURL's) to be an "exact match" for the item (in the SKOS sense as well as other "same as" senses). And we just moved external identifiers to a separate section to avoid cluttering up the main statements on an item. But if it was made clear this property should only be used for linking between vocabularies represented with SKOS and which do NOT have external identifier properties in wikidata, then I think it could be a good idea to cover representing that data somehow here. But the examples as presently given do not meet this criterion - we have Disease Ontology ID (P699) already for the first which I don't think is a SKOS vocabulary anyway, and UniProt protein ID (P352) for the second. Can you come up with an example where this property really is helpful and not redundant? I will change my oppose to support with better examples. ArthurPSmith (talk) 11:55, 30 April 2016 (UTC)[reply]
    • @ArthurPSmith: Those examples are actually ones that are from actual use cases where such a property would be helpful. Take for example the one from Disease Ontology (http://purl.obolibrary.org/obo/DOID_2841) The literal version of that ID is "DOID:2841", which with the formatter URL resolves to http://disease-ontology.org/term/DOID:2841. The URI in the Disease Ontology is however http://purl.obolibrary.org/obo/DOID_2841. To my knowledge there is currently not a possibility to reproduce that URI as a formatter UR in wikidata, whereas these URIs are used in external (semantic web) resources. The same applies to other resources where the prefix became part of the actual identifier. (e.g. Gene Ontology ID (P686) and Mouse Genome Informatics ID ((P671)).In earlier version of the property descriptions the prefix was added as part of the formatterURL. This has lead to issues since for most if not all users the prefix is an intrinsic part of the identifier, leading to identifiers of the form DOID:DOID:2841. With a property that would capture these official URI, wikidata content would be more easily be integrated through federated queries in external SPARQL endpoints such as sparql.uniprot.org, with a query of the following type:
      • @jerven: Actually if the formatter could take a URL and format an ID in the interface, instead of taking an ID and formatting an URL then this property is not needed. Basically, the addition of this property is hack around that weakness in the formatter.
  SELECT DISTINCT ?wd_url ?wduniprot ?uniprot ?wdLabel ?upLabel 
  WHERE {
  # Wikidata graph
  SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
      ?wd_url wdt:P279 wd:Q8054 ;
                     rdfs:label ?wdLabel ;
                     wdt:P352 ?wduniprot ;
                     wdt:Pxxx ?uniprot .
    FILTER(lang(?proteinLabel) = "en")
  }
  # uniprot graph
  ?uniprot rdfs:label ?upLabel .
  ?uniprot ?p ?o .
 }

--Andrawaag (talk) 20:16, 30 April 2016 (UTC)[reply]

@Andrawaag: well I suppose that's a reasonable argument but I am concerned about the UI implications for wikidata. If we can find examples that are not currently covered by an existing wikidata property I would be happier. Even better if the developers were looking at moving URI properties to a separate group similar to (or along with) the external id's group. Here's a different example - I don't believe we have a wikidata property covering the AGROVOC skos vocabulary from the UN FAO. soil (Q36133) ought to be logically a skos:exactMatch with http://aims.fao.org/aos/agrovoc/c_7156 (for instance)? ArthurPSmith (talk) 14:17, 2 May 2016 (UTC)[reply]
@ArthurPSmith: I am not entirely sure identifiers are skos:exactMatch... part of that discussion was actually about that in many cases, at least currently, they are not. This is because the concept modeled in Wikidata is conceptually similar but different from the remote database. Wikidata is not really clear in this matter, though some properties suggest more than really is the case. But assuming that any identifier is a skos:exactMatch is just going to cause a lot of trouble (factual inconsistencies). Therefore, IMHO, having this explicit statement is helpful, just because it's explicit, rather than implicitly assuming identifiers say something about equivalence. Egon Willighagen (talk) 08:26, 3 May 2016 (UTC)[reply]

@Markus Krötzsch: Maybe you want to comment on this proposal? --Pasleim (talk) 09:02, 3 May 2016 (UTC)[reply]

@ArthurPSmith: I share your concerns about the UI implications for wikidata. However, they are not specific for this proposed property. There are already 32 url properties implemented where the same UI concerns apply. I support a request to move URL properties to a separate group, but preferably outside the scope of this property proposal. Currently. Wikidata has a very powerful SPARQL endpoint, yet embedding Wikidata in the Linked data cloud is challenging. With a property capturing URI's on the same concept, would instantly allow linking to other resources in the linked data cloud. We might even make Wikidata the central node in the cloud. --Andrawaag (talk) 09:09, 3 May 2016 (UTC)[reply]
@Andrawaag, Egon Willighagen: could this be resolved better by a new property for wikidata external-id properties that works like formatter URL but provides a regular-expression transformation of the ID into a Linked Data URI string, perhaps with an additional qualifier on what kind of relationship it should represent on wikidata items (skosExactMatch, CloseMatch, or something else)? It seems to me adding huge numbers of new statements to items just because a ':' should be a '_' and similar concerns is wasteful and likely to lead to trouble (what if the ID is corrected on an item but not the "exact match" URI? Or the two statements are on different items?). ArthurPSmith (talk) 14:57, 3 May 2016 (UTC)[reply]
@ArthurPSmith: I think a main feature is that this property makes the 'additional qualifier' obligatory. I like the expressiveness of your proposal of a mix of, basically, three predicates (qualifier, RDF formatter, and external-id property), but also note that it sounds rather complex (I would support this proposal). This could ultimately replace the current proposal, I guess. However, the current proposal has the advantage that it does not depend on external-id properties, simplifying that part too. Regarding your consistency concerns, I agree, but see that similar with all the consistency issues that already exist; in fact, part of that could be addressed with this proposal. For these, however, I would suggest the established route of consistency testing, as now also done for, for example, ID strings against their expected string format. Egon Willighagen (talk) 15:16, 3 May 2016 (UTC)[reply]
* Ok after some thought I'm changing my view to   Weak support - I think the plan as it stands is imperfect, but doing this at least for now does seem useful in lieu of better solutions. ArthurPSmith (talk) 17:53, 27 May 2016 (UTC)[reply]