Wikidata talk:Identifiers

Latest comment: 4 years ago by Moebeus in topic Typical repair situations

Classification of identifier properties edit

At WikidataCon we decided that all identifier properties should be classified under Wikidata property for an identifier (Q19847637) (unless we find some counter-examples), so every property with datatype external identifier should be put there. A list of identifier properties not classified under Wikidata property for an identifier (Q19847637) or its subclasses can be queried this way:

SELECT DISTINCT ?p ?pLabel ?pDescription WHERE {
  # Wikidata property with datatype external identifier
  ?p wikibase:propertyType wikibase:ExternalId .
  FILTER NOT EXISTS {
    # Wikidata property for a unique identifier
    ?p wdt:P31/wdt:P279* wd:Q19847637 .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

Where can this constraint best be documented and enforced? -- JakobVoss (talk) 09:30, 29 October 2017 (UTC)Reply

I don't believe we have any constraints that depend on the datatype of a property. Also constraints currently (to the extent they are "enforceable") act on individual statements (though they often depend on the absence or existence of other statements), and not on an item or property as a whole, so I'm not sure we have a mechanism for doing this at all right now. So probably best to just start by documenting this here and possibly on the talk page for example of Wikidata property for an identifier (Q19847637). ArthurPSmith (talk) 12:47, 1 November 2017 (UTC)Reply
I added a Wikidata usage instructions (P2559) at Wikidata property for an identifier (Q19847637) and asked the property creators how to best document and enforce this happens. A listeria page with unclassified properties can further help. -- JakobVoss (talk) 20:44, 1 November 2017 (UTC)Reply
@JakobVoss: Does that decision mean that all properties with external identifiers should have unique value? --Pasleim (talk) 13:09, 1 November 2017 (UTC)Reply
@Pasleim: Actually no, because different notions of "unique value" exist. Actually "unique identifier" is a tautology because all identifiers are unique (otherwise they are just names). Uniqueness as we can measure and enforce it in Wikidata is more a matter of property constraints. Anyway, I would be careful saying anything about uniqueness unless it's very clear what kind of uniqueness is meant. -- JakobVoss (talk) 20:44, 1 November 2017 (UTC)Reply

Typical repair situations edit

The page clearly evolves to something very useful, and I want to make suggestions for more aspects to cover. A typical situation related to identifiers in Wikidata is that they don’t work (any longer) as expected:

  1. An individual identifier is withdrawn by the external authority (--> deprecate it in Wikidata)
  2. The external database mixes different concepts within one database entry, and the identifier now transports this problem to Wikidata (--> probably remove the bad identifier; what claim to place instead (e.g. novalue)? how to notify the external database?)
  3. The URL structure of an external database changes, but the identifiers remain intact (--> update formatter URL in property)
  4. The external database changes its “identifiers”, and optionally the URL structure as well (--> propose for new property; keep the old one including the identifiers in items, but deprecate its formatter URL)
  5. The external database goes offline (--> deprecate formatter URL, but keep identifiers)

I hope that such aspects could somehow be covered on this page. Any thoughts? —MisterSynergy (talk) 13:26, 31 October 2017 (UTC)Reply

We should sort these cases and give examples. 3 is no big deal, 1 and 5 are related, an example of 5 is Dewey Decimal Classification (P1036). A common section to group the questions could be "How to reflect changes in the external identifier system/database". -- JakobVoss (talk) 12:34, 1 November 2017 (UTC)Reply
Yes this could work, but actually I don’t have a preference (or plan) how to implement those ideas. In the past I missed such an overview of “do and don’ts” for identifier maintenance a couple of times, and I think it fits to this page particularly after it went to the Help: namespace. —MisterSynergy (talk) 12:54, 1 November 2017 (UTC)Reply

Deprecate or remove edit

@MisterSynergy, Horcrux, Ivan A. Krestinin: This discussion should be resumed, I also link it in the Project Chat. What should we do with external identifiers when they are either redirected or withdrawn? Should we delete them or should we deprecate them? I've seen deletions like this or this and I'm not sure they are correct. VIAF ID (P214) is one of the most important identifiers, so I think that keeping obsolete values (maybe not redirect ones) is to be considered. Opinions? --Epìdosis 13:24, 15 September 2019 (UTC)Reply

  • I think it depends on the type of identifier and what happen with the identifier:
If the identifier was withdrawn or marked otherwise as invalid, I'd deprecate.
If the identifier is redirecting to another one, I'd set that other identifier to preferred rank (this one would remain with normal rank).
If the identifier is re-assigned, I'd set an end date and set "novalue" or a new identifier with preferred rank.
VIAF clusters are a bit different from the usual identifiers, as they keep re-organizing the components. In this case, I'd continue the current pratice of updating the redirecting identifier.
Without prior discussion, I wouldn't apply the VIAF approach to other identifiers.
BTW, I'm writing "I'd", this to express my personal opinion on what a bot could do automagically, not a user should do manually. --- Jura 13:51, 15 September 2019 (UTC)Reply
I work with music a lot, and have found that certain identifiers (ISWC, ISRC, etc.) are useful to keep even though they now redirect, since they are used in older/outdated (printed) source material. I usually set these items to deprecated rank with reason for deprecation = something, typically redirect. Moebeus (talk) 15:28, 19 September 2019 (UTC)Reply

Authority control identifiers edit

The subclass structure with Wikidata property for authority control (Q18614948) is a great idea to rate the quality of an identifier/database entry (third-party controlled vs. user-generated content such as social media handles, community projects, etc). However, from my understanding this page is not clear enough in which situations a property with externalId data type qualifies for a instance of (P31): Wikidata property for authority control (Q18614948) (or subclass thereof) claim.

MisterSynergy (talk) 13:06, 1 November 2017 (UTC)Reply

I use wikidata-taxonomy and plain SPARQL to get an overview of subclasses and instances of Wikidata property for an identifier (Q19847637). The classification is not so messy in my eyes but I suppose a large number of properties can be classified more precisely (maybe Wikidata property to identify horses (Q26883022) is a bit too precise). A task force would be nice if it helps. -- JakobVoss (talk) 11:15, 2 November 2017 (UTC)Reply
Nice tool, thanks for the hint. Are you sure that all those subclasses of Wikidata property for authority control (Q18614948) are good as well? I don’t see why they should inherit the “authority control” character. Wouldn’t they better be placed as subclasses of Wikidata property for an identifier (Q19847637)? —MisterSynergy (talk) 11:34, 2 November 2017 (UTC)Reply
Frankly speaking I thought about merging Wikidata property for authority control (Q18614948) and Wikidata property for an identifier (Q19847637) because I found no important difference so far. Maybe it also depends on how we define uniqueness. -- JakobVoss (talk) 20:35, 2 November 2017 (UTC)Reply

A SPARQL query gives all identifier properties with their datatype and class. edit

User:JakobVoss - where is that SPARQL? Can you publish it, so that readers can verify your claim? 77.179.33.91 22:52, 20 August 2018 (UTC)Reply

See Help:Identifiers/queries linked as "SPARQL query" -- JakobVoss (talk) 06:14, 21 August 2018 (UTC)Reply

Identifier having datatype = quantity edit

How can a quantity qualify as an identifier? 77.179.33.91 21:20, 20 August 2018 (UTC)Reply

I think IPA number order (BEING DELETED) (P3917) has datatype quantity for historical reasons to enforce integer values. This could also be achieved with datatype external-id and cinstraints. -- JakobVoss (talk) 06:16, 21 August 2018 (UTC)Reply
This obvious error should be rectified. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:13, 15 September 2019 (UTC)Reply

Meta-Constraints on identifier classes edit

How can we systematically manage meta-constraints on identifier classes such as Wikidata property for authority control for academic journals (Q57589544)? There should be a statement that links Wikidata property for authority control for academic journals (Q57589544) to periodical (Q1002697) and generates warnings if properties don't have a corresponding constraint. See this manually crafted query:

SELECT ?property ?propertyLabel ?type ?typeLabel WHERE {
  # identifier for periodical
    ?property wdt:P31/wdt:P279* wd:Q57589544 .
  OPTIONAL { 
  #FILTER NOT EXISTS {
    # instance-of type constraint in periodical
    ?property p:P2302 [
      ps:P2302 wd:Q21503250 ;
      pq:P2309 wd:Q21503252 ;
      pq:P2308 ?type
     ] .
     ?type wdt:P279* wd:Q1002697 .
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en" .}
}
Try it!

-- JakobVoss (talk) 07:41, 29 October 2018 (UTC)Reply

Would shape expressions work for this? --Lydia Pintscher (WMDE) (talk) 15:52, 30 October 2018 (UTC)Reply
Ok, I should finally try out Wikidata:WikiProject ShEx/How to get started?! Storing and managing schemas as Wikidata pages is at the TODO list, isn't it? -- JakobVoss (talk) 20:00, 4 November 2018 (UTC)Reply
Yeah I want to make that happen. --Lydia Pintscher (WMDE) (talk) 13:15, 9 November 2018 (UTC)Reply
Return to the project page "Identifiers".