Wikidata:Property proposal/numeric ID

numeric ID edit

Originally proposed at Wikidata:Property proposal/Generic

   Not done
Descriptionthe stable numeric identifier for an entry on a website, used as qualifier for potentially unstable human-readable values
Data typeExternal identifier
Allowed valuesAny
Example 1Kadim Al Sahir (Q1362223)SoundCloud ID (P3040)72830036 (used as a qualifier)
Example 2Justin Bieber (Q34086)Genius artist ID (P2373)357 (used as a qualifier)
Example 3Glenn Greenwald (Q5568842)TED speaker ID (P2611)2081 (used as a qualifier)
Example 4Signal (Q19718090)Instagram username (P2003)8725236333 (used as a qualifier)
Planned useEnsure that human-readable identifiers (e.g. usernames/slugs) are still current
Number of IDs in sourceUsually one, but can be more in rare exceptions.
Expected completenesseventually complete (Q21873974)
Robot and gadget jobsBots should be used to initially fetch the numeric ID and update the human-readable username/slug if it changes.

Motivation edit

We use human-readable values (e.g. usernames and slugs) as identifiers. These identifiers are easier to obtain and make it possible to link to the full external website experience. However, those are too often unstable and can change resulting in obsolete statements. We have been trying to solve this issue using website-specific properties such as Genius artist numeric ID (P6351) and Twitter (X) numeric user ID (P6552), but this is limiting and biased towards giant digital monopolies. A more proper approach would be to have a generic property that can be used as a qualifier for any potentially unstable identifier. For example, if the only identifier we have for an account on SoundCloud is the human-readable username (e.g. oum), it will be rendered useless once the username changes and it would be difficult to trace the target. As a solution, we should link both the human-readable identifier and, as a qualifier, the site-specific numeric ID (in the oum example: 553726).OsamaK (talk) 06:57, 7 July 2020 (UTC)[reply]

Discussion edit

@Jura1: I feel if we had to generalize a rule on "human-readable identifiers linked to full-featured website" vs. "stable identifiers accessed via an API or otherwise limited website version", I would choose the former human-friendly version to make contributing far more accessible for users regardless of their technical background. Being a cooperative project, this has to bare an immense value. The stabilizing work should be left to bots.--OsamaK (talk) 08:37, 7 July 2020 (UTC)[reply]
It seems somewhat hard to get Wikibase to change for the above to work.
If the numeric ones are stable, there isn't much input needed once the bot set it correctly. --- Jura 08:41, 7 July 2020 (UTC)[reply]
What kind of change is referred to here? When it comes to social media identifiers for example, we are already using human-friendly identifiers where ever possible. --OsamaK (talk) 08:47, 7 July 2020 (UTC)[reply]
We try(tried) to use stable identifiers. The above samples can only link in your proposal. --- Jura 09:16, 7 July 2020 (UTC)[reply]
It's true that linking cannot be currently implemented with such a generic property. The workaround would be to have a separate property for each stable website identifier or to redefine the existing properties to refer to stable identifiers instead of the human-friendly ones. Both are not practical and our decision making mechanism in Wikidata would fail to get this task done (would take months if not years). I would accept the down side of giving up linking in exchange of one generic property for stable identifiers.--OsamaK (talk) 10:58, 7 July 2020 (UTC)[reply]
  •   Oppose The logic seems backwards here to me. I don't understand why we should prioritise human readable identifiers which are unstable? This is Wikidata - machine readable, structured data. You mention in the motivation that these unstable identifiers become useless as soon as they change, but to me that implies that they are inherently useless as a standalone identifier at all times because with just the unstable identifier alone there is no way to tell if it was ever true/untrue without a stable reference.

    In contrast, with a stable identifier, you know it doesn't change - so you can always access the identifier and check if the subject is correct to know that it always was or wasn't correct. I think it makes more sense to use the stable numeric identifiers (the true identifiers), and capture username data separately using website username or ID (P554). In theory, any relevant human readable string (not a true identifier) can be retrieved by machine via access using the numeric identifier. An extended discussion around this topic also took place at Wikidata:Requests_for_permissions/Bot/SilentSpikeBot. --SilentSpike (talk) 11:35, 10 July 2020 (UTC)[reply]
@SilentSpike, ChristianKl: I see your point. If the stable identifier is easily accessible, it should be adopted. In other cases, however, the logic behind using the human-readable property is to lower the technical bar required to contribute, which is an immensely worthwhile goal. At the end, this is what Wikimedia projects have always been about. Wikidata, specifically, has always been described as "read and edited by both humans and machines." If the baseline was to require people to dig into HTML or some (private?) API, we would only be open for a small minority of contributors. We need anyone to be able to fill-in Twitter or Soundcloud username, with minimal barriers. Sustainability and permanence can be achieved by bots without compromising the mission nor utility.--OsamaK (talk) 12:33, 18 July 2020 (UTC)[reply]