Open main menu

Wikidata:Requests for comment/Sort identifier statements on items that are instances of human

An editor has requested the community to provide input on "Sort identifier statements on items that are instances of human" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

ProblemEdit

Currently on a page of a human (instance of Q5) the identifiers look randomly sorted. It could be they are sorted in the order they have been added to the item, but for the reader no "time of addition" is shown. Even if the current sorting mechanism would be transparent to the reader, it would not be related to characteristic of the identifiers, but to editorial processes.

Under current practice editors can the order by removing an ID and then re-adding, as has been done in this example of vandalism.

PrecedenceEdit

Precedence for sorting
Project Things Page dependency of sorting Sort algo Comment Wikidata-Item-Example
Wikidata InstanceOf, SubClassOf page independent sorting sorted first for their importance Q937#claims
Wikidata Wikipedia-Sitelinks page independent sorting sorted by code Q937#sitelinks-wikipedia
other Wikimedia projects external identifiers page independent sorting sorted by importance Template:Authority control (Q3907614) e.g. d:Template:Authority control, species:Template:Authority control, commons:Template:Authority control, en:Template:Authority control have the identifiers in a fixed sorting order;

en:Albert Einstein gives: VIAF: 75121530 LCCN: n79022889 ISNI: 0000 0001 2281 955X GND: 118529579 SELIBR: 184709 SUDOC: 026849186 BNF: cb119016075 (data) BIBSYS: 90053072 ULAN: 500240971 HDS: 28814 MusicBrainz: c98c325e-7277-46e8-8b44-e3517f3e041a MGP: 53269 NLA: 36582360 NDL: 00438728 NCL: 369710 NKC: jn19990002019 ICCU: IT\ICCU\CFIV\035853 BNE: XX834035 SNAC: w63g5cm3

Wikidata external identifiers page dependent sorting time of addition(?) Added here for comparison. Q937#sitelinks

ProposalEdit

Among the IDs one is outstanding: VIAF. VIAF is not only the most used external ID for humans, but also offers a page with links to several internationally relevant libraries and all the VIAF data is free for download. Backlinks to Wikidata exist too. So it would be nice to have VIAF easily available for the reader by listing it first or among the first.

Would it be possible to have the identifiers with the highest number of uses on instances of human (Q5) to be listed first on the item pages? Three options:

  1. all IDs are sorted by overall usage on Q5-items and IDs are listed in that order in the section "Identifiers" - maybe difficult to implement
  2. the top 6 most used IDs come first in the section "Identifiers" (these are all in Wikidata:Database reports/List of properties/Top100)
  3. the single most used ID, VIAF, comes first in the section "Identifiers"

77.180.110.58 16:35, 25 April 2018 (UTC)

StatisticsEdit

Statistics for external identifier usage on instances of human
Property Plain name Property number Statements on instances of Q5
(2018-12-15)
Instances of Q5 having at least one statement
(2018-12-15)
Comment
VIAF ID (P214) VIAF 214 1165258 1154986 Listed at Wikidata:Database reports/List of properties/Top100
ISNI (P213) ISNI 213 957802 950364 Listed at Wikidata:Database reports/List of properties/Top100
Library of Congress authority ID (P244) LC 244 597675 596685 Listed at Wikidata:Database reports/List of properties/Top100
GND ID (P227) GND 227 588080 587202 Listed at Wikidata:Database reports/List of properties/Top100
SUDOC authorities ID (P269) SUDOC 269 390831 389446 Listed at Wikidata:Database reports/List of properties/Top100
NTA ID (P1006) NTA 1006 380783 379486
Bibliothèque nationale de France ID (P268) BNF 268 366985 365604 Listed at Wikidata:Database reports/List of properties/Top100
CBDB ID (P497) CBDB 497 303348 303228
IMDb ID (P345) IMDb 345 279388 278826 Listed at Wikidata:Database reports/List of properties/Top100
Biblioteca Nacional de España ID (P950) BNE 950 127136 126614
SELECT (COUNT(?id) AS ?count_id) (COUNT(DISTINCT(?item)) AS ?count_item)
WHERE {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P213 ?id .  
}

Try it!

DiscussionEdit

  • I'll rather see the list sorted on alphabet, easier to search that way. Sjoerd de Bruin (talk) 09:56, 26 April 2018 (UTC)
    User:Sjoerddebruin that is different from the treatment of instance of (P31) and subclass of (P279) which are listed on top for their importance. Maybe it could be done for all but VIAF or all but the top 6. With alphabetic sorting for all the most important (VIAF) comes somewhere at the end for users of the English-language interface. 78.55.205.137 11:56, 26 April 2018 (UTC)
  • I believe the wikidata UI already has a mechanism for setting up sorting, you would have to specify the properties in order somewhere, but it should be possible to do this. I don't believe it's possible to have it automatically sort by frequency of use, though - that would have to be generated separately and applied to the sort list. Another good simple default might be to order by property P number... ArthurPSmith (talk) 00:22, 27 April 2018 (UTC)
    Display order is defined at MediaWiki:Wikibase-SortedProperties. --Marsupium (talk) 09:39, 27 April 2018 (UTC)
  • I would also prefer alphabetical sorting. If you're interested in VIAFs, you would very quickly get used to seeing VIAF at or near the bottom of the page. The useful thing is consistent placing, rather than necessarily being at the top. Jheald (talk) 07:07, 27 April 2018 (UTC)
    Probably alphabetically by English label, right? --Marsupium (talk) 09:39, 27 April 2018 (UTC)
    Local language, would be some software adjusment. Adding all properties to the mentioned system message seems troublesome. Sjoerd de Bruin (talk) 10:48, 3 May 2018 (UTC)
    BTW: Reasonator displays and sorts by English label. --Marsupium (talk) 00:54, 20 June 2018 (UTC)
  • I would have proposed this much earlier on MediaWiki talk:Wikibase-SortedProperties if I wouldn't think ordering by time of addition has its advantages as well. --Marsupium (talk) 09:39, 27 April 2018 (UTC)
    Marsupium, of course it has benefits, every methods has some benefits. But it is not even transparent to the user. 77.179.112.1 09:01, 3 May 2018 (UTC)
    What advantages? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:11, 3 May 2018 (UTC)
    User:Pigsonthewing - the only one I can see is to see what was added earlier and what was added later. But that relates to editorial processes and I see no precendence for Wikidata showing that on iten pages, e.g. sitelinks or sections (the external ids section itself) are not sorted on the page by time of addition to a page. 77.179.112.1 10:26, 3 May 2018 (UTC)
    One “can see is to see what was added earlier and what was added later” – exactly! Also statements for one property and qualifiers for one statement are ordered by time of addition. --Marsupium (talk) 04:55, 4 May 2018 (UTC)
    Marsupium, but you didn't provide the relevance. It seems to be relevant to you, but is it to readers? Wikipedias don't do it that way. "Also statements for one property" - very ugly too. On person pages the children are not sorted by date of birth, but by time of addition of the statement to Wikidata. Disgusting. 77.179.61.171 21:28, 4 May 2018 (UTC)
    @77.179.61.171: Yes, that both points are right! Now I’d like to have back the times when it was possible to sort the statements manually. --Marsupium (talk) 18:47 and 19:01, 5 May 2018 (UTC)
  • If technically possible, sorting by the most common properties of items with the same instance of (P31)/subclass of (P279) (and breaking ties alphabetically) would be the most intuitive option IMO. Even better, providing the user with the option to choose between different sorting methods (creation order, frequency, alphabetical, P number, curated order from MediaWiki:Wikibase-SortedProperties, etc.) --Waldir (talk) 14:42, 7 May 2018 (UTC)
  • I've two thoughts here -
a) I don't think we should do this just for Q5 items. It would probably be easiest to have a uniform sorting order for all items - of course, as most identifiers are for people, and most people identifiers are only for people, this won't make much of a difference.
b) Any system would be fine as long as they're consistent, but I think a flexible system where we get the order from a list (like we have for normal properties with MediaWiki:Wikibase-SortedProperties) rather than a firm "always sort by most common" or "always sort by label" would be best.
There's a number of identifier properties that naturally group together - for example, the various national biographical dictionaries (ODNB, ADB, DNZB, etc), or the various gallery identifiers for artists. In my own work (UK politicians) there are a handful of common identifiers - P1614, P2015, etc - and it would make a lot of sense to have these always next to each other.
Maybe we could start out with a general rule of "most common IDs first" and then adjust that list so that there's some more natural grouping? New properties can then be tacked on at the end or inserted at a natural position in the sequence. Andrew Gray (talk) 21:50, 7 June 2018 (UTC)
  •   Support Agree to both a) and b). The IDs that are part of VIAF could also form a first group, the six first above at #Statistics are all part of VIAF I think. --Marsupium (talk) 14:44, 16 June 2018 (UTC)
  • Added example where VIAF went down on a page due to vandalism [1] Vandalism was found because the next edit of of that person turned a valid ISNI into an invalid one, catched by tracking category on English Wikipedia (en:Category:Wikipedia articles with faulty authority control identifiers (ISNI)). 2.247.26.14 11:09, 16 June 2018 (UTC)
    @Marsupium: See Q690790, where a vandal managed to move VIAF down, due to ordering by time of last addition. 2.247.26.14 13:53, 16 June 2018 (UTC)
    That's yet another valid point against the current system! :) --Marsupium (talk) 14:46, 16 June 2018 (UTC)
  • I think identifiers should be sorted alphabetically on all items. Importance or usefulness of identifiers is subjective and I don't agree with promoting some external datasets over others by deliberately putting them first. - Nikki (talk) 09:32, 12 July 2018 (UTC)
I may be late about this, but anyway... I assume the preferred sort-order will vary for people from different backgrounds. For an American, searching for the Library of Congress will be as obvious as for a Englishwoman to search for the British Library. A preference for VIAF is only suitable for member-countries, making it unsuitable in the German-speaking area for Austrians, since the Austrian National Library is not a member. Alphabetical order is less confining than the solution suggested currently. Yotwen (talk) 00:23, 11 September 2019 (UTC)

Discussing - using MediaWiki:Wikibase-SortedPropertiesEdit

So far the only technique presented was ordering via MediaWiki:Wikibase-SortedProperties. That means

  1. impossible to sort alphabetically by user-language
  2. impossible to sort based on instance of/subclass of or any other properties of the item where the ID appears
  3. difficult to sort many/all

Some IDs are restricted to a specific domain, e.g. ISBN (books), ISIN (securities), ISAN (audiovisual works), ISWC (musical works), ISNI (person), DOI, PubMed... At Wikidata:Database reports/List of properties/Top100 seventeen occur on more than 500000 item pages:

PubMed ID (P698)	17026971
DOI (P356)	13240130
PMCID (P932)	4108331
GeoNames ID (P1566)	3467183
Global Biodiversity Information Facility ID (P846)	1981754
Encyclopedia of Life ID (P830)	1375495
Freebase ID (P646)	1259903
Entrez Gene ID (P351)	1232579
IRMNG ID (P5055)	1205853
VIAF ID (P214)	1205757
ISNI (P213)	973414
China administrative division code (P442)	742822
GND ID (P227)	618950
Library of Congress authority ID (P244)	580562
IMDb ID (P345)	568573
ITIS TSN (P815)	522258
RefSeq Protein ID (P637)	510276

GND is Germany-centric and LCCN is US-centric but each not restricted to items related to Germany or the US. They will probably see resistance if they appear on top. VIAF on the other hand is a cooperation between various institutions from various countries and can be considered more neutral.

@Marsupium, Waldir, Pigsonthewing, ArthurPSmith, Andrew_Gray, Sjoerddebruin: What do you think about starting with VIAF which should only affect item pages about persons and works. It will not be perfect, but hopefully better than the current order, which is by time of addition of the property to an item, which any vandal can influence by deleting a property, if it is re-added it will be on the end of the list. There are more than 400 new external-ID-systems since 2018-01-03 [2]. 77.0.203.210 17:17, 19 June 2018 (UTC)

  •   Support using MediaWiki:Wikibase-SortedProperties - no real opinions on what sorting order we use, we can figure that out as time goes by and discuss it seperately. For the time being, VIAF or ISNI being first would seem fine to me.
I don't think we really need to worry about specific sorting algorithms by user language - we don't support that for the main properties and they're probably more visible. In practical terms, we won't really need worry about type of item either - the identifiers found on people or species or buildings or books are so completely different that there won't be much overlap and so each one will effectively have its own sorting order, as a subset of the main list. Andrew Gray (talk) 20:00, 19 June 2018 (UTC)
  •   Support for VIAF first. (As sub-part of this discussion also support for ISNI second. I think we could indeed take the first six most used simply in that order: VIAF ID (P214), ISNI (P213), Library of Congress authority ID (P244), GND ID (P227), SUDOC authorities ID (P269), Bibliothèque nationale de France ID (P268).) --Marsupium (talk) 00:47, 20 June 2018 (UTC)
  •   Support VIAF first, per Marsupium and Andrew Gray. Would also support ISNI second. Fine with having the six as listed by Marsupium. But for humans/persons nothing further for now. 77.0.203.210 01:26, 20 June 2018 (UTC)
  •   Neutral I'd still rather see the list sorted on alphabet. Even if it was only sorted on the English label, that would still give a single consistent ordering for all users across all external ID statements on all objects. According to User:Lea Lacroix (WMDE) (diff), it should be possible to make the ID statements sort according to their labels in the user's own language without too much overhead on either client or server-side, however it would take some development work, with no indication of when that might be schedulable. Jheald (talk) 15:59, 20 June 2018 (UTC)
    Jheald, "I'd still rather see the list sorted on alphabet." - but that is not available. Or do you suggest sorting by English using MediaWiki:Wikibase-SortedProperties, i.e. adding 2800+ properties there and update the edit protected page each time a new externalId property is created? Or do you see any other available option? 2.245.56.30 19:08, 20 June 2018 (UTC)
    I wouldn't see a problem in doing that, inflating MediaWiki:Wikibase-SortedProperties.
    Best would be to let the user have the choice … --Marsupium (talk) 20:09, 20 June 2018 (UTC)
  • I would support adding them to MediaWiki:Wikibase-SortedProperties using the English label for now, plus a ticket asking for it to be made possible to sort using the interface language in the future. - Nikki (talk) 09:32, 12 July 2018 (UTC)
    Nikki, the above three supporting votes, refer to start with adding VIAF to MediaWiki:Wikibase-SortedProperties. Could you clarify if you would support that? Note that Andrew wrote : "no real opinions on what sorting order we use, we can figure that out as time goes by and discuss it seperately". Also if only VIAF is added, then there is no sorting. Adding VIAF would just mean to start the process of giving the IDs a common order, an order which cannot be altered by a vandal via removing a property, which when re-added is at the bottom. 78.52.253.226 20:27, 18 August 2018 (UTC)
    I'm opposed to only adding VIAF, see my comment in the previous section. - Nikki (talk) 09:50, 19 August 2018 (UTC)
    Nikki, i.e. you mean instead of trying out if MediaWiki:Wikibase-SortedProperties works on identifiers by adding one element, all current Identifier-properties should be added in one step? Could you compile a list for that? 77.179.6.85 13:01, 19 August 2018 (UTC)
    Yes, it's easy to do with SPARQL, see this query. That also means we can easily update it in the future. - Nikki (talk) 16:16, 19 August 2018 (UTC)
  •   Support for VIAF ID (P214) first. --Epìdosis 11:27, 28 April 2019 (UTC)