Topic on User talk:Emu

Suggestions for importing IDs from VIAF

7
Epìdosis (talkcontribs)

Hi (and also @Kolja21:! In the last about two months I have started a massive cleaning of unique-value constraint violations for many authority IDs, mainly on human items, through Listeria lists; in the coordination page, Wikidata:WikiProject Duplicates/VIAF members, I have added the lists I'm using (I am still far from finishing, despite having solved at least 3.5k cases) and some brief suggestions for dealing with these cases.

Now I have also collected some best practices as reference for future imports of IDs from VIAF, so that we have a good starting point for future discussions (there will be some, surely); of course the table can be enlarged, so I would like you to add other lines if you have other ideas of possible good filters for reducing the error percentage :)

Kolja21 (talkcontribs)

@Epidos: Thanks for your great work! I don't know how this fits to the list but there is one problem that should be mentioned: Don't add VIAF as the source for of a statement (year of birth etc.), try to name the original authority ID. This makes checking much easier. There are still persons "born 1950" based on VIAF "born 20th century".

Epìdosis (talkcontribs)

I know well, and in many cases these persons saw 1950 in Google Knowledge Graph and lamented that on Twitter without knowing that it depended on Wikidata (see my last comment in Wikidata:Requests for permissions/Bot/FischBot 8) ... and it also happens, in a fewer cases, with 1900 and 1850 (and very rarely for prior dates).

So I greatly agree with your point, I always say it when I teach in courses and for these reason we didn't apply UseAsRef icon to VIAF ID (P214) (and ISNI (P213)). Although it's not immediately related to "importing IDs from VIAF" (but to "importing statements"), I was happy to add it to the table. Thanks for suggesting!

Emu (talkcontribs)

Thank you for your work, I have already noticed that you corrected some of my errors of the past! This page is great and will hopefully help us move forward towards higher data quality. Special thanks for channeling my BboberBot rants into something productive. A few remarks:

  • Help:Conflation doesn’t really reflect current practice (and somehow contradicts the proposed action on the page), I’m not sure if we should link this essay.
  • This is a controversial issue (Kolja21 and others have a different opinion), but deprecation of wrong identifiers is a somewhat mixed bag. Since VIAF doesn’t seem to understand ranks, this effectively cements wrong VIAF clusters. I know that it’s highly problematic to bow to VIAF’s quirky ways but … well, good clusters are of value as well.
  • different from (P1889) should probably be mentioned for people that are easy to confuse or are indeed conflated in authority files or VIAF:
  • We might want to add additional guidance about filing error reports – error-report URL or e-mail (P10923) is a good start but there’s a lot of tacit knowledge that some day should probably be written down.
  • The same is true for tricks to force VIAF to reevaluate a cluster (e.g. enter books into Wikidata).

--Emu (talk) 22:07, 21 September 2022 (UTC)

Kolja21 (talkcontribs)

@Emu: I totally agree that "deprecation of wrong identifiers is a somewhat mixed bag". That's why I think it's not wise to keep wrong or outdated IDs in general. It depends on the individual case. Often there is not even a source given.

Epìdosis (talkcontribs)

OK, thanks for your remarks!

  • I agree, Help:Conflation (I had read it only quickly) is not of much help and is partly outdated; I see we also have Help:Split an item (which seems OK) and Help:Conflation of two people (which is OK but ... it is good for items which are born as conflations and contain many statements about person X and many statements about person Y; while in most cases the item has 95% statements about person X and only a few about person Y, and usually person Y already has an item, so no need to create neither one new item nor two new items; this case should probably be added to the page). So I would link Help:Split an item as main guidance and I would edit thoroughly Help:Conflation and Help:Conflation of two people, if you agree
  • I perfectly agree that deprecation has issues because some (many?) external data reusers, most importantly VIAF, haven't understood (yet) rank differences; for this reason I try to reduce the use of deprecation to the minimum, removing outdated IDs (+1 Kolja!) and also less useful deprecations (typical case: an ID deprecated with reason for deprecated rank (P2241) applies to other person (Q35773207) can be removed, if it is present in the correct item; if it will be wrongly readded, it will be spotted through the constraint violation); I have added this indication and I could enlarge it (but I wouldn't add a suggestion to remove outdated IDs because the RFC is still ongoing, although a bit stalled)
  • different from (P1889) is greatly useful, of course; added
  • already mentioned error-report URL or e-mail (P10923); we should probably start writing down somewhere our additional tacit knowledge :)
  • right (I've never tried it personally, but I remember that it can work), added
Kolja21 (talkcontribs)

Three addtions (concerning Wikidata in general):

  • Unsourced data: More attention should be paid to sources. Unsourced statements are only a kind of pre-information and can be replaced by a sourced statement. Printed standard works cited in Wikipedia are almost completely absent.
  • Ranks: Not only external data reusers can't read ranks. Maintenance lists (constraint violations) need to be improved.
  • VIAF: Bots and Tools like authority control.js are still not able to read VIAF correctly (IDs with dashes and letters are interpreted wrong).
Reply to "Suggestions for importing IDs from VIAF"