Property talk:P214/Archive 2

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Recent synchronisation

Hi all! I want to report that @Bargioni: has just ended his huge work of synchronisation between VIAF clusters and Wikidata: in all the cases where a VIAF cluster linked to a Wikidata item, a statement P214 has been added to the item. The synchronisation, based on the VIAF dump 2019-11-04, involved ~570k items; the number of P214 statements on Wikidata grew from ~1.5M to ~2M.

Of course VIAF contains some errors in its clusters, it is a known problem. If you find problematic additions (e.g. the VIAF cluster mixes two or more different subjects, or links to the wrong Wikidata item), please remove the P214 statement and also report the problem in the page Wikidata:WikiProject Authority control/VIAF errors, which is intended to contain all confused VIAF clusters: it is really important to let VIAF know which clusters contain errors!

For any doubt or suggestion about VIAF synchronisation, please write here. Thank you very much and thanks again to Bargioni! --Epìdosis 23:33, 22 November 2019 (UTC)

It's known that VIAF monitors the duplicates we flag, but just in case they're not working on it right now it may be worth giving them a heads up. (I see that the database report is not ready yet, may need to wait a few days.) Nemo 14:13, 23 November 2019 (UTC)
The new report is out and shows an increase of 10-15k for both unique value and single value constraints. Those could be good candidates for a targeted reconciliation effort. Nemo 14:51, 27 November 2019 (UTC)

Not very happy with this import, to be honest. VIAF performs a lot of automatic mix'n'match of their clusters to our items based on "same name and year of birth" (and apparently sometimes only "same name"). I went through the human items on my watchlist a while ago (some thousand sportspersons with some hundred VIAF identifiers): it turned out that around 10% of VIAF identifiers were wrong, and the reason for that horrendous error rate was in most cases related to their poor mix'n'match process (which was originally done for enwiki articles and then imported to Wikidata). The recent import also generated a lot of obviously wrong matches on first impression, but there were simply way too many imports on my watchlist to check them all; however, if I extrapolate the 10% error rate from my previous observation, we probably have some 60k wrong matches imported. I have doubts that this will ever be repaired.
On a side note: mind that VIAF also rearranges matches on their side based on Wikidata P214 claims; this roughly happens once a month, according to the cluster history on their website. Best practice according to my experience is to move a wrongly matched identifier to another (possibly new) item. Unfortunately, VIAF sometimes fucks up the new situation on their side as well, by sort of re-defining their clusters in an inappropriate direction. —MisterSynergy (talk) 11:38, 26 November 2019 (UTC)

Another thing: for manual curation of the import it would be cool if the imported raw data could be made available somewhere (Toolforge or so) as a plain text file. Just linewise pairs "QID<TAB>imported VIAF ID" would be sufficient. Is this possible? (@Bargioni) —MisterSynergy (talk) 12:05, 26 November 2019 (UTC)
@MisterSynergy: My Toolforge membership request is pending (Nov 27): https://toolsadmin.wikimedia.org/tools/membership/status/661. Do I have to wait more days? Thx. --Bargioni (talk) 21:56, 5 December 2019 (UTC)
Looking at this page … it seems they have not approved any membership requests since Nov 22, thus I would wait just some more days. This is not extremely urgent in my opinion, we should just make sure that it does not get forgotten completely. —MisterSynergy (talk) 22:02, 5 December 2019 (UTC)

Cleaning up distinct value constraints from VIAF

  • Given recent complaints about data quality of Wikidata normdata I took a look at the distinct value constraint violations of VIAF ID (P214). With ~15,000 values, there are a lot of violations.
One pattern I found was the pattern between Axel Weber (Q4830374) and Axel A. Weber (Q64689) that both use the VIAF-ID 10944523. Fortunately, VIAF made the decision of linking back to Wikidata and declaring Axel Weber (Q4830374) to be the right person. I think that pattern appears for a bunch of the distinct value constraint violations and thus there would be room for running a bot that cleans up a large number of violations. ChristianKl11:40, 18 June 2020 (UTC)
Be careful with linking back to WikiData. I'm afraid this is a automated procedure, which might go wrong automated. Adding another bot-script to clean up will for sure take out the constraints, but the end result is only correct when we can be sure that VIAF has the correct back-link, not just the first/last/random link. Edoderoo (talk) 12:38, 18 June 2020 (UTC)
@ChristianKl: Very interesting discussion. If there are no objections, I will move it to Property talk:P214 leaving here a link, in order not to disperse it. Could you link the discussion where there have been complains about the quality of Wikidata normdata? I agree that 15k violations are a lot and that they should be cleaned (I'm working on it manually in these weeks). However, links of Wikidata to Wikidata are often wrong: I can firstly cite all these cases of VIAF clusters linking to two Wikidata items, in one case correctly and in the other case incorrectly; moreover, non-personal clusters (especially place and organization clusters) are a real mess - all reports are collected in Wikidata:VIAF/cluster/conflating entities. Some hundreds of violations are due to administrative units (see User talk:Bargioni#VIAF - errori geografici): VIAF often links to Wikidata item for the main village of an administrative unit instead to the administrative unit as a whole (e.g. Haaren (Q9840) vs. Haaren (Q2309125) or San Esteban de Nogales (Q141191) vs. San Esteban de Nogales (Q24013115)), in these cases I would agree removing VIAF by bot from the village and leaving it in the administrative unit. In general, I would prefer having an overview of these 15k violations before running a bot on the bases of VIAF links to Wikidata. --Epìdosis 12:52, 18 June 2020 (UTC)
Sadly, the back links are rather unfortunate as they are often wrong. E.g. this VIAF-Entry links to the actor Nam Yoon-su (Q65159194). I removed the VIAF link there as VIAF actually describes the researcher of the same name. However, a bot might re-add the wrong VIAF link. A way to solve it could be to make a wikidata object for the researcher and place the VIAF ID there, hoping that VIAF links back to that Wikidata entry when they renew their cluster. --Christian140 (talk) 12:57, 18 June 2020 (UTC)
Okay, it seems the data quality of VIAF is less then I would have wished for. @Epìdosis: feel free to move the discussion. ChristianKl13:29, 18 June 2020 (UTC)
Moved from https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=1210302862#Cleaning_up_distinct_value_constraints_from_VIAF. --Epìdosis 14:41, 18 June 2020 (UTC)

How do we deal with the distinct value constraint violations between municipalities and human settlements?

Of the 33159 current distinct value constraint violations, many of the current violations are due to us having imported via CebWiki Geonames distinction between human settlements and the municipalities in which they are located while applying the same VIAF ID to both. There are multiple ways we could go about it, but I think we should find a way to remove those from the list.

  Notified participants of WikiProject property constraints ChristianKl14:23, 20 June 2020 (UTC)

@ChristianKl: In Canada, we can add the places with names in multiple languages, like Lake Winnipegosis (Q934638) or Saskatchewan River (Q3047) and interprovincial feature, like Ottawa River (Q60974). Maybe a solution is to add separator (P4155), like Canadian Register of Historic Places ID (P477)? --Fralambert (talk) 14:49, 20 June 2020 (UTC)
The problem with separator (P4155) Canadian Register of Historic Places ID (P477) is that it would also filter out mistakes that are made when two distinct towns in different locations get mixed up and have the same VIAF ID. ChristianKl18:21, 20 June 2020 (UTC)
I'm not really convinced by the usefulness of these VIAF clusters, so I generally don't add them. --- Jura 14:53, 20 June 2020 (UTC)

Removal of redirected and deleted IDs

Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Sic19 (talk) 20:42, 12 July 2017 (UTC) Wikidelo (talk) 21:15, 8 May 2018 (UTC) ArthurPSmith (talk) 19:52, 22 August 2018 (UTC) PKM (talk) 19:40, 23 August 2018 (UTC) Ettorerizza (talk) 06:44, 8 October 2018 (UTC) Fuzheado (talk) 03:47, 19 December 2018 (UTC) Daniel Mietchen (talk) 16:30, 7 April 2019 (UTC) Iwan.Aucamp (talk) 21:48, 3 October 2019 (UTC) Epìdosis (talk) 23:49, 22 November 2019 (UTC) Sotho Tal Ker (talk) 00:52, 1 May 2020 (UTC) Bargioni (talk) 09:48, 02 May 2020 (UTC) Carlobia (talk) 14:34, 11 May 2020 (UTC) Pablo Busatto (talk) 03:22, 23 June 2020 (UTC) Matlin (talk) 10:53, 6 July 2020 (UTC) Msuicat (talk) 21:57, 27 August 2020 (UTC) Uomovariabile (talk) 10:04, 27 October 2020 (UTC) Silva Selva (talk) 17:21, 30 November 2020 (UTC) 1-Byte (talk) 15:52, 14 December 2020 (UTC) Alessandra.Moi (talk) 17:26, 16 February 2021 (UTC) CamelCaseNick (talk) 21:20, 20 February 2021 (UTC) Songceci (talk) 18:45, 24 February 2021 (UTC)]] moz (talk) 10:48, 8 March 2021 (UTC) AhavaCohen (talk) 14:41, 11 March 2021 (UTC) Kolja21 (talk) 17:37, 13 March 2021 (UTC) RShigapov (talk) 14:34, 19 September 2021 (UTC) Jason.nlw (talk) 15:15, 30 September 2021 (UTC) MasterRus21thCentury (talk) 20:22, 18 October 2021 (UTC) Newt713 (talk) 08:42, 13 March 2022 (UTC) Pierre Tribhou (talk) 08:00, 20 March 2022 (UTC) Powerek38 (talk) 17:21, 14 April 2022 (UTC) Ahatd (talk) 08:34, 4 August 2022 (UTC) JordanTimothyJames (talk) 00:54, 31 August 2022 (UTC) --Silviafanti (talk) 17:07, 14 September 2022 (UTC) Back ache (talk) 02:03, 1 November 2022 (UTC) AfricanLibrarian (talk) M.roszkowski (talk) 10:44, 4 January 2023 (UTC) Rhagfyr (talk) 19:36, 9 January 2023 (UTC) — Haseeb (talk) 13:10, 4 August 2023 (UTC) 13:26, 15 November 2023 (UTC)

  Notified participants of WikiProject Authority control Hi all! Until recently redirected and deleted VIAF clusters have been periodically removed by KrBot maintained by @Ivan A. Krestinin:; recently the bot has been blocked (see here and here) because it has been said that these cluster should be kept deprecating them and adding reason for deprecated rank (P2241)redirection (Q8143062) or reason for deprecated rank (P2241)withdrawn identifier value (Q21441764). I think that, while it can maybe be appropriate to act in the aforementioned way for single authority control IDs (such as GND ID (P227)) - however, there should be a separate discussion regarding this point -, for VIAF it is clearly better to remove redirected and deleted VIAF IDs (quoting Ivan: "VIAF contains huge number of redirects. Several IDs for each item actually. Adding only some deprecated IDs make our data inconsistent. Adding all deprecated IDs will increase amount of data significantly. Adding such IDs to Wikidata makes the data usage more hard and add nothing to our data quality."), as the bots have done in the past years. I open this discussion to reach consensus about this point. If no objections are raised, I will unblock the bot (at least for this task) on the 10th of June. --Epìdosis 09:18, 4 June 2020 (UTC)

I've already replied on this point at Property talk:P227#‎Removal of redirected IDs. Please don't create multiple, duplicate discussions. Post them once, and then post pointers in other places. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:43, 4 June 2020 (UTC)
@Pigsonthewing: I think indeed that it makes sense having two discussions. VIAF only aggregates sources, so doesn't have any information content in itself but only as aggregator, while GND is a source itself, so I think there may be users which would like to keep only GND redirects but not VIAF redirects; obviously there could also be users who would like to delete both and users, like you, who would like to keep both. --Epìdosis 11:40, 4 June 2020 (UTC)
I disagree that these VIAF redirects should exist at Wikidata. From my perspective the reconciliation of such numbers are best handled at source. There is a process where duplicates for ORCiD are registered and they are marked in order to look for inaccurate merges. A scholar should have only one ORCiD, one VIAF. I am really happy that they are merged at source and as long as redirects are handled properly at source we don't have to carry the baggage. Thanks, GerardM (talk) 10:52, 4 June 2020 (UTC)

  Support I support the removal of redirects. --Sotho Tal Ker (talk) 15:19, 7 June 2020 (UTC)

Better to continue in the general RfC: Wikidata:Requests for comment/Handling of stored IDs after they've been deleted or redirected in the external database‎. --Epìdosis 21:41, 8 June 2020 (UTC)

Closing this point

Single value constraint for multiple VIAF or LC NAF identifiers

I'm new, and this is my first discussion. I'm working on my university https://www.wikidata.org/wiki/Q1937387 and getting constraint errors for the Identifiers for VIAF and LC name authority file (NAF). The problem here is that an entity that changes its name gets a new identifier according to library cataloging rules, hence the multiple identifiers in the LC NAF and VIAF. But in wikidata, a new entity is not created for a name change, rather we add the new official name and the start/end times. So my question is, is this something to worry about? Does it need to be resolved? Because every corporate body in the LC NAF and by extension VIAF that changes its name will have multiple identifiers. Blrtg1 (talk) 17:22, 16 July 2020 (UTC)

@Blrtg1: Welcome on Wikidata and thanks for writing here! So, no need to resolve: in these cases you can leave all the VIAF ID (P214) and all the Library of Congress authority ID (P244); just for Library of Congress authority ID (P244), adding the qualifier subject named as (P1810) removes the constraint violation. Ask me or here for any other doubt! --Epìdosis 18:54, 16 July 2020 (UTC)
@Epìdosis:Now I see it! Thanks so much! Blrtg1 (talk) 20:00, 16 July 2020 (UTC)

Remove single value constraint

Is there an advantage of keeping it? --- Jura 12:18, 2 January 2021 (UTC)

I think that having the single value constraint is mainly useful for having constraint-violation reports, and because it indicates that (at least theoretically) there should be one VIAF cluster for each entity. I would keep it as suggested, as it is now. --Epìdosis 12:38, 2 January 2021 (UTC)
In that case we could set it to deprecated rank. This way only the report is generated by KrBot, but it wont show on items. --- Jura 13:52, 2 January 2021 (UTC)
OK,   Weak support for deprecation of the constraint. --Epìdosis 15:13, 2 January 2021 (UTC)
  • I would support removing this constraint. Not everyone reports incorrect clustering of authority records in VIAF, and not all the agencies that create authorities follow the same cataloging rules, so it is inevitable there will be multiple VIAF IDs for some entities. UWashPrincipalCataloger (talk) 22:40, 2 January 2021 (UTC)
  • Maybe it should be mentioned that some of these clusters are permanent (multiple values remain) while many others get merged or deleted (partially maybe also because Wikidata uses them on the same item). subject named as (P1810) is currently set as a differentiator for the single value constraint, but VIAF labels tend to evolve, possibly even more frequently than the period re-clustering.
    My comments are primarily about VIAF used on items for people, but might apply to the other types as well. --- Jura 13:56, 3 January 2021 (UTC)

Not all VIAF IDs have an associated Worldcat ID

User:Sotho Tal Ker recently added a constraint that items with a VIAF ID should also have a Worldcat ID. While most VIAF IDs have an associated Worldcat ID, there are plenty of VIAF IDs that do not. Do we feel that the constraint is still useful? I personally lean toward removing it, but I also know I have edited rather obscure items recently so I want to hear other's opinions. Belteshassar (talk) 17:45, 13 August 2021 (UTC)

In my experience, it seems valid in most cases, although at least a 10% of VIAF IDs effectively don't have a WorldCat Identities ID associated. It can still be a useful reminder to unexperienced users about adding WorldCat, but it's true that it can be sometimes misleading. I'm substantially neutral about removing the constraint, both positions have good reasons. --Epìdosis 18:53, 13 August 2021 (UTC)
Yes, I had noticed that. Items without a corresponding Identities link are usually Works and Expressions, all others (Personal, Corporate & Geopgraphic Names) have one, as far as I have seen. The constraint is only a suggestion to help fill missing links (superfluous for users of User:Bargioni/moreIdentifiers). If you think that it is misleading, feel free to remove it. --Sotho Tal Ker (talk) 21:07, 13 August 2021 (UTC)
I think it is fine to let the constraint be. I wasn't sure how common this is so wanted to bring it up for discussion, but I don't have a strong opinion. Belteshassar (talk) 19:35, 14 August 2021 (UTC)
Return to "P214/Archive 2" page.