Wikidata talk:VIAF

Add VIAF based on VIAF member edit

@Vladimir Alexiev, Kolja21, Epìdosis: 2416 human items have a GND but no VIAF https://w.wiki/QMA . Sometimes I found that a VIAF was already stored in dewiki. Is there a tool to import VIAF from dewiki? What other ways exist to add VIAF if a VIAF member ID already exists. PS: I know 1) there is some delay, sometimes the member is not yet in VIAF 2) my query includes deprecated for conflation, so not all 2416 can be solved. MrProperLawAndOrder (talk) 12:03, 9 May 2020 (UTC)Reply

@MrProperLawAndOrder: Very interesting. So, importing VIAFs from de:Vorlage:Normdaten is simply possible through this; however, it may be partially unsuccessful because, if de.wiki article has a VIAF but Wikidata hasn't one, probably it happens because the VIAF has been removed from Wikidata, either because it was deleted or because VIAF has reused it for an entity different from the previous one; so, such import could result in reinserting some obsolete VIAF clusters, along with many good VIAF clusters. Importing VIAFs from VIAF based on GND is of course possible, but requires knowledge in extracting this data from VIAF dumps, which I haven't - but it's certainly doable, I will discuss this with @Bargioni: in the next days. --Epìdosis 12:49, 9 May 2020 (UTC)Reply
@Epìdosis: now using the harvesting tool that you mentioned, thank you! @M2k~dewiki: you use it too [1] - is it possible to import VIAF too when you import GND? Two properties at the same time, to avoid extra page loading? MrProperLawAndOrder (talk) 14:51, 9 May 2020 (UTC)Reply
@Mike Peel: your Pi bot creates human items based on Wikipedia articles [2] - if doing it from dewiki, could it also import VIAF and GND? MrProperLawAndOrder (talk) 15:28, 9 May 2020 (UTC)Reply
Possible, but that bit of code is already quite complicated, so I'd prefer not to complicate it further. It sounds like there are other routes to import those values. Thanks. Mike Peel (talk) 15:57, 9 May 2020 (UTC)Reply
Hello @MrProperLawAndOrder: in the last weeks, HarvestTools stopped with the message "loading..." (and never actually imported something) when trying to import the VIAF (and sometimes also the GND). In same cases it is possible to import GND for entries in de:Kategorie:Wikipedia:GND in Wikipedia vorhanden, fehlt jedoch in Wikidata. A similar category does not exist for VIAF, as far as I know. --M2k~dewiki (talk) 15:47, 9 May 2020 (UTC)Reply
During harvesting, it is possible to check for constraint violations (unique key constraint violations and/or format violations), in order to find errors in the de-article (wrong or invalid GND and/or VIAF, syntax errors, ...) or correct GND/VIAF but duplicate/redundant articles about the same person or duplicate wikidata objects, which should be merged. --M2k~dewiki (talk) 15:53, 9 May 2020 (UTC)Reply
@M2k~dewiki: thank you. Loading from category:Mann took quite long, I almost wanted to give up, but finally 10000 pages were loaded. Around 10% lead to a new VIAF in Wikidata. A new de:Kategorie:Wikipedia:VIAF in Wikipedia vorhanden, fehlt jedoch in Wikidata would be a big help in page loading. MrProperLawAndOrder (talk) 16:06, 9 May 2020 (UTC)Reply
@Kolja21, M2k~dewiki: could this category be created in dewiki, probably inside de:Kategorie:Wikipedia:Abweichende Daten auf Wikidata? MrProperLawAndOrder (talk) 15:26, 29 May 2020 (UTC)Reply
@MrProperLawAndOrder: in order to fill this category also the template de:Vorlage:Normdaten would have to be modified, see section Wikdiata-Funktionalitäten. --M2k~dewiki (talk)

Add VIAF from dewiki edit

@Kolja21, M2k~dewiki: there is

could you create

? In de:Vorlage:Normdaten just duplicate the last section and replace P227 with P214 and GND with VIAF?

Otherwise the harvest tool https://tools.wmflabs.org/pltools/harvesttemplates/index.php?htid=161 will not work properly, it needs a category and Kategorie:Mann contains many items so it takes very long to load. And Mann isn't the only category that one can harvest from. MrProperLawAndOrder (talk) 16:23, 31 May 2020 (UTC)Reply

  Support. --Epìdosis 16:34, 31 May 2020 (UTC)Reply
The category has been created, the template has been modified. Also see this discussion. --M2k~dewiki (talk) 16:49, 31 May 2020 (UTC)Reply


Add VIAF from dewiki - 80 errors edit

@Kolja21:

I don't know why for several hours it says 124 for the category but only ~80 errors for the tool. Some of the errors I reviewed but in several cases the best solution might be to create a GND.

MrProperLawAndOrder (talk) 12:15, 2 June 2020 (UTC)Reply

They are part of the de:Kategorie:Wikipedia:GND fehlt. We are working on it ;) --Kolja21 (talk) 12:48, 2 June 2020 (UTC)Reply
@Kolja21: 38, one less :-) MrProperLawAndOrder (talk) 16:15, 2 June 2020 (UTC)Reply
@MrProperLawAndOrder: These cases aren't about single authority files but clusters. In Alina Cojocaru (Q451422) VIAF is marked as "undesirable merger of identities"; no date given. Now we can guess: Is the problem solved? Yes or no. If no: Which of the five authority files are wrong? None of the entries has a year of birth. Happy guessing. Imho we should check these problems in the 2030s. Till then we have enough work to do. I'm concentrate on the sources: The single authority file. VIAF is not a source. It's a help to find sources. --Kolja21 (talk) 17:21, 2 June 2020 (UTC)Reply
@Kolja21: 34 - some can be solved through merging, yet another duplicate create by Pi bot: https://www.wikidata.org/w/index.php?title=Q90692975&action=history . But for some others it would help to improve the VIAF cluster and one step is to improve the VIAF sources - that's why I selected with petscan those that have no GND. Creating a GND here means helping to clean two maintenance categories in dewiki + improving WD. Alina is top in the list, probably conflation between mathematician and ballet dancer. MrProperLawAndOrder (talk) 17:54, 2 June 2020 (UTC)Reply
BTW: Both Alina's are in GND: 2 entries. So again: First we should check the original files than, in a second step, we can try to add and fix VIAF. --Kolja21 (talk) 19:09, 2 June 2020 (UTC)Reply
@Kolja21: "I selected with petscan those that have no GND. Creating a GND here means helping" ... of course, I selected those that have no GND in dewiki according to de:Kategorie:Wikipedia:GND fehlt. If removing an item from there can be done by adding a GND to dewiki which already exists, this is fine. I just want to empty the lists:
  1. 26 elements in https://petscan.wmflabs.org/?psid=16377568
  2. 59 errors in https://tools.wmflabs.org/pltools/harvesttemplates/index.php?htid=642
  3. 106 elements in de:Kategorie:Wikipedia:VIAF in Wikipedia vorhanden, fehlt jedoch in Wikidata
If the VIAF in dewiki is 100% correct, then it should be added, but something prevents the harvest tool to do it, e.g. the VIAF is on another item, so the items should be merged in WD. If it is not 100% correct, e.g. due to conflation, then I don't know what the quality procedure in dewiki would be. MrProperLawAndOrder (talk) 21:32, 2 June 2020 (UTC)Reply
Take one example from your list Andy Hopper (Q92694) = VIAF 38903674. Why? Because it has a link to WD Q38903674. There are 12 more IDs. Do you think anyone is checking all 12 IDs? Very unlikely and often even unpossible. In cases like "Hopper, Andrew‏ (sparse)" (National Library of Australia) you can just make a guess. A bot will import these 12 IDs to Wikidata and maybe someone will find a mistake later. If it is obvious that the VIAF is not 100% correct one should use the REMARK field like you have seen here. --Kolja21 (talk) 23:59, 2 June 2020 (UTC)Reply
@Kolja21: I don't understand why you write this. In the petscan list there are 25 left. Whatever mess is in VIAF, these 25 can get a GND and disappear from GND fehlt and from the petscan list. I do what I can to empty the three places. But adding GND to dewiki or create new GND is not part of my work. I know VIAF has endless errors and that the bots importing data may copy the errors. But as you see below the clean-up process detected several duplicates. Not listed are the cases where dewiki was on the older item. Re REMARK-field - thanks to GND heroes like user:Silewe and user:Kolja21 the usage of that field can be reduced :-)[3]. MrProperLawAndOrder (talk) 05:23, 3 June 2020 (UTC)Reply
Thank you. We trying our best. I just wrote this to underline that comparing one WP article or one WD item with one authority file is a work I can do. Comparing Wikidata with Wikipedia with two or three VIAF clusters containing multiple authority files is exhausting and often totally confusing. --Kolja21 (talk) 11:23, 3 June 2020 (UTC)Reply


Add VIAF from dewiki - 40 errors edit

  1. 24 elements in https://petscan.wmflabs.org/?psid=16377568
  2. 37 errors in https://tools.wmflabs.org/pltools/harvesttemplates/index.php?htid=642
  3. 85 elements in de:Kategorie:Wikipedia:VIAF in Wikipedia vorhanden, fehlt jedoch in Wikidata

@Raymond, M2k~dewiki: the page

shows within in "Folgende Vorlagen werden von diesem Artikel verwendet:" a list of names, with one individual, a Wirtschaftswissenschaftler, inbetween:

  • Joachim Winter (bearbeiten)
  • Joachim Winter (Wirtschaftswissenschaftler) (bearbeiten)
  • Johann Winter (bearbeiten)

the result is that de:Winter (Familienname) is listed at https://tools.wmflabs.org/pltools/harvesttemplates/index.php?htid=642 The page was created and moved 4 days ago, maybe a single edit solves the issue. MrProperLawAndOrder (talk) 05:50, 5 June 2020 (UTC)Reply

@MrProperLawAndOrder: Fixed with a null edit. Purge did not help in this case. Raymond (talk) 07:15, 5 June 2020 (UTC)Reply
@Raymond: thank you, didn't know that feature of MW. de:Kategorie:Wikipedia:VIAF in Wikipedia vorhanden, fehlt jedoch in Wikidata - now 76. MrProperLawAndOrder (talk) 08:11, 5 June 2020 (UTC)Reply
@MrProperLawAndOrder: Very new, kindly created 2020-05-31 by User:M2k~dewiki. Raymond (talk) 08:48, 5 June 2020 (UTC)Reply
@Raymond: the null edit? Is that documented anywhere? MrProperLawAndOrder (talk) 09:48, 5 June 2020 (UTC)Reply
@MrProperLawAndOrder: Ah sorry, I misread your last question. It was about the null edit not the category on de.wiki. The null edit is a very old feature trick to activate the re-rendering of the page incl. some background tasks like re-generating the list of used includes pages/templates. Raymond (talk) 09:53, 5 June 2020 (UTC)Reply



VIAF cluster conflating humans edit

@Kolja21, Epìdosis, Mautpreller: if the VIAF ID is ranked as deprecated and has a qualifier reason for deprecation = conflation , then it is not clear to the reader what was conflated and in a later check, it might be impossible to see the conflation when it doesn't exist in VIAF anymore.

There are also two pages:

If the information is stored in items it can be queried via WDQS, which in the pages it can't. A benefit of the pages is, that one has better tracking about changes. How about storing it in the items and let a bot (ideally standard Listera) write the pages based on the items? MrProperLawAndOrder (talk) 20:15, 13 June 2020 (UTC)Reply

I tendentially agree, although I would also continue using the report pages because they allow describing the problems related to conflation in a way more sophisticated than simply marking the VIAF cluster as conflated. --Epìdosis 20:34, 13 June 2020 (UTC)Reply
But wouldn't it be possible to create qualifiers in order to precisely determine what the conflation is? Taking a real example: Q1125847. VIAF ID is deprecated. Qualifier: reason: conflation. Qualifier: date 13-06-2020. Qualifier: Link to LCAuth concerns wrong person, namely Q96107618. Link to NUKAT concerns wrong person, namely Q96108054. This is exactly what is the case and it is certainly relevant. In my view, it would be a very good idea to store this in the items. Mautpreller (talk) 20:59, 13 June 2020 (UTC)Reply
@Mautpreller, Epìdosis: only if the information is stored in the item, it is available to tools like moreIdentifiers and WDQS and all others using only data stored in the item. MrProperLawAndOrder (talk) 21:24, 13 June 2020 (UTC)Reply
Yes, this is my suggestion. Why not qualify the exact nature of the conflation in the item? I don't know how to do this but it should be possible. Mautpreller (talk) 21:41, 13 June 2020 (UTC)Reply
@Mautpreller: one would need to say something about each source ID, but if the source ID is mentioned in a qualifier, then no additional information can be attached. Another option could be to store it in the properties for the source IDs, state that they are mentioned in a VIAF that is present in P213 and state why this source ID is false here - using "applies to other person". But a tool is needed to do all this, otherwise editing is very tedious. MrProperLawAndOrder (talk) 22:19, 13 June 2020 (UTC)Reply

@Mautpreller, Kolja21, Epìdosis: VIAF is a moving target, maybe storing the VIAF IDs can be done fully automated. WD stores the source IDs, which are less often conflated. Then a bot can store the VIAF and state which source caused the VIAF to be on the WD item. No more stating why deprecating, but stating why the VIAF is on an item. And if another item has the ID too, then it either it is a duplicate or VIAF is conflating. MrProperLawAndOrder (talk) 00:38, 14 June 2020 (UTC)Reply

Indeed, a cluster should not be treated like a stable ID. At least VIAF IDs should only be added with a date and checked once a while. --Kolja21 (talk) 01:19, 14 June 2020 (UTC)Reply
Yes, I agree. There is a permanent problem because of this very cluster nature of VIAF. Maybe I should explain: In the independent "Normdaten" (authority control) system of the German Wikipedia, I discovered in 2016 that the VIAF ID of de:Henry Bauer linked to a LCAuth data set that refers to another person (the article was written by me so I know exactly). So I removed the LCAuth data in the German Wikipedia authority control system. I had no big problem about the VIAF ID because most links are okay (NUKAT is wrong but the rest is fine). Now, in a discussion about using Wikidata authority control properties for German Wikipedia, I found out that both VIAF and Wikidata still had the wrong LCAuth and NUKAT data. So I tried to clarify and correct this on Wikidata. However, because of the scripted edits the mistakes came back. It is this problem I am trying to solve. I think this is important because, as far as I know, the VIAF cluster scripts also use Wikidata as a source. So this relatively simple mistake might be propagated again and again. This leads to corruption of data. So there must be a possibility for Wikidata to state: the link between the VIAF cluster and the LCAuth data set is wrong (at a given point in time) and should be removed. I hope that an update of VIAF might indeed remove this wrong link (and replace it by a correct one which I also indicated). It cannot be that Wikidata "confirms" a false claim but Wikidata has to identify it as false, hopefully helping VIAF to improve. Of course, it is necessary to check, say, once a month, whether VIAF has corrected this mistake, and if so, to remove the deprecation of the VIAF ID.Mautpreller (talk) 09:41, 14 June 2020 (UTC)Reply


BNF systematically spreading false information edit

https://catalogue.bnf.fr/ark:/12148/cb139745974

Identifiant international :  ISNI 0000 0001 2039 8691 , cf. http://isni.org/isni/0000000120398691
Notice n° : FRBNF13974597
Création : 95/04/10
Mise à jour : 95/04/10

There was no ISNI on 95/04/10, http://archive.vn/wip/piP6d . MrProperLawAndOrder (talk) 00:28, 14 June 2020 (UTC)Reply

The data sheet was created 95/04/10. ISNI was later added automatically. OCLC, Inc. (Q190593) is promoting ISNI and exported it to many databases. They have a monopoly on library software. --Kolja21 (talk) 01:27, 14 June 2020 (UTC)Reply
Thank you Kolja21. If ISNI was added automatically the "Mise à jour : 95/04/10" should have been changed. They didn't. So, systematically spreading false information. Re "monopoly" - if that is true, it may explain why their library websites are so bad. ISNI has bugs right from the beginning - it seems they created items based on VIAF clusters. I remember that for some time ISNI was shown on top of all items in viaf.org. MrProperLawAndOrder (talk) 01:36, 14 June 2020 (UTC)Reply
ISNI started in 2012 with great expectations but I have the impression that the project will not be developed further. --Kolja21 (talk) 01:46, 14 June 2020 (UTC)Reply
ISNI clusters are very likely to conflate different entities and yes, the impression is often they have been based on VIAF clusters; I always prefer to add also national libraries' IDs because it is very difficult finding conflations in them; the most frequent problem in national libraries are (a few) duplicates, which are surely easier to manage than conflations. --Epìdosis 08:26, 14 June 2020 (UTC)Reply
ISNI has bad management. But it is an ISO standard, backed internationally, if all national libraries would stop to have their own IDs, there would be more pressure to improve. ISBN-13 works more or less fine. https://www.editeur.org/files/about/EDItEUR%20February%202020%20Newsletter.html#isni dumps! json! ... and maybe the recent IDs are less conflated. There has been very little growth in the recent years, but maybe it will change to more growth. Can we have an ISNI for every WD human that has 10 external IDs? WMF could maybe join them. MrProperLawAndOrder (talk) 08:41, 14 June 2020 (UTC)Reply

Alert - moreIdentifiers behavior changed edit

User:Bargioni/moreIdentifiers - the script is been reported to not show the deprecated VIAF clusters anymore. Instead of giving the user more control over VIAF editing, the functionality is reduced. MrProperLawAndOrder (talk) 04:50, 15 June 2020 (UTC)Reply

Fixed this morning. --Epìdosis 13:20, 15 June 2020 (UTC)Reply

Reason for deprecation removed - deprecated rank left edit

@Kolja21, Mautpreller, Bargioni, Epìdosis: User:Silewe made it deprecated and gave reason conflation [4] and added a GND 4 minutes later. 7 days later User:Wurgl removes the reason, but the deprecation is kept [5] . As of today the cluster has two sources, each is in WD having normal rank.

Maybe the tool User:Bargioni/moreIdentifiers could show each source from the cluster, so one doesn't need to go there, arranged in a table, where one column (maybe "WD status") has more data related to the presence of the sources in WD ("in WD, normal rank" etc.). MrProperLawAndOrder (talk) 04:04, 16 June 2020 (UTC)Reply

Well maybe 7 days on some strange planet far in the universe. April 30th:Silewe – June 30th:Wurgl. Ane yes, I forgot to set the rank back. Reason: VIAF removed two wrong Ids from its cluster, so it was not a mix of two persons anymore. --Wurgl (talk) 06:51, 16 June 2020 (UTC)Reply
BTW: VIAF removed GND 1154499715 and RERO vtls020303341 from this cluster on May 3rd. GND 1154499715 is now in VIAF cluster 8238152140006311100008 and the RERO Id is in VIAF cluster 252155284922587062845 you can see that at the bottom of the viaf page ("open History"). So Silewe's change was okay. Any mine too, but as I said, I forgot to set back the rank. Sorry for that. --Wurgl (talk) 07:04, 16 June 2020 (UTC)Reply
Return to the project page "VIAF".