Property talk:P227/Archive 2

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.


Deprecate Tn claims; which "reason for deprecation" qualifier?

There are meanwhile more than 1000 Tns in Wikidata again, so apparently users are still importing those ones from somewhere. I can batch-deprecate all of them to avoid another re-import, and add a reason for deprecated rank (P2241) qualifier. However, which value would be appropriate for this qualifier? Simply incorrect identifier value (Q54975531) would be possible, but somewhat meaningless. Do we have something more specific? —MisterSynergy (talk) 20:21, 1 August 2019 (UTC)

I believe that Tns don’t fit any of the criteria in Help:Deprecation, so in theory they should just be deleted periodically. But as you said, they keep being added. The problem with incorrect identifier value (Q54975531) is that the wording does not convey any information about the real reason for deprecation to an unfamiliar user. Wouldn’t it be best to create a new item for “undifferentiated [in the sense of GND or LCCN]“ and use this? --Emu (talk) 21:49, 1 August 2019 (UTC)
Shouldn't be necessary since DNB announced yesterday the deletion of all Tn records by June 16, 2020. -- Gymel (talk) 19:19, 29 August 2019 (UTC)
I support Emus idea to create a new item for "undifferentiated". Even if the DNB will delete all Tns June 2020 (what I doubt), these IDs will still remain in other datebases for years, if not for decades. Though the qualifier should only be added if a source is given; otherwise I would just delete the Tn. BTW: The list Wikidata:WikiProject Authority control/Tn is quite helpful for maintenance. --Kolja21 (talk) 00:00, 1 September 2019 (UTC)

@MisterSynergy, Kolja21, Emu, Gymel:, such tool editing would be very helpful, see my comment in section "#Duplicates". Deleting all or deprecating all would help to see real duplicates via SPARQL. GND real duplicates are bad, because they also result in VIAF duplicates. MrProperLawAndOrder (talk) 19:35, 11 May 2020 (UTC) / Linkfix Property talk:P227#Duplicates. --Kolja21 (talk) 20:42, 11 May 2020 (UTC)


(careful) import from VIAF?

Hi everyone, I was at a conference the last couple of days and some people mentioned that the GND coverage on Wikidata is a bit on the low side. For other countries I sometimes import links based on VIAF (example) and I could do the same for GND. https://w.wiki/C$5 gives an overview of potential candidates. After reading through this page I see I have to do some filtering:

  1. Check if the GND entry actually exists (viaf seems to contain dead links)
  2. Check if the GND entry is of type person and not of type name

Do you think this is a good plan? Has this been tried before? Do I need to apply additional filtering to prevent errors? Multichill (talk) 11:55, 30 November 2019 (UTC)

Sounds great! A challenging project. User:Magnus Manske is one of the experts. AFAIK there is a third filtering needed. There are old GNDs with a dash. Example: GND 4029236-8 for Cairo (= VIAF DNB-040292363). If you start with GND, Type p (person) you can ignore this problem (it only efficts corporate bodies and geographical place names). --Kolja21 (talk) 16:47, 30 November 2019 (UTC)
Well, if you had asked before the recent mass VIAF import has taken place, I would have supported this idea. Now I am not sure, as there were plenty of wrong VIAF identifiers imported recently. For persons, VIAF does pretty aggessive automatic matching of their clusters to existing Wikidata items, based on "same name + same year of birth" comparisons which results in way too many wrong matches. That said, it would probably be safe to import GNDs from VIAF clusters about anything except humans. —MisterSynergy (talk) 17:21, 30 November 2019 (UTC)
How about a direct import? In VIAF clusters, the GND part usually has quite a lot of detail. Someone might already be doing that with the subset of economists. --- Jura 17:29, 30 November 2019 (UTC)
Could you prepare a random sample import set of about 500 GND? We could check them for systematic and specific problems before the big import. Not a big fan of the last VIAF import either, almost all the changes on my watchlist were faulty. --Emu (talk) 17:56, 30 November 2019 (UTC)
@Kolja21, MisterSynergy, Jura1, Emu: I did a small test run. Please have a look.
As a general remark: If a link is incorrect, please don't remove it, but set it to rank deprecated with reason for deprecated rank (P2241) set to applies to other person (Q35773207). This avoids re-introducing mistakes. Multichill (talk) 19:54, 2 December 2019 (UTC)
I've checked 10 edits: 8 are good (some of them were even missing on German WP), 2 were wrong:
Imho even the wrong edits are helpful if they are marked as "rank deprecated" since many editors on Wikidata do the same kind of import. BTW: Is there a list like "the 500 most common names"? The bot could ignore persons with these names or put these edits on a seperate list for "please check"? --Kolja21 (talk) 22:08, 2 December 2019 (UTC)

For humans add VIAF if GND exists

@Multichill: could your bot for humans add VIAF ID if GND ID and DtBio ID (P7902) are equal and present and VIAF missing? One can reach the VIAF cluster via GND ID, e.g. for P227=P7902=1047557762 the link is https://viaf.org/viaf/sourceID/DNB%7C1047557762 MrProperLawAndOrder (talk) 22:36, 23 May 2020 (UTC)

I'm pretty sure I didn't continue this because the error rate was too high. Not sure. I have no plans to work on this anytime soon. Multichill (talk) 08:47, 24 May 2020 (UTC)

GND saturation of Wikidata

GND-only items currently saturated almost every other application. Given that we have more than 160,000 items with merely GND IDs, can we see an outline how this will be fixed?

According to MrProperLawAndOrder (see talk page of @Mike Peel:), they count on @Bargioni: (or @Epìdosis:) to fix it for them [1]. --- Jura 10:03, 25 May 2020 (UTC)

@Jura1: you are aware of the fact that your claim about me is a personal attack? I never said what you claim. MrProperLawAndOrder (talk) 06:28, 26 May 2020 (UTC)
@Jura1: In less than one week dates of birth and death will be imported by Bargioni from GND ID (P227). --Epìdosis 10:06, 25 May 2020 (UTC)
Given the number of items, it seems unlikely that this can be done in a week, but I think we can hold that long. --- Jura 10:13, 25 May 2020 (UTC)
@Jura1: Work in progress. We have to access GND a lot of times to grab dates. If more info is available, I'll grab it too. -- Bargioni 🗣 10:35, 25 May 2020 (UTC)
Didn't they have downloadable dump? It might be easier to just create new items from scratch and nuke the others. --- Jura 10:38, 25 May 2020 (UTC)
@Jura1: importing from the most recent dump would mean to import information that is already outdated. MrProperLawAndOrder (talk) 13:22, 25 May 2020 (UTC)
@Jura1: Can you provide a source for the claim in your first sentence? MrProperLawAndOrder (talk) 12:54, 25 May 2020 (UTC)
@Jura1: reminder. MrProperLawAndOrder (talk) 14:05, 25 May 2020 (UTC)
@Jura1: reminder. MrProperLawAndOrder (talk) 23:01, 25 May 2020 (UTC)

Wrong gender imported from GND

Can you repair this: https://www.wikidata.org/w/index.php?title=Q94853704&oldid=1185186853 person is obviously female. --- Jura 10:13, 25 May 2020 (UTC)

More examples are Q95335213, Q95338703, Q95339302, Q95350061, Q95349834, Q95348529, Q95349608, Q95350494. DNB seems to have wrong gender data (male instead of female) in at least some cases, even if they show the right (female) form of occupation (e.g. "Schriftstellerin" instead of "Schriftsteller"). --M2k~dewiki (talk) 12:12, 25 May 2020 (UTC)

Should be

SELECT ?person ?gnd
WHERE { 
  ?person wdt:P227 ?gnd . 
  ?person wdt:P7902 ?gnd .
  MINUS { ?person wdt:P569 ?b . }
  MINUS { ?person wdt:P570 ?d . }
  ?person wdt:P31 wd:Q5 .
  ?person wdt:P21 wd:Q6581072 .
  ?person wdt:P735 ?firstname . 
  ?firstname wdt:P31 wd:Q12308941 .
}
ORDER BY DESC(?item)
Try it!

and

SELECT ?person ?gnd
WHERE { 
  ?person wdt:P227 ?gnd . 
  ?person wdt:P7902 ?gnd .
  MINUS { ?person wdt:P569 ?b . }
  MINUS { ?person wdt:P570 ?d . }
  ?person wdt:P31 wd:Q5 .
  ?person wdt:P21 wd:Q6581097 .
  ?person wdt:P735 ?firstname . 
  ?firstname wdt:P31 wd:Q11879590 .
}
ORDER BY DESC(?item)
Try it!

However, it works only on items having given name (P735): probably items created in the last days don't have it yet. --Epìdosis 13:52, 25 May 2020 (UTC)

@Jura1, M2k~dewiki: Thank you for the list of differing data. In GND gender and occupation are added separately so an actress can be male. All errors concerning articles in German WP have been corrected a few months ago. I've added the new items to this list: de:Wikipedia:GND/Fehlermeldung/Mai 2020#Todesjahr nach 1850. These errors will be corrected as well. --Kolja21 (talk) 00:32, 26 May 2020 (UTC)
  Done The GNDs with a wrong gender have been corrected. --Kolja21 (talk) 20:03, 28 May 2020 (UTC)

Removal of redirected IDs

Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Sic19 (talk) 20:42, 12 July 2017 (UTC) Wikidelo (talk) 21:15, 8 May 2018 (UTC) ArthurPSmith (talk) 19:52, 22 August 2018 (UTC) PKM (talk) 19:40, 23 August 2018 (UTC) Ettorerizza (talk) 06:44, 8 October 2018 (UTC) Fuzheado (talk) 03:47, 19 December 2018 (UTC) Daniel Mietchen (talk) 16:30, 7 April 2019 (UTC) Iwan.Aucamp (talk) 21:48, 3 October 2019 (UTC) Epìdosis (talk) 23:49, 22 November 2019 (UTC) Sotho Tal Ker (talk) 00:52, 1 May 2020 (UTC) Bargioni (talk) 09:48, 02 May 2020 (UTC) Carlobia (talk) 14:34, 11 May 2020 (UTC) Pablo Busatto (talk) 03:22, 23 June 2020 (UTC) Matlin (talk) 10:53, 6 July 2020 (UTC) Msuicat (talk) 21:57, 27 August 2020 (UTC) Uomovariabile (talk) 10:04, 27 October 2020 (UTC) Silva Selva (talk) 17:21, 30 November 2020 (UTC) 1-Byte (talk) 15:52, 14 December 2020 (UTC) Alessandra.Moi (talk) 17:26, 16 February 2021 (UTC) CamelCaseNick (talk) 21:20, 20 February 2021 (UTC) Songceci (talk) 18:45, 24 February 2021 (UTC)]] moz (talk) 10:48, 8 March 2021 (UTC) AhavaCohen (talk) 14:41, 11 March 2021 (UTC) Kolja21 (talk) 17:37, 13 March 2021 (UTC) RShigapov (talk) 14:34, 19 September 2021 (UTC) Jason.nlw (talk) 15:15, 30 September 2021 (UTC) MasterRus21thCentury (talk) 20:22, 18 October 2021 (UTC) Newt713 (talk) 08:42, 13 March 2022 (UTC) Pierre Tribhou (talk) 08:00, 20 March 2022 (UTC) Powerek38 (talk) 17:21, 14 April 2022 (UTC) Ahatd (talk) 08:34, 4 August 2022 (UTC) JordanTimothyJames (talk) 00:54, 31 August 2022 (UTC) --Silviafanti (talk) 17:07, 14 September 2022 (UTC) Back ache (talk) 02:03, 1 November 2022 (UTC) AfricanLibrarian (talk) M.roszkowski (talk) 10:44, 4 January 2023 (UTC) Rhagfyr (talk) 19:36, 9 January 2023 (UTC) — Haseeb (talk) 13:10, 4 August 2023 (UTC) 13:26, 15 November 2023 (UTC)

  Notified participants of WikiProject Authority control Hi all! Until recently redirected GND IDs have been periodically removed by KrBot maintained by @Ivan A. Krestinin:; recently the bot has been blocked (see here and here) because it has been said that these cluster should be kept deprecating them and adding reason for deprecated rank (P2241)redirection (Q8143062). I think that it can be appropriate to act in the aforementioned way for single authority control IDs; at the same time, it is inconsistent to have only some redirected IDs. We should decide in this discussion if we want that

  1. the bot always deprecates redirected IDs (which may be inconsistent, as in the past many redirected IDs have been deleted; but this solution is still somewhat possibile), never removing them
  2. the bot deprecates the redirected IDs only in some cases (we should establish a criterium and it should be possible for the bot to understand and respect this criterium, which may be not easy) and removes them in all the other cases
  3. the bot always removes the redirected IDs, unless they have already been deprecated (we should establish a criterium and apply it manually)
  4. the bot always removes redirected IDs, as it did before the block

In my opinion, option 3 is probably a good compromise, at least temporarily; so, if no objections are raised, I will unblock the bot (at least for this task) on the 10th of June, asking Ivan not to remove IDs which are already deprecated.

If objections are raised about this temporary compromise and in the meanwhile the bot gets unblocked for other tasks (e.g. for VIAF tasks, see this discussion), I will ask Ivan not to edit GND IDs until some consensus is reached about the above proposals. --Epìdosis 09:47, 4 June 2020 (UTC)

Redirected IDs should never have been removed; any edits that did so should be reverted. The fact that some were wrongly removed in the past should not be used as a reason to remove more in the future. Your point 3 is not a good compromise; it would continue the harm done by such removals. I am opposed to the block being lifted unless an undertaking is given to remove no IDs. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:15, 4 June 2020 (UTC)
  • What is the scale of this? (db size GND: actual entries, number of redirects, number added over some period of time; at Wikidata: number of redirects removed in a run). --- Jura 13:11, 4 June 2020 (UTC)
  •   Oppose option 3 and 4. Agree with Andy Mabbett: no GND IDs should be removed, if they still apply to the same entity. And what is the source for the bot's operations? Is it the VIAF DB? See also User talk:Ivan A. Krestinin#VIAF replacement without adjusting the reference, which still said "imported from Wikimedia project : German Wikipedia" MrProperLawAndOrder (talk) 03:40, 7 June 2020 (UTC)
    The data to use for GND is in GND LDS / GND dump. If the next dump is out, a bot or QS could add the redirects. I don't know what the value of having this is outside GND "piz" (=Q5). @Kolja21, Raymond: third parties use piz a lot and often don't change the old IDs. So it is important for resolving. Q5/piz should be the most important? MrProperLawAndOrder (talk) 02:07, 7 June 2020 (UTC)
    Care should be taken: WD mixes entities from GND, e.g. pseudonym GNDs are on the human. But each redirect applies to one GND entity. One should think about how to store this information. Most important seems to be to do it for humans, as many 3rd parties also only have items for humans, not one for "real" person and one for what GND calls "pseudonym". MrProperLawAndOrder (talk) 03:45, 7 June 2020 (UTC)
  •   Support I support the removal of redirects. I have not read any valid reasons yet why these should be kept. Andy states his opinion against removal, but does never actually give any arguments. MrProper does neither. What are the benefits for Wikidata and/or entities using the data that WD provides if we allow for redirects to be kept? Keeping redirected IDs would be the same as keeping Wikipedia article redirects after those articles have been renamed. Is that done? --Sotho Tal Ker (talk) 15:07, 7 June 2020 (UTC)
    You ask what is the benefit of keeping and claim I didn't state a reason, while I did, not only for keeping but also for adding, at 02:07, 7 June 2020 right on this page. You don't only oppose adding them, but also support removing them. Since you asked for a benefit, why didn't you state a benefit for the action 'you' ask for, namely removal of verifiable information provided by a high quality source? MrProperLawAndOrder (talk) 23:22, 7 June 2020 (UTC)
You might want to clarify what you mean by your statement at 02:07, 7 June 2020. Because to me it does not make any sense. Why are old redirects important for resolving? Which third parties use piz a lot? Why does it matter in this context for redirects? Why would any third party be interested in obtaining current values AND older values that are redirects? Good reasons for keeping redirects are surely possible, i.e. this is correct historical information or the redirects are still valid data in itself. But none of this has been provided, only some vague gibberish. But feel free to elaborate more. Benefits for removal of redirects are easy to see:
  • Values for redirects point to the same data as the current values. There is no use in keeping distinct values if they actually point to the same source. See also my example for Wikipedia redirects above.
  • Values are constantly updated. Third parties can be confident that they always get the up-to-date values without any mixed in deprecated values. Of course this could be filtered aswell but removal will make the handling for third parties much easier.
  • A cleaner database. There will only be a few items left with more than one value and those cases are well substantiated, like pseudonyms which have their own GND entry.
  • Keeping redirects will waste computing power as these redirects have then to be resolved externally. This is not energy efficient.
I never asked for anything, I just picked one of the options provided. But I am indeed in favor of removing outdated data. Knowing that an item had this this and that GND value which have then been merged into a single item is of no practical use. The only reason these redirects are kept by the authority data providers is that they were intended to be permanent. This is much better than suddenly having a dead link but it does not mean that these redirects have to be kept forever, especially not by secondary databases like Wikidata. In my opinion, any sensible approach would be to just replace the older value with the new one and move on.
From another discussion I can see that people want to keep redirects that are already present, but do not want to add other redirects that exist. How does that make any sense? And I really would love to see your "high quality source" for "verifiable information" that still includes redirects. Unless it is a GND data dump, which could be classified as mediocre at its best. --Sotho Tal Ker (talk) 02:26, 8 June 2020 (UTC)
Re "The only reason these redirects are kept by the authority data providers is that they were intended to be permanent." + "it does not mean that these redirects have to be kept forever" : could you post your definitions of "permanent" and "forever"?
Re "You might want to clarify what you mean by your statement at 02:07, 7 June 2020. Because to me it does not make any sense." :
  • "Why are old redirects important for resolving?" : Without them it isn't possible.
  • "Which third parties use piz a lot?" : de:Wikipedia:BEACON#Datenquellen zu Personen (GND)
  • "Why does it matter in this context for redirects?" : If nobody would use the values, the values would have no value.
  • "Why would any third party be interested in obtaining current values AND older values that are redirects?" : Cleaning up their data, matching with other piz users
Re "Good reasons for keeping redirects are surely possible, i.e. this is correct historical information or the redirects are still valid data in itself." - That is what all the requests for keeping are about.
Re "And I really would love to see your "high quality source" for "verifiable information" that still includes redirects. Unless it is a GND data dump, which could be classified as mediocre at its best." - GND LDS.
MrProperLawAndOrder (talk) 13:51, 8 June 2020 (UTC)
Any english dictionary can give you the definitions I use. Permanent: "continuing or enduring without fundamental or marked change". Forever: "for a limitless time". Please also note that I explicitely stated the quoted second part for secondary databases. Primary databases are expected (at least by me) to never delete any redirects. But the explanations you gave do only partially satisfy me. They make sense for primary databases, but why should secondary databases like Wikidata keep obsolete values? I also do not see any commonalities between the linked BEACON-sources and Wikidata. The data in those sources clearly link to GND values and often their own websites. Where does Wikidata come into play and why would these third parties be interested in obsolete values stored in Wikidata? I already explained the reasons why redirects exist: So that users can update their data without getting any dead links or inconsistency, for example to avoid stuff like this: [3]. But this only applies to the primary database. Why would any secondary source want to keep obsolete values? If a clean up of their data is needed, I would advise those third parties to use the primary database directly, not any secondary one which usually lags behind a bit. If requests for keeping are made for historical purposes, why do I not see anyone mentioning this in the discussion? And lastly: The GND LDS dump is not of "high quality", sorry to be blunt. There are lots and lots of issues, but this is not the point in this discussion.
Long reply made short: In my opinion your arguments apply mostly to primary databases which are the maintainers of original values. Wikidata is only a user of that data like those other mentioned third parties. But maybe I am missing something here. --Sotho Tal Ker (talk) 20:46, 8 June 2020 (UTC)
@Sotho Tal Ker: thanks for insisting. Third party uses old GND and can link via a resolver to WD. If the value is deleted it cannot do that any more. "high quality" in general or not, but GND is at least for the redirects the authoritative source, even if they do errors on their redirects. GND LDS is current data, dump can be outdated. GND IDs for humans widely re-used, sites providing beacon-files. Beacons cannot be safely matched if the third parties use different IDs for the same human. Also note, Deutsche Biographie supports redirected IDs, they use the dump or GND LDS or whatever to obtain them. WD could do the same. MrProperLawAndOrder (talk) 22:58, 8 June 2020 (UTC)

Better to continue in the general RfC: Wikidata:Requests for comment/Handling of stored IDs after they've been deleted or redirected in the external database‎. --Epìdosis 21:41, 8 June 2020 (UTC)

Replacement of redirected values

@Epìdosis, Pigsonthewing, Bargioni, Kolja21, Raymond: the bot didn't only remove, it replaced, and it did so on deprecated values, without changing the rank, so the redirect target was then marked as deprecated in WD. And, worse, it did so on items that already had the target value as preferred value, creating inconsistency in WD [4]. These kind of edits by the bot are just a disaster. Maybe it did more harm to the data than any vandal. MrProperLawAndOrder (talk) 20:26, 8 June 2020 (UTC)

Yes, these edits are the main problem. KrBot helped a lot to find duplicates (and also fixed internal VIAF-GND-IDs, since VIAF has problems with dashes) but overwriting values that have references or ranks is unacceptable. --Kolja21 (talk) 20:38, 8 June 2020 (UTC)
I surely agree. When correcting redirects (if we decide it is a good practice removing the redirected ID), the new value should have normal rank and old references should be removed. --Epìdosis 23:00, 8 June 2020 (UTC)

Closing this point

Can the users who haven't been block evading clarify what they prefer for this property? I think the options are:

  • (a.) delete redirecting and deleted ids
  • (b.) delete redirecting ids
  • (c.) delete deleted ids
  • (d.) skip deletion of (manually or otherwise) deprecated ids
  • (e.) add all redirecting ids

I think krbot did (a). I'm not really sure that we have the resources to maintain anything beyond that. We could do (d.), but that implies that when an id is fixed somewhere, redirecting ids are fixed as well. I would be glad if Krbot would be reactivated soon. @Epìdosis, Bargioni, Kolja21, Raymond, Ivan A. Krestinin: --- Jura 09:04, 20 June 2020 (UTC)

I would go with (d). At least for the GND I can confirm that redirected ids are permanent. Raymond (talk) 09:08, 20 June 2020 (UTC)
The risk at Wikidata is that the redirecting GND id and the actual GND are on different items. If (d.) concerns only few items, this wouldn't much of a risk. --- Jura 09:12, 20 June 2020 (UTC)
  •   Comment @Jura1: The question is very much pertinent and I perfectly agree with the risk you underline in option (d.) [I've already started looking for these cases, see the second query here), but the problem should probably not be discussed here, since the general RfC Handling of stored IDs after they've been deleted or redirected in the external database is still open and was going in another direction: seemingly there was consensus for deprecating deleted and redirected IDs and some users also supported adding already redirected IDs (which still doesn't convince me completely). So I think the discussion should probably continue with a general perspective in the RfC, since, if here a few users decide something for this property and then the RfC is closed with a different result, the last would prevail. --Epìdosis 09:30, 20 June 2020 (UTC)
    • RFCs are only applicable if the other discussions didn't lead to a result. If you feel other users should be invited to comment on this, please ping them. The general approach for IDs doesn't exclude that we use better solutions for some IDs. --- Jura 09:34, 20 June 2020 (UTC)
      • I think we should probably mention @MisterSynergy: (the main supporter of deprecation of redirected/deleted IDs and of their insertion ex novo in items in the RfC) and also {{ping project|Authority control}} (I don't do it myself because I've received complains for supposedly using it too frequently). --Epìdosis 09:44, 20 June 2020 (UTC)
        • I would prefer to standardize the handling of redirecting and deleted identifiers for all properties, including this one. Everything else is pretty difficult to implement and teach to the community. ---MisterSynergy (talk) 09:58, 20 June 2020 (UTC)
          • If we can formulate an approach that covers a decent variety of use cases, why not. I don't really see that in the RFC's initial proposal: it seems to assume that there are only permanent redirects and everything else is stable. With external-ids at Wikidata covering properties like this one, social media account names, VIAF and ISO country codes, there are at least four different things that need to be explicitly mentioned. --- Jura 10:42, 20 June 2020 (UTC)
  • So shall we go for (d)? These would be skipped. It should be possible to phrase this into a more general summary of how to handle them. --- Jura 17:30, 27 June 2020 (UTC)
    @Jura1: Two problems: first, I would prefer waiting the closure of the general RfC, although I acknowledge that it could require a long wait, so maybe a provisional solution is the best way to have KrBot restart its work; second, I've not fully understood what solution d) implies: the bot should not touch deprecated IDs (OK), but when finding values which GND has redirected but on Wikidata aren't yet deprecated, would it a) delete them b) deprecate them c) skip them? --Epìdosis 17:52, 27 June 2020 (UTC)
    • As you are aware, RFC shouldn't be used to duplicate ongoing discussions elsewhere (that is here). Option (d) would skip the statements in the query when doing the updates it does currently. --- Jura 17:55, 27 June 2020 (UTC)
    @Jura1: So if the bot finds a GND value which has normal rank on Wikidata while in GND redirects to another value, should the bot remove the GND value from Wikidata, according to option (d)? --Epìdosis 18:00, 27 June 2020 (UTC)
    Given that everyone had time to comment, I think we can re-activate this per option (d). --- Jura 23:45, 15 July 2020 (UTC)
    @Jura1: I agree, but could you please clarify me this point: "if the bot finds a GND value which has normal rank on Wikidata while in GND redirects to another value, should the bot remove the GND value from Wikidata, according to option (d)?" --Epìdosis 09:38, 16 July 2020 (UTC)
    If it runs as it did for now, I think it would update it to the new value effectively deleting the old one. Ivan runs the bot, so he would be the person to confirm this. Option (d) only changes what happens to deprecated statements. Someone else would need to find a way to maintain the redirecting deprecated statements (or people shouldn't rely on them to point to the correct redirect target). --- Jura 09:49, 16 July 2020 (UTC)

Usage of P227 in dewiki

@Christian140, Kolja21, Raymond, emu, berita: Re de:Wikipedia:Umfragen/Normdaten aus Wikidata maybe some in the discussion are not aware of the recent changes in WD. 100000+ new items about humans that are in Deutsche Biographie have been created, VIAF ID is now added to them, directly from the GND DB, dozens of duplicated items that have been created from a dewiki article have been found and merged. On humans GND ID became the second VIAF source ID after ISNI that is used on 1 million items, see Wikidata:VIAF/type/human. The items are enriched not only with more data from GND but also by other edits. When the next GND dump is out, maybe user:Bargioni and user:Epìdosis can analyze the whole dump and maybe create more items, one could start with those that have relationships with other humans or have a value for ISNI and VIAF.

In WD it is also very easy to find format violations, recently I found cases in dewiki where a GND was stored in the field VIAF, this can also happen in WD, but a warning symbol will be displayed after saving.

Would be interesting to hear, what is bad about using the data from WD. The discussion isn't only about GND but also about VIAF and LCCN. Maybe a tool would be nice that allows easy editing of VIAF, GND, LCCN in WD by dewiki-users. One can link directly to the properties, e.g. P227: Q57188#P227, but this is not as user friendly as displaying the three fields next to each other. I don't know how a change in P227 appears in article watch lists, maybe there is something to be improved too. MrProperLawAndOrder (talk) 19:25, 8 June 2020 (UTC)

Regarding editing Wikidata by (de)wiki-users also see:

"Maybe a tool would be nice ..." deWP has a nice tool, see: de:Hilfe:Normdaten#Helferlein. --Kolja21 (talk) 20:20, 8 June 2020 (UTC)

Kolja21, I meant a tool for editing in WD. If the tool could do that, that would be nice. Instead of storing in Vorlage:Normdaten the data would be stored in WD. So, this part could be made the same as is in dewiki now. dewiki also has a field for type, this could be done in WD via qualifier, one could even store the entity subtypes, piz, pis etc. which is more that dewiki currently has? MrProperLawAndOrder (talk) 20:33, 8 June 2020 (UTC)
Just a few random example from the last days:
  • bot and batch problems: Stefan Haas (Q15433293) (history): Silewe removed an incorrect VIAF, Bargioni re-entered it, I removed it. Sometimes it’s ping-pong over years.
  • plain wrong: Otto Keller (Q2039491) (history) – all identifiers save for the Austrian parliament had the wrong Otto Keller. I found out because there was no entry in de.wp (but in WD) and I could cross-check – that would have been impossible with the envisioned Wikidata only approach.
  • especially worrisome: Michael Fischer (Q95316658) used to be correctly considered distinct from Michael Fischer (Q21588913). Then User:MrProperLawAndOrder merged the two for some reason (my only information is: batch #35935 which isn’t helpful at all). Again, I could find the problem because there still is Authority Control in de.wp.
I fail to see how a tool would be of any help here. --Emu (talk) 20:48, 8 June 2020 (UTC)
Emu, the third one was me. I restored Q95316658 and merged the duplicate Q96106664 created by an unclickable temporary batch [5]. If you click on the batch link from my batch you can find more information. Today I also created a section for these batches here on the page: Property talk:P227#VIAF batch merge using QS. user:Raymond recently reported two errors in my batch merges. I merged 4400 - sorry for the three errors found so far. All merges involved Deutsche Biographie for which I created many new items. MrProperLawAndOrder (talk) 21:23, 8 June 2020 (UTC) //// (edit conlict, putting it here, extension to the last sentence of the post) All merges involved Deutsche Biographie for which I created many new items and that recently got a VIAF directly from the GND DB and where the VIAF value existed on another item. There were so many duplicates that I thought some mass operation could help - of course I feared wrong mergers. I included check for name and for date of birth. MrProperLawAndOrder (talk) 22:02, 8 June 2020 (UTC)
I’m not sure how what you did now is any improvement over the solution I created. Anyway, we’ve reached the core of the problem: If I point out a problem (or even hundreds) of problems, the answer is always threefold: 1) the error rate is low (easy if it’s hard to find errors) 2) all problems can be solved with some gizmo 3) philosophical concerns about the relationship between de.wp and Wikidata. --Emu (talk) 21:47, 8 June 2020 (UTC)
@Emu: the merge was wrong, Q95316658 should not be a redirect to the other M. Fischer, and your item had a higher number, but WD's standard is to merge into the lower, so I merged into Q95316658. I don't see where my answer was threefold by the definition you provided. Please criticize my answer directly. MrProperLawAndOrder (talk) 23:03, 8 June 2020 (UTC)
Okay, I understand. After looking at it again: You also caused the problem with Otto Keller (and I solved it). So to answer your original question: Yes, I am aware of the recent changes. No, they don’t change my position. --Emu (talk) 14:01, 9 June 2020 (UTC)
Emu, which "problem with Otto Keller"? MrProperLawAndOrder (talk) 17:34, 9 June 2020 (UTC)
Otto Keller (Q95342132) was incorrectly merged with Otto Keller (Q2039491) --Emu (talk) 21:06, 9 June 2020 (UTC)
@Emu: thank you, added to Property talk:P227#VIAF batch merge using QS. So, I am happy that out of the three errors two were mine, it means fewer different causes. MrProperLawAndOrder (talk) 00:04, 10 June 2020 (UTC)
@MrProperLawAndOrder: For the GND type WD had P107 (P107). This property was deleted. Also properties for GNDName, GNDCheck und REMARK are missing. --Kolja21 (talk) 20:49, 8 June 2020 (UTC)
@Kolja21: one could ask for undeletion, it is verifiable information from a high quality source. One could restrict the scope to qualifier, so it is attached to the GND, not the item. I don't know about the value of GNDname, Tn seems to be phased out. We should think about GNDCheck and REMARK. But that would be helpful for WD anyway. MrProperLawAndOrder (talk) 21:07, 8 June 2020 (UTC)
+1. The undeletion would be helpful but I don't want to fight this through. --Kolja21 (talk) 21:15, 8 June 2020 (UTC)

https vs http for formatter url

Is there a reason this property uses https for the formatter URL? If you visit https://d-nb.info/gnd/121191699 you will see that it is not the destination URL (that's a search link) but also not the given data permalink (which is http). --Reosarevok (talk) 15:25, 29 April 2021 (UTC)

Because that's the identity GND gave it. Look for the RDF of https://d-nb.info/gnd/121191699 : "<https://d-nb.info/gnd/121191699> a gndo:DifferentiatedPerson;". Multichill (talk) 15:37, 29 April 2021 (UTC)
According to the link data release change notes from March 2019 those were deliberate changes toward https in late 2019. In their dumps, as seen above, they already use https, while the displayed URIs have not changed. By accessing them through http, they behave the same, so it made sense to use the URI format from the dumps, that is supposed to be the new one. --CamelCaseNick (talk) 15:52, 29 April 2021 (UTC)

Value not in GND

2021-04-17 User:M2k~dewiki did harvesting https://www.wikidata.org/w/index.php?title=Q470177&diff=1403385624&oldid=1368147285 but the value doesn't exist in GND. Can a bot check each P227 value if it exists in the GND? Michael Montag (talk) 09:23, 4 June 2021 (UTC)

Hello @Michael Montag: there have been mass-deletions of VIAF-entries, which have been invalidated, in the past, see for example:

Maybe something similar could be done for other identifiers, like LCCN, GND, IMDb, WFb, .... ? --M2k~dewiki (talk) 18:21, 4 June 2021 (UTC)

Sounds good, thank you. Michael Montag (talk) 22:29, 4 June 2021 (UTC)

There had been a couple of bot runs already. More than 6.000 Tns have been removed, see Wikidata:WikiProject Authority control/Tn. Other Bots fixed Wikidata:Database reports/Constraint violations/P227#"Format" violations and duplicates. --Kolja21 (talk) 12:13, 5 June 2021 (UTC)

Data check

Data check - Wilhelm Heuser

  • Wilhelm Heuser, http://d-nb.info/gnd/1055107037 Prof., Lebensdaten: 1885-1970, Geburtsort: Radewege (Westhavelland), Sterbeort: Bayreuth, Agrarwissenschaftler
    • Q26350670 13 September 1885 Düsseldorf - 22 August 1956 de:Wilhelm Heuser (Politiker) Wilhelm Anton Heuser (* 13. September 1885 in Düsseldorf; † 22. August 1956 in Neuss) war ein deutscher Politiker der Zentrumspartei und der NSDAP, Beigeordneter, Bürgermeister und Oberbürgermeister von Sterkrade sowie Oberbürgermeister von Oberhausen. !!! I removed the GND 1055107037 from this item
    • Q94822325 - item created 2020-05-19 based on GND/DtBio

GEPRIS:

Heuser, Wilhelm
Personen-ID: 5104985
Prof. Dr. Wilhelm Heuser, * 3.4.1885 in Düsseldorf, † 17.3.1970 in Bayreuth
Deutscher Pflanzenzüchter und Politiker (Zentrum, NSDAP)
Wirkungsorte: Gorzów Wielkopolski; Melk
Wikipedia
Gemeinsame Normdatei (GND) 1055107037
Deutsche Biographie
WIKIDATA Q26350670

Wikipedia-Link leads to de:Wilhelm Heuser, a disambiguation page since 2020-03-29 [6]

HumanAFuser (talk) 10:10, 11 June 2021 (UTC)

Do I understand the problem correctly that GEPRIS 5104985 is conflated? --Emu (talk) 10:51, 11 June 2021 (UTC)
Or the others missed some facts. HumanAFuser (talk) 14:57, 11 June 2021 (UTC)
WBIS seems to be conflated, too. I will write them an email. --Emu (talk) 15:28, 11 June 2021 (UTC)

Data check - Joseph Linde von Linden

https://d-nb.info/gnd/136722369 based on https://www.deutsche-biographie.de/pnd136722369.html + 1810 found on

https://de.wikisource.org/wiki/BLK%C3%96%3ALinde_Freiherr_von_Linden%2C_Joseph "Linde Freiherr von Linden, Joseph (k. k. General-Major und Ritter des Maria Theresien-Ordens, geb. zu Münster im Jahre 1728, gest. zu Wien 16. November 1804)" + " Im Jahre 1794 trat der 66jährige Veteran nach 47jähriger Dienstzeit als General-Major in den Ruhestand, den er noch zehn Jahre genoß und dann als Greis von 76 Jahren zu Wien starb."

  1. Is BLKÖ wrong and DtBio did "+1810" on purpose?
  2. Or did DtBio make a mistake?
  3. or are these two General-Majors?

HumanAFuser (talk) 14:57, 11 June 2021 (UTC)

Same person, merged. BLKÖ is right, NDB is clearly referencing BLKÖ („Wurzbach“) but is wrong about the death year. --Emu (talk) 16:28, 11 June 2021 (UTC)

Data check - Erika Hanel


  • Q59653094
    • created 2018 by Reinheitsgebot "from catalog 2060" and description "Geburtsdatum:07.12.1916; Geburtsort:Wien; Sterbedatum:18.10.1965; Sterbeort:Wien; Geschlecht:weiblich; Beruf:Schriftstellerin, Journalistin, Übersetzerin; GND:123636612"
    • 07:57, 14 April 2021‎ AxelCorti talk contribs‎ 10,388 bytes 0‎ ‎Changed claim: GND ID (P227): 116445831 undothank Tag: new editor changing statement (restore)
    • 22:41, 30 December 2020‎ Emu talk contribs‎ 10,388 bytes +1,437‎ ‎Created claim: Vienna History Wiki ID (P7842): 36845, add Vienna History Wiki ID (P7842) based on MnM matching (details) undothank Tag: OpenRefine [3.4] (restore)
  • Q95696729 created 2020-05-29 based on GND/DtBio 116445831

Revert Q59653094 to GND 123636612 and remove Vienna connection? @Emu, AxelCorti: HumanAFuser (talk) 15:16, 11 June 2021 (UTC)

@HumanAFuser: That’s a tricky one. Items Q59653094 and Q95696729 and GND 116445831 clearly refer to Austrian jurist (Dr.iur.), journalist and (above all) P.E.N. club functionary Erika Hanel (1916–1965). It’s unclear who GND 123636612 is about: At was created by the Bavarian State Library on 2002-03-08 and modified 2008-04-06. Name, birth year and profession match, but then again there’s “Dt. Juristin”. I’m pretty sure that the Bavarian State Library just assumed that Hanel had to be German because she published in 1952 in the Munich-base Droemer publishing house. So in the end I’m pretty sure that GND 123636612 also refers to the same person. I will do some additional research in the next time. --Emu (talk)
@Emu:, Q59653094 was 123636612, nothing "Dr. iur." there. HumanAFuser (talk) 17:17, 11 June 2021 (UTC)
@HumanAFuser: But Q59653094 was first and foremost created for this article from the Vienna History Wiki, MnM catalogue 2060. (They link to GNDs as well, not always the right ones.) And this person was – among other things like translator, writer and functionary – a “Dr.iur.” according to both WBIS and several books on the literature in Austria in die 50s and early 60s. --Emu (talk) 17:34, 11 June 2021 (UTC)
@Emu:, thanks for the pointer that it was created for/from https://www.geschichtewiki.wien.gv.at/index.php?curid=36845 . Maybe it should be reported to GND and the GND items improved and maybe both merged? I leave this to you, ok? HumanAFuser (talk) 17:44, 11 June 2021 (UTC)
It’s pretty pointless, the DNB will just respond that they have too little information and the BSB will probably not respond at all. At least that’s my view, maybe @Kolja21: has a different opinion? --Emu (talk) 17:54, 11 June 2021 (UTC)
@Emu, HumanAFuser: Both GNDs refer to the same person: Dr. Erika Hanel, Sekretärin des Pen-Clubs Wien. I've filed a report, since DNB has a publication of her: de:Wikipedia:GND/Fehlermeldung/Juni 2021#Todesjahr nach 1850. --Kolja21 (talk) 18:38, 11 June 2021 (UTC)
@Kolja21: Meine Befürchtung zielt etwas darauf, dass die DNB sagen könnte, dass unklar ist, dass das von der BSB und von dir aufgenommene Werk von der gleichen Person stammt. Aber in dem Fall habe ich gerne unrecht :-) --Emu (talk) 19:13, 11 June 2021 (UTC)

Data check - Wessely, Franx/Franz Xaver

@Kolja21, Emu: can you help? HumanAFuser (talk) 19:45, 11 June 2021 (UTC)

I've merged and added:
--Kolja21 (talk) 20:58, 11 June 2021 (UTC)
Other merge: Franz Wesselý (Q95136470) > high school teacher --Emu (talk) 21:22, 11 June 2021 (UTC)
Another namesake: Dr. iur. František Xaver Veselý (Q95151669) (1869-1900), legal scholar ≠ Q94988959. --Kolja21 (talk) 22:21, 11 June 2021 (UTC)

@Emu: Kannst du noch mal einen Blick auf Franz Wessely (Q94987830), contributing editor, werfen? In s:de:BLKÖ:Wesselý, Fr. war er mit GND 1073521591 erfasst → (jetzt) Franz Wessely (Q107195996), bookseller. Ich habe keinen Nachweis gefunden, dass die beiden Personen identisch sind, aber möglich wäre es (Stichwort: Buchhändler-Correspondenz). --Kolja21 (talk) 21:20, 11 June 2021 (UTC)

@Kolja21: Bin schon dabei – das wäre meine Vermutung, vielleicht finde ich was in den Matriken und Zeitungen. --Emu (talk) 21:22, 11 June 2021 (UTC)
@Kolja21, HumanAFuser: Gleiche Person, ich habe einen Artikel auf regiowiki:Franz Wessely angelegt. --Emu (talk) 18:00, 12 June 2021 (UTC)

Data check - Fernando Niño

http://d-nb.info/gnd/118999818 Niño de Guevara, Fernando
Quelle	M/Reg., Enc. univ., LoC-NA
Zeit	Lebensdaten: -1552 (Sterbedatum nach LoC-NA)
Lebensdaten: ca. 2. Hälfte 16. Jh.
Land	Spanien (XA-ES)
Weitere Angaben	Span. Kardinal, Grossinquisitor, Erzbischof von Sevilla
  • Fernando Niño de Guevara (Q1392508) (1541-1609) Cardinal, Archbishop of Seville and Spanish Grand Inquisitor
  • Fernando Niño (Q5859989) [+1552] Archbishop of Granada and Patriarch of the West Indies

@Kolja21, Emu: GND conflated? Create a new GND for one of the two? HumanAFuser (talk) 23:07, 11 June 2021 (UTC)

GND 118999818 ( -1552; Sterbedatum nach LoC-NA) = Fernando Niño de Guevara (Q1392508). I suspect the source LCAuth n81147685 has been modified:
  • Niño de Guevara, Fernando, 1541-1609
  • Variant: Niño de Guevara, Fernando, d. 1552
I will leave a note at de:WP:GND/F. --Kolja21 (talk) 23:32, 11 June 2021 (UTC)

Data check - Karl Freiherr Pergler von Perglas

GND 116147407 on each:

There is only one other Carl/Karl in GND: Pergler von Perglas, Carl August 1783-1843 / Regierungsrat http://d-nb.info/gnd/1045616362 ( https://www.geni.com/people/KARL-August-Freiherr-Pergler-von-Perglas/6000000097093020853 ) .

@Mfchris84, Emu, Kolja21: Create new GND for BLKÖ? Or are both the same? I will remove 116147407 from BLKÖ-Carl, added 21 July 2020‎ by Mfchris84. HumanAFuser (talk) 11:28, 12 June 2021 (UTC)

I've added different from (P1889) since both persons are exactly described (b. 1793 Katzengrün in Böhmen vs. b. 1800 München). --Kolja21 (talk) 11:54, 12 June 2021 (UTC)
PS: Problem already known, see de:Karl Pergler von Perglas (disambiguation). --Kolja21 (talk) 11:58, 12 June 2021 (UTC)
@Kolja21: could you create a new GND based on BLKÖ? That should be a sufficient source for GND? So, it would better prevent another person from adding the other GND again, and BLKÖ gets better GND coverage. HumanAFuser (talk) 12:03, 12 June 2021 (UTC)
  OK GND 1235284883. --Kolja21 (talk) 12:12, 12 June 2021 (UTC)

Data check - Anton Kaschnitz

s:de:BLKÖ:Kaschnitz zu Weinberg, Joseph Ritter und Anton Valentin Freiherr is for father Josef Q55853227 (http://d-nb.info/gnd/136992609) and son Anton Q93585262, the latter wrongly had the same GND ID as the father. @Kolja21: could you create GND for Anton to prevent re-import by bots, maybe due to the mixed BLKÖ article? HumanAFuser (talk) 17:38, 12 June 2021 (UTC)

  OK GND 1235299236. Author of Praktische Bemerkungen und Anleitung zur Veredlung der Schaafzucht in Galizien (1805). --Kolja21 (talk) 20:41, 12 June 2021 (UTC)

Data check - Gerontius

http://d-nb.info/gnd/102394075 Gerontius, Hierosolymitanus, ... Jerusalem, Gerontius, Archimandrita

@Kolja21, Epìdosis, Emu: I merged, but then saw that Epìdosis created the newer item on purpose, removing dewiki from the older, so I reverted myself. GND conflated? From current data I tend to say they are the same. HumanAFuser (talk) 10:26, 13 June 2021 (UTC)

Archimandrite vs. artist. @Lantus: Ich vermute, die Normdaten, die du für den Bildschnitzer eingefügt hast, passen nicht. --Kolja21 (talk) 13:09, 13 June 2021 (UTC)

More than one item with same GND

  • 2020-05-31 04:30 : 10 [7]
  • 2021-06-10 08:23 : 463 [8]
  • 2021-06-11 21:25 : 115 [9]
  • 2021-06-12 20:06 : 20 [10]
  • 2021-06-13 11:33 : 7 [11]

HumanAFuser (talk) 21:32, 11 June 2021 (UTC) , @Kolja21, Emu: 20 ... 1) some are tricky, and I left them for later review when I tried to make fast clean-up. 2) some are "said to be the same" 3) maybe some are easy, but I need a break now. HumanAFuser (talk) 20:08, 12 June 2021 (UTC)

@Kolja21: a lot of old cases, created by batch operations, (starting Q94.. and Q95..) but some people created items by hand and added a GND. They should see a warning? And merge? Maybe they don't know what to do? And now: you also did it [12] - is there some bug in the software, so people don't get a notification? Or maybe here just made a mistake, normally merge? I am curious if the software could be improved? Or a bot notifies people if they didn't merge after 48 hours or something like that? HumanAFuser (talk) 21:51, 11 June 2021 (UTC)

Encouraging people to merge items isn’t a very good idea. Duplicates aren’t good, but they can be fixed in time – there’s really not that much hurry. On the other hand: Wrong merges are often catastrophic and may take hours to untangle. So it makes sense that merging is largely an activity reserved to users with some experience. --Emu (talk) 22:00, 11 June 2021 (UTC)
"in time" - duplicates prevent some people (like me) from enriching items, because they don't know where to put new informtion. It also can result in VIAF having sources more distributed between clusters. GND itself has lots of duplicates, maybe thousands recorded in WD (unique value violations). "and may take hours to untangle" - also a software limitation, there is no button "undo merge". HumanAFuser (talk) 23:00, 11 June 2021 (UTC)
@HumanAFuser: That’s not a software limitation per se. Unmerging is easy if done swiftly (just hit the undo button twice). It gets messy (and indeed catastrophic) if done after some time when redirects are fixed by bots, other users make their contributions and sometimes even other sources rely on Wikidata to add to their own databases. You would need cutting edge AI technology to fix this – and if we had that kind of technology, editing Wikidata as we know it now would cease to exist. (Bei dieser Gelegenheit: Ich vermute, wir sprechen alle Deutsch – können wir diese Konversation nicht einfach auf Deutsch führen?) --Emu (talk) 18:07, 12 June 2021 (UTC)
The Gustav Jacoby duplicate (Q97453671) is strange. Usually you get a warning if a GND is entered twice. Maybe there was a time lag. --Kolja21 (talk) 23:42, 11 June 2021 (UTC)

Hedwig Storch (admin) added a new human with several edits, the last edit by her, done 2020-09-09, added the GND ID [13]. Afterwards no more edits by her on the item, but some edits by others. HumanAFuser (talk) 12:00, 12 June 2021 (UTC)

Bot for removing Tn

Can a bot remove all remaining Tn, like I manually removed at https://www.wikidata.org/w/index.php?title=Q88075428&diff=1442732027&oldid=1344349454 ? HumanAFuser (talk) 09:15, 16 June 2021 (UTC)

Yes, this has been done since 2015, see Wikidata:WikiProject Authority control/Tn. There are a few cases where an invalid GND (like a Tn) is used as identifier in other projects. These cases must be left out. @HumanAFuser: Do you know how many Tns are left in Wikidata? VIAF has deleted most of them. --Kolja21 (talk) 15:08, 16 June 2021 (UTC)
@Kolja21, Emu, Epìdosis: I have no idea how many Tn exist in WD. But since they are no longer part of the GND they should IMO be removed from P227 to reduce maintenance cost. Some projects may never remove the Tn - but WD cannot not even reliably should which values are Tn and which not. Maybe create another property GND-Tn-ID for those users interested in Tn. I am not aware that VIAF stores any Tn. HumanAFuser (talk) 12:49, 17 June 2021 (UTC)
VIAF has stored Tns until some months ago, so many have been imported from VIAF in the last months. Personally agree about removing (or, as a secondary option, deprecating them). --Epìdosis 13:07, 17 June 2021 (UTC)
Tns should be removed (and have been removed) - but only if no source is given. --Kolja21 (talk) 20:06, 17 June 2021 (UTC)

@Kolja21, Epìdosis: while cleaning up the 1400+ VIAF duplicates in Q94/Q95 range I find more Tn, e.g. [14]. Is there any user that can remove all Tn from P227? Is there a SPARQL to see those that have a source given, as Kolja21 says? Kolja21, do you have an example? HumanAFuser (talk) 12:09, 18 June 2021 (UTC)

@HumanAFuser: Imho you should not remove links to DtBio only because the link contains an invalid GND. We need to keep these IDs. Please add in this case URL source https://www.deutsche-biographie.de/sfz019289-5.html and mark GND as deprecated.[15] --Kolja21 (talk) 12:23, 18 June 2021 (UTC)
@Kolja21: wanted to revert my edit, but you already changed. I will stop this Tn-related work. It looks like waste of time to work on this manually. So DtBio is not only storing Tp but also Tn, where for the latter some links on the page are broken by design http://viaf.org/viaf/sourceID/DNB%7C10211207X and the others may be broken at any moment where the third party removes the Tn. HumanAFuser (talk) 12:34, 18 June 2021 (UTC)
Ever database using GND contains incorrect IDs: Tns, typos, and transposed digits (Zahlendreher). It's not a waste of time to work on these cases manually. It's the first step to correct them. --Kolja21 (talk) 13:50, 18 June 2021 (UTC)
That is not correct. There are tools for this. Software can do it. HumanAFuser (talk) 13:50, 19 June 2021 (UTC)

Storing Tn deprecated in P227

https://www.wikidata.org/w/index.php?title=Q94778860&diff=1444688173&oldid=1444687091 why? Some use was claimed for P7902, but why on P227? HumanAFuser (talk) 13:52, 19 June 2021 (UTC)

Because: https://www.deutsche-biographie.de/pnd105367451.html. If a incorrect identifier value (Q54975531) is still in use we need this information. --Kolja21 (talk) 22:02, 19 June 2021 (UTC)
@Kolja21, Emu, Epìdosis: But then it should be stored in P7902? Why would DtBio management force WD to store Tn in P227? Are there other websites that have that leverage? And for DtBio, is this Tn-info stored in P7902 too, i.e. it has to be done twice? And all this has is done mostly or exclusively manually as Tn are out of VIAF and GND LDS control - who is actually checking the data? If in DtBio the Tn for Q94778860 is deleted and this deletion is also done in P7902 how does one know that the Tn-marking in P227 was due to DtBio/P7902 and act accordingly, e.g. delete the Tn in P227? How about restricting Tn-related problems to P7902, so people managing P227 can more easily use tools and bots? HumanAFuser (talk) 08:52, 20 June 2021 (UTC)
@HumanAFuser: Wenn eine fehlerhafte GND im Umlauf ist, sollte sie unter P227 vermerkt sein. (Das betrifft nicht nur DtBio, sondern auch das Historical Dictionary of Switzerland (Q642074) und andere Nachschlagewerke.) Genau dafür gibt es die Ränge und den Vermerk incorrect identifier value (Q54975531). Wenn man eine ID löscht, wird schlimmstenfalls ein neues Objekt angelegt oder, bestenfalls, schaue ich oder jemand anderes noch einmal nach, warum die ID nicht in Wikidata eingetragen wurde. Es ist der gleiche Fall wie bei einem falschen Geburtsjahr. Wenn eine Person unter dem Geburtsjahr 1755 bekannt ist, wird dieses Datum festgehalten, auch wenn sie 1765 geboren wurde. Anderenfalls müsste man jedes Mal, wenn der Unterschied im Geburtsjahr auffällt, neu recherchieren, ob es sich um die gleiche Person handelt. --Kolja21 (talk) 12:51, 20 June 2021 (UTC)

Bot for resolving redirects

@Emu, Kolja21, Epìdosis: 2020-09-13 a VIAF redirect was resolved [16], but the underlying GND ID was not changed, remains a redirect until today. That is bad, since GND IDs help in detecting duplicates and are a basis for DtBio IDs which recently have been added to WD based on P227. On the aforementioned item Q94771869 the (new) DtBio ID was not added. Can a bot resolve the GND IDs? HumanAFuser (talk) 12:39, 17 June 2021 (UTC)

User:Ivan A. Krestinin could you also resolve GND redirects? HumanAFuser (talk) 12:51, 17 June 2021 (UTC)

@HumanAFuser: KrBot effectively used to resolve GND redirects, as VIAF redirects. However, about one year ago the bot was blocked because there were protests against the removal of these redirected IDs; as you can see #Removal of redirected IDs, users were split between support for deletion of redirected IDs and deprecation of redirected IDs; since Ivan said it was too difficult for him to set the bot for the deprecation (instead of deletion) of redirected IDs and since there was no consensus for the deletion of redirected IDs, in the last months (from June 2020) redirected IDs have been left untouched. --Epìdosis 13:12, 17 June 2021 (UTC)
@Emu, Kolja21, Epìdosis, Ivan A. Krestinin: WD doesn't seem to store all redirects, only storing some is confusing. At least the bot could add the redirect target as a new value to each item, if that value doesn't exist yet? HumanAFuser (talk) 13:33, 17 June 2021 (UTC)
I personally agree about preferring deletion to deprecation in cases of redirection; however, the above discussion and this stalled RfC seem to head towards the other solution. Anyway, I surely agree about this, if possible for Ivan: "add the redirect target as a new value to each item, if that value doesn't exist yet". --Epìdosis 13:38, 17 June 2021 (UTC)
The problem was that the bot was changing values even if a source was given. --Kolja21 (talk) 19:55, 17 June 2021 (UTC)
This is true too, unfortunately. --Epìdosis 20:55, 17 June 2021 (UTC)

Stuck without bot support

@Emu, Kolja21, Epìdosis, Ivan A. Krestinin: and now WD is stuck, no GND bot support but:

  1. 1000000+ GND items in WD
  2. a GND DB containing already nearly 500000 redirects
  3. DtBio based on GND
  4. items to come
    1. GND having ~4000000 further Tp not yet in WD
    2. DtBio having 200000+ GND IDs not in WD
  5. 1135 single value violations on DtBio Wikidata:Database_reports/Constraint_violations/P7902#"Single_value"_violations - for pseudonyms they may stay, but the rest probably needs merging in GND and then fixing the data in WD (if already merged in GND DB, then simply fixing in WD is needed)
  6. 1450+ VIAF duplicates related to Q94../Q95.. items where one item probably has GND https://w.wiki/3VzL
  7. VIAF missing
    1. 8018+ missing VIAF on DtBio items Wikidata:Database_reports/Constraint_violations/P7902#"Item_VIAF_ID_(P214)"_violations
    2. 15194 GND ID - type human should have VIAF Wikidata:Database_reports/Complex_constraint_violations/P227#GND_ID_-_type_human_should_have_VIAF
  8. sex missing
    1. 278 Wikidata:Database_reports/Complex_constraint_violations/P7902#Deutsche_Biographie_ID_-_type_human_should_have_sex // 2021-06-18 increased to 388 [17], nosex-SPARQL: https://w.wiki/3W7p
    2. no results or query error Wikidata:Database_reports/Complex_constraint_violations/P227#GND_ID_-_type_human_should_have_sex // must be query error, since for DtBio there are 278 already // 2021-06-18 3692 in SPARQL https://w.wiki/3W8C
  9. 3310 outdated/former DtBio GND IDs Property_talk:P7902#3310_IDs_in_WD_but_not_in_DtBio_source_as_preferred_ID
  10. missing de-label
    1. DtBio 1000+ [18] // 2021-06-18 no results / time out [19]
    2. P227 no results / timeout
  11. missing en-label
  12. Deutsche Biographie ID should be equal to GND ID // 3727 [20] // 2021-06-18 3759 [21]
  13. Tn still on items e.g. [22]

HumanAFuser (talk) 23:49, 17 June 2021 (UTC)

@HumanAFuser: Very good summary of the situation, thanks for this. I think 8.2 is a query timeout. I would suggest to solve 8.1 manually (instead of importing sex from GND), since in the past I have found a relevant number of GND IDs having wrong sex. 7 can be (mostly) solved through a bot-import from VIAF (searching in which cluster each GND is and adding it); 9 can similarly be solved by a bot - obviously, the two bots have to be programmed. 6 requires manual intervention. 5 can probably be solved in cooperation with GND. After these, we can reflect on how to deal with 4. Best, --Epìdosis 06:55, 18 June 2021 (UTC)
Another interesting piece of data: 4974 GND IDs added to Wikidata during May 2021, probably all of them manually. --Epìdosis 07:01, 18 June 2021 (UTC)

@Epìdosis: fixing 8.1 add P21 for 329 out of 388 https://quickstatements.toolforge.org/#/batch/56769 , added two nosex-SPARQLs: https://w.wiki/3W7p (DtBio) https://w.wiki/3W8C (GND) - the latter had a time out in the first run, had to try again, got "3692 results in 58780 ms" - maybe 1 min limit. But it would be more efficient if all/more "items to come" would be already there, and one does not have to do this in little chunks again and again. New items are created without sex - what percentage, I don't know. HumanAFuser (talk) 09:24, 18 June 2021 (UTC)

"1135 single value violations on DtBio ..." de:Wikipedia:GND/Fehlermeldung contains around 200 error messages per months + backlog of one year = about 2.000 errors (including duplicates) will be solved within the next months. --Kolja21 (talk) 10:20, 18 June 2021 (UTC)
Regarding 7: the number is higher, for 7.1 https://w.wiki/3WUT shows 9311 items as of now and for 7.2 https://w.wiki/3WUV shows 66644 items as of now. @Bargioni: is working on 7.1. --Epìdosis 13:59, 19 June 2021 (UTC)
@HumanAFuser, Kolja21, Emu: After @Bargioni:'s intervention point 7 has improved: 7.1 (https://w.wiki/3WUT) down to 716, 7.2 (https://w.wiki/3WUV) down to 58052; considering only instance of (P31)human (Q5) for 7.2, https://w.wiki/3WYu shows 5463. Good evening, --Epìdosis 19:36, 19 June 2021 (UTC)

2020 GND LDS data - VIAF and ISNI not matching current situation

One example below, saw it before with other items, but didn't take note:

HumanAFuser (talk) 11:04, 20 June 2021 (UTC)

Tn deleted from DtBio but stored in P227

Cf. [23]. @Kolja21, Emu: can you delete Tn everywhere, where you stored them in P227 because they were found in P7902 in case you cannot find another source comfirming that usage? HumanAFuser (talk) 22:19, 21 June 2021 (UTC)

Ich sehe nicht die Dringlichkeit, nicht zuletzt weil mir gar nicht so klar scheint, wie es bei Deutsche Biographie (GND) ID (P7902) weitergeht. Wir haben bisher eine vage Andeutung von dir (“Today I got information, that DtBio is in process of deleting Tn”) und einige gelöschte Werte – wir wissen nicht, ob das von Dauer ist. Das ist für mich keine Basis, um größere Aktionen zu starten.
Und um es ganz deutlich zu sagen: Mir missfällt es, dass ich hier und schon mehrfach den Eindruck hatte, dass du hier anderen Benutzern Hausaufgaben erteilen möchtest. --Emu (talk) 22:33, 21 June 2021 (UTC)

New GND human related duplicates - strategy for prevention

2021-06-10 there were 463 GND human duplicates [24]. I worked on cleaning that up to a quantity below 10. Now I have found the following new GND human related duplicates:

Item1 Item2 GND Date Diff
Leila von Meister / Q95207486 Leila Trapmann / Q106173123 116871180 2021-06-17 [25]
Christoph Wagner / Q87821925 Christian Wolff / Q107300469 137859953 2021-06-20 [26]
Heidrun Stein-Kecks / Q1594436 Heidrun Stein-Kecks / Q107300471 129728284 2021-06-20 [27]

There might well be again 400+ after one year. What can be done against this? The state of the GND managed by Deutsche Nationalbiblithek is suboptimal anyway, they store obvious human duplicates and it is not visible they have any advanced strategy to reduce the amount nor to prevent creation of new duplicates. To some extend the humans in P227 are managed better, but to some extent worse. One part is that people add the same GND ID to different items. Any ideas how to reduce the amount? HumanAFuser (talk) 09:19, 22 June 2021 (UTC)

Die GND ist eine gemeinsame Normdatei, in der mehrere Datenbanken zusammengeführt wurden. Es gibt entsprechend zahlreiche Dubletten, die nach und nach abgearbeitet werden. Neue kommen, abgesehen von Fehlern einzelner Bibliothekare, nicht mehr hinzu. Wenn hier in WD Objekte doppelt angelegt werden, ist das nicht Schuld der Normdateien; im Gegenteil, die Normdaten helfen, solche Dopplungen aufzuspüren. --Kolja21 (talk) 18:28, 22 June 2021 (UTC)
I would suggest the following: the first step is always to clean the existing problems, mainly pointed out by constraint-violations (which has now been done for unique-constraint-violations of P227); the second step is to periodically monitor new problems, through constraint-violations, and analyse how they generate and how they can be dealt with. Causes are mainly two: new users, which create new items and add GNDs and don't recognise constraint violations (in this case, we can send messages to them explaining the importance of constraints, so that they will merge duplicates if they mistakenly create them); massive imports, which tend to generate some duplicates (in this case, we can send messages to the importers and ask them both to be more careful and to help in cleaning the duplicates). Obviously keeping low the number of constraint-violations makes it easier to deal with new cases with the time necessary to educate single users. --Epìdosis 19:24, 22 June 2021 (UTC)

Value marked as redirect but target value not stated

[28] As of today https://d-nb.info/gnd/1073396460 redirects to http://d-nb.info/gnd/174358903 . HumanAFuser (talk) 22:33, 25 June 2021 (UTC)

It’s a wiki. Just fix it and stop creating pointless discussions. --Emu (talk) 22:54, 25 June 2021 (UTC)
+1. @HumanAFuser: Außerdem habe ich dir schon mehrmals erklärt, warum IDs, die Weiterleitungen oder ungültig sind, nicht einfach gelöscht werden. Und bevor du neue Fragen stellst oder sinnlose Diskussionen startest, bitte beantworte erstmal die Fragen, die dir hier gestellt wurden. --Kolja21 (talk) 20:30, 26 June 2021 (UTC)

Value marked as incorrect but is a redirect

2021-06-18 value marked as incorrect but is a redirect [29] HumanAFuser (talk) 10:44, 26 June 2021 (UTC)

It’s a wiki. Just fix it and stop creating pointless discussions. --Emu (talk) 11:45, 26 June 2021 (UTC)
Return to "P227/Archive 2" page.