User:Magnus Manske/Date bot issues

Since about September 2017, I have had my bot User:Reinheitsgebot add (some) dates and/or (mostly) date references (date of birth (P569) and date of death (P570)) to a large number of items on Wikidata. The date information comes from my Mix'n'match tool, which in turn got it from various upstream (external, third-party) catalogs. I have been made aware of issues with some of those edits, so here is some detail on what's happening, and how we can fix issues together:

  • Mix'n'match is seeded with entries from third-party sites, either by myself, or other users doing imports. This includes a short description, from which I can extract birth/death dates.
  • The upstream (third-party) data might be wrong. Nothing I can do about that. We don't decide "The Truth"™ here, we report and give sources. If a specific upstream site is particularly bad or unsuitable, please let me know, and I will prevent the bot from using it. I have done so for some sites already
  • The short description could be generated wrongly. Again, if that happens a lot for a particular site, please let me know, and I can deactivated it in the bot, at least until it is fixed in Mix'n'match
  • The date extraction could have gone wrong. That has happened, and it is easy to fix, so please let me know which upstream site has issues
  • New birth/death dates are only added to an item by the bot if there either is no such date, or the existing date is of less precision, that is, Wikidata has a year, but the bot can add a precise day (with reference)
  • There has been some discussion about the reference format, which I have changed accordingly. However, it would be best if there were a consensus-agreed "best practices" page on how to set references for external sites, especially where we have a property (with or without URL pattern)
  • There has been some discussion about adding dates (or references to dates) which might be in the Julian calendar, but the calendar is not specified by the source. I have changed the bot to not add dates before the year 1584 any more, but again, there should be a "best practices" way to add dates of unknown calendar, maybe with a qualifier, or a "unknown calendar, presumed Gregorian" calendar item. Disallowing all bot edits on dates before 1923, as was suggested by someone, strikes me a ludicrously limiting
  • In October 2017, the bot has added 6,859 new dates to 5,500 entries, and 268,527 references to existing dates

Some bad or skewed edits have undoubtedly been done by my bot. Here are my views and proposals:

  • I'll be happy to fix any large-scale issue that is machine-accessible, e.g. "remove new dates added from a specific source"
  • I'll be happy to help in cleaning up some references (double references, one just URL, another the "proper" source), but that should probably wait until we have agreed on a "final format" for such references. This will likely go beyond my bot edits
  • I can not, possibly, fix errors if all you have to say "your bot made some errors" without further details, and especially without a pattern for such edits
  • Mistakes will happen, for both bot and human edits. I have taken great care to avoid mistakes in my bot edits, and keep improving the bot based on my own research, and your feedback, but demanding 100% correctness would mean ceasing all bot and human edits to Wikidata immediately. I find that not acceptable. Therefore, we must accept some level of mistakes, and try to minimize that level as much as humanly possible.

Adding data from, and references to, third-party sources is what we do. Bots can help with that. I look forward to constructive comments. --Magnus Manske (talk) 14:26, 31 October 2017 (UTC)

There are still quite a few bad entries like this from GND economists (de):--Masegand (talk) 17:49, 31 October 2017 (UTC)
Thanks, found and fixed 54 entries, looking for more now. --Magnus Manske (talk) 08:51, 1 November 2017 (UTC)
20 more down. That should be all from GND economists. --Magnus Manske (talk) 08:57, 1 November 2017 (UTC)
Thanks, now there is only a handful left like this one. [1]--11:51, 1 November 2017 (UTC)

Hi Magnus, I just want to give you a bit of encouragement in what you're doing! The first major task (coreferencing between authority files) is well underway, due to your magnificent MnM efforts. The next major task (how to ingest, reference, then fuse and reconcile data) is largely unexplored and your import of birth/death dates is a first step in that direction. Of course it will be difficult and hitches will be hit, but it's required innovative work. Cheers! --Vladimir Alexiev (talk) 14:01, 19 November 2017 (UTC)

Swedish National Encyclopedia / User:Reinheitsgebot

edit

Today a few dozen went wrong like this one [2].

--Masegand (talk) 17:00, 29 November 2017 (UTC)