About this board

Previous discussion was archived at User talk:Emu/Archive 1 on 2020-09-28.

Suggestions for importing IDs from VIAF

7
Epìdosis (talkcontribs)

Hi (and also @Kolja21:! In the last about two months I have started a massive cleaning of unique-value constraint violations for many authority IDs, mainly on human items, through Listeria lists; in the coordination page, Wikidata:WikiProject Duplicates/VIAF members, I have added the lists I'm using (I am still far from finishing, despite having solved at least 3.5k cases) and some brief suggestions for dealing with these cases.

Now I have also collected some best practices as reference for future imports of IDs from VIAF, so that we have a good starting point for future discussions (there will be some, surely); of course the table can be enlarged, so I would like you to add other lines if you have other ideas of possible good filters for reducing the error percentage :)

Kolja21 (talkcontribs)

@Epidos: Thanks for your great work! I don't know how this fits to the list but there is one problem that should be mentioned: Don't add VIAF as the source for of a statement (year of birth etc.), try to name the original authority ID. This makes checking much easier. There are still persons "born 1950" based on VIAF "born 20th century".

Epìdosis (talkcontribs)

I know well, and in many cases these persons saw 1950 in Google Knowledge Graph and lamented that on Twitter without knowing that it depended on Wikidata (see my last comment in Wikidata:Requests for permissions/Bot/FischBot 8) ... and it also happens, in a fewer cases, with 1900 and 1850 (and very rarely for prior dates).

So I greatly agree with your point, I always say it when I teach in courses and for these reason we didn't apply UseAsRef icon to VIAF ID (P214) (and ISNI (P213)). Although it's not immediately related to "importing IDs from VIAF" (but to "importing statements"), I was happy to add it to the table. Thanks for suggesting!

Emu (talkcontribs)

Thank you for your work, I have already noticed that you corrected some of my errors of the past! This page is great and will hopefully help us move forward towards higher data quality. Special thanks for channeling my BboberBot rants into something productive. A few remarks:

  • Help:Conflation doesn’t really reflect current practice (and somehow contradicts the proposed action on the page), I’m not sure if we should link this essay.
  • This is a controversial issue (Kolja21 and others have a different opinion), but deprecation of wrong identifiers is a somewhat mixed bag. Since VIAF doesn’t seem to understand ranks, this effectively cements wrong VIAF clusters. I know that it’s highly problematic to bow to VIAF’s quirky ways but … well, good clusters are of value as well.
  • different from (P1889) should probably be mentioned for people that are easy to confuse or are indeed conflated in authority files or VIAF:
  • We might want to add additional guidance about filing error reports – error-report URL or e-mail (P10923) is a good start but there’s a lot of tacit knowledge that some day should probably be written down.
  • The same is true for tricks to force VIAF to reevaluate a cluster (e.g. enter books into Wikidata).

--Emu (talk) 22:07, 21 September 2022 (UTC)

Kolja21 (talkcontribs)

@Emu: I totally agree that "deprecation of wrong identifiers is a somewhat mixed bag". That's why I think it's not wise to keep wrong or outdated IDs in general. It depends on the individual case. Often there is not even a source given.

Epìdosis (talkcontribs)

OK, thanks for your remarks!

  • I agree, Help:Conflation (I had read it only quickly) is not of much help and is partly outdated; I see we also have Help:Split an item (which seems OK) and Help:Conflation of two people (which is OK but ... it is good for items which are born as conflations and contain many statements about person X and many statements about person Y; while in most cases the item has 95% statements about person X and only a few about person Y, and usually person Y already has an item, so no need to create neither one new item nor two new items; this case should probably be added to the page). So I would link Help:Split an item as main guidance and I would edit thoroughly Help:Conflation and Help:Conflation of two people, if you agree
  • I perfectly agree that deprecation has issues because some (many?) external data reusers, most importantly VIAF, haven't understood (yet) rank differences; for this reason I try to reduce the use of deprecation to the minimum, removing outdated IDs (+1 Kolja!) and also less useful deprecations (typical case: an ID deprecated with reason for deprecated rank (P2241) applies to other person (Q35773207) can be removed, if it is present in the correct item; if it will be wrongly readded, it will be spotted through the constraint violation); I have added this indication and I could enlarge it (but I wouldn't add a suggestion to remove outdated IDs because the RFC is still ongoing, although a bit stalled)
  • different from (P1889) is greatly useful, of course; added
  • already mentioned error-report URL or e-mail (P10923); we should probably start writing down somewhere our additional tacit knowledge :)
  • right (I've never tried it personally, but I remember that it can work), added
Kolja21 (talkcontribs)

Three addtions (concerning Wikidata in general):

  • Unsourced data: More attention should be paid to sources. Unsourced statements are only a kind of pre-information and can be replaced by a sourced statement. Printed standard works cited in Wikipedia are almost completely absent.
  • Ranks: Not only external data reusers can't read ranks. Maintenance lists (constraint violations) need to be improved.
  • VIAF: Bots and Tools like authority control.js are still not able to read VIAF correctly (IDs with dashes and letters are interpreted wrong).
Reply to "Suggestions for importing IDs from VIAF"
Угрожаемого положения (talkcontribs)

Hi Emu,


I'm based in Morocco and can tell you, that there has been quite an affair about Mr. Govrin.

Emu (talkcontribs)
Угрожаемого положения (talkcontribs)

Yes, but what fits this activity?

Emu (talkcontribs)

I can’t think of a suitable property.

Reply to "ambassador Govrin"
Gymnicus (talkcontribs)

Hallo Emu. Falls es für dich zeitlich möglich ist, würde ich dich darum bitten, dass du dir mal bitte das Datenobjekt Antoine Beaubien (Q62065538) und die dazugehörigen Datenobjekte anzuschauen. Ich würde mich über deine Meinung und Einschätzung dazu freuen.

Emu (talkcontribs)

Ich nehme an, es geht um WD:N? Entsprechende Anträge werden nicht auf meiner Diskussionsseite gestellt …

Gymnicus (talkcontribs)

Ja, es geht um die Notabilität. Aber es geht aktuell nicht um einen Löschantrag. Ich habe dich ja nicht darum gebeten, das Datenobjekt und die zugehörigen Datenobjekte zu löschen. Ich habe dich um deine Meinung und deine Einschätzung gebeten. Allein gegen Windmühlen kämpfen ist nicht immer schön, deswegen wollte ich bei diesem Datenobjekt mit den vielen dazugehörigen Datenobjekten gerne wissen, ob du eine ähnliche Auffassung wie ich hast. Natürlich habe ich auch im den Hinterkopf, dass du dadurch einen möglichen zukünftigen Löschantrag auf Grund von einer Art Befangenheit nicht entscheiden kannst.

Emu (talkcontribs)

Wirkt nicht übertrieben notable, andererseits hat Chantal Raymond (Q87154088) einen Sitelink … sagen wir so, da gibt es vermutlich klarere Fälle von fehlender notability, um die man sich kümmern kann … --Emu (talk) 15:31, 11 September 2022 (UTC)

Gymnicus (talkcontribs)

Danke für den Hinweis. Ich hatte ganz übersehen, dass das Datenobjekt Chantal Raymond (Q87154088) einen Seitenlink hatte. Aber nachdem ich ihn mir angeschaut habe und dort keinerlei Quellen enthalten waren, habe ich den Artikel zum Schnelllöschen vorgeschlagen und diesen Wunsch wurde auch entsprochen. Damit erfüllt das Datenobjekt Chantal Raymond (Q87154088) jetzt nicht mehr den ersten Punkt der Notabilitätskriterien.

Reply to "Bitte um Einschätzung"
Yousiphh (talkcontribs)
Emu (talkcontribs)

@Yousiphh I don’t think this would be a good idea as the three items are about different concepts. Nargiz Aliyeva (Q18427819) is a person while the other two seem to link to subpages on az.wikibooks. --Emu (talk) 15:00, 2 September 2022 (UTC)

Reply to "Need help"

Jahreszahlen hinter der Description

3
S.v.Mering (talkcontribs)

Hallo Emu,

Sie haben einen meiner Edits rückgängig gemacht (mit dem Kommentar "keine Person gleichen Namens, überflüssige Jahresdaten entfernt"). Gibt es zu diesem Vorgehen ein "empfohlene Vorgehensweise"? Ich gebe auch bei eindeutigen Namen die Lebensdaten in Klammern an, weil diese Info dann direkt oben auf der Seite steht. Sonst muss man teilweise recht weit nach unten zu DOB & DOD. Da ich sehr viele Personen editiere, wäre es gut zu wissen, wenn das irgendwelchen community Empfehlungen widerspricht.

Vielen Dank und viele Grüße, ~~~~

Emu (talkcontribs)
S.v.Mering (talkcontribs)

Danke! Das hilft mir schon mal weiter.

Reply to "Jahreszahlen hinter der Description"
Justus Nussbaum (talkcontribs)

Gut und zugleich gar nicht gut, was dort jetzt passiert ist. Gut ist, dass die Film-Profi-Frau Julia Nika Neviandt jetzt den von ihr bevorzugten öffentlichen Namen auch bei Wikidata hat. Gar nicht gut hingegen ist, dass durch deine Verwendung des Merge-Skriptes meine Arbeits-Edits wohl noch im Endergebnis verarbeitet/vorhanden sind aber gänzlich ohne jede Dokumentation, dass ich sie beigetragen habe. Sämtliche Deskriptionen im Kopfteil (deutsch, englisch, französisch) zum Beispiel habe ich formuliert sowie weitere Einträge beigetragen. Nur kann das aus der jetztigen Versionshistorie niemand mehr erkennen. Könnte es sein, dass Du diese fiese Eigenart des Skriptes noch nie bemerkt oder kritisch zur Kenntnis genommen hast? Ist es Dir verständlich, dass ich diesen *Kollateralschaden* des Merge-Skriptes nicht begrüßenswert und akzeptabel finde? Gut, Du wirst jetzt wahrscheinlich darauf verweisen, Du hättest das Skript, das gebräuchlich ist, ja nur angewendet und nicht programmiert.

Ich hatte einen sehr guten Grund, die Löschung von "Monique" zu fordern, denn ich habe dieses Ergebnis bei Merge-Verfahren kommen sehen und wollte das vermeiden. Nebenbei mache ich darauf aufmerksam, dass es nirgendwo vorgesehen ist, mehr als ein paar Stichwörtchen als Lösch-Begründung anzugeben. Daher jetzt hier geschehen.

Eigentlicher Sinn dieser Eingabe ist indes eine Frage/Bitte an DIch. Kannst Du dir vorstellen, mich dabei zu unterstützen, wenn ich dies als Beispielfall für die Forderung nach einer gründlichen Überarbeitung des Merge-Skriptes einreiche? Dass danit die gesamte Versionshistorie vernichtet wird, kann unmöglich so bleiben! Zumindest für die Zukunft, wenn das Skript neukodiert/ in Ordnung gebracht ist, sollte die Versionshistorie und die Wertschätzung für die Beitragenden nicht mehr hintenrunterfallen! Da Du als Admin tieferen Einblick in die Strukturen hast, wäre ein sachdienlicher Hinweis, wen man wo sinnvollerweise dafür informell ansprechen sollte, ebenfalls willkommen. Auf gute Zusammenarbeit! -- ~~~~

Emu (talkcontribs)

@Justus Nussbaum Mir ist nicht ganz klar, was du meinst, ehrlich gesagt. Durch die History von Q95189626 und die History von Q113571129 ist die Entwicklung nachvollziehbar. Von wem ein einzelner Edit stammt, ist in der Regel nur bei pathologischen Fällen (Vandalismus, Fälschung, Urheberrechtsverletzung, Sockenpuppen, etc.) relevant und kann, wie gesagt, problemlos nachvollzogen werden. Eine Attribution in einem urheberrechtlich Sinn ist durch CC0 1.0 nicht nötig und auch nicht praktikabel. --Emu (talk) 15:46, 22 August 2022 (UTC)

Reply to "Julia Nika Neviandt (Q95189626)"
Uli Elch (talkcontribs)
Emu (talkcontribs)

@Uli Elch Ich habe die Merges rückgängig gemacht. --Emu (talk) 15:18, 18 August 2022 (UTC)

Uli Elch (talkcontribs)

Moin noch mal. Nun habe ich mich allerdings selbst vergurkt. Beim Versuch, "Compañia Boliviana de Aviacion" (Q112602404) wiederherzustellen, habe ich übersehen, dass ich schon "Compañía Boliviana de Aviación - BOA (Q113513486)" angelegt hatte. Jetzt würde ich dir gerne zähneknirschend die undankbare Aufgabe übergeben. diese beiden sinnvoll zusammenzulegen (mge). Sorry und vielen Dank! Viele Grüße

Emu (talkcontribs)

✓ Done

Reply to "Compania Boliviana ..."
Bookfence (talkcontribs)

Hi!

Maybe you can help me with he aforementioned article on Esperanto Wikipedia and its Wikidata element. There's a death date which shows 30 Nov 1876, although there's no source for it and I could not trace where the template borrows this piece of information. It is neither in the wikidata entry, nor in the Esperanto article when I open it for editing. On the Hungarian page (Hora János Alajos), there is a reliable source for 1877. Until we find a reliable document for the exact date, I want to remove the unsourced information from the Esperanto version and add just the year 1877, but I don't know how to. Thanks in advance.

Emu (talkcontribs)

@Bookfence It seems to be a problem with eo.wp’s handling of dates with year precision. The only solution seems to be to hardcode it in the eo.wp article, at least that’s my understanding of w:eo:Ŝablono:Informkesto homo. I performed the necessary edit but ideally eo.wp should work on their templates. --Emu (talk) 14:11, 18 August 2022 (UTC)

Reply to "János Alajos Hora"

knifflige Zusammenlegung

3
Summary by Gymnicus

Die Zusammenlegung der Datenobjekte wurde durchgeführt.

Gymnicus (talkcontribs)

Hallo Emu! Ich hätte hier eine etwas knifflige Zusammenlegung, um die ich dich bitten würde. Hier in Wikidata gibt es einmal das Datenobjekt Silke Schwager (Q3483888) und das Datenobjekt Silke Braun (Q7515240). Die beiden Datenobjekte behandeln dieselbe Person. Das einzige Problem an der Sache ist, dass beide Datenobjekte auch jeweils einen englischen Seitenlink haben. Aber bei den Seitenlink des Datenobjekt Silke Braun (Q7515240) handelt es sich nur um eine Weiterleitung, also könnte dieser vor dem Zusammenlegen entfernt werden. Vielen Dank schon mal im Voraus.

Emu (talkcontribs)
Gymnicus (talkcontribs)

Vielen Dank

Lisa-Maria Kellermayr: Duplication of information, should be modeled within the object

8
HarryNº2 (talkcontribs)

Hello Emu,

what are you saying, why are you deleting the information?

Emu (talkcontribs)

The information you entered isn’t information about Dr. Kellermayr (the subject) but rather about her place of death (P20) (the object). This means that there is an unneccesary duplication of information. This type of qualifiers make data curation really hard in the long run. --Emu (talk) 17:14, 5 August 2022 (UTC)

HarryNº2 (talkcontribs)

Some Wikimedia projects use this information in their infoboxes. Smaller Wikipedias in particular do not have an article on each location that they can link to it. Therefore, the information on P131 and P17 is necessary, also due to the changeable history of many countries and municipalities, whose borders have shifted continuously and will shift in the future. I see only advantages here. --HarryNº2 (talk) 17:26, 5 August 2022 (UTC)

Emu (talkcontribs)

No, it’s not necessary for the use case you mention, see for example w:ru:Келлермайр, Лиза-Мария. Yes, political divisions do change over time – and this information should be reflected within the items about places. I see no advantage and a lot of problems. --Emu (talk) 17:36, 5 August 2022 (UTC)

HarryNº2 (talkcontribs)

For many places, the information is not available at all. The information is even more important if the place of birth or the place of death is a certain street or hospital, for which there is no article in most Wikipedias. Therefore, your example is the exception, not the rule, especially for smaller Wikipedias. I also don't see any problems with data processing or data maintenance in Wikidata. --HarryNº2 (talk) 17:48, 5 August 2022 (UTC)

Emu (talkcontribs)

I’m not an expert in LUA programming, but your examples could probably all be fixed by additional code. Could you provide an example of a smaller Wikipedia with an article about Lisa-Maria Kellermayr (Q113344024) that really needs this duplication of information within the qualifier? If not, I consider your point to be pretty moot, to be honest.

HarryNº2 (talkcontribs)

For one thing, not every Wikipedia has an expert on LUA. For another, this is generally about the additional specification of P131 and P17, not just in the case of Lisa-Maria Kellermayr. If there was no general discussion about deleting this information from all records, please restore it. --HarryNº2 (talk) 18:03, 5 August 2022 (UTC)

Emu (talkcontribs)
Reply to "Lisa-Maria Kellermayr: Duplication of information, should be modeled within the object"