Wikidata talk:WikiProject Visual arts/Getty Vocabularies< Wikidata talk:WikiProject Visual arts
Wikidata as contributor to Getty VocabulariesEdit
@Vladimir Alexiev: Hi, I think the alignment with the Getty vocabularies by email I've practiced till now irregularly doesn't scale, thus I've written the following letter to send it to firstname.lastname@example.org. Since I came to mention you, I wanted to ask you I've you'd be okay with that. If you recommend any other changes I'll also be happy of course. The text follows, you can change it directly if you like. Thanks in advance! --Marsupium (talk) 18:38, 11 November 2018 (UTC)
Dear Sir or Madam,
In the last years I've had some email correspondence with Patricia and Robin regarding various corrections to AAT and ULAN and leading to some changes. The finding of the issues reported was possible through several methods of comparison with the database of Wikidata. Meanwhile the data and those methods have improved and the amount of issues found has grown and outnumbered the 10 records you recommend to report by email by far. The best visible collections of issues are those:
- <https://www.wikidata.org/wiki/Property_talk:P245#Error_reports> lists some 20 duplicates of persons I haven't reported yet;
- <https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P245#Single_value> currently has 94 groups of records of which some are corporate body duplicates, some already matched pairs and some false positives and
- <https://www.wikidata.org/wiki/Wikidata:Database_reports/Complex_constraint_violations/P245> currently has 550 cases where ULAN and Wikidata gender mismatch of which some will be wrong in Wikidata, but the bigger part wrong in ULAN.
Sometimes I've reported issues by email up to now. This takes some time to collect or filter and reformat the groups for me and probably it also causes work for you taking the data from the email to your internal format. With the grown number of issues now this does not seem to be the most practicable way.
That is why I would like ask for access to use the online contribution forms for the Getty vocabularies. I have some non-formal relationship to the LMU Munich Institute for Art History and the Zentralinstitut für Kunstgeschichte (ZIKG). If needed I could probably induce a relationship via one of those as a more institutional way. But the resource where input would actually come from is Wikidata. The "WikiProject Visual arts" is where work is coordinated with many volunteer participants including me who add and correct data. Thus, to me it seems to be best fitting to record Wikidata or the "WikiProject" as contributor of the data. Probably I'd be the only or main user directly reporting data back to your vocabularies, so there wouldn't be need to hand over a password, nevertheless if this would be possible within your policies this would perhaps be of advantage in the future. Vladimir Alexiev might be able to provide information as an insider of both projects.
All data that would be submitted is either self-evident – this applies to some of the duplicates – or checked using external sources including the Allgemeines Künstlerlexikon (or AKLONLINE), RKDartists and other printed and not printed sources. All references are included in the Wikidata database and can be submitted for all contributions.
The Getty vocabularies themselves are used heavily as references in Wikidata using the SPARQL endpoint and the human-readable pages and as a template for our ontology. Knowledge of the guidelines and structure of the GVP is deep. Fixing errors in your vocabularies is also of interest for the Wikidata database to confirm data by consistency with the more professionally maintained Getty vocabularies. All data in the Wikidata database is licensed under CC0.
I would much appreciate any reply if or how you think collaboration and data input from here might be possible, thanks in advance!
Yours sincerely, […]
@Marsupium : this is an excellent letter and of course you can mention my name. I can only add this fir your consideration (not for the letter)
- for big data submissions, Getty prefers mass submission through XML files (maybe also tabular). But I guess that does not pertain to constraint violations, which require individual checking and more intellectual work.
- Similar to WD, Getty insists on sources and contributors for each entry and label. Contributor could be simply Wikidata as you suggest (and that will have positive political value), but source should be specific for each case (more important for unclear cases). WD's reference mechanism will be very useful here.
- Thank you for your advice! I've made some minor changes taking into account especially your second comments and sent the above. Let's see what happens! :-) Regarding the input format I think I'll have to figure out what works best, but that might be easier if or once I can look into the internal structure of the GVP vocabularies. Up to now I didn't plan to do any big data submissions like any for new records, but in principle that might be interesting as well, let's see how difficult or easy all will be. Thanks again, --Marsupium (talk) 22:07, 21 November 2018 (UTC)