Open main menu

Wikidata:Requests for permissions/Bot/VIAFbot/Meeting agenda


  • GND main types (entity). Do we add 'n'?
  • Record PND and GND differently?
    • en:Name Authority File (PND) = GND, type p
    • GDK and SWD are outdated and need to be checked (better ignore them)
  • Write only authorities, that don't have conflicts? Or all authorities, but with sources?
  • What is the source?
  • Programming details of the bot, link to source. got to put this on github shortly.
  • ISNI through VIAF map. What to do with multiple correspondences? I.e. for some VIAF there are multiple ISNIs.
  • Which of the remaining identifiers to add?
idMap = {'TYP': pywikibot.PropertyPage(wikidata, 'Property:P107'),
         'LCCN': pywikibot.PropertyPage(wikidata, 'Property:P244'),
         'VIAF':pywikibot.PropertyPage(wikidata, 'Property:P214'),
         'GND':pywikibot.PropertyPage(wikidata, 'Property:P227'), 
         'BNF':pywikibot.PropertyPage(wikidata, 'Property:P268'), 
         'SUDOC':pywikibot.PropertyPage(wikidata, 'Property:P269'), 


Thursday, 28 March 2013

  • 3pm PDT (California time)
  • 10pm UTC
  • IRC, Server: freenode, channel: #wikimedia-glam


  1. IRC   Support Kolja21 (talk) 18:45, 18 March 2013 (UTC)
    1. Ok, you get your way Kolja, We'll meet on freenode at #wikimedia-glam
  2. Skype
  3. Google Hangout
  4. Webex
  5. toll-free conference call

Meeting transcriptEdit

  • <notconfusing> hi Kolja21 , we'll start in 10 minutes ja?
  • <notconfusing> hi Shimgray
  • <shimgray> hi all
  • <shimgray> remind me of our plan?
  • <notconfusing> sure, we have an agenda
  • <notconfusing> shimgray, feel free to add, if there's something oyu want to talk about
  • <Kolja21> where is max?
  • <notconfusing> Kolja21, I am Max
  • You are now known as MaximilianKlein
  • <MaximilianKlein> lets just wait for 2 more minutes unitl its officially the hour, and then we'll start
  • royt ( has joined #wikimedia-glam
  • <Kolja21> ok
  • Aubrey (972f9851@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <MaximilianKlein> Hello Aubrey, hello royt
  • <royt> hi
  • <Aubrey> hi max
  • Aubrey is now known as Guest43223
  • <MaximilianKlein> Ok, it's 3pm (for me). Let
  • <Kolja21> any news?
  • <MaximilianKlein> s start, why doens't everyone who is here for the meeting introduce themselves, and thier interest in Authority control?
  • Mathias_WMDE ( has joined #wikimedia-glam
  • <Mathias_WMDE> hi
  • <shimgray> (I apologise if I fall offline, I'm running a bulk upload just now)
  • <MaximilianKlein> Anybody who has specific topics they want to discuss please put them on this page
  • <MaximilianKlein> Mathias_WMDE, hi, i was just saying this if you're here for the Authority Control meeting, just say what you'd like to see from Wikidata Authority control and add ot the agenda if neceesary
  • <royt> Roy Tennant, OCLC Research - I expect to lurk more than anything, but if I can contribute anything I will. We (librarians) like authority control.
  • <Kolja21> Kolja from Berlin. I helped last summer to mange the change from PND to GND in Wikipedia.
  • <Mathias_WMDE> MaximilianKlein: Thanks. I am more or less afk right now
  • <Mathias_WMDE> will be reading later
  • James_F|Away is now known as James_F
  • <MaximilianKlein> thanks, so that's a good opportunity to start, Kolja, you said you didn't want to import things that weren't GND Typ-P
  • <shimgray> Andrew Gray, WiR British Library (and actual librarian); worked on VIAF in enwiki and working with the BL ISNI group to get that out once it's ready
  • <Kolja21> Is OCLC/VIAF planning to change from en.Wikipedia to Wikidata? That would be great for international readers.
  • <MaximilianKlein> Kolja, yes that's the idea
  • <Guest43223> Aubrey, Italy. Interested in library related stuff, and everything book related in Wikidata
  • <shimgray> max: ...with some great big limits on it :-)
  • <Kolja21> GND Typ N = disambiguation. The same with VIAF "undifferentiated".
  • <Kolja21> All other GND Types are ok
  • <MaximilianKlein> I see, so if you look at
  • <Dmcdevit> Hi, I'm Dominic, former WiR at the US National Archives.
  • Guest43223 is now known as aubremcfato
  • <MaximilianKlein> at line 107, I listed all the types of Authority control from English, German, French and Italian Wikipedias. so we can import as much or as little as we like
  • <MaximilianKlein> Kolja21 do you know how much GND Typ = N there is?
  • <Kolja21> line 110/111: GKD and SWD are outdated
  • <MaximilianKlein> Does anyone have any thoughts about integrating undifferentiated Authority control?
  • jarekt (48422381@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <Kolja21> ... and GDK-V1 was just an experiment
  • <MaximilianKlein> hello jarekt, please introduce yourself and find the agenda at
  • <shimgray> here's my thought; we should go slow. only match the things we know are definitely correct.
  • <shimgray> which probably rules out undifferentiated
  • <shimgray> *and* probably means we need to be careful with enwiki VIAF, as we know that has a few-percent error rate
  • <Kolja21> "undifferentiated Authority control" are no authority control, just names
  • <MaximilianKlein> so one interesting thing about Wikidata is that we don't have to tell "the Truth" we just make claims with sources
  • <MaximilianKlein> so we could import everything, just noting who said what
  • <shimgray> yeah, the undifferentiated ones are just placeholders with no real information
  • <MaximilianKlein> ok, any objections to NOT IMPROTING GND Typ = N?
  • <shimgray> MaximilianKlein, we're in a weird situation there. no-source (taken from a Wikipedia contributor) is actually more reliable than VIAF-algorithm source
  • <shimgray> I agree to _not_ import it
  • <Kolja21> yes, but authority control - at least GND - doesn't change (or, if there is a change/merge there will be a redirect). "undifferentiated" is not stable and will be changed or deleted.
  • <shimgray> we don't lose anything by only importing some things now :-)
  • <MaximilianKlein> shimgray, good point
  • <MaximilianKlein> ps. this is a sample edited page from my very early code
  • <jarekt> Sorry for missing the beggining of the discussion. It is my first time on IRC. I mantain Commons templates and have some bots for copying authority control templates between wikipedias.
  • <shimgray> jarekt, oh, it's you who does those! awesome
  • <Kolja21> 113 WORLDCATID: do we need this? worldcat ID produces some very strange results.
  • <MaximilianKlein> jarket, we are currently discussing what to import, if you look at this code around line 111, you can see what stuff is possible to import
  • <shimgray> worldcatid is there for historic reasons, IIRC
  • <MaximilianKlein> yes, so all the things commented out i am not planning to import
  • jarekt has quit (Ping timeout: 245 seconds)
  • <shimgray> please import ORCID!
  • <shimgray> hmm, actually, wait a bit :-)
  • <MaximilianKlein> ok, any objections to importing ORCID data?
  • <MaximilianKlein> ok, objections to importing ORCID data
  • <shimgray> I'm wondering if it should go in as P213 or if we need an additional datatype
  • <MaximilianKlein> shimgray, can you create an ORCID property?
  • <shimgray> I can request one
  • <MaximilianKlein> Ok, excellent can you be point person for that?
  • jarekt (48422381@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <MaximilianKlein> next topic, if we find conflicts, i.e. two languages disagree, do we want to a) not write anything, or b) write both and note the source of the import?
  • <shimgray> what are you using as source? "german wikipedia", or...?
  • <jarekt> maybe put in some log and try to reconsile latter
  • <Kolja21> b - would be fine with me
  • <Kolja21> what you want to do with the duplicate VIAFs?
  • <shimgray> MaximilianKlein, b sounds good, but two things
  • <MaximilianKlein> ok jarekt, why do you say "add later", Kolja21 why do you say "write both with source"?
  • <shimgray> a) we want to log a report as well, so they can be chased up
  • <shimgray> b) we want to make sure OCLC-algorithm matches are marked as algorithm matches (and possibly wrong)
  • <shimgray> since they're definitively less reliable than human matching
  • <Kolja21> @Max: "why do you say"? I didn't say that.
  • <jarekt> I guess b would be fine if there is a way to chase records with multiple conflicting codes
  • <MaximilianKlein> Kolja21, I just meant that you voted for b) and that jarket voted for a) so I was trying to get each of your reasons to debate it out
  • <MaximilianKlein> but it seems like b) is good, if we write a log about where the conflicts exists
  • <shimgray> the thing is, there *are* cases where one person has two or three IDs assigned
  • <shimgray> either because of pseudonyms etc or because of explicit conflicts
  • <jarekt> BTW I did a lot of matchings between commons and wikipedias to add interwiki and I was using years of death and birth if both match and name mached than I called it a match
  • <MaximilianKlein> jarket, excellent
  • <Kolja21> @Max: Got it. We're working - as you know - on Wikipedia with maintenance categories. So it would be the same kind of work on Wikidata.
  • <Kolja21> Example:
  • <MaximilianKlein> Kolja21, do you feel like we should be doing maintenance categories on Wikidata?
  • <shimgray> jarekt, we had a lot of problems with this for OCLC/VIAF matching. lots of people match an author record (name, dates) to, eg, a politician or footballer
  • <Kolja21> For GND (only persons) we have since 2005 a feedback list:
  • <Kolja21> Maintenance categories on Wikidata? If this is possible, yes. But maybe there are other ways to create working lists. I only care for the main functions: See the conflicts and delete the entry if the conflict is solved (or leave a comment).
  • <jarekt> Can we have maintenance categories on wikidata?
  • pigsonthewing (5d61a896@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <MaximilianKlein> afaik wikidata doens't have categories
  • <MaximilianKlein> does it?
  • <shimgray> not as such
  • <pigsonthewing> Hi
  • <shimgray> but Magnus has written some report tools
  • <MaximilianKlein> pigsonthewing, please introduce yourself, and find the agenda at
  • pigsonthewing_ (5d61a896@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <MaximilianKlein> pigsonthewing, so far we've discussed that we are not going to import GND Typ = N, and that in cases of languages conflicting that its best to write import both lanaguages opinions and note the import as a source
  • <MaximilianKlein> pigsonthewing, if you have opinion on those topics please be free
  • <pigsonthewing_> Thanks - sorry I'm late; PC issues :-(
  • James_F is now known as James_F|Away
  • <MaximilianKlein> no prob, ok so we can make maintenance categories by writing report software
  • <MaximilianKlein> what would be the things we'd like to have reported?
  • <pigsonthewing_> In cases of conflict, could a note be written to a log, too?
  • <MaximilianKlein> the group seemed to think that was a good idea, what kind of log would we like? I can sort of imagine a very minimal webpage containing links. Any opinions about how to store those conflict logs?
  • <shimgray> dump it to a workpage on WD somewhere, yeah
  • jarekt has quit (Ping timeout: 245 seconds)
  • <pigsonthewing_> A page on, say, meta?
  • <shimgray> why not wikidata?
  • pigsonthewing has quit (Ping timeout: 245 seconds)
  • <pigsonthewing_> wikidata would work, too.
  • <Kolja21> A list like this would be fine: Benutzer:Gymel/VIAF-Zuordnungen
  • <Kolja21>
  • <MaximilianKlein> ok, any objections to storing logs on wikidata subpages?
  • <MaximilianKlein> ok, last topic, ISNI. ISNI isn't currently on any wikipedia,s but it has a wikidata property already
  • <pigsonthewing_> Kolja21 - we have a similar page on en.WP * <searching>
  • <MaximilianKlein> there exists a map between VIAF and ISNI, as ISNI used the VIAF database as a "basefile"
  • <shimgray> and ORCID is a subset of ISNI.
  • <pigsonthewing_>
  • jarekt (48422381@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <shimgray> I am going to start talking to the ISNI group about getting them into Wikidata
  • <aubremcfato> @MAx when are you planning to run the bot?
  • <MaximilianKlein> correct. so are there reasons not to import ISNI?
  • <Kolja21> The category could be called "Authority control task force".
  • <shimgray> no, but there's not many (or any?) to import yet :-)
  • <aubremcfato> in the Italian ml between librarians/wikipedians there is a LONG thread about the ICCU ids, the ones that are in VIAF
  • <MaximilianKlein> aubremcfato, i can run it as soon as about 1 week just struggling with some programming difficulties
  • <shimgray> MaximilianKlein, would it be worth running it just for, say, LCCN first?
  • <MaximilianKlein> shimgray, you tell me
  • <shimgray> limited set of entries, and we know they're all curated
  • <aubremcfato> my problem is that people (as you already know) have discovered *huge* issues in those ids
  • jarekt_ (48422381@gateway/web/freenode/ip. has joined #wikimedia-glam
  • pigsonthewing (5d61a896@gateway/web/freenode/ip. has joined #wikimedia-glam
  • <aubremcfato> and thinking about make a study/article/ask to reimport all the data in VIAF
  • <MaximilianKlein> for those that don't know ICCU (the italian authority contorl) has rules where they don't include dates unless necessary, so those don't stick to the VIAF clusters as well
  • <pigsonthewing> en.WP has ORCID, which is a subset of ISNI
  • <MaximilianKlein> so do we want to do both ORCID and ISNI?
  • <pigsonthewing> I'm on an ORCID working group, if that helps
  • <shimgray> pigsonthewing, we were wondering if we should just import ORCID into the ISNI field or if it would be better for reusers to seperate
  • <MaximilianKlein> or just import ISNI, adn then have LUA scripts that can convert?
  • <aubremcfato> I still have to understand things well, but it seems there's the chance to *correct* a big part of the Italian cataloging system :-)
  • <pigsonthewing> If you do ORCID, you are doing ISNI . Q is what to label them as
  • <shimgray> MaximilianKlein, no conversion needed! ORCID is just a specific range in ISNI
  • <shimgray> pigsonthewing, we could just have ORCID as an alternative name for ISNI! but that could get confusing
  • <pigsonthewing> It would, because not all ISNI are ORCID
  • <pigsonthewing> However if something is labelled ORCID, you know it's an ISNI
  • <MaximilianKlein> what are the disadvantages to having seperate ORCID and ISNI fields?
  • jarekt has quit (Ping timeout: 245 seconds)
  • pigsonthewing_ has quit (Ping timeout: 245 seconds)
  • <MaximilianKlein> and aubremcfato, if an when the ICCU problem is solved, it's possible to run an updater script
  • <pigsonthewing> Q is "what is the harm of entering an ORCID twice, once as an ISNI", vs...
  • <aubremcfato> ok, that's better, it leaves us more room
  • <Kolja21> What strikes me at ICCU: They do not import the authority control from the local libraries they work with.
  • <aubremcfato> and time
  • <pigsonthewing> ...vs. labelling some as ISNI and some as ORCID
  • <aubremcfato> we are trying to talk with ICCU, and it's not easy
  • <aubremcfato> PS is someone logging?
  • <MaximilianKlein> aubremcfato, i will copy-paste log onto the agenda page afterwards
  • <Kolja21> ORCID and ISNI have different numbers. So please don't mix them.
  • <aubremcfato> ok, i keep that link
  • <shimgray> Kolja21, are you sure? the ORCID xxxx-xxxx-xxxx-xxxx is also a functioning ISNI
  • <pigsonthewing>
  • <MaximilianKlein> the disadvantage to recording them seperately is that it's extra data on servers
  • <shimgray> hmm. shall we leave these out for now? we don't have many ORCIDs and no ISNI-specific IDs
  • <MaximilianKlein> and then updating a specfic id, like for a redirect, needs to be aware that it has to be done twice
  • <shimgray> and if we end up merging them, no point in creating an ORCID property just now :-)
  • <Kolja21> Everyone can add his personal ORCID ID - and it will not be imported to ISNI. (But ISNI has reserve a number block for ORCID.)
  • <pigsonthewing> I encourage you all to register for your own ORCID ID (you're entitled to, as editors of Wikipedia or sister projects). You can then display them on your user pages.
  • <MaximilianKlein> pigsonthewing, :) that makes me feel important
  • <shimgray> Kolja21, that's not the impression I got, but I'll ask them about it :-)
  • <royt> "ORCID identifiers utilize a format compliant with the ISNI ISO standard. ISNI has reserved a block of identifiers for use by ORCID, so there will be no overlaps in assignments. "
  • <royt>
  • <pigsonthewing> I would like us to include ORCID IDs; I'm also encouraging ORCID to include links to Wikipedia and/or Wikidata
  • <shimgray> I think that means ISNI won't assign any numbers in that area, but they will accept ORCIDs as valid ISNIs
  • <MaximilianKlein> so royt, does that mean that it will always be possible to convert an orcid into an isni and an isni into an ORCID?
  • <pigsonthewing> regisiter for ORCID at:
  • <Kolja21> No, these are two different numbers.
  • <jarekt_> pigsonthewing I would suggests links to wikidata - more stable
  • <royt> it means they will never collide, but who knows what it means about resolution services
  • <pigsonthewing> jarekt_ True, but easier to get lay people to enter Wikipedia links.
  • <Kolja21> If you enter a ORCID ID in the search field @ ISNI: no person is found.
  • <royt> in other words, it may be necessary to keep them straight if ORCIDS are only resolved by ORCID and ISNIs are only resolved by ISNI
  • <pigsonthewing> ...then ORCID can do lookup.
  • <royt> jinx
  • <Kolja21> I think two properties are the best solution.
  • <pigsonthewing> Good point by royt, about resolution. Probably best to keep separate. Can always merge later.
  • <shimgray> Kolja21, I believe that's just because ISNI hasn't imported them yet :-)
  • <pigsonthewing> First ORCID page I gave says they're "working together to consider additional opportunities for collaboration."
  • <royt> shimgray: that could very well be, but if so it points out a refresh problem
  • <pigsonthewing> ...but they move slowly!
  • <royt> i like pigsonthewing's point that they could always be merged later, but pulling them apart again might be difficult
  • <shimgray> *nods*
  • <pigsonthewing> Wonder if one person can have an ORCID and a separate ISNI?
  • <MaximilianKlein> ok, so lets take a qucik straw-poll vote, a) record both ORCID and ISNI now as seperate fields, b) record just ISNI, and put ORCIDS in the isni field c) something else [explain]
  • <Kolja21> @max: probably they will try to merge them one day. i've tried an half year old ORCID.
  • <shimgray> pigsonthewing, at the moment, quite possible - but it's also possible to have two ISNIs assigned if the system can't tell you apart. eventually there'll be some deduping.
  • <pigsonthewing> BTW, we're also working on putting ORCIDs in en.WP citation templates for authors who are cited, but about whom there is no WP article
  • <MaximilianKlein> I would like to vote a) even though it's not the most elegant
  • <shimgray> MaximilianKlein, c), punt it down the road a few months? we have very few ORCIDs in the live DB, so we're not losing very much yet.
  • <shimgray> otherwise I vote a)
  • <pigsonthewing> a)
  • <Kolja21> @pigsonthewing: of cause. ORCID and ISNI are different identifiers.
  • <royt> I do: ORCID: ISNI: 0000 0000 8160 9395
  • <jarekt_> a)
  • <royt> a)
  • <Kolja21> a)
  • <pigsonthewing> royt How did you get an ISNI?
  • <MaximilianKlein> ok, that's mostly a), shimgray can tyou tell us why you'd rather wait?
  • <royt> pigsonthewing: don't remember, but in looking at the record it likely came from VIAF
  • <jarekt_> How about VIAF some people can have several numbers - they will be merged at some point I assume, so some numbers will dissapear
  • <royt> well, or LC NAF numbers should never disappear, they should be joined
  • <shimgray> MaximilianKlein, we don't lose much by putting it off, and I've just checked - will have at least one ISNI board member present, so we'll have a chance to grill him :-)
  • <pigsonthewing> Who's that?
  • <MaximilianKlein> shimgray, how much time are we talking?
  • <shimgray> Andrew MacEwan, our metadata head
  • <MaximilianKlein> ok, it's close to an hour, so I want to end the meeting in 5 minutes
  • <MaximilianKlein> any other topics for discussion?
  • <Kolja21> VIAF dubs: the get deleted. GND dubs: the create redirects.
  • <shimgray> MaximilianKlein, a month or two? we should have clear answers by then
  • <shimgray> pigsonthewing, by the way, are you thinking of coming down?
  • <pigsonthewing> [grateful for an intro to him, at some point, please]
  • <pigsonthewing> Wasn't; when is it?
  • <MaximilianKlein> Kolja, i have created a VIAF dub recognizer and will run it over the data before writing
  • <MaximilianKlein>
  • <pigsonthewing> Sorry , no.
  • <shimgray> 26/4. it is somewhat recent, I only sent the emails out yesterday :-)
  • Mathias_WMDE has quit (Ping timeout: 248 seconds)
  • <Kolja21> @max: great work!
  • <MaximilianKlein> Kolja21, thanks
  • <shimgray> MaximilianKlein, all looks good. my only caveat is that we should treat the enwiki VIAF data as lower-reliability than anything else if there's a conflict
  • <shimgray> and make sure to source it as from-algorithm
  • <shimgray> so that people can filter it out later if they choose, and/or we can run later (and better) scripts to check it
  • <MaximilianKlein> ok, if you make a wikidata property "proveance algortihm" I will attach it
  • <MaximilianKlein> Ok, great, so as for the ORCID/ISNI issue, we can wait to 26/4 for "clear" answers, but otherwise, this meeting has show unanimous agreement to make seperate Properties, and record seperately
  • <pigsonthewing> Let me know if you have any Qs I can put to ORCID on behalf of the group.
  • <shimgray> hmm. could we just use source:OCLC?
  • <MaximilianKlein> we could do that actually
  • <aubremcfato> guys, I need to go, it's midnight :-)
  • <aubremcfato> bye
  • <shimgray> aubremcfato, g'night!
  • <MaximilianKlein> aubremcfato, anything else you want to say?
  • <MaximilianKlein> thanks for joining
  • <aubremcfato> nothing importnat, the italian team needs to work on internal issues
  • <shimgray> here's a demo of a two-VIAF record, by the way:
  • <shimgray>
  • <aubremcfato> we'll keep in touch
  • <shimgray> (they're both valid, there were just matching issues)
  • aubremcfato has quit (Quit: Page closed)
  • <MaximilianKlein> ok, lets do another vote. for data that comes from english VIAF (algortihmic mathcing 99% accurate), do we want to record it as a) imported from English Wikipedia, or b) imported from OCLC?
  • <pigsonthewing> neutral
  • <shimgray> b). that's the original source in the vast, vast majority of cases
  • <shimgray> hmm. can your script tell if it was there before viafbot?
  • <jarekt_> b)
  • <MaximilianKlein> shimgray, i can probably tell the difference
  • <pigsonthewing> If we had an en.wp article on Arkady Natanovich Strugatsky, we'd only include one VIAF. Choice of whcih would be fairly arbitrary
  • <shimgray> *nods*
  • <shimgray> actually, one of those was included on the article for the two brothers
  • <jarekt_> wikidata will support ratings so one can mark prefered options
  • <shimgray> I just discovered the second one by accident, so thought I'd play around :-)
  • <MaximilianKlein> ok, b) seems to be almost unanimous
  • <MaximilianKlein> ok, meeting is 10 minutes over, any last items before adjourning the meeting?
  • <pigsonthewing> I'm now wondering why we /don't/ have an en.WP article on Arkady Natanovich Strugatsky ...
  • <jarekt_> I have seen VIAF records with seperate number for each library
  • <pigsonthewing> Next meeting? Do we need one? Continue by email?
  • <pigsonthewing> Wiki talk page?
  • <MaximilianKlein> let's use the talk page of the agenda for further discussion
  • <jarekt_> wiki talk page
  • <Kolja21> +1
  • <MaximilianKlein> and if there are more big issues, i can organize another meeting
  • <shimgray> pigsonthewing, it's my case study for weird matching :-). he coauthored almost everything with his brother, so many projects have an article on the pair of them
  • <pigsonthewing> Suits me. Goodnight, all.
  • <MaximilianKlein> K great. Meeting closed, will send out a link to the transcripts
  • <shimgray> some projects have article son the individuals as well
  • <shimgray> some have it only on the individuals