-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:03, 18 January 2017 (UTC)


Did you notice that Ringgold ID (P3500) was created? Please can you provide your data to Mix'n'Match? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:50, 27 January 2017 (UTC)

@Pigsonthewing: Absolutely! But I don't think my extraction from the ORCID dump was the right method. It only returns very few ids, and the method is not very reliable. I have been working on a different extraction procedure that uses the ORCID autocompletion method to retrieve the clean metadata associated with the Ringgold IDs. Here is how:
  • provides a database of 400,000 institutions, with ISNI identifiers, but without Ringgold IDs. However, they come from Ringgold's own database, so these institutions also have Ringgold IDs (not sure why Ringgold does not provide them).
  • we can match these records to the metadata returned by the autocompletion method in ORCID, because they have exactly the same tuple (name,city,region,country) (as they come from the same database).
  • by doing so, we obtain a much richer database, with both ids and a few other interesting columns:
I am currently processing this dump (slowly, to keep the load on ORCID minimal).
Concerning Mix'n'Match, I have mixed feelings. I don't think it is worth rushing to use it right now, because a lot of the dataset could be matched automatically, using more reliable methods than name matching. For instance, by matching existing ISNI identifiers (and first making sure that we have pulled all the ISNIs we could from other sources such as GRID). But also by fuzzy-matching on the other fields of the dataset (including the URL), which Mix'n'Match does not currently support. Let me know what you think! − Pintoch (talk) 21:22, 27 January 2017 (UTC)
Matching in IDs is good. My concern with fuzzy matching, especially on "home page" URLs, is that we might wrongly match a faculty to the main university, or a department to a faculty. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:52, 27 January 2017 (UTC)
Yeah, of course we don't want to match only on URLs, but I think that a combined match on (name,country,URL,type) could be more robust than name matching. There could be a lot of cases where this matching is good enough to be automated, and the harder cases would be done manually. This would be a perfect use case for the reconciliation feature of OpenRefine, so I have just contributed to the bounty there: − Pintoch (talk) 23:22, 27 January 2017 (UTC)

Country property quickstatementsEdit

I note that you added historic countries to a modern school (founded 1989). This doesn't seem right. Runner1928 (talk) 21:06, 22 March 2017 (UTC)

@Runner1928: Whoops, thanks for spotting that! I'll fix it asap. − Pintoch (talk) 21:08, 22 March 2017 (UTC)

Ipv6 prefixesEdit

Not sure how useful it is to add these, but if you do like [1] could you please also include retrieved (P813)? You might want to include the start time (P580) too. Do you have an idea about how this data can be used? Multichill (talk) 08:32, 13 May 2017 (UTC)

@Multichill: I agree, I should have added retrieved (P813), sorry about that! And yes, I have been thinking about start time (P580), but it is not clear to me if this date is available from RIPE: they have a creation time, which is very often set to epoch (Q2703), and a time of last update. I suppose that if the creation date is not Epoch then it is reasonably safe to assume that it is the right value for start time (P580) but I'm not entirely sure. About using this data, yes I have plenty of use cases in mind, and I plan to release a tool that uses this data in the coming weeks. − Pintoch (talk) 19:17, 13 May 2017 (UTC)

P17 or not P17…Edit


I just saw this claim :

I'm not sure that country (P17) is appropriate here. Could you take a look?

Cdlt, VIGNERON (talk) 13:18, 24 May 2017 (UTC)

@VIGNERON: First sorry for this batch of country (P17) imports, I had many issues with it. But for that particular example it does not seem too bad to me! The English wikipedia does put the page in en:Category:International organisations based in Denmark… Would you prefer ? Whyh is that wrong to add country (P17) in these circumstances? Thanks! − Pintoch (talk) 13:28, 24 May 2017 (UTC)
No problem, I understand, I had my fair share of problematic import myself.
Yes, headquarters location (P159) would be better but after a deeper look I think that this item should be split in two: the online international database and the organisation behind it. That way, the article on the French Wikipedia could be link to the first and the English Wikipedia article to the second. What do you think?
Cdlt, VIGNERON (talk) 13:49, 24 May 2017 (UTC)
You can use ORCID iD (Q51044) and ORCID, Inc. (Q19861084) as a model ;-) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:48, 24 May 2017 (UTC)
@VIGNERON:, @Pigsonthewing: Sounds great! − Pintoch (talk) 14:51, 24 May 2017 (UTC)
@Pigsonthewing: thank you for this example, that's exactly what I had in mind. I'll do the split later today. Cdlt, VIGNERON (talk) 07:57, 25 May 2017 (UTC)
@Pintoch, Pigsonthewing: I've created and move the P17 into GBIF Secretariat (Q30068103). Could you take a look? I didn't move the interwikilink yet as I don't want to break the navigation. Cdlt, VIGNERON (talk) 12:17, 27 May 2017 (UTC)
@VIGNERON: Looks OK. I gave them reciprocal item operated (P121) / operator (P137) links. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:44, 28 May 2017 (UTC)

Creating wikidata items for GRIDEdit

You may be interest in my new bot request!! ArthurPSmith (talk) 20:13, 6 June 2017 (UTC)

FYI I've started running - see Special:Contributions/APSbot. I added some additional fields including the alternate language labels and aliases as you had suggested, geographic coordinates, and inception date, where available. I didn't include the ISNI's or Fundref or other external ID's at this point. ArthurPSmith (talk) 20:32, 13 June 2017 (UTC)


Hi, Pintoch. Wrt this edit... since Centro de Estudios y Experimentación de Obras Públicas (Q20240946) is an institute/organization/non-physical-entity/whatever... wouldn't be preferable using headquarters location (P159) instead of located in the administrative territorial entity (P131) to store this data? Strakhov (talk) 20:33, 19 June 2017 (UTC)

Hi Strakhov. I have used headquarters location (P159) for companies (instances of business (Q4830453)) but now I am using located in the administrative territorial entity (P131) for everything else. The reason is that there is a constraint on P131 that indicates that headquarters location (P159) should be used for instances of business (Q4830453). I think the distinction between the two properties is quite tenuous, sometimes I find examples where I agree with you headquarters location (P159) should be better suited, but sometimes I find located in the administrative territorial entity (P131) appropriate. For instance I feel it is a bit weird to use headquarters location (P159) for a museum (Q33506). I think it is a bit annoying that we have to choose between the two, to be honest. If you want to change some of the claims I added to headquarters location (P159), feel free! − Pintoch (talk) 22:46, 19 June 2017 (UTC)
My point is... there's a constraint in P131 (incompatible with P159) ...and Centro de Estudios y Experimentación de Obras Públicas (Q20240946) is potentially "fill-able" with a P159 statement (for example headquarters location (P159) -> Calle de Alfonso XII (Q5659170); qualifier street number (P670) ->3-5), if you don't wanna create an item for the building itself if it's not top-level architecture (it's this one and it's not databased in Official Architects' Association of Madrid (Q5777096) registry (as the adjacent building (headquarters of "Centro de Documentación de Música y Danza") is).
With museum (Q33506) ...there's the institution and... the building(s). They are often mixed, yeah, but the distinction is clear to me (Museo del Prado (Q160112) vs Villanueva building (Q5818335) & Co.).
Do as you wish, but IMHO these locations edits would be better stored in P159, because, in the end... "when Wikidata is perfect" ...that would be their place. Thanks for your edits! Strakhov (talk) 23:14, 19 June 2017 (UTC)
I totally get your point, I am just sticking to the constraints as they are now. If you think the ban on P131 should be changed from business (Q4830453) to something broader (or a list of other classes), then this should be probably discussed on its talk page. If there is consensus for that, it should be straightforward to run a bot that changes the relevant statements from located in the administrative territorial entity (P131) to headquarters location (P159). I just don't have time and interest to look into the whole ontology myself and figure out which classes should get what property. − Pintoch (talk) 23:23, 19 June 2017 (UTC)
I agree with Strakhov. Bad constraints should be fixed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:13, 20 June 2017 (UTC)
Yeah I think we all agree, so let's discuss that somewhere more public. − Pintoch (talk) 22:52, 20 June 2017 (UTC)
