Wikidata:Edit groups/OR/8cb3004

Edit group OR/8cb3004

Summary adding affiliation info from ORCID, following instructions in Wikidata:Tools/OpenRefine/Editing/Tutorials/Working with APIs Author Daniel Mietchen
Number of edits 45 (more statistics) Example edit Q37380852

Discussion edit

This was my first somewhat successful attempt at making Wikidata edits using OpenRefine, so I'm leaving some feedback here that will hopefully help improve the items in this batch and similar batches (by anyone) in the future. I closely followed the instructions in Wikidata:Tools/OpenRefine/Editing/Tutorials/Working with APIs and refined them on the go. The workflow mostly made sense, and the edits that were made look fine. However, I did not manage to get all the Wikidata-compatible data from my test set (200 ORCID IDs, as per the tutorial) converted into edits, for different reasons that I will briefly outline and comment on in the following:

  • the string "iMM Lisboa - Instituto de Medicina Molecular" did not reconcile to Instituto de Medicina Molecular (Q10302897) (which had no instance of (P31) statement then, nor an alias similar to the string). OpenRefine thus wanted to create a new item for this and (rightly) complained that I had provided no additional information about this new item, and this apparently blocked the entire batch of edits from being performed. I did not see how I could resolve this in OpenRefine itself (the examples did not seem applicable to my case), so I attempted to fix it by trying to update or create the item for that institute, and since it already existed, I simply made a few edits. However, OpenRefine was not able (at least not in a way that I would see) to pick this up immediately, which made sense to me, so I tried to re-reconcile just that entry but could not figure out how to do that, so had to reconcile the entire column "organizations" again, which took something like 10 min that were annoying because I strongly suspect it is possible to identify the relevant entries and just re-reconcile them rather than everything, but still could not figure out how.
  • along with the notice that the iMM item would need to be created, I was warned that "Some qualifiers were ignored. Qualifier values could not be parsed, so they will not be added to the corresponding statements." It also gave a number (163) with that, and I assume that this applied to the number of "start date" and "end date" uses in my edit batch. I did not see how to act upon that either (perhaps a separate reconciliation step for the "start date" and "end date" columns?), so ignored it, albeit with the regret that this would make the edits less useful.
  • I had also extracted the "role" information from the JSON as an additional column, as was suggested as an exercise, but could not add that to the schema (I could drag it there but not drop it), presumably again because of some missing reconciliation step. It was odd, though, that these date columns that I had not reconciled either could be added to the schema, just not the "role" column.

Right now, I have no idea how best to add the missing statements based on this Edit groups batch, but I will look into this a bit more and report any insights here, especially if it would involve undoing the batch or parts of it. --Daniel Mietchen (talk) 02:22, 30 June 2018 (UTC)[reply]

 
Matched and unmatched cells
 
Appearance of cells reconciled to "new", in comparison to a cell matched to an existing item.
@Daniel Mietchen: thanks a lot for this detailed feedback! Here are a few explanations that will make things a bit clearer hopefully:
A cell in OpenRefine can have multiple reconciliation states:
  • By default, it is just "unreconciled": the cell just contains a string, with no information about what item it corresponds to. Your "role" column likely contains only unreconciled cells - and therefore you cannot use this column in a place of the schema where items are expected (such as the value of a subject has role (P2868) qualifier, because the datatype of subject has role (P2868) is "item" and not "string"). If a cell contains the role "professor", OpenRefine does not know if it should use professor (Q121594) or full professor (Q25339110) as value for the qualifier - a reconciliation step is needed.
  • A cell can be "matched" to an existing item, which means that we think it corresponds to that item exactly. The cell contains a dark blue link to the item. See "University of Michigan" in the screenshot for instance
  • A cell can be in the "none" or "unmatched" state, where some candidate items are suggested as light blue links. An additional link is also provided to create a new item for that cell. See the status of "Universidad de Zaragoza" in the screenshot for instance.
  • A cell can be "matched to a new item" which means that a new item will be created for this cell once we upload the data to Wikidata. OpenRefine never chooses this state by itself: if you have cells with this status in your project, that is because you have clicked the corresponding button to create a new cell (see "Temel Kotil" in the second screenshot).
In any case, it is always possible to change the reconciliation status of an individual cell by clicking the "Choose new match" link (see both screenshots). No need to reconcile the entire column again! If the item you want to match the cell to is not one of the light blue links in the cell, then just click Search for match and select the item in the dialog.
As you realized, dates do not need to be reconciled. Reconciliation is the process of transforming strings to Qids. Dates are not represented by Qids in Wikidata - they are just represented by their year, month and day values, which OpenRefine expects to read in the "YYYY-MM-DD" format. You can learn more about the expected format of cells for each datatype. − Pintoch (talk) 09:11, 30 June 2018 (UTC)[reply]