Wikidata:Tools/OpenRefine/Editing/Tutorials/Basic editing/fr

This page is a translated version of the page Wikidata:Tools/OpenRefine/Editing/Tutorials/Basic editing and the translation is 34% complete.

OpenRefine comprend une extension Wikidata qui permet de transformer des données tabulaires en déclarations Wikidata. Ce didacticiel donne un aperçu de son fonctionnement.

Pour commencer

Dans ce didacticiel, nous allons importer des données sur des universités au Royaume-Uni. Notre source est un portail gouvernemental qui permet d'exporter ses données au format CSV. Ouvrez ce fichier avec OpenRefine (vous pouvez copier-coller le lien et OpenRefine le téléchargera directement).

 
État initial du projet OpenRefine

La première étape est de réconcilier la colonne EstablishmentName à Wikidata. Cliquez sur le menu de cette colonne et choisissez RéconcilierDémarrer la réconciliation, et choisissez de réconcilier vers Wikidata.

OpenRefine essaie de deviner le type des entités de cette colonne et propose university (Q3918) : en pratique, c'est trop spécifique car certaines de nos lignes concernent des collèges ou des académies. Par conséquent, choisissez l'option Réconcilier avec le type et sélectionnez un type plus large dans ce champ, tel que educational organization (Q5341295).

Dans le panneau de droite, vous avez la possibilité d'utiliser d'autres colonnes pour affiner le processus de rapprochement. Dans notre cas, la colonne URN contient des identifiants qui sont stockés dans Wikidata sous la forme DfE URN (P2253). Cochez donc Inclure pour cette colonne, tapez DfE URN dans la case à droite, et validez votre choix dans la boîte de dialogue de suggestion déroulante.

 
Capture d'écran d'un projet OpenRefine dans le cadre d'un tutoriel sur l'import Wikidata. Boîte de dialogue de réconciliation pour l'étape de réconciliation initiale.

Enfin, démarrez le processus de réconciliation et attendez qu'il se termine. La plupart des cellules auraient dû être réconciliées automatiquement grâce à l'identifiant unique que nous avons fourni, mais en général il est important de vérifier la qualité de ces correspondances.

Certaines des cellules n'ont pas été réconciliées automatiquement, en raison de divergences entre notre table et Wikidata. Vous pouvez les isoler en cliquant sur aucun dans la facette jugement à gauche. Si vous avez le temps, vous pouvez ensuite les faire correspondre manuellement - mais ce n'est pas obligatoire pour continuer ce tutoriel : les cellules non rapprochées seront ignorées dans les étapes suivantes.

 
Capture d'écran d'un projet OpenRefine dans le cadre d'un tutoriel sur l'import Wikidata. Projet après rapprochement initial.

Planification de l'importation

Scroll horizontally to review the columns in this table. Many of them could be imported in Wikidata: the address of the headquarters (including the postcode), the URL of the official website, the main telephone number and even the name and title of the head of the institution.[1] We first need to understand how this sort of data is currently modeled in Wikidata. Click on a few reconciled cells and look what statements they already have.

Take University of Birmingham (Q223429) for instance. This item already has the following statements:

official website
  http://www.birmingham.ac.uk/
search formatter URL https://www.birmingham.ac.uk/search.aspx?q=$1
country United Kingdom
0 reference
add reference


add value
headquarters location
  Birmingham
street address Edgbaston
postal code B15 2TT
country United Kingdom
0 reference
add reference


add value

That shows us which Wikidata properties are used for this sort of data. To have a better idea of their current coverage for the items we have at hand, we are going to use the data enrichment capabilities of OpenRefine to pull the existing Wikidata statements in our OpenRefine project. Click on the reconciled column and choose Edit columnAdd columns from reconciled values. Select the properties that we are interested in, such as official website (P856), headquarters location (P159), located in the administrative territorial entity (P131) and phone number (P1329). Validate and wait for the process to complete.

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Data extension.

We can now use facets on these columns to evaluate their coverage on our dataset and compare their values to the ones we have in our source. For instance, click on the official website column that we have just created. Select FacetCustomized facetsFacet by blank. This creates a new facet on the left-hand site. The false value corresponds to the cells that are not empty, so the rows for which Wikidata already has an official website (P856).

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Facet of a column by blank.

At the time of writing, all these universities already have a website on Wikidata (good job Wikidatans!). The remaining two rows do not correspond to any university. This might be different for you if you have not reconciled all cells. We can use the same facet on other rows, and we discover that none of these items have a phone number (P1329) yet. Let's import that then!

Creating a schema

We now need to instruct OpenRefine how to translate our table into Wikidata statements. This is done by creating a schema: this is a template (or pattern) of statements that will be applied to each row. Click on the Wikidata button in the top right corner of your screen. Click Edit Wikidata schema to open the schema editor. You can think of the schema editor as a space where you can create new statements pretty much as you would create them directly on Wikidata. The main difference is that you can use columns as placeholders for values in these statements. For each row, these placeholders (or variables) will be replaced by the actual content of the column in that row. This is useful in our case, because the statements we want to add follow a simple pattern:

EstablishmentName 1329 TelephoneNum

First, let us indicate that the EstablishmentName column contains the items on which we want to add statements. Click the add item button to add a new item in the schema. Then, drag and drop the EstablishmentName column to the subject of the item.

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Specifying the subject item to modify.

Then, we can add a statement on this item template. Fill in the property (with phone number (P1329)), and drag and drop the TelephoneNum column as value for the statement. In the same way, we could add other statements on the same item, as well as labels, descriptions or aliases (for which you need to use the add term button).

Detecting and fixing issues

At this stage, you can see that the Issues tab reports 238 issues with our candidate statements.[2] Open the issues tab to survey them:

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Issues reported by OpenRefine after our first schema draft.

OpenRefine reports that the format of the telephone numbers that we are trying to add does not match the expected format on Wikidata. While the regular expression it provides is a bit cryptic, we can click on the property and discover that it expects a format where the international dialing code is supplied: instead of 01784434455, we should add +44-1784434455.

Luckily, we can fix that: we know that all these institutions are from the UK, and hence have the same international dialing code "+44". All we need is to append it in front of our values, with the initial 0 removed. Go back to the table. Find the TelephoneNum column, open its menu and select Edit cellsTransform. We can now enter a simple expression in GREL[3] to transform the values:

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Fixing the format of the telephone numbers.

Apply the transformation and open the schema editor again. The warning about the format has disappeared, but OpenRefine still complains about the fact that we have not added any reference to our statements. We want to indicate the source of this information and make it easy for others to verify it. In our case, the website we drew the information from is the one DfE URN (P2253) links to. So, we can just include this identifier as a reference. The value of the identifier will change for each statement, so again we need to drag and drop the column that contains the identifier:

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Final schema.

We can also add a retrieved date, as the data on this website might be updated in the future. We have now solved all the issues reported by OpenRefine, and we can preview the edits generated by our schema.

 
OpenRefine Wikidata basic editing tutorial, preview of edits

Uploading the changes to Wikidata

Now that we are happy with our statements, we can upload them to Wikidata. Click on the Wikidata button in the top right corner and select Push to Wikidata. You will need to login with your Wikidata account: the statements will be added via this account. This is a fairly small edit batch so we can do it on our own, but for larger groups, it is better to discuss the changes on Wikidata and request a bot flag to perform the edits.

 
Screenshot of an OpenRefine project as part of a tutorial about Wikidata import. Final dialog where edits can be performed.

Add a descriptive edit summary to your edits: this comment will appear in the history view of each item that you edit. Once you are done, click Perform edits. The upload will take a while and you can watch its progress by checking your own contributions. If you cancel the operation in OpenRefine, the upload will stop but existing edits will not be undone. If you have made a mistake and want to undo the entire edit group, you can use the EditGroups tool.

Conclusion

Well done, you have added more than 100 well sourced statements to Wikidata, just in a few clicks! Why don't you try this on another dataset? Oh, and if you enjoy using OpenRefine, feel free to add the {{User loves OpenRefine}} userbox to your user page. Do you want to try something a bit more challenging? Then go to the following tutorial in this series, the Inverse Listeria tutorial.

Notes

  1. We could add the head of the institution with statements such as University of Birmingham (Q223429)director / manager (P1037)Michael Sterling (Q6834612), but after reviewing the associated Wikipedia page we realize that this information is already outdated by a few years, so that should also be excluded from the import.
  2. If OpenRefine does not report any format violation, you might have selected the wrong column. We have two columns with phone numbers in our project: the original one from our source (TelephoneNum) and the one that we pulled from Wikidata (phone number). In the schema editor we want to use the TelephoneNum, because that is where our fresh values are.
  3. To learn more about this expression language, visit OpenRefine's wiki (you can also use Python or Clojure if you are more familiar with these languages).