维基数据:工具/OpenRefine

This page is a translated version of the page Wikidata:Tools/OpenRefine and the translation is 53% complete.

WikidataCon Award 2019
Coolest Tool Award 2022 logo

Open Refine

2022 Coolest Tool
Award Winner

in the category
Eggbeater

OpenRefine标志
OpenRefine标志
视频教程入门
Emma Carroll的OpenRefine初学者教程

OpenRefine是一个数据整理工具(自由软件),可以清理表式数据并与包括维基数据在内的知识库互联。此前由Google开发(名为Google Refine),目前已过渡为由社群支持的项目。

此页面收集OpenRefine方案,它们有助于将数据集导入维基数据,或者使用从维基数据提取的额外数据来增强数据集。随意使用讨论页寻求有关此软件的帮助。如果您喜爱该工具,可以悬挂{{User loves OpenRefine}}用户框来推广它。

OpenRefine currently only supports reconciling items. Lexemes are not supported as of September 2022.

安裝與運行OpenRefine

OpenRefine can be downloaded as an application. It works on desktop and laptop computers with Windows, Mac and Linux operating systems. It runs a small server on your computer and you then use a web browser to interact with it. It works best with browsers based on Webkit, such as Google Chrome, Chromium, Opera and Microsoft Edge, and is also supported on Firefox.

OpenRefine具有圖形使用者介面 ,並提供超過15種語言。

在您的電腦上安裝OpenRefine

您可在此下載最新穩定版本的OpenRefine .

於PAWS上直運行OpenRefine

Since May 2021, everyone with a registered Wikimedia account can run OpenRefine in PAWS on Wikimedia's Cloud Services. Please note that this is an experimental feature which is not supported by the OpenRefine team itself, and which may break or malfunction. It is however an interesting option for people who can't install software on their local computer.

PAWS is a Wikimedia Cloud tool that provides hosted access to Jupyter notebooks and other tools without needing any local installation.

You can access your own installation of OpenRefine with this link: https://hub-paws.wmcloud.org/hub/user-redirect/openrefine. You'll have to login with your wiki credentials, but don't tick Remember me box: as all files written on PAWS are publicly available, you don't want to let your credentials accessible. It is also possible that you will get an error message; if that is the case, then refresh the page and it should work.

Please contact YuviPanda with questions about OpenRefine via PAWS.

主要功能

维基数据关联

In OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata.

 
Semi-automatic reconciliation of universities in OpenRefine

OpenRefine的wiki包含一份详细的关联步骤指南。主要功能如下:

  • Restrict the reconciliation to a Wikidata class. Only items from subclasses of this Wikidata class will be considered;
  • Use multiple columns in your dataset and match them against values of properties in Wikidata, which refines the reconciliation score and acts as a tiebreaker between namesakes;
  • Use the external identifiers shared by your dataset and Wikidata to reconcile your items;
  • Use the sitelinks provided in your dataset as external identifiers - if these Wikimedia pages are linked to a Wikidata item, they will automatically be reconciled to that.

If you want to use the reconciliation features, consider engaging with the following instruction materials:

APIs can be, for instance a search on frlabels with wikidata thanks to this link https://wikidata.reconci.link/fr/api.

数据增强

 
This screencast demonstrates how to add new columns based on a reconciled column in OpenRefine 2.8.

此功能自OpenRefine 2.8版本起可用。

Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table. Access to item labels, item descriptions and item sitelinks is provided by properties Lxx, Dxx and Syyyy, where xx is a language code (en, fr, yue, etc.) and yyyy is a site ID (enwiki, ptwikisource, etc.).

You can use this function recursively on the newly-created columns if they correspond to Wikidata items. This lets you explore the Wikidata graph along selected properties. It is also possible to configure the way you retrieve the properties in various ways (for instance, filtering by rank or references).

维基数据编辑

此功能自OpenRefine 3.0版本起可用。

OpenRefine can help you transform tabular data into Wikidata statements. This works by creating a schema - a template of Wikidata edit that is applied to each row of your table. Once you have created a schema, you can:

  • preview the Wikidata edits and inspect them manually;
  • analyze and fix any issues raised automatically by the tool;
  • upload your changes to Wikidata by logging in with your own account;
  • export the changes to the QuickStatements v1 format.

方案

OpenRefine workflows can be shared by copying the JSON representation of the edit history. This represents the operations you have made in OpenRefine, and can be reused by others on similar datasets. This section lists some recipes that can be useful when working with Wikidata. See also OpenRefine Recipes.

  • Obtaining Wikidata Q numbers. Once you have reconciled a column to Wikidata, you can obtain the Qids in a new column, by using the Add column based on this column operation with the following GREL expression: cell.recon.match.id
  • More variables. You can access many different variables for the reconciled cell. See the reference page for variables.
  • 也来分享你的方案!

帮助OpenRefine

OpenRefine需要你的帮助!你可以做很多事情:

我们有一个Phabricator项目来在维基媒体内跟踪与OpenRefine相关的活动;随意标记与它有关的任务。

Over 2021-22, OpenRefine is being extended with Structured Data on Wikimedia Commons (SDC) support. This project is funded by a Wikimedia Foundation Project Grant.