维基数据:制作一个机器人

This page is a translated version of the page Wikidata:Creating a bot and the translation is 44% complete.
Outdated translations are marked like this.

这个页面的意思是如何在维基数据制作一个机器人。请分享您的代码,做出一个日志并且注明你想拿它干什么。

条件

制作一个机器人您需要:

  • 一些编程技巧(例如Python、Perl、PHP等)
  • 框架(下文框架之一)以及完成您的任务所需要的一些代码
  • 一个机器人账户(并请等待批准
  • 源代码编辑器(Notepad++、Geany、vi、emacs)

Recommendation

Pywikibot

In the following sections, you will learn how to install, configure and login using pywikibot. You only need to do these first three steps once. Also, there are some basic examples which are useful for learning the basics about bot programming.

安裝

更多有关安装pywikibot的介绍,参见MediaWiki的这个页面

如果想安装pywikibot:

配置

:更多有关pywikibot的构建事宜,参见MediaWiki的这个页面

You must configure user-config.py file with the bot username, family project and language. For Wikidata both family and language parameters are the same, wikidata.

You can reduce the delay between edits by adding: put_throttle = 1

登录

After you configure the user-config.py file, login as follows:

$ python login.py

It will ask you for your bot password, insert it and press enter. You should be logged in now.

示例1:获取数据

此示例从有关Douglas Adams的页面获取数据。将下面的源代码保存为一个文件并运行它:python example1.py

item.get() connects to Wikidata and fetches the data. The output is (reformatted for clarity):

{
    'claims': {
        'P646': [<pywikibot.page.Claim instance at 0x7f1880188b48>],
        'P800': [<pywikibot.page.Claim instance at 0x7f1880188488>, <pywikibot.page.Claim instance at 0x7f1880188368>]
        ...
    }
    'labels': {
        'gu': '\u0aa1\u0a97\u0acd\u0ab2\u0abe\u0ab8 \u0a8f\u0aa1\u0aae\u0acd\u0ab8',
        'scn': 'Douglas Adams',
        ...
    }
    'sitelinks': {
        'fiwiki': 'Douglas Adams',
        'fawiki': '\u062f\u0627\u06af\u0644\u0627\u0633 \u0622\u062f\u0627\u0645\u0632',
        'elwikiquote': '\u039d\u03c4\u03ac\u03b3\u03ba\u03bb\u03b1\u03c2 \u0386\u03bd\u03c4\u03b1\u03bc\u03c2',
        ...
    }
    'descriptions': {
        'eo': 'angla a\u016dtoro de sciencfikcio-romanoj kaj humoristo',
        'en': 'English writer and humorist',
    },
    'aliases': {
        'ru': ['\u0410\u0434\u0430\u043c\u0441, \u0414\u0443\u0433\u043b\u0430\u0441'],
        'fr': ['Douglas Noel Adams', 'Douglas No\xebl Adams'],
        ...
    }
}
['claims', 'labels', 'sitelinks', 'descriptions', 'aliases']
[[wikidata:Q42]]

It prints a dictionary with keys for

  • the set of claims in the page: Property:P646 is the Freebase identifier, Property:P800 is "notable work", etc.
  • the label of the item in many languages
  • the sitelinks for the item, not just Wikipedias in many languages, but also Wikiquote in many languages
  • the item description in many languages
  • the aliases for the item in many languages

Then a list with all the keys for the key-values pairs in the dictionary. Finally, you can see that the Wikidata item about Douglas Adams is Q42.

替代

The example above gets the ItemPage using the en wikipedia article. Alternatively, we can also get the ItemPage directly:

示例2:获得跨维基连接

After item.get(), for example the sitelinks can be accessed. These are links to all Wikipedias that have the article.

输出结果是:

{'fiwiki': 'Douglas Adams', 'eowiki': 'Douglas Adams', 'dewiki': 'Douglas Adams', ...}

With item.iterlinks(), an iterator over all these sitelinks is returned, where each article is given not as plain text as above but already as a Page object for further treatment (e.g., edit the text in the corresponding Wikipedia articles).

示例4:设置声明

This example sets an English and a German description for the item about Douglas Adams.

Setting labels and aliases works accordingly.

示例6:设置连结

要设置站点链接,我们既可以创建一个相应的词典对应示例4也可以使用页面对象:

示例7:设置语句

Statements are set using the Claim class. In the following, we set for Douglas Adams place of birth (P19): Cambridge (Q350).

For other datatypes, this works similar. In the following, we add claims with string (IMDb ID (P345)) and coordinate (coordinate location (P625)) datatypes (URL is the same as string):

示例8:添加限定符

Qualifiers are also represented by the Claim class. In the following, we add the qualifier incertae sedis (P678): family (Q35409) to the Claim "claim". Make sure you add the item before adding the qualifier.

示例9:加入来源

Also, sources are represented by the Claim class. Unlike for qualifiers, a source may contain more than one Claim. In the following, we add stated in (P248): Integrated Taxonomic Information System (Q82575) with retrieved (P813) March 20, 2014 as source to the Claim "claim". The claim has to be either retrieved from Wikidata or added to an itempage beforehand.

示例10:页面监视

TODO

示例11:获取子属性的值

In the following, we get values of sub-properties from branch described by source (P1343) -> Great Soviet Encyclopedia (1969–1978) (Q17378135) -> properties reference URL (P854) and title (P1476).

更多例子

一些用户分享了他们的源代码。在下面的链接了解更多:

维基数据整合工具

WikidataIntegrator is a library for reading and writing to Wikidata/Wikibase. We created it for populating Wikidata with content from authoritative resources on Genes, Proteins, Diseases, Drugs and others. Details on the different tasks can be found on the bot's Wikidata page.

Pywikibot is an existing framework for interacting with the MediaWiki API. The reason why we came up with our own solution is that we need a high integration with the Wikidata SPARQL endpoint in order to ensure data consistency (duplicate checks, consistency checks, correct item selection, etc.). Compared to Pywikibot, WikidataIntegrator currently is not a full Python wrapper for the MediaWiki API but is solely focused on providing an easy means to generate Python-based Wikidata bots.

For more information, documentation, download & installation instructions, see here: https://github.com/SuLab/WikidataIntegrator/

示例笔记本

An example notebook demonstrating an example bot to add therapeutic areas to drug items, including using fastrun mode, checking references, and removing old statements:

http://public-paws.wmcloud.org/46883698/example%20ema%20bot.ipynb

WikibaseIntegrator

Forked from Wikidata Integrator by User:Myst in 2020 and has seen several improvements to the API that makes it even easier to create bots using the library.

For more information, documentation, download & installation instructions, see here: https://github.com/LeMyst/WikibaseIntegrator

Example semi-automatic script

LexUse semi-automatic tool for finding and adding usage examples to lexemes. It's free software written using Python 3 in 2020 Wikidata:LexUse.

Wikibase.NET (已弃用)

Wikibase.NET is the api that replaces the now deprecated DotNetDataBot. Api client for the MediaWiki extension Wikibase. They aren't compatible because Wikibase.NET does no longer need the DotNetWikiBot framework.

下载与安装

You can download Wikibase.NET from GitHub. Just follow the instructions on that page.

已知问题

示例

即将到来……

DotNetDataBot(弃用)

安裝

配置

After unpacking the package you can see a file called DotNetDataBot.dll and one called DotNetDataBot.xml. The xml document is only for documentation. To use it you have to create a new refer in your project. Then you can write using DotNetDataBot; to import the framework.

登录

To login you have to create a new Site object with the url of the wiki, your bot's username and its password.

示例1:使用wiki页面获取ID

You can access the id of an item by searching for using the site and the title of the connected page.

示例2:获得跨维基连接

You can get the interwiki links of an item by loading the content and accessing the links field of the object.

示例3:设置描述

To set a description, you must call the setDescription function.

示例4:设置声明

It works the same way for setting a label. Just call setLabel.

示例5:获取100个页面的跨维基链接

不支持此功能。 只需遍历列表即可。

用于PHP的Wikibase api

This is an api client for Wikibase written in PHP. It can be downloaded from here.

示例1:基本示例

Take a look at the source comments to understand how it works.


示例2:创建声明

Take a look at the source comments to understand how it works.

VBot (no updates since 2017)

Framework for Wikidata and Wikipedia. Read and write on Wikidata and other Wikimedia project and have a useful list generator to generate list of Wikipedia page and Wikidata entity. Can read also JSON dump of Wikidata.

概述

读取和编辑维基数据和维基百科的机器人。

  • License: CC0 1.0
  • Language C#
  • Can read and write entities with all datatype on Wikidata
  • Can read and write pages on all Wiki project
  • Can read parameter from template on wiki pages
  • Can read JSON dump
  • Can create lists using:
  • Tested with Visual Studio Express 2013 for Windows Desktop.
    • Is necessary to have Newtonsoft.Json. You can install it with NuGet inside Visual Studio
    • Is necessary to add manually a reference to System.Web for "HttpUtility.UrlEncode"

下載

The framework can be downloaded from GitHub here.

指南

示例1

Update en label for all items with instance of (P31): short film (Q24862) that have director (P57) and that have publication date (P577) in 1908. (Use of Wikidata query)

LexData (Python; for Lexicographical data)

LexData is an easy to use python libary to create and edit Lexemes, Senses and Forms.

Tips

The documentation of LexData is still a bit lacking so look at existing implementations in MachtSinn or Wikdata Lexeme Forms for ideas how to use it.

If you only want to add statements to Lexemes (not forms or senses) WikibaseIntegrator might be a better choice, as it is more versatile and support a lot of data types.

Installation

You can install LexData via pip:

$ pip install LexData

Login

For all operations you need a WikidataSession. You can create it with your credentials, a bot password or an Edit Token (for example to edit via OAUTH):


Retrieve a Lexeme

You can open existing Lexemes and read their content.

Searching and creating Lexemes

If you don't know the L-Id of a lexeme you can search for it. And if it doesn't exist you can create it.

Adding information

You can easily create forms or senses, with or without additional claims:

直接使用维基数据的API

The other sections describe how to use bot frameworks to access and update Wikidata information. You can also directly interact with the Wikibase API that Wikidata provides. You need to do this if you're developing your own framework or if you need to do something that a framework doesn't support. The documentation for the Wikibase API can be found at mediawiki.org. You can also play around with it at Special:ApiSandbox, try action=wbgetentities.

Wikibase provides its API as a set of modules for MediaWiki's "action" API. You access this by making HTTP requests to /w/api.php. The default response format is JSON. So for your language of choice, you only need a library to perform HTTP requests and a JSON or XML library to parse the responses.

示例1:获得Q编码

This example gets the item Q number for the English Wikipedia article about Andromeda Galaxy. The Wikibase API's main "workhorse" module action=wbgetentities provides this information. The HTTP request (using jsonfm format for human-readable JSON output) is simply

https://www.wikidata.org/w/api.php?action=wbgetentities&titles=Andromeda%20Galaxy&sites=enwiki&props=&format=jsonfm&formatversion=2

Try following the link. This requests no additional information about the entity; remove &props= from the URL to see much more information about it. See the generated help for wbgetentities for more parameters you can specify.

Python

输出结果是:

Q2469

Example 2: Get list of items without particular interwiki

...please contribute if you know how...

参见

其他链接