User:Smalyshev (WMF)/User-Agent policy

This document describes the Wikidata Query Service policy with regard to the User-Agent header. It has been derived from WMF User-Agent policy. The capitalized terms in the policy have meanings as per RFC 2119.

Policy edit

Each client using the WDQS query endpoint MUST define a User-Agent header. Clients not having a User-Agent header MAY be denied access to the service. It is strongly RECOMMENDED that scripted clients use a descriptive User-Agent that would allow us to contact the script owner/runner if the script is misbehaving or causing issues.

Agent recommendations edit

If you are writing a bot or script to access the WDQS SPARQL endpoint, please include as part of the User-Agent:

  • A descriptive name for the tool that may not be confused with others.
  • Some information that would allow us to contact the developer or immediate controller of the tool (Wiki username, web page, Wiki project talk page, email, etc.).
  • Do NOT spoof browser UA strings; this is not helping anyone and creates a mess on both operational and analytic levels.

Examples of suitable UA names:

User-Agent: CoolToolName/1.1 (https://example.org/CoolTool/; CoolTool@example.org)
User-Agent: SparqlBot/3.14 (by User:BotMaker); runner: User:ILoveBots

You can use any format and any names as long as it is distinctive (not likely to be confused with other tools) and identifies the point of contact in some way.

If you are writing a library/toolkit for accessing the WDQS SPARQL endpoint, please consider allowing the end user to customize the User-Agent string that is being sent to the endpoint, and encourage them to set it to a descriptive name in your documentation.

Note that the information included in the User-Agent is covered by the Privacy policy and is treated as personal information. This means raw User-Agent strings will not be published, though aggregated data from these strings, which may contain generic parts of the UA string, such as toolkit name or tool name, may be used in research and published in the aggregated research results.

Generic Agents edit

Many software toolkits, such as Python libraries, the curl library, etc., by default use a UA string that identifies the toolkit. While this is not considered an empty UA and such requests will not be denied, a tool using a UA identifying only the library may be grouped with other tools using similar toolkits and treated as the same client for the purposes of resource allocation. This may lead to your request being throttled or rejected by the service due to performance constraints. Adding a distinctive UA string (see above) will allow you to avoid this.

Browser-based applications by default use the UA string of the browser hosting them. These are currently not considered to be generic agents. In the future we may add support for identifying browser-based applications using the Api-User-Agent header, but currently it is not supported. If you are implementing a browser-based application that is likely to issue a significant number of SPARQL queries, consider adding a distinctive UA string, too (changing the User-Agent is no longer forbidden by the specs).

Examples edit

TODO: EXAMPLES

PHP/curl edit

curl_setopt($ch, CURLOPT_USERAGENT ,'CoolToolName/1.1 (https://example.org/CoolTool/; CoolTool@example.org)');

Python/pywikibot edit

Ensure you are using the latest checkout of pywikibot, then add this to user-config.py:

user_agent_description = "https://example.org/CoolTool/; CoolTool@example.org"

See more details in the pywikibot docs.