User:DrTrigonBot/Subster

DrTrigonBot Subster
DrTrigonBot Subster
This is the configuration/template page belonging to Category:DrTrigonBot:DrTrigonBot/Subster and labels DrTrigonBot:DrTrigonBot/Subster:???.

This items will be automatically, frequently (daily) updated with external text from {{{url}}}. This text will be inserted between the <!--SUBSTER-abc-->...<!--SUBSTER-abc--> (tags). All inserted text is linked to labels as Property:P370, if nothing else specified. (This is done by DrTrigonBot)

Documentation[view · edit · history · purge ]

If this template is included on a page, DrTrigonBot modifies the page content according to given criteria and inserts text from aribrary sources (if they have changed). In case of problems, please leave a hint on this page (in English or German please). This template was derived from w:en:Template:Auto archiving notice and simplified.

Supported are all (plain) text sources, HTML sources by using regex and BeautifulSoup (thus partial XML support also) as well as RSS sources (through online RSS2HTML conversion, alternative with feed2html and Universal Feed Parser on the TS). Those sources may be ZIP compressed too. Last it is possible to use pure Wikipedia sources also.

One thing to be aware of and keep in mind upon using external sources is the copyright - the external page used as source, has to be either available under a free license - like it is the case for Identi.ca and blog.wikimedia.de - or the user has to show probable cause to be the author of the contents used from there. The choice has to be done carefully (e.g. avoid pages containing lots of advertising).

Usage edit

regex mode (default) edit

{{User:DrTrigonBot/Subster
|url=...
|regex=...
|value=abc
}}
...
<!--SUBSTER-abc--><!--SUBSTER-abc-->

Beautiful-Soup mode edit

{{User:DrTrigonBot/Subster
|url=...
|beautifulsoup=True
}}
...
<!--SUBSTER-BS:body--><!--SUBSTER-BS:/-->

simple mode edit

{{User:DrTrigonBot/Subster
|simple={{xyz|...}}
|value=abc
}}
...
<!--SUBSTER-abc--><!--SUBSTER-abc-->

where the template (eg. xyz) has following format

{{((}}xyz

|url=...
|regex=...
{{))}}

here arbitrary variables eg. (everything known from w:en:Help:Magic words#Parser functions) can be used.

Parameter edit

  • url: Webpage that serves as data/text source to read from. By using 'mail://[Adresse]' here, e-mail can also be used as source, more info at Mail.

optional:

  • wiki: Wheter to use external text from arbitrary URL (False) or internal text of a Wikipedia page (True) as source (default: False).
    • expandtemplates: Expand or resolve all templates in the wiki text fully (like subst:). Can only be used in combination with wiki=True and has no effect else (default: False).
  • zip: The source given by URL is ZIP compressed. If yes (True or any number bigger than 0) the first (or given by number) file from the archive gets decompressed and will be used (default: False).
  • xlsx: The source given by URL is in Excel format. Here the name of the sheet to export has to be given (default: False).
  • cron: Time interval the bot should be use on this page. The entry has to be given in w:en:cron format but without minute and hour, thus: '[day of month] [month] [day of week]' (default: * * *).
  • Show:

regex mode (default) edit

  • regex: Regular expression for extraction of the text from webpage. Use '(.*)' or '(.*?)' for the part of the text to extract (testing and confirmation of the regex is possible by using Python Regex Tool).
  • value: Description or label of the locale tag that will be used to insert the text into.

optional:

  • count: How many text insertions into local tags will be done.
  • postproc: Post processing of the text by using several methods:
    • ('formatedlist', regex, '* [[%s]]'): The extracted text will be edited again by help of a regular expression and a list of all resulting matches formated as Wikipedia links (with [[...]]) will be returned (in wiki format).
    • ('formatedlist_frommatrix', regex, format, cols, head, check): Especially for big tables (like csv) with the option to filter entries according to certain criteria (check).
    • ('replacetext', '<.*?>', 'abc'): The extracted text will be filtered again by help of a regular expression, the expression gets replaced, e.g. here all HTML tags contained are replaced by 'abc'.
    • ('chain', postprocs): Use multiple postproc functions in sequence.
    • for more confer User:DrTrigon/DrTrigonBot/subster-postproc.css

Beautiful Soup mode edit

  • beautifulsoup: Replace all Beautiful Soup tags, ignoring all other parameters or settings but 'url'. Per page Beautifoul Soup tags beloging to one URL can be processed only. For help with the syntax confer Beautiful Soup Documentation (default: False).

MagicWords mode edit

  • magicwords_only: Replace Magic Words only, all other parameters or settings and templates will be ignored, 'url' too (default: False).

Simple mode edit

  • simple: Can be combined with all other modes and is basically a simplification or abreviation to hide complex settings from unexperienced useres only. But it enables also dynamic parameter values and thus can be perfectly used along with functions from w:en:Help:Magic words#Parser functions. To use it just move the parameters desired into a own template and pass this template to 'simple' (a example).

Simulation edit

Because the bot is not running continously yet (just daily), here the most simple method to test and check the settings and parameter of the template and chose the properly. The tool calls in fact directly the bot code, thus it is a simulation using the real bot (the productive environment).

DrTrigonBot subster simulation panel

Mail edit

The bot is able to recive mails also. Those are stored and used as data source. The mails can also be viewed in order to be able to see what information can be extracted and how.

To access recived mails in the parameter url the following syntax has to be used:

mail://sender@server.bla/all

for the whole mail text (body) or '/attachment' for attachments.

DrTrigonBot subster mail queue: drtrigon+subster@toolserver.org

Examples edit

My apologies for not translating this yet, but you can have a look at w:de:Benutzer:DrTrigonBot/Subster/Doku#Beispiele.
* STEP 1: as in dewiki, enwiki, ... a page (best is a template) has to be setup with THIS template and the bot
* STEP 2: place according aliases on the items to update/link
* STEP 3: enjoy ;)

more edit

Dynamical/fast updates (irc channel daemon) edit

A part of the bot runs permanent or continously (as Daemon) and reacts on some specific edits by users.

  • direct update after edit on a page containing or using the template
  • special jobs on dewiki
this bot part was not activated on wikidata yet!

See also edit