Wikidata:Requests for permissions/Bot/FischBot 5
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 19:55, 7 January 2016 (UTC)[reply]
FischBot edit
FischBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Pyfisch (talk • contribs • logs)
Task/s: Import information about welsh books from wikipedia:cy template Gwybodlen llyfr (infobox book).
Code: https://github.com/pyfisch/fischbot/blob/master/welshbooks/script.py (export script)
Function details: The bot will be run in two steps. First the bot will try to extract all information useful for Wikidata from Gwybodlen llyfr (infobox book) into a csv file. I will check this list and then import the data into Wikidata. The bot will not overwrite existing statements.
The import was suggested by User:Thryduulf at Wikidata:Bot requests#Welsh books. Thanks to User:Ham II for his help.
Exported data:
- information about the book
- author (P50): I will only import it if there is a single article linked from the
awdur
field in the template that exists and has a Wikidata item. This will won't import data of unlinked authors, authors without a wiki page and multiple authors of a book. - original language of film or TV show (P364): Sourced from pages linked in
iaith
. The case-insensitive plain text values "cymraeg" (welsh) and "saesneg" (english) are also used. - country of origin (P495): Sourced from pages linked in
gwlad
. The plain text value "cymru" (case-insensitive) for Wales is also recocnized. - publication date (P577): Recognized are just years, or day, month name in welsh and year.
- number of pages (P1104): Only plain numbers are imported, a suffix " tudalen" is optional.
- publisher (P123): a single value linked from
cyhoeddwr
or one of the special strings also recognized:gwasg carreg gwalch
: Gwasg Carreg Gwalch (Q3405970)gwasg gomer
: Gomer Press (Q3042717)y lolfa
: Y Lolfa (Q3404425)gwasg prifysgol cymru
: University of Wales Press (Q7896557)gwasg gee
: Gwasg Gee (Q13128815)cyhoeddiadau barddas
: Cyhoeddiadau Barddas (Q13127628)gwasg y bwthyn
: Gwasg y Bwthyn (Q13128820)gwasg gwynedd
: Gwasg Gwynedd (Q13128814)
- editor (P98): a single value linked from
golygydd
. - genre (P136): Genres are included in the template, but they are usually not linked, often there are multiple genres given. It would be possible to filter some common values like novel or drama from the text.
- author (P50): I will only import it if there is a single article linked from the
- authority control
- ISBN-13 (P212) and ISBN-10 (P957): The template contains for most books at least one of the numbers. The number is formatted and checked for wellformedness with isbnlib.
- OCLC control number (P243): A few books contain this identifier, it is checked by the script that it only contains numbers.
- Library of Congress Classification (P1149): There is a field called
cyngres
in the template but it is virtually unused. I won't import it. - Dewey Decimal Classification (P1036): There is a field called
deway
but it is not used. I won't import it.
The statement instance of (P31)book (Q571) will be added to all books. All items will be sourced with imported from Wikimedia project (P143)Welsh Wikipedia (Q848525).
If Wikipedia is not considered a good enough source for the claims made I propose only to add instance of (P31), ISBN-13 (P212) and OCLC control number (P243) to link the items to other data sources so future users and bots can easier find more data about the books from different sources. The problem with welsh books is that there are in many cases no other free sources. To limit spreading unchecked data to other areas, statements could only imported for books in welsh language or from Wales.
I have created a sample of the generated output. --Pyfisch (talk) 10:53, 30 December 2015 (UTC)[reply]
- The full output: https://github.com/pyfisch/fischbot/blob/master/welshbooks/2015-12-30-run1.csv --Pyfisch (talk) 11:26, 31 December 2015 (UTC)[reply]